Table partitioning models in Django with PostgreSQL
Hello.
This topic is about how relatively quickly and painlessly configure partitioning (protezirovanie) table for months, if you use Django+PostgreSQL. Much of the above is suitable for other frameworks and ORM.
About what partitioning is and why it is needed, you can read, for example, here, here and here.
So, there is a project on Django and the table is one of the models must be very large. If the reading from this table occurs frequently, and always known the time period in which the recording was made, the partitioning can speed up the database.
Every time to write queries to include a partition is not very desirable, so let's try to automate it. Well, if the outcome is something I can use and not much familiar with SQL. I've read the docs, so you don't have to.
First I will tell you how quickly try my groundwork, and then about what's under the hood. Act as follows:
Please install the package from repository
the
and connect a couple of apps:
the
Suppose there is a model in the
the
add the file
the
run
the
If this is the first
Well, the base is now ready, but Django yet. The fact that, since version 1.3 Django is designed for PostgreSQL, the queries
We can roughly make Django to not use the
the
And can edit our model so that the
the
So, the base is ready, Django is ready, but are we ready? To the read requests have not resulted in a survey base of all sections, we need to filter
the
Consider also that never was, unnecessarily, the
Now everything is ready. All.
Pro Hood
The most boring and difficult part for me was creating indexes for the partitions. When you have written the SQL to create the table and we go to the schema with partitioning, there is no problem — copy-paste. But when Django creates tables, it is not clear how to force it not to create the indexes for the master table, but save the corresponding SQL. In the end, I decided: let Django will create the indexes (they will still be empty), and I will copy them in the section.
Just have to figure out how to do it.
From
the
A little copypasta and we have everything in order to create indexes for tables-partitions.
The package I used it in a few projects and quite successfully copes with its tasks.
Thanks to all who read it. Hope made useful to someone.
Also please note in the comments and on github found the flaws, suggest improvements and extensions.
Article based on information from habrahabr.ru
This topic is about how relatively quickly and painlessly configure partitioning (protezirovanie) table for months, if you use Django+PostgreSQL. Much of the above is suitable for other frameworks and ORM.
About what partitioning is and why it is needed, you can read, for example, here, here and here.
So, there is a project on Django and the table is one of the models must be very large. If the reading from this table occurs frequently, and always known the time period in which the recording was made, the partitioning can speed up the database.
Every time to write queries to include a partition is not very desirable, so let's try to automate it. Well, if the outcome is something I can use and not much familiar with SQL. I've read the docs, so you don't have to.
First I will tell you how quickly try my groundwork, and then about what's under the hood. Act as follows:
-
the
- Catch the
syncdb
to add commands partitioning.
the - Plug-in SQL that will pull the set indexes, create partitions, activate indexes on them, add functions, and triggers. the
- Activate the partitioning into the target applications and models.
Please install the package from repository
the
pip install git+https://github.com/podshumok/django-post-syncdb-hooks
and connect a couple of apps:
the
INSTALLED APPS = (
# ...
'post_syncdb_hooks',
'post_syncdb_hooks.partitioning',
'your.app',
# ...
)
Suppose there is a model in the
yourapp/models.py
:the
from django.db import models
class MyModel(models.Model):
date_created = models.DateTimeField()
my_data = models.IntegerField(db_index=True)
add the file
yourapp/sql/post_syncdb-hook.postgresql_psycopg2.sql
(you can edit intervals to the needs):the
SELECT month_partition_creation(
date_trunc('MONTH', NOW())::date,
date_trunc('MONTH', NOW() + INTERVAL '1 year' )::date,
'yourapp_mymodel', 'date_created');
run
syncdb
:the
./manage.py syncdb
... and a section created.If this is the first
syncdb
, the created partitions will not have indexes. To fix this, you need to run syncdb
again.Well, the base is now ready, but Django yet. The fact that, since version 1.3 Django is designed for PostgreSQL, the queries
INSERT INTO
, adding RETURNING...
to id
inserted records. A partitioning method that we use does not support this feature.We can roughly make Django to not use the
RETURNING
anywhere:the
from django.db import connections, DEFAULT_DB_ALIAS
connections[DEFAULT_DB_ALIAS].features.can_return_id_from_insert = False
And can edit our model so that the
RETURNING
used only:the
from post_syncdb_hooks.partitioning import to_partition
class MyModel(models.Model):
"..."
#...
@to_partition
def save(self, *args, **kwargs):
return super(MyModel, self).save(*args, **kwargs)
So, the base is ready, Django is ready, but are we ready? To the read requests have not resulted in a survey base of all sections, we need to filter
QuerySet
's on the field, for which the partitioning is implemented (in the example — date_created
):the
qs = MyModel.objects.filter(date_created__lte=..., date_created__gt=...)
Consider also that never was, unnecessarily, the
count()
all entries: for example, paginator
in the admin loves to do that.Now everything is ready. All.
Pro Hood post_syncdb_hooks
post_syncdb_hooks
contains management.py
, in which the receiver is connected to signal post_syncdb
. This receiver or hook is invoked for all installed applications. It scans, does not lie next to the file models.py
of the application folder is sql
, and if is, whether there is a file post_syncdb-hook.sql
or post_syncdb-hook.(backend).sql
that can be run.post_syncdb_hooks.partitioning
just contains the SQL file. In it we create the SQL function that takes four parameters: a start date, an end date, table name and field name. When calling the function, for the specified table creates partitions on the selected field, from the starting date to the ending date increments per month, as well as a trigger for INSERT INTO
indexes and tables-sections (if indexes are defined for the base table).Hood about the indexes
The most boring and difficult part for me was creating indexes for the partitions. When you have written the SQL to create the table and we go to the schema with partitioning, there is no problem — copy-paste. But when Django creates tables, it is not clear how to force it not to create the indexes for the master table, but save the corresponding SQL. In the end, I decided: let Django will create the indexes (they will still be empty), and I will copy them in the section.
Just have to figure out how to do it.
From
psql
has the option -E
, which causes it to output the SQL generated by internal commands. So,the
sudo -u postgres psql -E db
db=# \di
********* QUERY **********
SELECT n.nspname as "Schema",
c.relname as "Name",
CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
c2.relname as "Table"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
LEFT JOIN pg_catalog.pg_class c2 ON i.indrelid = c2.oid
WHERE c.relkind IN ('i',")
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
**************************
db=#
A little copypasta and we have everything in order to create indexes for tables-partitions.
Thank you!
The package I used it in a few projects and quite successfully copes with its tasks.
Thanks to all who read it. Hope made useful to someone.
Also please note in the comments and on github found the flaws, suggest improvements and extensions.
Комментарии
Отправить комментарий