Table partitioning models in Django with PostgreSQL

Hello.
This topic is about how relatively quickly and painlessly configure partitioning (protezirovanie) table for months, if you use Django+PostgreSQL. Much of the above is suitable for other frameworks and ORM.

About what partitioning is and why it is needed, you can read, for example, here, here and here.

So, there is a project on Django and the table is one of the models must be very large. If the reading from this table occurs frequently, and always known the time period in which the recording was made, the partitioning can speed up the database.

Every time to write queries to include a partition is not very desirable, so let's try to automate it. Well, if the outcome is something I can use and not much familiar with SQL. I've read the docs, so you don't have to.

First I will tell you how quickly try my groundwork, and then about what's under the hood. Act as follows:

Catch the syncdb to add commands partitioning.
Plug-in SQL that will pull the set indexes, create partitions, activate indexes on them, add functions, and triggers.
Activate the partitioning into the target applications and models.

Please install the package from repository
the

pip install git+https://github.com/podshumok/django-post-syncdb-hooks

and connect a couple of apps:
the

INSTALLED APPS = (
# ...
'post_syncdb_hooks',
'post_syncdb_hooks.partitioning',

'your.app',
# ...
)

Suppose there is a model in the yourapp/models.py:
the

from django.db import models

class MyModel(models.Model):
date_created = models.DateTimeField()
my_data = models.IntegerField(db_index=True)

add the file yourapp/sql/post_syncdb-hook.postgresql_psycopg2.sql (you can edit intervals to the needs):
the

SELECT month_partition_creation(
date_trunc('MONTH', NOW())::date,
date_trunc('MONTH', NOW() + INTERVAL '1 year' )::date,
'yourapp_mymodel', 'date_created');

run syncdb:
the

./manage.py syncdb

... and a section created.

If this is the first syncdb, the created partitions will not have indexes. To fix this, you need to run syncdb again.

Well, the base is now ready, but Django yet. The fact that, since version 1.3 Django is designed for PostgreSQL, the queries INSERT INTO, adding RETURNING... to id inserted records. A partitioning method that we use does not support this feature.

We can roughly make Django to not use the RETURNING anywhere:
the

from django.db import connections, DEFAULT_DB_ALIAS

connections[DEFAULT_DB_ALIAS].features.can_return_id_from_insert = False

And can edit our model so that the RETURNING used only:
the

from post_syncdb_hooks.partitioning import to_partition

class MyModel(models.Model):
"..."
#...

@to_partition
def save(self, *args, **kwargs):
return super(MyModel, self).save(*args, **kwargs)

So, the base is ready, Django is ready, but are we ready? To the read requests have not resulted in a survey base of all sections, we need to filter QuerySet's on the field, for which the partitioning is implemented (in the example — date_created):
the

qs = MyModel.objects.filter(date_created__lte=..., date_created__gt=...)

Consider also that never was, unnecessarily, the count() all entries: for example, paginator in the admin loves to do that.

Now everything is ready. All.

Pro Hood `post_syncdb_hooks`

post_syncdb_hooks contains management.py, in which the receiver is connected to signal post_syncdb. This receiver or hook is invoked for all installed applications. It scans, does not lie next to the file models.py of the application folder is sql, and if is, whether there is a file post_syncdb-hook.sql or post_syncdb-hook.(backend).sql that can be run.

post_syncdb_hooks.partitioning just contains the SQL file. In it we create the SQL function that takes four parameters: a start date, an end date, table name and field name. When calling the function, for the specified table creates partitions on the selected field, from the starting date to the ending date increments per month, as well as a trigger for INSERT INTO indexes and tables-sections (if indexes are defined for the base table).

Hood about the indexes

The most boring and difficult part for me was creating indexes for the partitions. When you have written the SQL to create the table and we go to the schema with partitioning, there is no problem — copy-paste. But when Django creates tables, it is not clear how to force it not to create the indexes for the master table, but save the corresponding SQL. In the end, I decided: let Django will create the indexes (they will still be empty), and I will copy them in the section.

Just have to figure out how to do it.

From psql has the option -E, which causes it to output the SQL generated by internal commands. So,
the

sudo -u postgres psql -E db

db=# \di
********* QUERY **********
SELECT n.nspname as "Schema",
c.relname as "Name",
CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
c2.relname as "Table"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
LEFT JOIN pg_catalog.pg_class c2 ON i.indrelid = c2.oid
WHERE c.relkind IN ('i',")
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
**************************
db=#

A little copypasta and we have everything in order to create indexes for tables-partitions.

Thank you!

The package I used it in a few projects and quite successfully copes with its tasks.
Thanks to all who read it. Hope made useful to someone.
Also please note in the comments and on github found the flaws, suggest improvements and extensions.

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express