Acceleration query SELECT COUNT(*) for large tables in PostgreSQL

As is well known, the queries SELECT COUNT(*) from big tables in PostgreSQL are very slow. Offer a complete solution for acceleration of this query by using functions and triggers.

Consider the example of a table with ~200 000 records:
SELECT COUNT(*) FROM users;
count
— 205043

Now do the query with enable_seqscan included:

the
SET enable_seqscan TO on;
EXPLAIN ANALYZE SELECT COUNT(*) FROM users;

QUERY PLAN
— Aggregate (cost=15813.70 15813.71..rows=1 width=0) (actual time=..82.907 82.908 rows=1 loops=1)
-> Seq Scan on users (cost=0.00..15296.16 rows=207016 width=0) (actual time=0.014..55.505 rows=205043 loops=1)
Planning time: 0.211 ms
Execution time: 82.967 ms

Run time was: 82.967 ms.

Now with enable_seqscan turned off:

the
SET enable_seqscan TO off;
EXPLAIN ANALYZE SELECT COUNT(*) FROM users;

QUERY PLAN
— Aggregate (cost=20156.95 20156.96..rows=1 width=0) (actual time=..117.553 117.554 rows=1 loops=1)
-> Bitmap Heap Scan on users (cost=4343.25 19639.41..rows=207016 width=0) (actual time=..rows 28.354 92.228=205043 loops=1)
Heap Blocks: exact=13226
-> Bitmap Index Scan on users_pkey (cost=0.00..4291.50 rows=207016 width=0) (actual time=..rows 25.247 25.247=229621 loops=1)
Planning time: 0.141 ms
Execution time: 117.724 ms

Run time was: 117.724 ms, although in this case postgresql is used the index users_pkey, but it got worse.

As you can see the execution time of these queries is too large.

I have developed a solution to reduce the execution time of the query to less than 1ms. The solution is the following:

1. for each table to start the count of the number of records with the name ТАБЛИЦА_count_seq.
2. write the function rows_count(), which will output the value of the counter or reset the counter.
3. write the function rows_count_update_trigger() that will run the triggers of the tables to automatically change the counter when the queries INSERT, DELETE, TRUNCATE.
4. the plug-in triggers that will:
— increase the counter after INSERT
zoom in DELETE
to reset TRUNCATE
5. instead of SELECT COUNT(*), will use SELECT rows_count('TABLE')

So, let's begin.

1. Create a counter which will store the current number of records in the table.

the
CREATE SEQUENCE users_count_seq MINVALUE 0 START 0;

2. Function rows_count () output the counter value or reset.

the
CREATE OR REPLACE FUNCTION rows_count(
text tablename, 
reset bool default false )
RETURNS bigint
LANGUAGE plpgsql AS $$
DECLARE
rows_count bigint;
tablename_seq text;
BEGIN
tablename_seq := tablename || '_count_seq';
-- reset the counter
IF reset IS TRUE THEN
EXECUTE 'SELECT setval($1,count(*)) FROM '||tablename
USING tablename_seq
INTO rows_count;
-- output the current count
ELSE
EXECUTE 'SELECT last_value FROM '||tablename_seq
INTO rows_count;
END IF;
RETURN rows_count;
END;
$$;

3. Function rows_count_update_trigger() — trigger function for automatic change counter.

the
CREATE OR REPLACE FUNCTION rows_count_update_trigger()
RETURNS TRIGGER
LANGUAGE plpgsql AS $$
DECLARE
tablename_seq text;
BEGIN
tablename_seq := TG_TABLE_NAME || '_count_seq';
-- increase the counter with INSERT
IF TG_OP = 'INSERT' THEN
EXECUTE 'SELECT nextval($1) FOR UPDATE'
USING tablename_seq;
RETURN NEW;
-- decrement the counter when you DELETE
ELSEIF TG_OP = 'DELETE' THEN
EXECUTE 'SELECT setval($1,nextval($1)-2) FOR UPDATE'
USING tablename_seq;
RETURN OLD;
-- reset the counter when TRUNCATE
ELSEIF TG_OP = 'TRUNCATE' THEN
EXECUTE 'SELECT setval($1,0) FOR UPDATE'
USING tablename_seq;
RETURN OLD;
END IF;
END;
$$;

4. The connection of the trigger function to a table.

CREATE TRIGGER rows_count_update_trigger
AFTER INSERT OR DELETE ON users
FOR EACH ROW EXECUTE PROCEDURE rows_count_update_trigger();

the
CREATE CONSTRAINT TRIGGER rows_count_update_trigger
AFTER INSERT OR DELETE ON users INITIALLY DEFERRED
FOR EACH ROW EXECUTE PROCEDURE rows_count_update_trigger();
CREATE TRIGGER rows_count_reset_trigger
AFTER TRUNCATE ON users
FOR EACH STATEMENT EXECUTE PROCEDURE rows_count_update_trigger();

5. Watch rezultaty using SELECT rows_count('TABLE')

First you need to reset the counter, so it kept the current number of records in the table.

Do a reset:

SELECT rows_count('users',true);
rows_count
— 205043

We see that the counter has reset and shows the actual number of records 205043. SELECT query rows_count('users') will return the same result 205043.

Analysis of a SELECT query rows_count('users'):

the
EXPLAIN ANALYZE SELECT rows_count('users');

Conclusion:
QUERY PLAN
— Result (cost=0.00..5.25 rows=1000 width=0) (actual time=0.242..0.244 rows=1 loops=1)
Planning time: 0.033 ms

lead Time were: 0.260 ms.

Another advantage is that the time to perform SELECT rows_count('TABLE') will always be the same for any quantity of records in the table.

Thank you for your attention.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Templates ESKD and GOST 7.32 for Lyx 1.6.x

Monitoring PostgreSQL + php-fpm + nginx + disk using Zabbix

Custom table in MODx Revolution