Acceleration query SELECT COUNT(*) for large tables in PostgreSQL
As is well known, the queries SELECT COUNT(*) from big tables in PostgreSQL are very slow. Offer a complete solution for acceleration of this query by using functions and triggers.
Consider the example of a table with ~200 000 records:
Now do the query with enable_seqscan included:
the
Run time was: 82.967 ms.
Now with enable_seqscan turned off:
the
Run time was: 117.724 ms, although in this case postgresql is used the index users_pkey, but it got worse.
As you can see the execution time of these queries is too large.
I have developed a solution to reduce the execution time of the query to less than 1ms. The solution is the following:
1. for each table to start the count of the number of records with the name ТАБЛИЦА_count_seq.
2. write the function rows_count(), which will output the value of the counter or reset the counter.
3. write the function rows_count_update_trigger() that will run the triggers of the tables to automatically change the counter when the queries INSERT, DELETE, TRUNCATE.
4. the plug-in triggers that will:
— increase the counter after INSERT
— zoom in DELETE
to reset TRUNCATE
5. instead of SELECT COUNT(*), will use SELECT rows_count('TABLE')
So, let's begin.
1. Create a counter which will store the current number of records in the table.
the
2. Function rows_count () output the counter value or reset.
the
3. Function rows_count_update_trigger() — trigger function for automatic change counter.
the
4. The connection of the trigger function to a table.
CREATE TRIGGER rows_count_update_trigger
AFTER INSERT OR DELETE ON users
FOR EACH ROW EXECUTE PROCEDURE rows_count_update_trigger();
the
5. Watch rezultaty using SELECT rows_count('TABLE')
First you need to reset the counter, so it kept the current number of records in the table.
Do a reset:
We see that the counter has reset and shows the actual number of records 205043. SELECT query rows_count('users') will return the same result 205043.
Analysis of a SELECT query rows_count('users'):
the
Conclusion:
lead Time were: 0.260 ms.
Another advantage is that the time to perform SELECT rows_count('TABLE') will always be the same for any quantity of records in the table.
Thank you for your attention.
Article based on information from habrahabr.ru
Consider the example of a table with ~200 000 records:
SELECT COUNT(*) FROM users;
count
— 205043
Now do the query with enable_seqscan included:
the
SET enable_seqscan TO on;
EXPLAIN ANALYZE SELECT COUNT(*) FROM users;
QUERY PLAN
— Aggregate (cost=15813.70 15813.71..rows=1 width=0) (actual time=..82.907 82.908 rows=1 loops=1)
-> Seq Scan on users (cost=0.00..15296.16 rows=207016 width=0) (actual time=0.014..55.505 rows=205043 loops=1)
Planning time: 0.211 ms
Execution time: 82.967 ms
Run time was: 82.967 ms.
Now with enable_seqscan turned off:
the
SET enable_seqscan TO off;
EXPLAIN ANALYZE SELECT COUNT(*) FROM users;
QUERY PLAN
— Aggregate (cost=20156.95 20156.96..rows=1 width=0) (actual time=..117.553 117.554 rows=1 loops=1)
-> Bitmap Heap Scan on users (cost=4343.25 19639.41..rows=207016 width=0) (actual time=..rows 28.354 92.228=205043 loops=1)
Heap Blocks: exact=13226
-> Bitmap Index Scan on users_pkey (cost=0.00..4291.50 rows=207016 width=0) (actual time=..rows 25.247 25.247=229621 loops=1)
Planning time: 0.141 ms
Execution time: 117.724 ms
Run time was: 117.724 ms, although in this case postgresql is used the index users_pkey, but it got worse.
As you can see the execution time of these queries is too large.
I have developed a solution to reduce the execution time of the query to less than 1ms. The solution is the following:
1. for each table to start the count of the number of records with the name ТАБЛИЦА_count_seq.
2. write the function rows_count(), which will output the value of the counter or reset the counter.
3. write the function rows_count_update_trigger() that will run the triggers of the tables to automatically change the counter when the queries INSERT, DELETE, TRUNCATE.
4. the plug-in triggers that will:
— increase the counter after INSERT
— zoom in DELETE
to reset TRUNCATE
5. instead of SELECT COUNT(*), will use SELECT rows_count('TABLE')
So, let's begin.
1. Create a counter which will store the current number of records in the table.
the
CREATE SEQUENCE users_count_seq MINVALUE 0 START 0;
2. Function rows_count () output the counter value or reset.
the
CREATE OR REPLACE FUNCTION rows_count(
text tablename,
reset bool default false )
RETURNS bigint
LANGUAGE plpgsql AS $$
DECLARE
rows_count bigint;
tablename_seq text;
BEGIN
tablename_seq := tablename || '_count_seq';
-- reset the counter
IF reset IS TRUE THEN
EXECUTE 'SELECT setval($1,count(*)) FROM '||tablename
USING tablename_seq
INTO rows_count;
-- output the current count
ELSE
EXECUTE 'SELECT last_value FROM '||tablename_seq
INTO rows_count;
END IF;
RETURN rows_count;
END;
$$;
3. Function rows_count_update_trigger() — trigger function for automatic change counter.
the
CREATE OR REPLACE FUNCTION rows_count_update_trigger()
RETURNS TRIGGER
LANGUAGE plpgsql AS $$
DECLARE
tablename_seq text;
BEGIN
tablename_seq := TG_TABLE_NAME || '_count_seq';
-- increase the counter with INSERT
IF TG_OP = 'INSERT' THEN
EXECUTE 'SELECT nextval($1) FOR UPDATE'
USING tablename_seq;
RETURN NEW;
-- decrement the counter when you DELETE
ELSEIF TG_OP = 'DELETE' THEN
EXECUTE 'SELECT setval($1,nextval($1)-2) FOR UPDATE'
USING tablename_seq;
RETURN OLD;
-- reset the counter when TRUNCATE
ELSEIF TG_OP = 'TRUNCATE' THEN
EXECUTE 'SELECT setval($1,0) FOR UPDATE'
USING tablename_seq;
RETURN OLD;
END IF;
END;
$$;
4. The connection of the trigger function to a table.
AFTER INSERT OR DELETE ON users
FOR EACH ROW EXECUTE PROCEDURE rows_count_update_trigger();
the
CREATE CONSTRAINT TRIGGER rows_count_update_trigger
AFTER INSERT OR DELETE ON users INITIALLY DEFERRED
FOR EACH ROW EXECUTE PROCEDURE rows_count_update_trigger();
CREATE TRIGGER rows_count_reset_trigger
AFTER TRUNCATE ON users
FOR EACH STATEMENT EXECUTE PROCEDURE rows_count_update_trigger();
5. Watch rezultaty using SELECT rows_count('TABLE')
First you need to reset the counter, so it kept the current number of records in the table.
Do a reset:
SELECT rows_count('users',true);
rows_count
— 205043
We see that the counter has reset and shows the actual number of records 205043. SELECT query rows_count('users') will return the same result 205043.
Analysis of a SELECT query rows_count('users'):
the
EXPLAIN ANALYZE SELECT rows_count('users');
Conclusion:
QUERY PLAN
— Result (cost=0.00..5.25 rows=1000 width=0) (actual time=0.242..0.244 rows=1 loops=1)
Planning time: 0.033 ms
lead Time were: 0.260 ms.
Another advantage is that the time to perform SELECT rows_count('TABLE') will always be the same for any quantity of records in the table.
Thank you for your attention.
Комментарии
Отправить комментарий