Thread: hash aggregation
Hello! Is it possible to speed up the plan? hashes=# \d hashcheck Table "public.hashcheck" Column | Type | Modifiers --------+-------------------+-------------------------------------------------------- id | integer | not null default nextval('hashcheck_id_seq'::regclass) name | character varying | value | character varying | Indexes: "btr" btree (name) hashes=# select version(); version -------------------------------------------------------------------------------------------------------------- PostgreSQL 9.2.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2), 64-bit (1 row) hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=573977.88..573978.38 rows=200 width=32) (actual time=10351.280..10351.551 rows=4000 loops=1) Output: name, (count(name)) Sort Key: hashcheck.name Sort Method: quicksort Memory: 315kB -> HashAggregate (cost=573968.24..573970.24 rows=200 width=32) (actual time=10340.507..10341.288 rows=4000 loops=1) Output: name, count(name) -> Seq Scan on public.hashcheck (cost=0.00..447669.16 rows=25259816 width=32) (actual time=0.019..2798.058 rows=25259817loops=1) Output: id, name, value Total runtime: 10351.989 ms (9 rows) hashes=# Thank you.
On Wed, Oct 10, 2012 at 9:09 AM, Korisk <Korisk@yandex.ru> wrote: > Hello! Is it possible to speed up the plan? > Sort (cost=573977.88..573978.38 rows=200 width=32) (actual time=10351.280..10351.551 rows=4000 loops=1) > Output: name, (count(name)) > Sort Key: hashcheck.name > Sort Method: quicksort Memory: 315kB > -> HashAggregate (cost=573968.24..573970.24 rows=200 width=32) (actual time=10340.507..10341.288 rows=4000 loops=1) > Output: name, count(name) > -> Seq Scan on public.hashcheck (cost=0.00..447669.16 rows=25259816 width=32) (actual time=0.019..2798.058 rows=25259817loops=1) > Output: id, name, value > Total runtime: 10351.989 ms AFAIU there are no query optimization solution for this. It may be worth to create a table hashcheck_stat (name, cnt) and increment/decrement the cnt values with triggers if you need to get counts fast. -- Sergey Konoplev a database and software architect http://www.linkedin.com/in/grayhemp Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984
Thanx for the advice, but increment table is not acceptable because it should be a plenty of them. Nevertheless in the investigations was achieved some progress (7.4 sec vs 19.6 sec). But using IOS scan you can see that there is an abnormal cost calculations it make me suspicious of little bugs. Thanks for your answer. hashes=# \d hashcheck; Table "public.hashcheck" Column | Type | Modifiers --------+-------------------+-------------------------------------------------------- id | integer | not null default nextval('hashcheck_id_seq'::regclass) name | character varying | value | character varying | Indexes: "hashcheck_name_idx" btree (name) hashes=# vacuum hashcheck; VACUUM hashes=# set random_page_cost=0.1; SET hashes=# set seq_page_cost=0.1; SET hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=407366.72..407367.22 rows=200 width=32) (actual time=10712.505..10712.765 rows=4001 loops=1) Output: name, (count(name)) Sort Key: hashcheck.name Sort Method: quicksort Memory: 315kB -> HashAggregate (cost=407357.08..407359.08 rows=200 width=32) (actual time=10702.285..10703.054 rows=4001 loops=1) Output: name, count(name) -> Seq Scan on public.hashcheck (cost=0.00..277423.12 rows=25986792 width=32) (actual time=0.054..2877.100 rows=25990002loops=1) Output: id, name, value Total runtime: 10712.989 ms (9 rows) hashes=# set enable_seqscan = off; SET hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=10000000000.00..10000528610.88 rows=200 width=32) (actual time=0.116..7452.005 rows=4001 loops=1) Output: name, count(name) -> Index Only Scan Backward using hashcheck_name_idx on public.hashcheck (cost=10000000000.00..10000398674.92 rows=25986792width=32) (actual time=0.104..3785.767 rows=25990002 loops=1) Output: name Heap Fetches: 0 Total runtime: 7452.509 ms (6 rows) Благодаря шаманствам на: http://www.sql.ru/forum/actualthread.aspx?tid=974484 11.10.2012, 01:30, "Sergey Konoplev" <gray.ru@gmail.com>: > On Wed, Oct 10, 2012 at 9:09 AM, Korisk <Korisk@yandex.ru> wrote: > >> Hello! Is it possible to speed up the plan? >> Sort (cost=573977.88..573978.38 rows=200 width=32) (actual time=10351.280..10351.551 rows=4000 loops=1) >> Output: name, (count(name)) >> Sort Key: hashcheck.name >> Sort Method: quicksort Memory: 315kB >> -> HashAggregate (cost=573968.24..573970.24 rows=200 width=32) (actual time=10340.507..10341.288 rows=4000 loops=1) >> Output: name, count(name) >> -> Seq Scan on public.hashcheck (cost=0.00..447669.16 rows=25259816 width=32) (actual time=0.019..2798.058rows=25259817 loops=1) >> Output: id, name, value >> Total runtime: 10351.989 ms > > AFAIU there are no query optimization solution for this. > > It may be worth to create a table hashcheck_stat (name, cnt) and > increment/decrement the cnt values with triggers if you need to get > counts fast. > > -- > Sergey Konoplev > > a database and software architect > http://www.linkedin.com/in/grayhemp > > Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984
On 10/11/2012 12:13 PM, Korisk wrote: > Thanx for the advice, but increment table is not acceptable because it should be a plenty of them. > Nevertheless in the investigations was achieved some progress (7.4 sec vs 19.6 sec). > But using IOS scan "IOS scan" ? Do you mean some kind of I/O monitoring tool? > you can see that there is an abnormal cost calculations it make me suspicious of little bugs. Abnormal how? The cost estimates aren't times, I/Os, or anything you know, they're a purely relative figure for comparing plan costs. > hashes=# set enable_seqscan = off; > SET What's your seq_page_cost and random_page_cost? > hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > GroupAggregate (cost=10000000000.00..10000528610.88 rows=200 width=32) (actual time=0.116..7452.005 rows=4001 loops=1) > Output: name, count(name) > -> Index Only Scan Backward using hashcheck_name_idx on public.hashcheck ^^^^^^^^^^^^^^^^^^^^^^^^ If you don't mind the increased cost of insert/update/delete try: CREATE INDEX hashcheck_name_rev_idx ON public.hashcheck (name DESC); ie create the index in descending order. -- Craig Ringer
On Wed, Oct 10, 2012 at 9:13 PM, Korisk <Korisk@yandex.ru> wrote: > -> Index Only Scan Backward using hashcheck_name_idx on public.hashcheck (cost=10000000000.00..10000398674.92 rows=25986792width=32) It seems odd. Is it possible to look at the non default configuration? SELECT name, setting, reset_val FROM pg_settings WHERE setting <> reset_val; > (actual time=0.104..3785.767 rows=25990002 loops=1) > Output: name > Heap Fetches: 0 > Total runtime: 7452.509 ms > (6 rows) > > Благодаря шаманствам на: > http://www.sql.ru/forum/actualthread.aspx?tid=974484 > > 11.10.2012, 01:30, "Sergey Konoplev" <gray.ru@gmail.com>: >> On Wed, Oct 10, 2012 at 9:09 AM, Korisk <Korisk@yandex.ru> wrote: >> >>> Hello! Is it possible to speed up the plan? >>> Sort (cost=573977.88..573978.38 rows=200 width=32) (actual time=10351.280..10351.551 rows=4000 loops=1) >>> Output: name, (count(name)) >>> Sort Key: hashcheck.name >>> Sort Method: quicksort Memory: 315kB >>> -> HashAggregate (cost=573968.24..573970.24 rows=200 width=32) (actual time=10340.507..10341.288 rows=4000 loops=1) >>> Output: name, count(name) >>> -> Seq Scan on public.hashcheck (cost=0.00..447669.16 rows=25259816 width=32) (actual time=0.019..2798.058rows=25259817 loops=1) >>> Output: id, name, value >>> Total runtime: 10351.989 ms >> >> AFAIU there are no query optimization solution for this. >> >> It may be worth to create a table hashcheck_stat (name, cnt) and >> increment/decrement the cnt values with triggers if you need to get >> counts fast. >> >> -- >> Sergey Konoplev >> >> a database and software architect >> http://www.linkedin.com/in/grayhemp >> >> Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984 -- Sergey Konoplev a database and software architect http://www.linkedin.com/in/grayhemp Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984
"IOS scan" ? Index Only Scan What's your seq_page_cost and random_page_cost? hashes=# SELECT name, setting, reset_val FROM pg_settings WHERE setting <> reset_val; name | setting | reset_val -------------------------+----------------+----------- archive_command | (disabled) | enable_bitmapscan | off | on enable_indexscan | off | on enable_seqscan | off | on log_file_mode | 0600 | 384 random_page_cost | 0.1 | 4 seq_page_cost | 0.1 | 1 transaction_isolation | read committed | default unix_socket_permissions | 0777 | 511 (9 rows) Postgresql 9.2.1 was configured and built with default settings. Thank you.
On Thu, Oct 11, 2012 at 8:15 AM, Korisk <Korisk@yandex.ru> wrote: > What's your seq_page_cost and random_page_cost? > hashes=# SELECT name, setting, reset_val FROM pg_settings WHERE setting <> reset_val; > name | setting | reset_val > -------------------------+----------------+----------- > archive_command | (disabled) | > enable_bitmapscan | off | on > enable_indexscan | off | on > enable_seqscan | off | on > log_file_mode | 0600 | 384 > random_page_cost | 0.1 | 4 > seq_page_cost | 0.1 | 1 > transaction_isolation | read committed | default > unix_socket_permissions | 0777 | 511 Could you please try to set *_page_cost to 1 and then EXPLAIN ANALYZE it again? > -> Index Only Scan Backward using hashcheck_name_idx on public.hashcheck > (cost=10000000000.00..10000398674.92 rows=25986792 width=32) > (actual time=0.104..3785.767 rows=25990002 loops=1) I am just guessing but it might probably be some kind of a precision bug, and I would like to check this. > (9 rows) > > Postgresql 9.2.1 was configured and built with default settings. > > Thank you. -- Sergey Konoplev a database and software architect http://www.linkedin.com/in/grayhemp Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984
Again the same cost. hashes=# SELECT name, setting, reset_val FROM pg_settings WHERE setting <> reset_val; name | setting | reset_val -------------------------+----------------+----------- archive_command | (disabled) | enable_bitmapscan | off | on enable_indexscan | off | on enable_seqscan | off | on log_file_mode | 0600 | 384 random_page_cost | 1 | 4 transaction_isolation | read committed | default unix_socket_permissions | 0777 | 511 (8 rows) hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------ GroupAggregate (cost=10000000000.00..10000596612.97 rows=200 width=32) (actual time=0.136..7272.240 rows=4001 loops=1) Output: name, count(name) -> Index Only Scan using hashcheck_name_rev_idx on public.hashcheck (cost=10000000000.00..10000466660.96 rows=25990002width=32) (act ual time=0.121..3624.624 rows=25990002 loops=1) Output: name Heap Fetches: 0 Total runtime: 7272.735 ms (6 rows) 11.10.2012, 21:55, "Sergey Konoplev" <gray.ru@gmail.com>: > On Thu, Oct 11, 2012 at 8:15 AM, Korisk <Korisk@yandex.ru> wrote: > >> What's your seq_page_cost and random_page_cost? >> hashes=# SELECT name, setting, reset_val FROM pg_settings WHERE setting <> reset_val; >> name | setting | reset_val >> -------------------------+----------------+----------- >> archive_command | (disabled) | >> enable_bitmapscan | off | on >> enable_indexscan | off | on >> enable_seqscan | off | on >> log_file_mode | 0600 | 384 >> random_page_cost | 0.1 | 4 >> seq_page_cost | 0.1 | 1 >> transaction_isolation | read committed | default >> unix_socket_permissions | 0777 | 511 > > Could you please try to set *_page_cost to 1 and then EXPLAIN ANALYZE it again? > >> -> Index Only Scan Backward using hashcheck_name_idx on public.hashcheck >> (cost=10000000000.00..10000398674.92 rows=25986792 width=32) >> (actual time=0.104..3785.767 rows=25990002 loops=1) > > I am just guessing but it might probably be some kind of a precision > bug, and I would like to check this. > >> (9 rows) >> >> Postgresql 9.2.1 was configured and built with default settings. >> >> Thank you. > > -- > Sergey Konoplev > > a database and software architect > http://www.linkedin.com/in/grayhemp > > Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984
On Thu, Oct 11, 2012 at 8:55 PM, Korisk <Korisk@yandex.ru> wrote: > hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; Now set enable_bitmapscan and enable_indexscan to on an try it again. Then set enable_seqscan to on and run it one more time. > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------------------ > ------------------------------------------------ > GroupAggregate (cost=10000000000.00..10000596612.97 rows=200 width=32) (actual time=0.136..7272.240 rows=4001 loops=1) > Output: name, count(name) > -> Index Only Scan using hashcheck_name_rev_idx on public.hashcheck (cost=10000000000.00..10000466660.96 rows=25990002width=32) (act > ual time=0.121..3624.624 rows=25990002 loops=1) > Output: name > Heap Fetches: 0 > Total runtime: 7272.735 ms > (6 rows) > > > > > > > 11.10.2012, 21:55, "Sergey Konoplev" <gray.ru@gmail.com>: >> On Thu, Oct 11, 2012 at 8:15 AM, Korisk <Korisk@yandex.ru> wrote: >> >>> What's your seq_page_cost and random_page_cost? >>> hashes=# SELECT name, setting, reset_val FROM pg_settings WHERE setting <> reset_val; >>> name | setting | reset_val >>> -------------------------+----------------+----------- >>> archive_command | (disabled) | >>> enable_bitmapscan | off | on >>> enable_indexscan | off | on >>> enable_seqscan | off | on >>> log_file_mode | 0600 | 384 >>> random_page_cost | 0.1 | 4 >>> seq_page_cost | 0.1 | 1 >>> transaction_isolation | read committed | default >>> unix_socket_permissions | 0777 | 511 >> >> Could you please try to set *_page_cost to 1 and then EXPLAIN ANALYZE it again? >> >>> -> Index Only Scan Backward using hashcheck_name_idx on public.hashcheck >>> (cost=10000000000.00..10000398674.92 rows=25986792 width=32) >>> (actual time=0.104..3785.767 rows=25990002 loops=1) >> >> I am just guessing but it might probably be some kind of a precision >> bug, and I would like to check this. >> >>> (9 rows) >>> >>> Postgresql 9.2.1 was configured and built with default settings. >>> >>> Thank you. >> >> -- >> Sergey Konoplev >> >> a database and software architect >> http://www.linkedin.com/in/grayhemp >> >> Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984 -- Sergey Konoplev a database and software architect http://www.linkedin.com/in/grayhemp Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984
Strange situation. After indexscan enabling the cost is seriously decreased. hashes=# set enable_bitmapscan=on; SET hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------ GroupAggregate (cost=10000000000.00..10000596612.97 rows=200 width=32) (actual time=0.187..7424.799 rows=4001 loops=1) Output: name, count(name) -> Index Only Scan using hashcheck_name_rev_idx on public.hashcheck (cost=10000000000.00..10000466660.96 rows=25990002width=32) (act ual time=0.166..3698.776 rows=25990002 loops=1) Output: name Heap Fetches: 0 Total runtime: 7425.403 ms (6 rows) hashes=# set enable_indexscan=on; SET hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------- GroupAggregate (cost=0.00..596612.97 rows=200 width=32) (actual time=0.148..7339.115 rows=4001 loops=1) Output: name, count(name) -> Index Only Scan using hashcheck_name_rev_idx on public.hashcheck (cost=0.00..466660.96 rows=25990002 width=32) (actualtime=0.129. .3653.848 rows=25990002 loops=1) Output: name Heap Fetches: 0 Total runtime: 7339.592 ms (6 rows) hashes=# set enable_seqscan=on; SET hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ ----- Sort (cost=565411.67..565412.17 rows=200 width=32) (actual time=21746.799..21747.026 rows=4001 loops=1) Output: name, (count(name)) Sort Key: hashcheck.name Sort Method: quicksort Memory: 315kB -> HashAggregate (cost=565402.03..565404.03 rows=200 width=32) (actual time=21731.551..21733.277 rows=4001 loops=1) Output: name, count(name) -> Seq Scan on public.hashcheck (cost=0.00..435452.02 rows=25990002 width=32) (actual time=29.431..13383.812 rows=25990002loop s=1) Output: id, name, value Total runtime: 21747.356 ms (9 rows)
Hi, On 12 October 2012 15:14, Korisk <Korisk@yandex.ru> wrote: > Strange situation. > After indexscan enabling the cost is seriously decreased. You can not really disable any scan method. enable_xxx = off just sets very high cost (=10000000000) for that operation. -- Ondrej Ivanic (ondrej.ivanic@gmail.com) (http://www.linkedin.com/in/ondrejivanic)
On Thu, Oct 11, 2012 at 9:14 PM, Korisk <Korisk@yandex.ru> wrote: > Strange situation. > After indexscan enabling the cost is seriously decreased. AFAIK when the planner has to choose between index scans and seq scans and both of this options are off it uses one of this strategies anyway but puts 10000000000.00 as a lower cost for this (thanks Maxim Boguk for the explanation in chat). > -> Index Only Scan using hashcheck_name_rev_idx on public.hashcheck (cost=10000000000.00..10000466660.96 rows=25990002width=32) (act > ual time=0.166..3698.776 rows=25990002 loops=1) So when you enabled one of these options it started using it as usual. > hashes=# set enable_indexscan=on; > SET > hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; [cut] > -> Index Only Scan using hashcheck_name_rev_idx on public.hashcheck (cost=0.00..466660.96 rows=25990002 width=32)(actual time=0.129. > .3653.848 rows=25990002 loops=1) What I can not understand is why the seq scan's estimated cost is better the index scan's one. It depends on the number of pages in index/relation. May be the index is heavily bloated? Let's see the sizes: select pg_total_relation_size('hashcheck') select pg_total_relation_size('hashcheck_name_rev_idx'); > hashes=# set enable_seqscan=on; > SET > hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc; > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------------------ > ----- > Sort (cost=565411.67..565412.17 rows=200 width=32) (actual time=21746.799..21747.026 rows=4001 loops=1) > Output: name, (count(name)) > Sort Key: hashcheck.name > Sort Method: quicksort Memory: 315kB > -> HashAggregate (cost=565402.03..565404.03 rows=200 width=32) (actual time=21731.551..21733.277 rows=4001 loops=1) > Output: name, count(name) > -> Seq Scan on public.hashcheck (cost=0.00..435452.02 rows=25990002 width=32) (actual time=29.431..13383.812rows=25990002 loop > s=1) > Output: id, name, value > Total runtime: 21747.356 ms > (9 rows) > > > > > -- Sergey Konoplev a database and software architect http://www.linkedin.com/in/grayhemp Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +14158679984
> What I can not understand is why the seq scan's estimated cost is > better the index scan's one. It depends on the number of pages in > index/relation. May be the index is heavily bloated? Mm i don't know how to see bloating level. But the index was created by create index on hashcheck using btree (name) after the table population. Sizes: hashes=# select pg_total_relation_size('hashcheck'); pg_total_relation_size ------------------------ 2067701760 (1 row) hashes=# select pg_total_relation_size('hashcheck_name_rev_idx'); pg_total_relation_size ------------------------ 629170176 (1 row)
On 12.10.2012 09:10, Sergey Konoplev wrote: > What I can not understand is why the seq scan's estimated cost is > better the index scan's one. It depends on the number of pages in > index/relation. May be the index is heavily bloated? The IOS cost depends on other things too. The index can't be read simply as a sequence of pages, the scan needs to jump around the tree to read the tuples in the right order. With the index size being close to the size of the table, the cost of these operations may easily outweight the benefits. And I suspect this is the case here, because the table has only 3 columns (INT and two text ones), and each row has some overhead (header), that may further decrease the difference between index and table size. Nevertheless, the cost estimate here is wrong - either it's estimating something wrong, or maybe everything is in the case and the planner does not know about that. Tomas
On 11.10.2012 17:15, Korisk wrote: > "IOS scan" ? > Index Only Scan > > What's your seq_page_cost and random_page_cost? > > hashes=# SELECT name, setting, reset_val FROM pg_settings WHERE setting <> reset_val; > name | setting | reset_val > -------------------------+----------------+----------- > archive_command | (disabled) | > enable_bitmapscan | off | on > enable_indexscan | off | on > enable_seqscan | off | on > log_file_mode | 0600 | 384 > random_page_cost | 0.1 | 4 > seq_page_cost | 0.1 | 1 > transaction_isolation | read committed | default > unix_socket_permissions | 0777 | 511 > (9 rows) > > Postgresql 9.2.1 was configured and built with default settings. > > Thank you. Hi, so how much RAM does the system have? Because if you're using the default shared buffers size (32MB IIRC), that's the first thing you should bump up. It's usually recommended to set it to ~25% of RAM, but not more than ~10GB. Set also the work_mem and maintenance_work_mem, depending on the amount of RAM you have. Then set effective_cache_size to 75% of RAM (this is just a hint to the planner, it won't really allocate memory). Restart the database and try the queries again. Don't run them with EXPLAIN ANALYZE because that adds overhead that may easily make some of the queries much slower. It's great to see the estimates and actual row counts, but for timing queries it's a poor choice (even the TIMING OFF added in 9.2 is not exactly overhead-free). Maybe this is what made the seqscan look much slower? I usually run them from psql like this \o /dev/null \timing on SELECT ... which gives me more reliable timing results (especially when executed right on the server). Only if all this tuning fails, it's time to fine-tune the knobs, i.e. the cost variables. Please, don't change the seq_page_cost, always keep it at 1.0 and change only the other values. For example if everything fits into the RAM, you may change the random_page_cost to 1.5 or lower (I'd never recommend to set it lower than seq_page_cost), and then you may start tuning the cpu_* costs. But please, this is the last thing you should do - tune the server properly first. There's even a very nice wiki page about tuning PostgreSQL servers: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server regards Tomas