Thread: Strange performance problem with query
Hi All, Please see the output from the following query analysis : =# explain analyze select count(1) from jbpmprocess.jbpm_taskinstance ti join jbpmprocess.jbpm_task task on (ti.task_ = task.id_ ) join jbpmprocess.jbpm_processinstance pi on ti.procinst_ = pi.id_ where ti.isopen_ = true; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=47372.04..47372.05 rows=1 width=0) (actual time=647.070..647.071 rows=1 loops=1) -> Hash Join (cost=44806.99..47336.72 rows=14127 width=0) (actual time=605.077..645.410 rows=20359 loops=1) Hash Cond: (ti.task_ = task.id_) -> Hash Join (cost=44779.80..47115.28 rows=14127 width=8) (actual time=604.874..640.541 rows=20359 loops=1) Hash Cond: (ti.procinst_ = pi.id_) -> Index Scan using idx_task_instance_isopen on jbpm_taskinstance ti (cost=0.00..1995.84 rows=22672 width=16) (actual time=0.011..16.606 rows=20359 loops=1) Index Cond: (isopen_ = true) Filter: isopen_ -> Hash (cost=28274.91..28274.91 rows=1320391 width=8) (actual time=604.601..604.601 rows=1320391 loops=1) Buckets: 262144 Batches: 1 Memory Usage: 51578kB -> Seq Scan on jbpm_processinstance pi (cost=0.00..28274.91 rows=1320391 width=8) (actual time=0.004..192.166 rows=1320391 loops=1) -> Hash (cost=18.75..18.75 rows=675 width=8) (actual time=0.196..0.196 rows=675 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 27kB -> Seq Scan on jbpm_task task (cost=0.00..18.75 rows=675 width=8) (actual time=0.003..0.106 rows=675 loops=1) Total runtime: 652.266 ms (15 rows) I'm not sure why the planner insists on doing the sequential scan on jbpm_processinstance even though the 22672 rows from jbpm_taskinstance it has to match it against, is only 1% of the number of rows in jbpm_processinstance. So far I think it is because the values in procinst_ of jbpm_taskinstance are not entirely unique. The very strange thing though is the way the query plan changes if I repeat the where clause : explain analyze select count(1) from jbpmprocess.jbpm_taskinstance ti join jbpmprocess.jbpm_task task on (ti.task_ = task.id_ ) join jbpmprocess.jbpm_processinstance pi on ti.procinst_ = pi.id_ where ti.isopen_ = true and ti.isopen_ = true; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=2074.61..2074.62 rows=1 width=0) (actual time=80.126..80.126 rows=1 loops=1) -> Hash Join (cost=27.19..2074.24 rows=151 width=0) (actual time=0.217..77.959 rows=20359 loops=1) Hash Cond: (ti.task_ = task.id_) -> Nested Loop (cost=0.00..2044.97 rows=151 width=8) (actual time=0.016..71.429 rows=20359 loops=1) -> Index Scan using idx_task_instance_isopen on jbpm_taskinstance ti (cost=0.00..29.72 rows=243 width=16) (actual time=0.012..16.928 rows=20359 loops=1) Index Cond: ((isopen_ = true) AND (isopen_ = true)) Filter: (isopen_ AND isopen_) -> Index Scan using jbpm_processinstance_pkey on jbpm_processinstance pi (cost=0.00..8.28 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=20359) Index Cond: (id_ = ti.procinst_) -> Hash (cost=18.75..18.75 rows=675 width=8) (actual time=0.196..0.196 rows=675 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 27kB -> Seq Scan on jbpm_task task (cost=0.00..18.75 rows=675 width=8) (actual time=0.002..0.107 rows=675 loops=1) Total runtime: 80.170 ms I get a similar plan selected on the original query if I set enable_seqscan to off. I much prefer the second result. My questions are: 1. Why is this happening? 2. How can I encourage the behavior of the second query without changing the original query? Is there some column level setting I can set? (BTW the tables are analyzed, and I currently have no special settings/attributes set for any of the tables.) -- Kind Regards Stefan Cell : 072-380-1479 Desk : 087-577-7241 To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: https://www.fnb.co.za/disclaimer.html If you are unable to access the Disclaimer, send a blank e-mail to firstrandbankdisclaimer@fnb.co.za and we will send you a copy of the Disclaimer.
"Van Der Berg, Stefan" <SvanderBerg@fnb.co.za> wrote: > I get a similar plan selected on the original query if I set > enable_seqscan to off. I much prefer the second result. > My questions are: > 1. Why is this happening? Your cost factors don't accurately model actual costs. > 2. How can I encourage the behavior of the second query without > changing the original query? You didn't give enough information to really give solid advice, but when people see what you are seeing, some common tuning needed is: Set shared_buffers to about 25% of system RAM or 8GB, whichever is lower. Set effective_cache_size to 50% to 75% of system RAM. Set work_mem to about 25% of system RAM divided by max_connections. If you have a high cache hit ratio (which you apparently do) reduce random_page_cost, possibly to something near or equal to seq_page_cost. Increase cpu_tuple_cost, perhaps to 0.03. You might want to play with the above, and if you still have a problem, read this page and post with more detail: http://wiki.postgresql.org/wiki/SlowQueryQuestions > Is there some column level setting I can set? The statistics looked pretty accurate, so that shouldn't be necessary. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Kevin, Thanks for the advice. I opted for setting the random_page_cost a bit lower, as that made the most sense in the context of the current setup where there is quite a high cache hit ratio. Is 97% high enough?: =# SELECT 'cache hit rate' AS name, sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio FROM pg_statio_user_tables; name | ratio ----------------+------------------------ cache hit rate | 0.97344836172381212996 When I set the random_page_cost down from 4 to 2, the query plan changes to the faster one. Kind Regards Stefan Cell : 072-380-1479 Desk : 087-577-7241 On 2014/09/15 03:25 PM, Kevin Grittner wrote: > "Van Der Berg, Stefan" <SvanderBerg@fnb.co.za> wrote: > >> I get a similar plan selected on the original query if I set >> enable_seqscan to off. I much prefer the second result. >> My questions are: >> 1. Why is this happening? > Your cost factors don't accurately model actual costs. > >> 2. How can I encourage the behavior of the second query without >> changing the original query? > You didn't give enough information to really give solid advice, but > when people see what you are seeing, some common tuning needed is: > > Set shared_buffers to about 25% of system RAM or 8GB, whichever is > lower. > > Set effective_cache_size to 50% to 75% of system RAM. > > Set work_mem to about 25% of system RAM divided by max_connections. > > If you have a high cache hit ratio (which you apparently do) reduce > random_page_cost, possibly to something near or equal to > seq_page_cost. > > Increase cpu_tuple_cost, perhaps to 0.03. > > You might want to play with the above, and if you still have a > problem, read this page and post with more detail: > > http://wiki.postgresql.org/wiki/SlowQueryQuestions > >> Is there some column level setting I can set? > The statistics looked pretty accurate, so that shouldn't be > necessary. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: https://www.fnb.co.za/disclaimer.html If you are unable to access the Disclaimer, send a blank e-mail to firstrandbankdisclaimer@fnb.co.za and we will send you a copy of the Disclaimer.