Thread: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
tim_wilson
Date:
On a 9.3.1 server , I have a key busy_table in that is hit by most transactions running on our system. One DB's copy of this table has 60K rows and 1/3 of that tables rows can updated every minute. Autovacuum autovacuum_analyze_scale_factor is set 0.02, so that analyse runs nearly every minute. But when autovacuum vacuum runs I sometimes see the following message in logs: LOG: automatic vacuum of table "busy_table":* index scans: 0* pages: 0 removed, 22152 remain tuples: 0 removed, 196927 remain buffer usage: 46241 hits, 478 misses, 715 dirtied avg read rate: 0.561 MB/s, avg write rate: 0.839 MB/s system usage: CPU 0.07s/0.06u sec elapsed 6.66 sec and the tuples remaining is then overestimated by a factor >3 , and have seen this over estimate as large at >20 times IE 5M This causes the query planner to then fail to get the best plan, in fact this can result in queries that take 30 Minutes that normally return in 4-6 seconds. IE it starts table scanning tables in joins to busy_table rather than using the index. As soon as following appears: LOG: automatic vacuum of table "busy_table": *index scans: 1* all is well again for the queries that follow this , but for the 20-30 user interactions during the bad period never return as they take too long. -- View this message in context: http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
tim_wilson
Date:
Just to add a little more detail about my busy_table 1) there are no rows deleted 2) 98% of updates are HOT 3) there are two DB's on this postgres instance both with the same table, both seeing the same issue -- View this message in context: http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5804424.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
Tom Lane
Date:
tim_wilson <tim.wilson@telogis.com> writes: > On a 9.3.1 server , I have a key busy_table in that is hit by most > transactions running on our system. One DB's copy of this table has 60K rows > and 1/3 of that tables rows can updated every minute. > Autovacuum autovacuum_analyze_scale_factor is set 0.02, so that analyse runs > nearly every minute. But when autovacuum vacuum runs I sometimes see the > following message in logs: > LOG: automatic vacuum of table "busy_table":* index scans: 0* > pages: 0 removed, 22152 remain > tuples: 0 removed, 196927 remain > buffer usage: 46241 hits, 478 misses, 715 dirtied > avg read rate: 0.561 MB/s, avg write rate: 0.839 MB/s > system usage: CPU 0.07s/0.06u sec elapsed 6.66 sec > and the tuples remaining is then overestimated by a factor >3 , and have > seen this over estimate as large at >20 times IE 5M FWIW, I tried to reproduce this without success. There's some code in there that attempts to extrapolate the total number of live tuples when VACUUM has not scanned the entire table. It's surely plausible that that logic went off the rails ... but without a test case or at least a more specific description of the problem scenario, it's hard to know what's wrong exactly. regards, tom lane
Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
tim_wilson
Date:
Thanks for you response Tom: but what does index_scans:0 mean? vs index scans: 1? I have had a look at the c code but cannot see when it that would be the case. -- View this message in context: http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5806283.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
Tom Lane
Date:
tim_wilson <tim.wilson@telogis.com> writes: > Thanks for you response Tom: > but what does index_scans:0 mean? vs index scans: 1? I believe the former means that VACUUM found no removable tuples, so it had no need to make any passes over the table's indexes. (Ordinarily you wouldn't see the number of scans as more than 1, unless VACUUM removed quite a lot of dead tuples, more than it could remember within maintenance_work_mem; in which case it would make multiple passes over the indexes to remove index entries.) regards, tom lane
Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
tim_wilson
Date:
I have now created a repeatable test for this ...bug, well that may be debatable, but getting the query plan this wrong after vacum and analyze have run certainly looks like a bug to me. I have created a test case that matches my problem domain but can probably be simplified. postgres_bug.sql <http://postgresql.1045698.n5.nabble.com/file/n5806302/postgres_bug.sql> Even after autovac vacuum and autovac analyze have run the query plan goes from using indexes on the big table to table scanning them as it thinks my unit_test table is large due to the error in the estimate for rows in the table that autovac generated. Once you run a cluster on the table all goes back to using the correct indexes. And the next autovacuum gets things straight again on the stats table. Look forward to solution as this is hurting us. A very hot table on our system goes bad every night and needs constant watching. regards Tim -- View this message in context: http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5806302.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
tim_wilson
Date:
Yes I can create a simpler version that exhibits the problem: postgres_bug_simpler.sql <http://postgresql.1045698.n5.nabble.com/file/n5806320/postgres_bug_simpler.sql> This only now involves one smaller table 60K rows, and a linked table with 20M rows. I tried with 6K and 1M but could not get problem to occur. Both are now unchanging in size. The smaller table gets updated frequently, and then starts exhibiting the bad query plan, it seems especially after the 2nd auto vacuum and auto analyze. When the dead_rows goes to zero in the stats the live_tup can stay at an huge factor larger than the table really is for some time. In my system the smaller table that is updated frequently grows only slightly if at all. I never want a table scan to happen of the big table, but even with enable_seq_scan=false set in functions that query these tables I can get the bad query plan. Would it be possible to have a setting on a table that gave an expression for determining the table size? IE For key highly updated tables I could set and maintain the meta-data for the size table and even shape of the data. Then for these tables autovac would not need to make the effort of having to estimate size. regards Tim -- View this message in context: http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5806320.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
tim_wilson
Date:
Was my example not able to be repeated or do I need to give you a better example of the problem, or is there just a lot of stuff happening? Happy to do more work on example sql for the problem if it needs it, just need some feed back on how whether this problem is going to be looked at or not. regards Tim -- View this message in context: http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5806743.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
Tom Lane
Date:
tim_wilson <tim.wilson@telogis.com> writes: > Was my example not able to be repeated or do I need to give you a better > example of the problem, or is there just a lot of stuff happening? The latter ... http://www.postgresql.org/message-id/flat/10787.1402083865@sss.pgh.pa.us regards, tom lane
Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
tim_wilson
Date:
Great thanks a lot. We will be ready to build and test a patch or 9.4 version as soon as you have a test patch you want to try. regards Tim -- View this message in context: http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5806747.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Re: autovacuum vacuum creates bad statistics for planner when it log index scans: 0
From
tim_wilson
Date:
Have been looking at lazyvacuum.c trying to think of a way of changing the calculation of the stats to ensure that my hot table stats do not get so badly distorted. What I have noticed is that when I reindex my hot table the stats on the table do not seem to change in pg_stat_user_tables (ie they stay bad) but the EXPLAIN for the query starts using the correct index again. The index itself does not seem very bloated but reindex seems to alter the query optimizer choices. I can't see in index.c where side effect would be caused. Why is this? Not sure if this is entirely helpful as reindex takes exclusive lock, but if I used a concurrent rebuild and rename of the index I might be able to work around this. -- View this message in context: http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5808509.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.