Thread: autovacuum vacuum creates bad statistics for planner when it log index scans: 0

On a 9.3.1 server , I have a key busy_table in that is hit by most
transactions running on our system. One DB's copy of this table has 60K rows
and 1/3 of that tables rows can updated every minute.

Autovacuum autovacuum_analyze_scale_factor is set 0.02, so that analyse runs
nearly every minute. But when autovacuum vacuum runs I sometimes see the
following message in logs:

LOG:  automatic vacuum of table "busy_table":* index scans: 0*
        pages: 0 removed, 22152 remain
        tuples: 0 removed, 196927 remain
        buffer usage: 46241 hits, 478 misses, 715 dirtied
        avg read rate: 0.561 MB/s, avg write rate: 0.839 MB/s
        system usage: CPU 0.07s/0.06u sec elapsed 6.66 sec

and the tuples remaining is then overestimated by a factor >3 , and have
seen this over estimate as large at >20 times IE 5M

This causes the query planner to then fail to get the best plan, in fact
this can result in queries that take 30 Minutes that normally  return in 4-6
seconds.

IE it starts table scanning tables in joins to busy_table rather than using
the index.

As soon as following appears:
LOG:  automatic vacuum of table "busy_table": *index scans: 1*

all is well again for the queries that follow this , but for the 20-30 user
interactions during the bad period never return as they take too long.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Just to add a little more detail about my busy_table
1) there are no rows deleted
2) 98% of updates are HOT
3) there are two DB's on this postgres instance both with the same table,
both seeing the same issue




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5804424.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


tim_wilson <tim.wilson@telogis.com> writes:
> On a 9.3.1 server , I have a key busy_table in that is hit by most
> transactions running on our system. One DB's copy of this table has 60K rows
> and 1/3 of that tables rows can updated every minute.

> Autovacuum autovacuum_analyze_scale_factor is set 0.02, so that analyse runs
> nearly every minute. But when autovacuum vacuum runs I sometimes see the
> following message in logs:

> LOG:  automatic vacuum of table "busy_table":* index scans: 0*
>         pages: 0 removed, 22152 remain
>         tuples: 0 removed, 196927 remain
>         buffer usage: 46241 hits, 478 misses, 715 dirtied
>         avg read rate: 0.561 MB/s, avg write rate: 0.839 MB/s
>         system usage: CPU 0.07s/0.06u sec elapsed 6.66 sec

> and the tuples remaining is then overestimated by a factor >3 , and have
> seen this over estimate as large at >20 times IE 5M

FWIW, I tried to reproduce this without success.

There's some code in there that attempts to extrapolate the total number
of live tuples when VACUUM has not scanned the entire table.  It's surely
plausible that that logic went off the rails ... but without a test case
or at least a more specific description of the problem scenario, it's
hard to know what's wrong exactly.

            regards, tom lane


Thanks for you response Tom:
but what does index_scans:0 mean? vs index scans: 1?

I have had a look at the c code but cannot see when it that would be the
case.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5806283.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


tim_wilson <tim.wilson@telogis.com> writes:
> Thanks for you response Tom:
> but what does index_scans:0 mean? vs index scans: 1?

I believe the former means that VACUUM found no removable tuples, so it
had no need to make any passes over the table's indexes.

(Ordinarily you wouldn't see the number of scans as more than 1, unless
VACUUM removed quite a lot of dead tuples, more than it could remember
within maintenance_work_mem; in which case it would make multiple passes
over the indexes to remove index entries.)

            regards, tom lane


I have now created a repeatable test for this ...bug, well that may be
debatable, but getting the query plan this wrong after vacum and analyze
have run certainly looks like a bug to me.

I have created a test case that matches my problem domain but can probably
be simplified.
postgres_bug.sql
<http://postgresql.1045698.n5.nabble.com/file/n5806302/postgres_bug.sql>

Even after autovac vacuum and autovac analyze have run the query plan goes
from using indexes on the big table to table scanning them as it thinks my
unit_test table is large due to the error in the estimate for rows in the
table that autovac generated. Once you run a cluster on the table all goes
back to using the correct indexes. And the next autovacuum gets things
straight again on the stats table.

Look forward to solution as this is hurting us. A very hot table on our
system goes bad every night and needs constant watching.

regards
Tim








--
View this message in context:
http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5806302.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Yes I can create a simpler version that exhibits the problem:
postgres_bug_simpler.sql
<http://postgresql.1045698.n5.nabble.com/file/n5806320/postgres_bug_simpler.sql>

This only now involves one smaller table 60K rows, and a linked table with
20M rows. I tried with 6K and 1M but could not get problem to occur. Both
are now unchanging in size. The smaller table gets updated frequently, and
then starts exhibiting the bad query plan, it seems especially after the 2nd
auto vacuum and auto analyze. When the dead_rows goes to zero in the stats
the live_tup can stay at an huge factor larger than the table really is for
some time.

In my system the smaller table that is updated frequently grows only
slightly if at all. I never want a table scan to happen of the big table,
but even with enable_seq_scan=false set in functions that query these tables
I can get the bad query plan.

Would it be possible to have a setting on a table that gave an expression
for determining the table size? IE For key highly updated tables I could set
and maintain the meta-data for the size table and even shape of the data.
Then for these tables autovac would not need to make the effort of having to
estimate size.

regards
Tim




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5806320.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Was my example not able to be repeated or do I need to give you a better
example of the problem, or is there just a lot of stuff happening?

Happy to do more work on example sql for the problem if it needs it, just
need some feed back on how whether this problem is going to be looked at or
not.

regards
Tim



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5806743.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


tim_wilson <tim.wilson@telogis.com> writes:
> Was my example not able to be repeated or do I need to give you a better
> example of the problem, or is there just a lot of stuff happening?

The latter ...

http://www.postgresql.org/message-id/flat/10787.1402083865@sss.pgh.pa.us

            regards, tom lane


Great thanks a lot.

We will be ready to build and test a patch or 9.4 version as soon as you
have a test patch you want to try.

regards
Tim





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5806747.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Have been looking at lazyvacuum.c trying to think of a way of changing the
calculation of the stats to ensure that my hot table stats do not get so
badly distorted.

What I have noticed is that when I reindex my hot table the stats on the
table do not seem to change in pg_stat_user_tables (ie they stay bad) but
the EXPLAIN for the query starts using the correct index again.

The index itself does not seem very bloated but reindex seems to alter the
query optimizer choices. I can't see in index.c where side effect would be
caused.

Why is this? Not sure if this is entirely helpful as reindex takes exclusive
lock, but if I used a concurrent rebuild and rename of the index I might be
able to work around this.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/autovacuum-vacuum-creates-bad-statistics-for-planner-when-it-log-index-scans-0-tp5804416p5808509.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.