Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Date
Msg-id CAMkU=1yQz4OEmoWSnY_wE03KdpoQ4zpzmjpyPDUQwz+tbsU21g@mail.gmail.com
Whole thread Raw
In response to [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.  (David Gould <daveg@sonic.net>)
Responses Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate.  (David Gould <daveg@sonic.net>)
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate.  (David Gould <daveg@sonic.net>)
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate.  (David Gould <daveg@sonic.net>)
List pgsql-hackers
On Wed, Jan 17, 2018 at 4:49 PM, David Gould <daveg@sonic.net> wrote:

Please add the attached patch and this discussion to the open commit fest. The
original bugs thread is here: 20180111111254.1408.8342@wrigleys.postgresql.org.

Bug reference:      15005
Logged by:          David Gould
Email address:      daveg@sonic.net
PostgreSQL version: 10.1 and earlier
Operating system:   Linux
Description:

ANALYZE can make pg_class.reltuples wildly inaccurate compared to the actual
row counts for tables that are larger than the default_statistics_target.

Example from one of a clients production instances:

# analyze verbose pg_attribute;
INFO:  analyzing "pg_catalog.pg_attribute"
INFO:  "pg_attribute": scanned 30000 of 24519424 pages, containing 6475 live rows and 83 dead rows; 6475 rows in sample, 800983035 estimated total rows.

This is a large complex database - pg_attribute actually has about five
million rows and needs about one hundred thouand pages. However it has
become extremely bloated and is taking 25 million pages (192GB!), about 250
times too much. This happened despite aggressive autovacuum settings and a
periodic bloat monitoring script. Since pg_class.reltuples was 800 million,
the bloat monitoring script did not detect that this table was bloated and
autovacuum did not think it needed vacuuming.

I can see how this issue would prevent ANALYZE from fixing the problem, but I don't see how it could have caused the problem in the first place.  In your demonstration case, you had to turn off autovac in order to get it to happen, and then when autovac is turned back on, it is all primed for an autovac to launch, go through, touch almost all of the pages, and fix it for you.  How did your original table get into a state where this wouldn't happen?

Maybe a well-timed crash caused n_dead_tup to get reset to zero and that is why autovac is not kicking in?  What are the pg_stat_user_table number and the state of the visibility map for your massively bloated table, if you still have them?

In any event, I agree with your analysis that ANALYZE should set the number of tuples from scratch.  After all, it sets the other estimates, such as MCV, from scratch, and those are much more fragile to sampling than just the raw number of tuples are.  But if the default target is set to 1, that would scan only 300 pages.  I think that that is a little low of a sample size to base an estimate on, but it isn't clear to that using 300 pages plus whacking them around with an exponential averaging is really going to be much better.  And if you set your default target to 1, that is more-or-less what you signed up for.

It is little weird to have VACUUM incrementally update and then ANALYZE compute from scratch and discard the previous value, but no weirder than what we currently do of having ANALYZE incrementally update despite that it is specifically designed to representatively sample the entire table.  So I don't think we need to decide what to do about VACUUM before we can do something about ANALYZE.

So I support your patch.  There is probably more investigation and work that could be done in this area, but those could be different patches, not blocking this one.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Fwd: automatic disable unicode line style when terminal is not unicode
Next
From: Tom Lane
Date:
Subject: Re: Fwd: automatic disable unicode line style when terminal is not unicode