Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. - Mailing list pgsql-hackers

From David Gould
Subject Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate.
Date
Msg-id 20180302140037.00af305d@engels
Whole thread Raw
In response to Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Responses Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
List pgsql-hackers
On Fri, 2 Mar 2018 18:47:44 +0300
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote:

> The calculation I made for the first step applies to the next steps too, 
> with minor differences. So, the estimate increases at each step. Just 
> out of interest, I plotted the reltuples for 60 steps, and it doesn't 
> look like it's going to converge anytime soon (see attached).
> Looking at the formula, this overshoot term is created when we multiply 
> the old density by the new number of pages. I'm not sure how to fix 
> this. I think we could average the number of tuples, not the densities. 
> The attached patch demonstrates what I mean.

I'm confused at this point, I provided a patch that addresses this and a
test case. We seem to be discussing everything as if we first noticed the
issue. Have you reviewed the patch and and attached analysis and tested it?
Please commment on that?

Thanks.

Also, here is a datapoint that I found just this morning on a clients
production system:

INFO:  "staging_xyz": scanned 30000 of   pages, containing 63592 live rows and 964346 dead rows;
30000 rows in sample, 1959918155 estimated total rows

# select (50000953.0/30000*63592)::int as nrows;
   nrows          
-----------
 105988686

This tables reltuples is 18 times the actual row count. It will never converge
because with 50000953 pages analyze can only adjust reltuples by 0.0006 each time.

It will also almost never get vacuumed because the autovacuum threshold of
0.2 * 1959918155 = 391983631 about 3.7 times larger than the actual row count.

The submitted patch is makes analyze effective in setting reltuples to within
a few percent of the count(*) value.

-dg


-- 
David Gould                                   daveg@sonic.net
If simplicity worked, the world would be overrun with insects.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: heap_lock_updated_tuple_rec can leak a buffer refcount
Next
From: Magnus Hagander
Date:
Subject: Re: Online enabling of checksums