Home > mailing lists

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Date	March 1, 2018 21:09:34
Msg-id	18522.1519916974@sss.pgh.pa.us Whole thread Raw
In response to	Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Responses	Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate. (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
List	pgsql-hackers

Tree view

Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
> On 01.03.2018 06:23, David Gould wrote:
>> In theory the sample pages analyze uses should represent the whole table
>> fairly well. We rely on this to generate pg_statistic and it is a key
>> input to the planner. Why should we not believe in it as much only for
>> reltuples? If the analyze sampling does not work, the fix would be to improve
>> that, not to disregard it piecemeal.

> Well, that sounds reasonable. But the problem with the moving average
> calculation remains. Suppose you run vacuum and not analyze. If the
> updates are random enough, vacuum won't be able to reclaim all the
> pages, so the number of pages will grow. Again, we'll have the same
> thing where the number of pages grows, the real number of live tuples
> stays constant, and the estimated reltuples grows after each vacuum run.

You claimed that before, with no more evidence than this time, and I still
don't follow your argument.  The number of pages may indeed bloat but the
number of live tuples per page will fall.  Ideally, at least, the estimate
would remain on-target.  If it doesn't, there's some other effect that
you haven't explained.  It doesn't seem to me that the use of a moving
average would prevent that from happening.  What it *would* do is smooth
out errors from the inevitable sampling bias in any one vacuum or analyze
run, and that seems like a good thing.

> I did some more calculations on paper to try to understand this. If we
> average reltuples directly, instead of averaging tuple density, it
> converges like it should. The error with this density calculation seems
> to be that we're effectively multiplying the old density by the new
> number of pages. I'm not sure why we even work with tuple density. We
> could just estimate the number of tuples based on analyze/vacuum, and
> then apply moving average to it. The calculations would be shorter, too.
> What do you think?

I think you're reinventing the way we used to do it.  Perhaps consulting
the git history in the vicinity of this code would be enlightening.

            regards, tom lane

pgsql-hackers by date:

From: Amit Kapila
Date: 01 March 2018, 21:05:12
Subject: Re: zheap: a new storage format for PostgreSQL

From: Magnus Hagander
Date: 01 March 2018, 21:18:48
Subject: Re: Online enabling of checksums

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. - Mailing list pgsql-hackers

Previous

Next