Re: Remaining case where reltuples can become distorted across multiple VACUUM operations - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Remaining case where reltuples can become distorted across multiple VACUUM operations
Date
Msg-id CAH2-WznSniwVEQv7pT03SGytNj6EKw1-YOv4N6WZgYuWrwmZvQ@mail.gmail.com
Whole thread Raw
In response to Re: Remaining case where reltuples can become distorted across multiple VACUUM operations  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: Remaining case where reltuples can become distorted across multiple VACUUM operations
List pgsql-hackers
On Mon, Aug 8, 2022 at 9:17 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> Because if a subset of the pages of a relation contains more tuples
> than your current total expected tuples in the table, you should
> update your expectations regardless of which blocks or which number of
> blocks you've scanned - the previous stored value is a strictly worse
> estimation than your last measurement.

The previous stored value could be -1, which represents the idea that
we don't know the tuple density yet. So it doesn't necessarily follow
that the new estimate is strictly better, even in this exact scenario.

> A 33-block relation with first 32 1-tuple pages is still enough to
> have a last page with 250 tuples, which would be ignored in that
> scheme and have a total tuple count of 33 or so.

The simple fact is that there is only so much we can do with the
limited information/context that we have. Heuristics are not usually
free of all bias. Often the bias is the whole point -- the goal can be
to make sure that we have the bias that we know we can live with, and
not the opposite bias, which is much worse. Details of which are
usually very domain specific.

I presented my patch with a very simple test case -- a very clear
problem. Can you do the same for this scenario?

I accept that it is possible that we'll keep an old reltuples which is
provably less accurate than doing something with the latest
information from vacuumlazy.c. But the conditions under which this can
happen are *very* narrow. I am not inclined to do anything about it
for that reason.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: 2022-08-11 release announcement draft
Next
From: Mark Wong
Date:
Subject: Re: Asking for feedback on Pgperffarm