Re: Remaining case where reltuples can become distorted across multiple VACUUM operations - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Remaining case where reltuples can become distorted across multiple VACUUM operations
Date
Msg-id CAH2-WzkgL90RS4b-k0Mr2UBXoBc-0cEOhArwBU6mkHffOHx7eQ@mail.gmail.com
Whole thread Raw
In response to Re: Remaining case where reltuples can become distorted across multiple VACUUM operations  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
On Thu, Aug 11, 2022 at 1:48 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> I think I understand your reasoning, but I don't agree with the
> conclusion. The attached patch 0002 does fix that skew too, at what I
> consider negligible cost. 0001 is your patch with a new version
> number.

Your patch added allowSystemTableMods to one of the tests. I guess
that this was an oversight?

> I'm fine with your patch as is, but would appreciate it if known
> estimate mistakes would also be fixed.

Why do you think that this particular scenario/example deserves
special attention? As I've acknowledged already, it is true that your
scenario is one in which we provably give a less accurate estimate,
based on already-available information. But other than that, I don't
see any underlying principle that would be violated by my original
patch (any kind of principle, held by anybody). reltuples is just an
estimate.

I was thinking of going your way on this, purely because it didn't
seem like there'd be much harm in it (why not just handle your case
and be done with it?). But I don't think that it's a good idea now.
reltuples is usually derived by ANALYZE using a random sample, so the
idea that tuple density can be derived accurately enough from a random
sample is pretty baked in. You're talking about a case where ignoring
just one page ("sampling" all but one of the pages) *isn't* good
enough. It just doesn't seem like something that needs to be addressed
-- it's quite awkward to do so.

Barring any further objections, I plan on committing the original
version tomorrow.

> An alternative solution could be doing double-vetting, where we ignore
> tuples_scanned if <2% of pages AND <2% of previous estimated tuples
> was scanned.

I'm not sure that I've understood you, but I think that you're talking
about remembering more information (in pg_class), which is surely out
of scope for a bug fix.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: shadow variables - pg15 edition
Next
From: Peter Smith
Date:
Subject: Re: shadow variables - pg15 edition