My bugfix commit 74388a1a (which was pushed back in February) added
heuristics to VACUUM's reltuples calculation/estimate. This prevented
VACUUM from distorting our estimate of reltuples over time, across
successive VACUUM operations run against the same table. The problem
was that VACUUM could scan the same single heap page again and again,
while believing it saw a random sample each time. This eventually
leads to a pg_class.reltuples value that is based on the assumption
that every single heap page in the table is just like the heap page
that gets "sampled" again and again. This was always the last heap
page (due to implementation details related to the work done by commit
e8429082), which in practice tend to be particularly poor
representations of the overall reltuples density of tables.
I have discovered a gap in these heuristics: there are remaining cases
where its percentage threshold doesn't prevent reltuples distortion as
intended. It can still happen with tables that are small enough that a
cutoff of 2% of rel_pages is less than a single page, yet still large
enough that vacuumlazy.c will consider it worth its while to skip some
pages using the visibility map. It will typically skip all but the
final heap page from the relation (same as the first time around).
Here is a test case that shows how this can still happen on HEAD (and
in Postgres 15):
regression=# create table foo(bar int);insert into foo select i from
generate_series(1, 10000) i;
CREATE TABLE
INSERT 0 10000
Now run vacuum verbose against the table several times:
regression=# vacuum verbose foo;
*** SNIP ***
regression=# vacuum verbose foo;
The first vacuum shows "tuples: 0 removed, 10000 remain...", which is
correct. However, each subsequent vacuum verbose revises the estimate
downwards, eventually making pg_class.reltuples significantly
underestimate tuple density (same as the first time around).
Attached patch fixes closes the remaining gap. With the patch applied,
the second and subsequent vacuum verbose operations from the test case
will show that reltuples is still 10000 (it won't ever change). The
patch just extends an old behavior that was applied when scanned_pages
== 0 to cases where scanned_pages <= 1 (unless we happened to scan all
of the relation's tables, of course). It doesn't remove the original
test from commit 74388a1a, which still seems like a good idea to me.
--
Peter Geoghegan