pgsql: Avoid VACUUM reltuples distortion. - Mailing list pgsql-committers

From Peter Geoghegan
Subject pgsql: Avoid VACUUM reltuples distortion.
Date
Msg-id E1nKVP3-0001GW-8I@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Avoid VACUUM reltuples distortion.

Add a heuristic that avoids distortion in the pg_class.reltuples
estimates used by VACUUM.  Without the heuristic, successive manually
run VACUUM commands (run against a table that is never modified after
initial bulk loading) will scan the same page in each VACUUM operation.
Eventually pg_class.reltuples may reach the point where one single heap
page is accidentally considered highly representative of the entire
table.  This is likely to be completely wrong, since the last heap page
typically has fewer tuples than average for the table.

It's not obvious that this was a problem prior to commit 44fa8488, which
made vacuumlazy.c consistently scan the last heap page (even when it is
all-visible in the visibility map).  It seems possible that there were
more subtle variants of the same problem that went unnoticed for quite
some time, though.  Commit 44fa8488 simplified certain aspects of when
and how relation truncation was considered, but it did not introduce the
"scan the last page" behavior.  Essentially the same behavior was
introduced much earlier, in commit e8429082.  It was conditioned on
whether or not truncation looked promising towards the end of the
initial heap pass by VACUUM until recently, which was at least somewhat
protective.  That doesn't seem like something that we should be relying
on, though.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkNKORurux459M64mR63Aw4Jq7MBRVcX=CvALqN3A88WA@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/74388a1ac36d2f0206c5477eeddc636d7947a5a4

Modified Files
--------------
src/backend/commands/vacuum.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)


pgsql-committers by date:

Previous
From: Michael Paquier
Date:
Subject: pgsql: Remove all traces of tuplestore_donestoring() in the C code
Next
From: Peter Geoghegan
Date:
Subject: pgsql: Increase hash_mem_multiplier default to 2.0.