Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
Date
Msg-id CAH2-WzmvXXEKtEph7U360umZ5pN3d18RBfu=nyPg9neBLDUWdw@mail.gmail.com
Whole thread Raw
In response to Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
List pgsql-hackers
On Mon, Dec 6, 2021 at 12:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
> So does this. If some of the table is now all-visible when it wasn't
> before, it's certainly a good guess that the portions that still
> aren't have about the same distribution of dead tuples that they did
> before ... although the other direction is less clear: it seems
> possible that newly not-all-visible pages have fewer dead tuples than
> ones which have been not-all-visible for a while. But you have to make
> some guess.

To me, it seems natural to accept and even embrace the inherent
uncertainty about the number of dead tuples. We should model our
current belief about how many dead tuples are in the table as a
probability density function (or something along the same lines).
There is a true "sample space" here. Once we focus on not-all-visible
pages, using authoritative VM info, many kinds of misestimations are
clearly impossible. For example, there are only so many
not-all-visible heap pages, and they can only hold so many dead tuples
(up to MaxHeapTuplesPerPage). This is a certainty.

The number of dead tuples in the table is an inherently dynamic thing,
which makes it totally dissimilar to the pg_statistics-based stats.
And so a single snapshot of a point in time is inherently much less
useful -- we ought to keep a few sets of old statistics within our new
pgstat_report_analyze() -- maybe 3 or 5. Each set of statistics
includes the total number of relpages at the time, the total number of
not-all-visible pages (i.e. interesting pages) at the time, and the
average number of live and dead tuples encountered. This is
interpreted (along with a current visibilitymap_count()) to get our
so-called probability density function (probably not really a PDF,
probably something simpler and only vaguely similar) within
autovacuum.c.

It just occurred to me that it makes zero sense that
pgstat_report_vacuum() does approximately the same thing as
pgstat_report_analyze() -- we make no attempt to compensate for the
fact that the report is made by VACUUM specifically, and so reflects
the state of each page in the table immediately after it was processed
by VACUUM. ISTM that this makes it much more likely to appear as an
underestimate later on --  pgstat_report_vacuum() gets the furthest
possible thing from a random sample. Whereas if we had more context
(specifically that there are very few or even 0 all-visible pages), it
wouldn't hurt us at all, and we wouldn't need to have special
pgstat_report_vacuum()-only heuristics.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Pavel Luzanov
Date:
Subject: Re: GiST operator class for bool
Next
From: Tom Lane
Date:
Subject: Triage for unimplemented geometric operators