Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
Date
Msg-id CA+TgmoZnKXf4YPdtzWam+WoF7B-jjsDCZMG5x+j+wnCci6-oXQ@mail.gmail.com
Whole thread Raw
In response to Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
List pgsql-hackers
On Tue, Dec 7, 2021 at 2:13 PM Peter Geoghegan <pg@bowt.ie> wrote:
> For example, why should we count dead heap-only tuples from earlier in
> a HOT chain, even when we see no evidence that opportunistic HOT
> pruning can't keep up on that page? Since we actually care about the
> direction of things, not just the present state of things, we'd be
> justified in completely ignoring those dead tuples. Similarly, it
> might well make sense to give more weight to concentrations of LP_DEAD
> items on a page -- that is a signal that things are not going well *at
> the level of the page*. Not so much when you have a few LP_DEAD stubs,
> but certainly when you have dozens of them on one page, or even
> hundreds. And so ISTM that the conditions of the page should influence
> how we interpret/count that page's dead tuples, in both directions
> (interpret the page as having more dead tuples, or fewer).

Well... I mean, I think we're almost saying the same thing, then, but
I think you're saying it more confusingly. I have no objection to
counting the number of dead HOT chains rather than the number of dead
tules, because that's what affects the index contents, but there's no
need to characterize that as "not the literal truth." There's nothing
fuzzy or untrue about it if we simply say that's what we're doing.

> Right. And as I keep saying, the truly important thing is to not
> *completely* ignore any relevant dimension of cost. I just don't want
> to ever be wildly wrong -- not even once. We can tolerate being
> somewhat less accurate all the time (not that we necessarily have to
> make a trade-off), but we cannot tolerate pathological behavior. Of
> course I include new/theoretical pathological behaviors here (not just
> the ones we know about today).

Sure, but we don't *need* to be less accurate, and I don't think we
even *benefit* from being less accurate. If we do something like count
dead HOT chains instead of dead tuples, let's not call that a
less-accurate count of dead tuples. Let's call it an accurate count of
dead HOT chains.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Dubious usage of TYPCATEGORY_STRING
Next
From: Andrew Dunstan
Date:
Subject: Re: enable certain TAP tests for MSVC builds