Re: autovacuum prioritization - Mailing list pgsql-hackers

From Robert Haas
Subject Re: autovacuum prioritization
Date
Msg-id CA+TgmoadggMo0_xEHpjOcBqJgurTNOxKNoqmuMP77PrFdaX3Tw@mail.gmail.com
Whole thread Raw
In response to Re: autovacuum prioritization  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: autovacuum prioritization
List pgsql-hackers
On Tue, Jan 25, 2022 at 3:32 PM Peter Geoghegan <pg@bowt.ie> wrote:
> For example, a
> page that has 5 dead heap-only tuples is vastly different to a similar
> page that has 5 LP_DEAD items instead -- and yet our current approach
> makes no distinction. Chances are very high that if the only dead
> tuples are heap-only tuples, then things are going just fine on that
> page -- opportunistic pruning is actually keeping up.

Hmm, I think that's a really good insight. Perhaps we ought to forget
about counting updates and deletes and instead count dead line
pointers. Or maybe we can estimate the number of dead line pointers by
knowing how many updates and deletes there were, as long as we can
distinguish hot updates from non-HOT updates, which I think we can.

> if successive ANALYZE operations notice
> a consistent pattern where pages that had a non-zero number of LP_DEAD
> items last time now have a significantly higher number, then it's a
> good idea to err in the direction of more aggressive vacuuming.
> *Growing* concentrations of LP_DEAD items signal chaos. I think that
> placing a particular emphasis on pages with non-zero LP_DEAD items as
> a qualitatively distinct category of page might well make sense --
> relatively few blocks with a growing number of LP_DEAD items seems
> like it should be enough to make autovacuum run aggressively.

I think measuring the change over time here might be fraught with
peril. If vacuum makes a single pass over the indexes, it can retire
as many dead line pointers as we have, or as will fit in memory, and
the effort doesn't really depend too much on exactly how many dead
line pointers we're trying to find. (I hear that it does depend more
than you'd think ... but I still don't think that should be the
dominant consideration here.) So to be efficient, we want to do that
pass over the indexes when we have a suitably large batch of dead line
pointers. I don't think it really depends on how long it took the
batch to get to that size. I don't want to vacuum a terabyte of
indexes with a much-smaller-than-normal batch of dead TIDs just
because the number of dead TIDs seems to be increasing quickly at the
moment: it's hard to imagine that the results will be worth the
resources I'll have to expend to get there. On the other hand I also
don't think I want to postpone vacuuming the indexes because the
number is really big but not growing that fast.

I feel like my threshold for the number of dead TIDs that ought to
trigger a vacuum grows as the table gets bigger, capped by how much
memory I've got. But I don't feel like the rate at which it's changing
necessarily matters. Like if I create a million dead line pointers
really quickly, wait a month, and then create another million dead
line pointers, I feel like I want the system to respond just as
aggressively as if the month-long delay were omitted.

Maybe my feelings are wrong here. I'm just saying that, to me, it
doesn't feel like the rate of change is all that relevant.

> Even busy production DBs should usually only be vacuuming one large
> table at a time. Also might make sense to strategically align the work
> with the beginning of a new checkpoint.

I'm not sure that either of those statements are correct. But on the
other hand, I am also not sure that either of those statements are
incorrect.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Bug in ProcArrayApplyRecoveryInfo for snapshots crossing 4B, breaking replicas
Next
From: Robert Haas
Date:
Subject: Re: autovacuum prioritization