Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date
Msg-id CA+TgmoaGoZ2wX6T4sj0eL5YAOQKW3tS8ViMuN+tcqWJqFPKFaA@mail.gmail.com
Whole thread Raw
In response to Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
List pgsql-hackers
On Fri, Jan 7, 2022 at 5:20 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I thought I was being conservative by suggesting
> autovacuum_freeze_max_age/2. My first thought was to teach VACUUM to
> make its FreezeLimit "OldestXmin - autovacuum_freeze_max_age". To me
> these two concepts really *are* the same thing: vacrel->FreezeLimit
> becomes a backstop, just as anti-wraparound autovacuum (the
> autovacuum_freeze_max_age cutoff) becomes a backstop.

I can't follow this. If the idea is that we're going to
opportunistically freeze a page whenever that allows us to mark it
all-visible, then the remaining question is what XID age we should use
to force freezing when that rule doesn't apply. It seems to me that
there is a rebuttable presumption that that case ought to work just as
it does today - and I think I hear you saying that it should NOT work
as it does today, but should use some other threshold. Yet I can't
understand why you think that.

> I couldn't agree more. In fact, I was mostly thinking about how to
> *help* these users. Insisting on waiting for a cleanup lock before it
> becomes strictly necessary (when the table age is only 50
> million/vacuum_freeze_min_age) is actually a big part of the problem
> for these users. vacuum_freeze_min_age enforces a false dichotomy on
> aggressive VACUUMs, that just isn't unhelpful. Why should waiting on a
> cleanup lock fix anything?

Because waiting on a lock means that we'll acquire it as soon as it's
available. If you repeatedly call your local Pizzeria Uno's and ask
whether there is a wait, and head to the restaurant only when the
answer is in the negative, you may never get there, because they may
be busy every time you call - especially if you always call around
lunch or dinner time. Even if you eventually get there, it may take
multiple days before you find a time when a table is immediately
available, whereas if you had just gone over there and stood in line,
you likely would have been seated in under an hour and savoring the
goodness of quality deep-dish pizza not too long thereafter. The same
principle applies here.

I do think that waiting for a cleanup lock when the age of the page is
only vacuum_freeze_min_age seems like it might be too aggressive, but
I don't think that's how it works. AFAICS, it's based on whether the
vacuum is marked as aggressive, which has to do with
vacuum_freeze_table_age, not vacuum_freeze_min_age. Let's turn the
question around: if the age of the oldest XID on the page is >150
million transactions and the buffer cleanup lock is not available now,
what makes you think that it's any more likely to be available when
the XID age reaches 200 million or 300 million or 700 million? There
is perhaps an argument for some kind of tunable that eventually shoots
the other session in the head (if we can identify it, anyway) but it
seems to me that regardless of what threshold we pick, polling is
strictly less likely to find a time when the page is available than
waiting for the cleanup lock. It has the counterbalancing advantage of
allowing the autovacuum worker to do other useful work in the meantime
and that is indeed a significant upside, but at some point you're
going to have to give up and admit that polling is a failed strategy,
and it's unclear why 150 million XIDs - or probably even 50 million
XIDs - isn't long enough to say that we're not getting the job done
with half measures.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Arne Roland
Date:
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Next
From: Robert Haas
Date:
Subject: Re: Patch: Code comments: why some text-handling functions are leakproof