Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date
Msg-id CAH2-WzmfyfmmANV6RRchHmUBgjU9QYBUP_6CoFuQ3d23T4E6nQ@mail.gmail.com
Whole thread Raw
In response to Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
List pgsql-hackers
On Fri, Feb 18, 2022 at 12:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I'd like to have a clearer idea of exactly what is in each of the
> remaining patches before forming a final opinion.

Great.

> What's tricky about 0001? Does it change any other behavior, either as
> a necessary component of advancing relfrozenxid more eagerly, or
> otherwise?

It does not change any other behavior. It's totally mechanical.

0001 is tricky in the sense that there are a lot of fine details, and
if you get any one of them wrong the result might be a subtle bug. For
example, the heap_tuple_needs_freeze() code path is only used when we
cannot get a cleanup lock, which is rare -- and some of the branches
within the function are relatively rare themselves. The obvious
concern is: What if some detail of how we track the new relfrozenxid
value (and new relminmxid value) in this seldom-hit codepath is just
wrong, in whatever way we didn't think of?

On the other hand, we must already be precise in almost the same way
within heap_tuple_needs_freeze() today -- it's not all that different
(we currently need to avoid leaving any XIDs < FreezeLimit behind,
which isn't made that less complicated by the fact that it's a static
XID cutoff). Plus, we have experience with bugs like this. There was
hardening added to catch stuff like this back in 2017, following the
"freeze the dead" bug.

> If there's a way you can make the precise contents of 0002 and 0003
> more clear, I would like that, too.

The really big one is 0002 -- even 0003 (the FSM PageIsAllVisible()
thing) wasn't on the table before now. 0002 is the patch that changes
the basic criteria for freezing, making it block-based rather than
based on the FreezeLimit cutoff (barring edge cases that are important
for correctness, but shouldn't noticeably affect freezing overhead).

The single biggest practical improvement from 0002 is that it
eliminates what I've called the freeze cliff, which is where many old
tuples (much older than FreezeLimit/vacuum_freeze_min_age) must be
frozen all at once, in a balloon payment during an eventual aggressive
VACUUM. Although it's easy to see that that could be useful, it is
harder to justify (much harder) than anything else. Because we're
freezing more eagerly overall, we're also bound to do more freezing
without benefit in certain cases. Although I think that this can be
justified as the cost of doing business, that's a hard argument to
make.

In short, 0001 is mechanically tricky, but easy to understand at a
high level. Whereas 0002 is mechanically simple, but tricky to
understand at a high level (and therefore far trickier than 0001
overall).

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: Time to drop plpython2?
Next
From: James Coleman
Date:
Subject: Re: Synchronizing slots from primary to standby