Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: New strategies for freezing, advancing relfrozenxid early
Date
Msg-id CAH2-WzmjHQJ7pbdO4BtWVJ6CLG-Mp9CNe914WUJdiScOTNRKRw@mail.gmail.com
Whole thread Raw
In response to Re: New strategies for freezing, advancing relfrozenxid early  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: New strategies for freezing, advancing relfrozenxid early
Re: New strategies for freezing, advancing relfrozenxid early
List pgsql-hackers
On Thu, Dec 15, 2022 at 10:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
> I agree that the burden of catch-up freezing is excessive here (in
> fact I already wrote something to that effect on the wiki page). The
> likely solution can be simple enough.

Attached is v10, which fixes this issue, but using a different
approach to the one I sketched here.

This revision also changes the terminology around VM skipping: we now
call the strategies there "scanning strategies", per feedback from
Jeff and John. This does seem a lot clearer.

Also cleaned up the docs a little bit, which were messed up by a
rebasing issue in v9.

I ended up fixing the aforementioned "too much catch-up freezing"
issue by just getting rid of the whole concept of a second table-size
threshold that forces the eager scanning strategy. I now believe that
it's fine to just rely on the generic logic that determines scanning
strategy based on a combination of table age and the added cost of
eager scanning. It'll work in a way that doesn't result in too much of
a freezing spike during any one VACUUM operation, without waiting
until an antiwraparound autovacuum to advance relfrozenxid (it'll
happen far earlier than that, though still quite a lot later than what
you'd see with v9, so as to avoid that big spike in freezing that was
possible in pgbench_history-like tables [1]).

This means that vacuum_freeze_strategy_threshold is now strictly
concerned with freezing. A table that is always frozen eagerly will
inevitably fall into a pattern of advancing relfrozenxid in every
VACUUM operation, but that isn't something that needs to be documented
or anything. We don't need to introduce a special case here.

The other notable change for v10 is in the final patch, which removes
aggressive mode altogether. v10 now makes lazy_scan_noprune less
willing to give up on setting relfrozenxid to a relatively recent XID.
Now lazy_scan_noprune is willing to wait a short while for a cleanup
lock on a heap page (a few tens of milliseconds) when doing so might
be all it takes to preserve VACUUM's ability to advance relfrozenxid
all the way up to FreezeLimit, which is the traditional guarantee made
by aggressive mode VACUUM.

This makes lazy_scan_noprune "under promise and over deliver". It now
only promises to advance relfrozenxid up to MinXid in the very worst
case -- even if that means waiting indefinitely long for a cleanup
lock. That's not a very strong promise, because advancing relfrozenxid
up to MinXid is only barely adequate. At the same time,
lazy_scan_noprune is willing to go to extra trouble to
get a recent enough FreezeLimit -- it'll wait for a few 10s of milliseconds.
It's just not willing to wait indefinitely. This seems likely to give us the
best of both worlds.

This was based in part on something that Andres said about cleanup
locks a while back. He had a concern about cases where even MinXid was
before OldestXmin. To some degree that's addressed here, because I've
also changed the way that MinXid is determined, so that it'll be a
much earlier value. That doesn't have much downside now, because of the
way that lazy_scan_noprune is now "aggressive-ish" when that happens to
make sense.

Not being able to get a cleanup lock on our first attempt is relatively
rare, and when it happens it's often something completely benign. For
example, it might just be that the checkpointer was writing out the
same page at the time, which signifies nothing about it really being
hard to get a cleanup lock -- the checkpointer will have dropped its
conflicting buffer pin almost immediately. It would be a shame to
accept a significantly older final relfrozenxid during an infrequent,
long running antiwraparound autovacuum of larger tables when that
happens -- we should be willing to wait 30 milliseconds (just not 30
minutes, or 30 days).

None of this even comes up for pages whose XIDs are >= FreezeLimit,
which is actually most pages with the patch, even in larger tables.
It's relatively rare for VACUUM to need to process any heap page in
lazy_scan_noprune, but it'll be much rarer still for it to have to do
a "short wait" like this. So "short waits" have a very small downside,
and (at least occasionally) a huge upside.

By inventing a third alternative behavior (to go along with processing
pages via standard lazy_scan_noprune skipping and processing pages in
lazy_scan_prune), VACUUM has the flexibility to respond in a way
that's proportionate to the problem at hand, in one particular heap
page. The new behavior has zero chance of mattering in most individual
tables/workloads, but it's good to have every possible eventuality
covered. I really hate the idea of getting a significantly worse
outcome just because of something that happened in one single heap
page, because the wind changed directions at the wrong time.

[1] https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Patch

--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: Transaction timeout
Next
From: Vik Fearing
Date:
Subject: Re: Standard REGEX functions