Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: New strategies for freezing, advancing relfrozenxid early
Date
Msg-id CAH2-Wzn9KRf3rBSDkYsK3tvW2JegLfqCCG2Kwi4Ay9QVmhwPSA@mail.gmail.com
Whole thread Raw
In response to Re: New strategies for freezing, advancing relfrozenxid early  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: New strategies for freezing, advancing relfrozenxid early
Re: New strategies for freezing, advancing relfrozenxid early
List pgsql-hackers
On Tue, Dec 6, 2022 at 1:45 PM Peter Geoghegan <pg@bowt.ie> wrote:
> v9 will also address some of the concerns you raised in your review
> that weren't covered by v8, especially about the VM snapshotting
> infrastructure. But also your concerns about the transition from lazy
> strategies to eager strategies.

Attached is v9. Highlights:

* VM snapshot infrastructure now spills using temp files when required
(only in larger tables).

v9 is the first version that has a credible approach to resource
management, which was something I put off until recently. We only use
a fixed amount of memory now, which should be acceptable from the
viewpoint of VACUUM resource management. The temp files use the
BufFile infrastructure in a relatively straightforward way.

* VM snapshot infrastructure now uses explicit prefetching.

Our approach is straightforward, and perhaps even obvious: we prefetch
at the point that VACUUM requests the next block in line. There is a
configurable prefetch distance, controlled by
maintenance_io_concurrency. We "stage" a couple of thousand
BlockNumbers in VACUUM's vmsnap by bulk-reading from the vmsnap's
local copy of the visibility map -- these staged blocks are returned
to VACUUM to scan, with interlaced prefetching of later blocks from
the same local BlockNumber array.

The addition of prefetching ought to be enough to avoid regressions
that might otherwise result from the removal of SKIP_PAGES_THRESHOLD
from vacuumlazy.c (see commit bf136cf6 from around the time the
visibility map first went in for the full context). While I definitely
need to do more performance validation work around prefetching
(especially on high latency network-attached storage), I imagine that
it won't be too hard to get into shape for commit. It's certainly not
committable yet, but it's vastly better than v8.

The visibility map snapshot interface (presented by visibilitymap.h)
also changed in v9, mostly to support prefetching. We now have an
iterator style interface (so vacuumlazy.c cannot request random
access). This iterator interface is implemented by visibilitymap.c
using logic similar to the current lazy_scan_skip() logic from
vacuumlazy.c (which is gone).

All told, visibilitymap.c knows quite a bit more than it used to about
high level requirements from vacuumlazy.c. For example it has explicit
awareness of VM skipping strategies.

* Page-level freezing commit now freezes a page whenever VACUUM
detects that pruning ran and generated an FPI.

Following a suggestion by Andres, page-level freezing is now always
triggered when pruning needs an FPI. Note that this optimization gets
applied regardless of freezing strategy (unless you turn off
full_page_writes, I suppose).

This optimization is added by the second patch
(v9-0002-Add-page-level-freezing-to-VACUUM.patch).

* Fixed the doc build.

* Much improved criteria for deciding on freezing and vmsnap skipping
strategies in vacuumlazy.c lazy_scan_strategy function -- improved
"cost model".

VACUUM should now give users a far smoother "transition" from lazy
processing to eager processing. A table that starts out small (smaller
than vacuum_freeze_strategy_threshold), but gradually grows, and
eventually becomes fairly large (perhaps to a multiple of
vacuum_freeze_strategy_threshold in size) will now experience a far
more gradual transition, with catch-up freezing spread out multiple
VACUUM operations. We avoid big jumps in the overhead of freezing,
where one particular VACUUM operation does all required "catch-up
freezing" in one go.

My approach is to "stagger" the timeline for switching freezing
strategy and vmsnap skipping strategy. We now change over from lazy to
eager freezing strategy when the table size threshold (controlled by
vacuum_freeze_strategy_threshold) is first crossed, just like in v8.
But unlike v8, v9 will switch over to eager skipping in some later
VACUUM operation (barring edge cases). This is implemented in a fairly
simple way: we now apply a "separate" threshold that is based on
vacuum_freeze_strategy_threshold: a threshold that's *twice* the
current value of the vacuum_freeze_strategy_threshold GUC/reloption
threshold.

My approach of "staggering" multiple distinct behaviors to avoid
having them all kick in during the same VACUUM operation isn't new to
v9. The behavior around waiting for cleanup locks (added by
v9-0005-Finish-removing-aggressive-mode-VACUUM.patch) is another
example of the same general idea.

In general I think that VACUUM shouldn't switch to more aggressive
behaviors all at the same time, in the same VACUUM. Each distinct
aggressive behavior has totally different properties, so there is no
reason why VACUUM should start to apply each and every one of them at
the same time. Some "aggressive" behaviors have the potential to make
things quite a lot worse, in fact. The cure must not be worse than the
disease.

* Related to the previous item (about the "cost model" that chooses a
strategy), we now have a much more sophisticated approach when it
comes to when and how we decide to advance relfrozenxid in smaller
tables (tables whose size is < vacuum_freeze_strategy_threshold). This
improves things for tables that start out small, and stay small.
Tables where we're unlikely to want to advance relfrozenxid in every
single VACUUM (better to be lazy with such a table), but still want to
be clever about advancing relfrozenxid "opportunistically".

The way that VACUUM weighs both table age and the added cost of
relfrozenxid advancement is more sophisticated in v9. The goal is to
make it more likely that VACUUM will stumble upon opportunities to
advance relfrozenxid when it happens to be cheap, which can happen for
many reasons. All of which have a great deal to do with workload
characteristics.

As in v8, v9 makes VACUUM willing to advance relfrozenxid without
concern for table age, whenever it notices that the cost of doing so
happens to be very cheap (in practice this means that the number of
"extra" heap pages scanned is < 5% of rel_pages). However, in v9 we
now go further by scaling this threshold through interpolation, based
on table age.

We have the same "5% of rel_pages" threshold when table age is less
than half way towards the point that autovacuum.c will launch an
antiwraparound autovacuum -- when we still have only minimal concern
about table age. But the rel_pages-wise threshold starts to grow once
table age gets past that "half way towards antiwrap AV" point. We
interpolate the rel_pages-wise threshold using a new approach in v9.

At first the rel_pages-wise threshold grows quite slowly (relative to
the rate at which table age approaches the point of forcing an
antiwraparound AV). For example, when we're 60% of the way towards
needing an antiwraparound AV, and VACUUM runs, we'll eagerly advance
relfrozenxid provided that the "extra" cost of doing so happens to be
less than ~22% of rel_pages. It "accelerates" from there (assuming
fixed rel_pages).

VACUUM will now tend to take advantage of individual table
characteristics that make it relatively cheap to advance relfrozenxid.
Bear in mind that these characteristics are not fixed for the same
table. The "extra" cost of advancing relfrozenxid during this VACUUM
(whether measured in absolute terms, of as a proportion of the net
amount of work just to do simple vacuuming) just isn't predictable
with real workloads. Especially not with the FPI opportunistic
freezing stuff from the second patch (the "freeze when heap pruning
gets an FPI" thing) in place. We should expect significant "natural
variation" among tables, and within the same table over time -- this
is a good thing.

For example, imagine a table that experiences a bunch of random
deletes, which leads to a  VACUUM that must visit most heap pages (say
85% of rel_pages). Let's suppose that those deletes are a once-off
thing. The added cost of advancing relfrozenxid in the next VACUUM
still isn't trivial (assuming the remaining 15% of pages are
all-visible). But it is probably still worth doing if table age is at
least starting to become a concern. It might actually be a lot cheaper
to advance relfrozenxid early.

* Numerous structural improvements, lots of code polishing.

The patches have been reordered in a way that should make review a bit
easier. Now the commit messages are written in a way that clearly
anticipates the removal of aggressive mode VACUUM, which the last
patch actually finishes. Most of the earlier commits are presented as
preparation for completely removing aggressive mode VACUUM.

The first patch (which refactors how VACUUM passes around cutoffs like
FreezeLimit and OldestXmin by using a dedicated struct) is much
improved. heap_prepare_freeze_tuple() now takes a more explicit
approach to tracking what needs to happen for the tuple's freeze plan.
This allowed me to pepper it with defensive assertions. It's also a
lot clearer IMV. For example, we now have separate freeze_xmax and
replace_xmax tracker variables.

The second patch in the series (the page-level freezing patch) is also
much improved. I'm much happier with the way that
heap_prepare_freeze_tuple() now explicitly delegates control of
page-level freezing to FreezeMultiXactId() in v9, for example.

Note that I squashed the patch that taught VACUUM to size dead_items
using scanned_pages into the main visibility map patch
(v9-0004-Add-eager-and-lazy-VM-strategies-to-VACUUM.patch). That's why
there are only 5 patches (down from 6) in v9.

--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: GetNewObjectId question
Next
From: Michael Paquier
Date:
Subject: Re: Raising the SCRAM iteration count