Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: New strategies for freezing, advancing relfrozenxid early
Date
Msg-id 9fd17e87889845d74b8e3fdac7d2748a07950a92.camel@j-davis.com
Whole thread Raw
In response to Re: New strategies for freezing, advancing relfrozenxid early  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: New strategies for freezing, advancing relfrozenxid early
List pgsql-hackers
On Sat, 2022-12-10 at 18:11 -0800, Peter Geoghegan wrote:
> On Tue, Dec 6, 2022 at 1:45 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > v9 will also address some of the concerns you raised in your review
> > that weren't covered by v8, especially about the VM snapshotting
> > infrastructure. But also your concerns about the transition from
> > lazy
> > strategies to eager strategies.
>
> Attached is v9. Highlights:

Comments:

* The documentation shouldn't have a heading like "Managing the 32-bit
Transaction ID address space". We already have a concept of "age"
documented, and I think that's all that's needed in the relevant
section. Freezing is driven by a need to keep the age of the oldest
transaction ID in a table to less than ~2B; and also the need to
truncate the clog (and reduce lookups of really old xids). It's fine to
give a brief explanation about why we can't track very old xids, but
it's more of an internal detail and not the main point.

* I'm still having a hard time with vacuum_freeze_strategy_threshold.
Part of it is the name, which doesn't seem to convey the meaning. But
the heuristic also seems off to me. What if you have lots of partitions
in an append-only range-partitioned table? That would tend to use the
lazy freezing strategy (because each partition is small), but that's
not what you want. I understand heuristics aren't perfect, but it feels
like we could do something better. Also, another purpose of this seems
to be to achieve v15 behavior (if v16 behavior causes a problem for
some workload), which seems like a good idea, but perhaps we should
have a more direct setting for that?

* The comment above lazy_scan_strategy() is phrased in terms of the
"traditional approach". It would be more clear if you described the
current strategies and how they're chosen. The pre-16 behavior was as
lazy as possible, so that's easy enough to describe without referring
to history.

* "eager skipping behavior" seems like a weird phrasing because it's
not immediately clear if that means "skip more pages" (eager to skip
pages and lazy to process them) or "skip fewer pages" (lazy to skip the
pages and eager to process the pages).

* The skipping behavior is for all-visible pages is binary: skip them
all, or skip none. That makes sense in the context of relfrozenxid
advancement. But how does that avoid IO spikes? It would seem perfectly
reasonable to me, if relfrozenxid advancement is not a pressing
problem, to process some fraction of the all-visible pages (or perhaps
process enough of them to freeze some fraction). That would ensure that
each VACUUM makes a payment on the deferred costs of freezing. I think
this has already been discussed but it keeps reappearing in my mind, so
maybe we can settle this with a comment (and/or docs)?

* I'm wondering whether vacuum_freeze_min_age makes sense anymore. It
doesn't take effect unless the page is not skipped, which is confusing
from a usability standpoint, and we have better heuristics to decide if
the whole page should be frozen or not anyway (i.e. if an FPI was
already taken then freezing is cheaper).


--
Jeff Davis
PostgreSQL Contributor Team - AWS





pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] Add native windows on arm64 support