Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: New strategies for freezing, advancing relfrozenxid early
Date
Msg-id CAH2-Wz=hfTw=3rX3Kew5gt8LHnUnO7G5gfp6k4iT_JTs7zP9XA@mail.gmail.com
Whole thread Raw
In response to Re: New strategies for freezing, advancing relfrozenxid early  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Tue, Oct 4, 2022 at 7:59 PM Jeff Davis <pgsql@j-davis.com> wrote:
> I am fine with that, but I'd like us all to understand what the
> downsides are.

Although I'm sure that there must be one case that loses measurably,
it's not particularly obvious where to start looking for one. I mean
it's easy to imagine individual pages that we lose on, but a practical
test case where most of the pages are like that reliably is harder to
imagine.

> If I understand correctly:
>
> 1. Eager freezing (meaning to freeze at the same time as setting all-
> visible) causes a modest amount of WAL traffic, hopefully before the
> next checkpoint so we can avoid FPIs. Lazy freezing (meaning set all-
> visible but don't freeze) defers the work, and it might never need to
> be done; but if it does, it can cause spikes at unfortunate times and
> is more likely to generate more FPIs.

Lazy freezing means to freeze every eligible tuple (every XID <
OldestXmin) when one or more XIDs are before FreezeLimit. Eager
freezing means freezing every eligible tuple when the page is about to
be set all-visible, or whenever lazy freezing would trigger freezing.

Eager freezing tends to avoid big spikes in larger tables, which is
very important. It can sometimes be cheaper and better in every way
than lazy freezing. Though lazy freezing sometimes retains an
advantage by avoiding freezing that is never going to be needed
altogether, typically only in small tables.

Lazy freezing is fairly similar to what we do on HEAD now -- though
it's not identical. It's still "page level freezing". It has lazy
criteria for triggering page freezing.

> 2. You're trying to mitigate the downsides of eager freezing by:
>   a. when freezing a tuple, eagerly freeze other tuples on that page
>   b. optimize WAL freeze records

Sort of.

Both of these techniques apply to eager freezing too, in fact. It's
just that eager freezing is likely to do the bulk of all freezing that
actually goes ahead. It'll disproportionately be helped by these
techniques because it'll do most actual freezing that goes ahead (even
when most VACUUM operations use the lazy freezing strategy, which is
probably the common case -- just because lazy freezing freezes
lazily).

> 3. You're trying to capture the trade-off in #1 by using the table size
> as a proxy. Deferred work is only really a problem for big tables, so
> that's where you use eager freezing.

Right.

> But maybe we can just always use
> eager freezing?:

That doesn't seem like a bad idea, though it might be tricky to put
into practice. It might be possible to totally unite the concept of
all-visible and all-frozen pages in the scope of this work. But there
are surprisingly many tricky details involved. I'm not surprised that
you're suggesting this -- it basically makes sense to me. It's just
the practicalities that I worry about here.

>   a. You're mitigating the WAL work for freezing.

I don't see why this would be true. Lazy vs Eager are exactly the same
for a given page at the point that freezing is triggered. We'll freeze
all eligible tuples (often though not always every tuple), or none at
all.

Lazy vs Eager describe the policy for deciding to freeze a page, but
do not affect the actual execution steps taken once we decide to
freeze.

>   b. A lot of people run with checksums on, meaning that setting the
> all-visible bit requires WAL work anyway, and often FPIs.

The idea of rolling the WAL records into one does seem appealing, but
we'd still need the original WAL record to set a page all-visible in
VACUUM's second heap pass (only setting a page all-visible in the
first heap pass could be optimized by making the FREEZE_PAGE WAL
record mark the page all-visible too). Or maybe we'd roll that into
the VACUUM WAL record at the same time.

In any case the second heap pass would have to have a totally
different WAL logging strategy to the first heap pass. Not
insurmountable, but not exactly an easy thing to do in passing either.

>   c. All-visible is conceptually similar to freezing, but less
> important, and it feels more and more like the design concept of all-
> visible isn't carrying its weight.

Well, not quite -- at least not on the VM side itself.

There are cases where heap_lock_tuple() will update a tuple's xmax,
replacing it with a new Multi. This will necessitate clearly the
page's all-frozen bit in the VM -- but the all-visible bit will stay
set. This is why it's possible for small numbers of all-visible pages
to appear even in large tables that have been eagerly frozen.

>   d. (tangent) I had an old patch[1] that actually removed
> PD_ALL_VISIBLE (the page bit, not the VM bit), which was rejected, but
> perhaps its time has come?

I remember that pgCon developer meeting well.  :-)

If anything your original argument for getting rid of PD_ALL_VISIBLE
is weakened by the proposal to merge together the WAL records for
freezing and for setting a heap page all visible. You'd know for sure
that the page will be dirtied when such a WAL record needed to be
written, so there is actually no reason to care about dirtying the
page. No?

I'm in favor of reducing the number of WAL records required in common
cases if at all possible -- purely because the generic WAL record
overhead of having an extra WAL record does probably add to the WAL
overhead for work performed in lazy_scan_prune(). But it seems like
separate work to me.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Next
From: Michael Paquier
Date:
Subject: Re: Assorted fixes related to WAL files (was: Use XLogFromFileName() in pg_resetwal to parse position from WAL file)