Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
Date
Msg-id CAH2-WzmtP1rpk6At_xS4zLYL+s-645o-SbnVE3ZzkWwa3kkmAw@mail.gmail.com
Whole thread Raw
In response to Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
List pgsql-hackers
On Tue, Sep 20, 2022 at 3:12 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached is v2, which I'm just posting to keep CFTester happy. No real
> changes here.

Attached is v3. I'd like to move forward with commit soon. I'll do so
in the next few days, barring objections.

v3 has vacuumlazy.c pass NewRelfrozenXid instead of FreezeLimit for
the purposes of generating recovery conflicts during subsequent REDO
of the resulting xl_heap_freeze_page WAL record. This more general
approach is preparation for my patch to add page-level freezing [1].
It might theoretically lead to more recovery conflicts, but in
practice the impact should be negligible. For one thing VACUUM must
freeze *something* before any recovery conflict can happen during
subsequent REDO on a replica in hot standby. It's far more likely that
any disruptive recovery conflicts come from pruning.

It also makes the cutoff_xid field from the xl_heap_freeze_page WAL
record into a "standard latestRemovedXid format" field. In other words
it backs up an XID passed by vacuumlazy.c caller during original
execution (not in the REDO routine, as on HEAD). To make things
clearer, the patch also renames the nearby xl_heap_visible.cutoff_xid
field to xl_heap_visible.latestRemovedXid. Now there are no WAL
records with a field called "cutoff_xid" (they're all called
"latestRemovedXid" now). This matches PRUNE records, and B-Tree DELETE
records.

The overall picture is that all REDO routines (for both heapam and
index AMs) now advertise that they have a field that they use to
generate recovery conflicts that follows a standard format.  All
latestRemovedXid XIDs are applied in a standard way during REDO: by
passing them to ResolveRecoveryConflictWithSnapshot(). Users can grep
the output of tools like pg_waldump to find latestRemovedXid fields,
without necessarily needing to give any thought to which kind of WAL
records are involved, or even the rmgr. Presenting this information
precisely and uniformity seems useful to me. (Perhaps we should have a
truly generic name, which latestRemovedXid isn't, but that can be
handled separately.)

[1] https://commitfest.postgresql.org/39/3843/
--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] ALTER TABLE ... SET STORAGE default
Next
From: Jeff Davis
Date:
Subject: Re: Lack of PageSetLSN in heap_xlog_visible