Re: Why clearing the VM doesn't require registering vm buffer in wal record - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Why clearing the VM doesn't require registering vm buffer in wal record
Date
Msg-id CAAKRu_YAvPWTuEotqN_yHnM0To3igVK_eWh_mRn12odr_20azg@mail.gmail.com
Whole thread
In response to Re: Why clearing the VM doesn't require registering vm buffer in wal record  (Andres Freund <andres@anarazel.de>)
Responses Re: Why clearing the VM doesn't require registering vm buffer in wal record
Re: Why clearing the VM doesn't require registering vm buffer in wal record
List pgsql-hackers
On Thu, Mar 5, 2026 at 4:01 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2026-03-05 15:38:24 -0500, Andres Freund wrote:
>
> > There's explicit code for ignoring the FSM, but I don't see the same for the
> > VM. And that makes sense: VM changes are mostly WAL logged, just not
> > completely / generically (i.e. this complaint), whereas FSM changes are not
> > WAL logged at all.
>
> Unfortunately I can confirm that incremental backups end up with an outdated
> VM.
>
> An unfortunate kicker: It looks like verify_heapam() doesn't even notice :(.

Attached is a patch set to fix the issue based largely on the work you
started on your branch. I attached the version targeting master/19
which is prefixed with v1_PGMASTER and the version targeting 18,
prefixed v1_PG18. The pg 18 changes aren't a straight cherry-pick to
17 (the earliest I'll backpatch because that was when incremental
backup was introduced) because the redo functions live in a different
file in 18 than in 17, but I want to avoid discussing three different
versions of this patch set on this thread.

The backpatched changes are different for a few reasons, but the
biggest difference from a review standpoint is that in pg18, the redo
routines can read WAL in the old format or the new format, so that
people can reasonably upgrade to the new minor version.

I added a TAP test that does something like Andres' incremental backup
repro but tests more variations. Because init_from_backup() runs
pg_combinebackup with the --debug flag, the log output is very verbose
(i.e. every single copied file is logged). I suggest we turn it off by
default in tests. I included a patch that does that. It isn't required
to commit my test, but my test does produce 5000 lines of regression
log output which seems...not ideal.

The patch set targeting master includes a few more changes aimed at
catching bugs like this in the future. 0005 adds a check to
verify_heapam() that PD_ALL_VISIBLE is never clear when the VM is set.
0006 stops masking PD_ALL_VISIBLE during WAL consistency checking. And
0007 is a version of a patch Andres started which validates that every
block registered with a WAL record is read during recovery.

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: Jim Jones
Date:
Subject: Re: Truncate logs by max_log_size
Next
From: Melanie Plageman
Date:
Subject: Re: Two issues leading to discrepancies in FSM data on the standby server