On 13/02/2026 22:31, Sebastian Webber wrote:
> PostgreSQL version: 17.8 (standby), 17.5 (primary)
>
> Primary: PostgreSQL 17.5 (Debian 17.5-1.pgdg130+1) on aarch64-unknown-
> linux-gnu
> Standby: PostgreSQL 17.8 (Debian 17.8-1.pgdg13+1) on aarch64-unknown-
> linux-gnu
>
> Platform: Docker containers on macOS (Apple Silicon / aarch64), Docker
> Desktop
>
>
> Description
> -----------
>
> A PostgreSQL 17.8 standby crashes during WAL replay when streaming
> from a 17.5 primary. The crash occurs after replaying a
> MultiXact/TRUNCATE_ID record followed by a MultiXact/CREATE_ID
> record.
Thanks for the report, I can repro it with your script. It is indeed a
regression introduced in the latest minor release, in the logic to
replay multixact WAL generated on older minor versions. (Commit
8ba61bc063). Adding the folks from the thread that led to that commit.
The commit added this in RecordNewMultiXact():
> /*
> * Older minor versions didn't set the next multixid's offset in this
> * function, and therefore didn't initialize the next page until the next
> * multixid was assigned. If we're replaying WAL that was generated by
> * such a version, the next page might not be initialized yet. Initialize
> * it now.
> */
> if (InRecovery &&
> next_pageno != pageno &&
> pg_atomic_read_u64(&MultiXactOffsetCtl->shared->latest_page_number) == pageno)
> {
> elog(DEBUG1, "next offsets page is not initialized, initializing it now");
The idea is that if the next offset falls on a different page
(next_pageno != pageno), and we have not yet initialized the next page
(pg_atomic_read_u64(&MultiXactOffsetCtl->shared->latest_page_number) ==
pageno), we initialize it now. However, that last check goes wrong after
a truncation record is replayed. Replaying a truncation record does this:
>
> /*
> * During XLOG replay, latest_page_number isn't necessarily set up
> * yet; insert a suitable value to bypass the sanity test in
> * SimpleLruTruncate.
> */
> pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
> pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
> pageno);
Thanks to that, latest_page_number moves backwards to much older page
number. That breaks the "was the next offset page already initialized?"
test in RecordNewMultiXact().
I don't understand why that "bypass the sanity check" is needed. As far
as I can see, latest_page_number is tracked accurately during WAL
replay, and should already be set up. It's initialized in
StartupMultiXact(), and updated whenever the next page is initialized.
That was introduced a long time ago, in commit 4f627f8973, which in turn
was a backpatched and had deal with WAL that was generated before that
commit. I suspect it was necessary back then, for backwards
compatiblity, but isn't necessary any more. Hence, I propose to remove
that "bypass the sanity check" code (attached). Does anyone see a
scenario where latest_page_number might not be set correctly?
If we want to play it even more safe -- and I guess that's the right
thing to do for backpatching -- we could set latest_page_number
*temporarily* while we do the the truncation, and restore the old value
afterwards.
This fixes the bug. With this fix, you can replay WAL that's already
been generated.
- Heikki