Home > mailing lists

Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL
Date	September 25, 2025 22:19:55
Msg-id	CA+TgmoYDgcV8viFQNLHYyC2Zct4sRsOf99rWWzjt6k3Z2hpNig@mail.gmail.com Whole thread
In response to	[PATCH] Fix pg_rewind false positives caused by shutdown-only WAL (Srinath Reddy Sadipiralla <srinath2133@gmail.com>)
List	pgsql-hackers

Tree view

On Sat, Sep 6, 2025 at 12:34 PM Srinath Reddy Sadipiralla
<srinath2133@gmail.com> wrote:
> While working with pg_rewind, I noticed that it can sometimes request a rewind even when no actual changes exist
aftera failover. 
>
> Problem:
> Currently, pg_rewind determines the end-of-WAL on the target by using the last shutdown checkpoint (or
minRecoveryPointfor a standby). This creates a false positive scenario: 
>
> 1)Suppose a standby is promoted to become the new primary.
> 2)Later, the old primary is cleanly shut down.
> 3)The only WAL record generated on the old primary after divergence is a shutdown checkpoint.
>
> At this point, the old primary and new primary contain identical data.

This isn't true, because the control file changes when the shutdown
checkpoint is written.

Also, I think if you tested this even once, you'd find out that it
doesn't actually work. At least, it doesn't work for me, and I don't
see how it would work for you. I created a primary. Then I made a
standby. Then I promoted the standby while doing a clean shutdown of
the primary. Then I tried to restart the primary as a standby, and I
get an infinite loop like this:

2025-09-25 15:07:03.094 EDT [58031] FATAL:  terminating walreceiver
process due to administrator command
2025-09-25 15:07:03.094 EDT [56485] LOG:  new timeline 2 forked off
current database system timeline 1 before current recovery point
0/040000D8

Now, admittedly, I didn't run pg_rewind here. But if I understand
correctly, your patch's idea is to have pg_rewind say that everything
is fine in this scenario. What it won't do is make postgres itself
agree with that conclusion. I'm actually not sure the patch would have
achieved that goal here, because the WAL actually looks like this:

rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn:
0/04000028, prev 0/03000120, desc: RUNNING_XACTS nextXid 754
latestCompletedXid 753 oldestRunningXid 754
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn:
0/04000060, prev 0/04000028, desc: CHECKPOINT_SHUTDOWN redo
0/04000060; tli 1; prev tli 1; fpw true; wal_level replica; xid 0:754;
oid 14029; multi 1; offset 0; oldest xid 746 in DB 1; oldest multi 1
in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid
0; shutdown

Your patch doesn't seem to contain any logic to ignore RUNNING_XACTS,
just CHECKPOINT_SHUTDOWN. So in this scenario it might have still
thought a rewind was necessary. But I think this test case still makes
the point that you can't just change pg_rewind if the server has a
different idea about how things work.

Also, the server is correct to be concerned about the control file
changing, and your patch is wrong to try to ignore such a change. It's
true that the control file is not a relation data file, but its
contents are very important, and updates to them matter when deciding
whether servers are in sync.

Also, even if the idea of your patch were correct, the logic it uses
to try to find an XLOG_CHECKPOINT_SHUTDOWN or XLOG_SWITCH record is
wrong. It's mandatory to first test XLogRecGetRmid(xlogreader) against
the appropriate RM_whatever_ID; see walsummarizer.c for an example of
how to do this correctly.

--
Robert Haas
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Masahiko Sawada
Date: 25 September 2025, 22:15:27
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Tom Lane
Date: 25 September 2025, 22:36:18
Subject: Re: Avoiding roundoff error in pg_sleep()

Re: [PATCH] Fix pg_rewind false positives caused by shutdown-only WAL - Mailing list pgsql-hackers

Previous

Next