Home > mailing lists

Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery - Mailing list pgsql-hackers

From	Marco Nenciarini
Subject	Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery
Date	May 6 14:27:48
Msg-id	CA+nrD2eVHHNpKkEc=RsPkcbe033EyZqa_1YTFcSLqwCfZ9r2xA@mail.gmail.com Whole thread
In response to	Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery (Xuneng Zhou <xunengzhou@gmail.com>)
List	pgsql-hackers

Tree view

Hi Xuneng,

You're right that polling isn't ideal. For a backpatchable bug fix
though, the trade-off seems reasonable: the change is contained in
the walreceiver, doesn't touch the wire protocol, and applies to all
back branches. Exploring better designs would be worthwhile but
probably belongs in a separate effort.

Best regards,
Marco

On Fri, May 1, 2026 at 4:57 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:

Hi Marco,

On Tue, Apr 28, 2026 at 12:50 AM Marco Nenciarini <marco.nenciarini@enterprisedb.com> wrote:
v7 patches attached. No code changes from v6, just rebased on
current master to remove minor offset, and the backpatch file is
renamed with a "nocfbot-" prefix so the commitfest bot picks up
only the master patch.

On Mon, Apr 27, 2026 at 6:00 PM Marco Nenciarini <marco.nenciarini@enterprisedb.com> wrote:
Registered in PG20-1: https://commitfest.postgresql.org/patch/6716/

On Sat, Mar 21, 2026 at 11:52 AM Marco Nenciarini <marco.nenciarini@enterprisedb.com> wrote:
Here are the v6 patches.

Xuneng correctly pointed out that RequestXLogStreaming rounds down,
not up, so it isn't the cause of the gap. The actual mechanism is
that archive recovery processes whole segment files: after both nodes
replay the same archived segment N, the cascade's next read position
lands at the start of segment N+1, while the upstream's
GetStandbyFlushRecPtr returns replayPtr inside segment N.

Changes from v5:

- Updated the code comment and commit message to describe the correct
root cause (archive recovery segment granularity, not
RequestXLogStreaming truncation).

- Reset the catchup state when the upstream is no longer behind.
Without this, if the walreceiver successfully streams, the
connection breaks, and it loops back to find itself ahead again,
the stale deadline from the previous wait would cause an immediate
timeout.

Two patches attached: v6-0001 for master (extends the
walrcv_identify_system API) and v6-backpatch-0001 for stable branches
(global variable to preserve ABI).

Polling at intervals stil seems not good to me. But I don't have a better idea for now.

--
Best,
Xuneng

pgsql-hackers by date:

From: vignesh C
Date: 06 May, 14:25:44
Subject: Re: Proposal: Conflict log history table for Logical Replication

From: Peter Eisentraut
Date: 06 May, 14:39:18
Subject: Re: FOR PORTION OF does not recompute GENERATED STORED columns that depend on the range column

Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery - Mailing list pgsql-hackers

Previous

Next