Re: Add an option to skip loading missing publication to avoid logical replication failure - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: Add an option to skip loading missing publication to avoid logical replication failure
Date
Msg-id CABPTF7XH8Uh+K-x3RMt6fOkK3xwSD2YVQehCfp_hb1TS0abe+w@mail.gmail.com
Whole thread Raw
In response to Re: Add an option to skip loading missing publication to avoid logical replication failure  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
Yeh, tks for your clarification.  I have a basic understanding of it now. I mean is this considered a bug or design defect in the codebase? If so, should we prevent it from occuring in general, not just for this specific test.

vignesh C <vignesh21@gmail.com

We have three processes involved in this scenario:
A walsender process on the publisher, responsible for decoding and
sending WAL changes.
An apply worker process on the subscriber, which applies the changes.
A session executing the ALTER SUBSCRIPTION command.

Due to the asynchronous nature of these processes, the ALTER
SUBSCRIPTION command may not be immediately observed by the apply
worker. Meanwhile, the walsender may process and decode an INSERT
statement.
If the insert targets a table (e.g., tab_3) that does not belong to
the current publication (pub1), the walsender silently skips
replicating the record and advances its decoding position. This
position is sent in a keepalive message to the subscriber, and since
there are no pending transactions to flush, the apply worker reports
it as the latest received LSN.
Later, when the apply worker eventually detects the subscription
change, it restarts—but by then, the insert has already been skipped
and is no longer eligible for replay, as the table was not part of the
publication (pub1) at the time of decoding.
This race condition arises because the three processes run
independently and may progress at different speeds due to CPU
scheduling or system load.
Thoughts?

Regards,
Vignesh

pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Next
From: Robert Haas
Date:
Subject: Re: fixing CREATEROLE