Yeh, tks for your clarification. I have a basic understanding of it now. I mean is this considered a bug or design defect in the codebase? If so, should we prevent it from occuring in general, not just for this specific test.
We have three processes involved in this scenario: A walsender process on the publisher, responsible for decoding and sending WAL changes. An apply worker process on the subscriber, which applies the changes. A session executing the ALTER SUBSCRIPTION command.
Due to the asynchronous nature of these processes, the ALTER SUBSCRIPTION command may not be immediately observed by the apply worker. Meanwhile, the walsender may process and decode an INSERT statement. If the insert targets a table (e.g., tab_3) that does not belong to the current publication (pub1), the walsender silently skips replicating the record and advances its decoding position. This position is sent in a keepalive message to the subscriber, and since there are no pending transactions to flush, the apply worker reports it as the latest received LSN. Later, when the apply worker eventually detects the subscription change, it restarts—but by then, the insert has already been skipped and is no longer eligible for replay, as the table was not part of the publication (pub1) at the time of decoding. This race condition arises because the three processes run independently and may progress at different speeds due to CPU scheduling or system load. Thoughts?