On Fri, Aug 1, 2025 at 10:54 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Hi Vignesh, Amit,
> We encountered a situation where a customer dropped a publication
> accidentally and that broke logical replication in an irrecoverable
> manner. This is PG 15.3 but the team confirmed that the behaviour is
> reproducible with PG 17 as well.
>
> When a WAL sender processes a WAL record recording a change in
> publication, it ends up calling LoadPublication() which throws an
> error if a publication mentioned in START_REPLICATION command is not
> found. The downstream tries to reconnect but the WAL sender again
> repeats the same process going in an error loop. Creating the
> publication does not help since WAL sender will always encounter the
> WAL record dropping the publication first.
>
> There are ways to come out of this situation, but not very clean always
> 1. Remove publication from subscription, run logical replication till
> it passes the point where publication was added, add the publication
> back and continue. It's not always possible to know when the
> publication was added back and thus it becomes tedious or next to
> impossible to apply these steps.
> 2. Reseeding the replication slot which involves copying all the data
> again and not feasible in case of large databases.
> 3. Skipping the transaction which dropped the publication. This will
> work if drop publication was the only thing in that transaction but
> not otherwise. Confirming that is tricky and requires some expert
> help.
>
> In PG 18 onwards, this behaviour is fixed by throwing a WARNING
> instead of an error. In the relevant thread [1] where the fix to PG 18
> was discussed, backpatching was also discussed. Back then it was
> deferred because of lack of field reports. But we are seeing this
> situation now.
>
Thanks for the report. One more reason we were hesitant to backpatch
was that it is possible that some users may expect replication to stop
in this case as mentioned by Tomas in one of his emails [1] ("See the
para starting with "Imagine you have a subscriber ..." in his email").
We thought, as it could be perceived as a behavior change, so better
to do it as a HEAD only change.
Now, seeing this report, it seems the customer(s) are probably okay to
skip a missing publication and let replication continue. So, we should
consider backpatching this change but it would be better if few more
people can share their opinion on this matter.
[1] - https://www.postgresql.org/message-id/dc08add3-10a8-738b-983a-191c7406707b%40enterprisedb.com
--
With Regards,
Amit Kapila.