Re: Dropping publication breaks logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Dropping publication breaks logical replication
Date
Msg-id CAA4eK1K6hgYQyyFKJBwB27yMa=NWe7-NQNumYtnP7JiEr99mDw@mail.gmail.com
Whole thread Raw
In response to Dropping publication breaks logical replication  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Dropping publication breaks logical replication
List pgsql-hackers
On Fri, Aug 1, 2025 at 10:54 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Hi Vignesh, Amit,
> We encountered a situation where a customer dropped a publication
> accidentally and that broke logical replication in an irrecoverable
> manner. This is PG 15.3 but the team confirmed that the behaviour is
> reproducible with PG 17 as well.
>
> When a WAL sender processes a WAL record recording a change in
> publication, it ends up calling LoadPublication() which throws an
> error if a publication mentioned in START_REPLICATION command is not
> found. The downstream tries to reconnect but the WAL sender again
> repeats the same process going in an error loop. Creating the
> publication does not help since WAL sender will always encounter the
> WAL record dropping the publication first.
>
> There are ways to come out of this situation, but not very clean always
> 1. Remove publication from subscription, run logical replication till
> it passes the point where publication was added, add the publication
> back and continue. It's not always possible to know when the
> publication was added back and thus it becomes tedious or next to
> impossible to apply these steps.
> 2. Reseeding the replication slot which involves copying all the data
> again and not feasible in case of large databases.
> 3. Skipping the transaction which dropped the publication. This will
> work if drop publication was the only thing in that transaction but
> not otherwise. Confirming that is tricky and requires some expert
> help.
>
> In PG 18 onwards, this behaviour is fixed by throwing a WARNING
> instead of an error. In the relevant thread [1] where the fix to PG 18
> was discussed, backpatching was also discussed. Back then it was
> deferred because of lack of field reports. But we are seeing this
> situation now.
>

Thanks for the report. One more reason we were hesitant to backpatch
was that it is possible that some users may expect replication to stop
in this case as mentioned by Tomas in one of his emails [1] ("See the
para starting with "Imagine you have a subscriber ..." in his email").
We thought, as it could be perceived as a behavior change, so better
to do it as a HEAD only change.

Now, seeing this report, it seems the customer(s) are probably okay to
skip a missing publication and let replication continue. So, we should
consider backpatching this change but it would be better if few more
people can share their opinion on this matter.

[1] - https://www.postgresql.org/message-id/dc08add3-10a8-738b-983a-191c7406707b%40enterprisedb.com

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Conflict detection for update_deleted in logical replication
Next
From: Vik Fearing
Date:
Subject: Re: implement CAST(expr AS type FORMAT 'template')