Re: Add an option to skip loading missing publication to avoid logical replication failure - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Add an option to skip loading missing publication to avoid logical replication failure
Date
Msg-id CAA4eK1KxSbo0XG4xXRm_6VX6Q5U1w1w8s3qbX8dKCKaSvw2dfQ@mail.gmail.com
Whole thread Raw
In response to Re: Add an option to skip loading missing publication to avoid logical replication failure  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Add an option to skip loading missing publication to avoid logical replication failure
List pgsql-hackers
On Mon, Mar 10, 2025 at 10:15 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Mar 10, 2025 at 9:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Tue, Mar 4, 2025 at 6:54 PM vignesh C <vignesh21@gmail.com> wrote:
>> >
>> > On further thinking, I felt the use of publications_updated variable
>> > is not required we can use publications_valid itself which will be set
>> > if the publication system table is invalidated. Here is a patch for
>> > the same.
>> >
>>
>> The patch relies on the fact that whenever a publication's data is
>> invalidated, it will also invalidate all the RelSyncEntires as per
>> publication_invalidation_cb. But note that we are discussing removing
>> that inefficiency in the thread  [1]. So, we should try to rebuild the
>> entry when we have skipped the required publication previously.
>>
>> Apart from this, please consider updating the docs, as mentioned in my
>> response to Sawada-San's email.
>
>
> I'm not sure I fully understand it, but based on your previous email and the initial email from Vignesh, if IIUC, the
issueoccurs when a publication is created after a certain LSN. When ALTER SUBSCRIPTION ... SET PUBLICATION is executed,
thesubscriber workers restart and request the changes based on restart_lsn, which is at an earlier LSN in the WAL than
theLSN at which the publication was created. This leads to an error, and we are addressing this behavior as part of the
fixby skipping the changes which are between the restart_lsn of subscriber and the lsn at which publication is created
andthis behavior looks fine. 
>

Yes, your understanding is correct, but note that as such, the patch
is simply skipping the missing publication. The skipped changes are
because those were on the table that is not part of any publication
w.r.t historic snapshot we have at the point of time.

>> BTW, I am planning to commit this only on HEAD as this is a behavior
>> change. Please let me know if you guys think otherwise.
>
>
> Somehow this looks like a bug fix which should be backported no?  Am I missing something?
>

We can consider this a bug-fix and backpatch it, but I am afraid that
because this is a behavior change, some users may not like it. Also, I
don't remember seeing public reports for this behavior; that is
probably because it is hard to hit. FYI, we found this via BF
failures. So, I thought it would be better to get this field tested
via HEAD only at this point in time.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: torikoshia
Date:
Subject: Re: RFC: Logging plan of the running query
Next
From: Michael Paquier
Date:
Subject: Re: Back-patch of: avoid multiple hard links to same WAL file after a crash