Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAD21AoCU2PLm+SxdOdUMpjgHioMB6baOoxqNi_XHD6QQJF+RKg@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Wed, Jan 26, 2022 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 25, 2022 at 8:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jan 25, 2022 at 11:58 PM David G. Johnston
> > <david.g.johnston@gmail.com> wrote:
> > >
> > > On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >>
> > >> Yeah, I think it's a good idea to clear the subskipxid after the first
> > >> transaction regardless of whether the worker skipped it.
> > >>
> > >
> > > So basically instead of stopping the worker with an error you suggest having the worker continue applying changes
(afterresetting subskipxid, and - arguably - the ?_error_* fields).  Log the transaction xid mis-match as a warning in
thelog file as opposed to an error. 
> >
> > Agreed, I think it's better to log a warning than to raise an error.
> > In the case where the user specified the wrong XID, the worker should
> > fail again due to the same error.
> >
>
> IIUC, the proposal is to compare the skip_xid with the very
> transaction the apply worker received to apply and raise a warning if
> it doesn't match with skip_xid and then continue. This seems like a
> reasonable idea but can we guarantee that it is always the first
> transaction that we want to skip? We seem to guarantee that we won't
> get something again once it is written durably/flushed on the
> subscriber side. I guess here it can happen that before the errored
> transaction, there is some empty xact, or maybe part of the stream
> (consider streaming transactions) of some xact, or there could be
> other cases as well where the server will send those xacts again.

Good point.

I guess that in the situation the worker entered an error loop, we can
guarantee that the worker fails while applying the first non-empty
transaction since starting logical replication. And the transaction is
what we’d like to skip. If the transaction that can be applied without
an error is resent after a restart, it’s a problem of logical
replication. As you pointed out, it's possible that there are some
empty transactions before the transaction in question since we don't
advance replication origin LSN if the transaction is empty. Also,
probably the same is true for a streamed transaction that is rolled
back or ROLLBACK-PREPARED transactions. So, we can also skip clearing
subskipxid if the transaction is empty? That is, we make sure to clear
it after applying the first non-empty transaction. We would need to
carefully think about this solution otherwise ALTER SUBSCRIPTION SKIP
ends up not working at all in some cases.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
Next
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: row filtering for logical replication