Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAA4eK1JC6hFqG08V2GZvddMUm4n0uR+0VNCsD-pydd5LQnFcEw@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Tue, Jun 15, 2021 at 6:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jun 2, 2021 at 3:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jun 1, 2021 at 9:05 PM Peter Eisentraut
> > <peter.eisentraut@enterprisedb.com> wrote:
> > >
> > > On 01.06.21 06:01, Amit Kapila wrote:
> > > > But, won't that be costly in cases where we have errors in the
> > > > processing of very large transactions? Subscription has to process all
> > > > the data before it gets an error. I think we can even imagine this
> > > > feature to be extended to use commitLSN as a skip candidate in which
> > > > case we can even avoid getting the data of that transaction from the
> > > > publisher. So if this information is persistent, the user can even set
> > > > the skip identifier after the restart before the publisher can send
> > > > all the data.
> > >
> > > At least in current practice, skipping parts of the logical replication
> > > stream on the subscriber is a rare, emergency-level operation when
> > > something that shouldn't have happened happened.  So it doesn't really
> > > matter how costly it is.  It's not going to be more costly than the
> > > error happening in the first place.  All you'd need is one shared memory
> > > slot per subscription to store a xid to skip.
> > >
> >
> > Leaving aside the performance point, how can we do by just storing
> > skip identifier (XID/commitLSN) in shared_memory? How will the apply
> > worker know about that information after restart? Do you expect the
> > user to set it again, if so, I think users might not like that? Also,
> > how will we prohibit users to give some identifier other than for
> > failed transactions, and if users provide that what should be our
> > action? Without that, if users provide XID of some in-progress
> > transaction, we might need to do more work (rollback) than just
> > skipping it.
>
> I think the simplest solution would be to have a fixed-size array on
> the shared memory to store information of skipping transactions on the
> particular subscription. Given that this feature is meant to be a
> repair tool in emergency cases, 32 or 64 entries seem enough.
>

IIUC, here you are talking about xids specified by the user to skip?
If so, then how will you get that information after the restart, and
why you need 32 or 64 entries for it?

>
> Anyway, it seems to me that we need to consider the user interface
> first, especially how and what the user specifies the transaction to
> skip. My current feeling is that specifying XID is intuitive and
> flexible but the user needs to have 2 steps: checks XID and then
> specifies it, and there is a risk that the user mistakenly specifies a
> wrong XID. On the other hand, the idea of specifying to skip the first
> transaction doesn’t require the user to check and specify XID but is
> less flexible, and “the first” transaction might be ambiguous for the
> user.
>

I see your point in allowing to specify First N transactions but OTOH,
I am slightly afraid that it might lead to skipping some useful
transactions which will make replica out-of-sync. BTW, is there any
data point for the user to check how many transactions it can skip?
Normally, we won't be able to proceed till we resolve/skip the
transaction that is generating an error. One possibility could be that
we provide some *superuser* functions like
pg_logical_replication_skip_xact()/pg_logical_replication_reset_skip_xact()
which takes subscription name/id and xid as input parameters. Then, I
think we can store this information in ReplicationState and probably
try to map to originid from subscription name/id to retrieve that
info. We can probably document that the effects of these functions
won't last after the restart. Now, if this function is used by super
users then we can probably trust that they provide the XIDs that we
can trust to be skipped but OTOH making a restriction to allow these
functions to be used by superusers might restrict the usage of this
repair tool.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: Different compression methods for FPI
Next
From: Rafia Sabih
Date:
Subject: Re: Position of ClientAuthentication hook