Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Skipping logical replication transactions on subscriber side |
Date | |
Msg-id | CAD21AoAZb4k4BgnHYPw27aEhZzqqA3XMYQ43Cs1+99w9SJsZRg@mail.gmail.com Whole thread Raw |
In response to | Re: Skipping logical replication transactions on subscriber side (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Skipping logical replication transactions on subscriber side
|
List | pgsql-hackers |
On Wed, Jun 2, 2021 at 3:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 1, 2021 at 9:05 PM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: > > > > On 01.06.21 06:01, Amit Kapila wrote: > > > But, won't that be costly in cases where we have errors in the > > > processing of very large transactions? Subscription has to process all > > > the data before it gets an error. I think we can even imagine this > > > feature to be extended to use commitLSN as a skip candidate in which > > > case we can even avoid getting the data of that transaction from the > > > publisher. So if this information is persistent, the user can even set > > > the skip identifier after the restart before the publisher can send > > > all the data. > > > > At least in current practice, skipping parts of the logical replication > > stream on the subscriber is a rare, emergency-level operation when > > something that shouldn't have happened happened. So it doesn't really > > matter how costly it is. It's not going to be more costly than the > > error happening in the first place. All you'd need is one shared memory > > slot per subscription to store a xid to skip. > > > > Leaving aside the performance point, how can we do by just storing > skip identifier (XID/commitLSN) in shared_memory? How will the apply > worker know about that information after restart? Do you expect the > user to set it again, if so, I think users might not like that? Also, > how will we prohibit users to give some identifier other than for > failed transactions, and if users provide that what should be our > action? Without that, if users provide XID of some in-progress > transaction, we might need to do more work (rollback) than just > skipping it. I think the simplest solution would be to have a fixed-size array on the shared memory to store information of skipping transactions on the particular subscription. Given that this feature is meant to be a repair tool in emergency cases, 32 or 64 entries seem enough. That information should be visible to users via a system view and each entry is cleared once the worker has skipped the transaction. Also, we also would need to clear the entry if the meta information of the subscription such as conninfo and slot name has been changed. The worker reads that information at least when starting logical replication. The worker receives changes from the publication and checks if the transaction should be skipped when start to apply those changes. If so the worker skips applying all changes of the transaction and removes stream files if exist. Regarding the point of how to check if the specified XID by the user is valid, I guess it’s not easy to do that since XIDs sent from the publisher are in random order. Considering the use case of this tool, the situation seems like the logical replication gets stuck due to a problem transaction and the worker repeatedly restarts and raises an error. So I guess it also would be a good idea that the user can specify to skip the first transaction (or first N transactions) since the subscription starts logical replication. It’s less flexible but seems enough to solve such a situation and doesn’t have such a problem of validating the XID. If the functionality like letting the subscriber know the oldest XID that is possibly sent is useful also for other purposes it would also be a good idea to implement it but not sure about other use cases. Anyway, it seems to me that we need to consider the user interface first, especially how and what the user specifies the transaction to skip. My current feeling is that specifying XID is intuitive and flexible but the user needs to have 2 steps: checks XID and then specifies it, and there is a risk that the user mistakenly specifies a wrong XID. On the other hand, the idea of specifying to skip the first transaction doesn’t require the user to check and specify XID but is less flexible, and “the first” transaction might be ambiguous for the user. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: