Re: Slow catchup of 2PC (twophase) transactions on replica in LR - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Slow catchup of 2PC (twophase) transactions on replica in LR |
Date | |
Msg-id | CAA4eK1KY=uwXXuVMtuNTYHGFbbXgDveoFoP3UbxNXqxCAx8GBQ@mail.gmail.com Whole thread Raw |
In response to | RE: Slow catchup of 2PC (twophase) transactions on replica in LR ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Responses |
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
|
List | pgsql-hackers |
On Mon, Apr 15, 2024 at 1:28 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > Vitaly, does the minimal solution provided by the proposed patch > > (Allow to alter two_phase option of a subscriber provided no > > uncommitted > > prepared transactions are pending on that subscription.) address your use case? > > I think we do not have to handle cases which there are prepared transactions on > publisher/subscriber, as the first step. It leads additional complexity and we > do not have smarter solutions, especially for problem 2. > IIUC it meets the Vitaly's condition, right? > > > > 1. While toggling two_phase from true to false, we could probably get a list of > > prepared transactions for this subscriber id and rollback/abort the prepared > > transactions. This will allow the transactions to be re-applied like a normal > > transaction when the commit comes. Alternatively, if this isn't appropriate doing it > > in the ALTER SUBSCRIPTION context, we could store the xids of all prepared > > transactions of this subscription in a list and when the corresponding xid is being > > committed by the apply worker, prior to commit, we make sure the previously > > prepared transaction is rolled back. But this would add the overhead of checking > > this list every time a transaction is committed by the apply worker. > > > > > > > In the second solution, if you check at the time of commit whether > > there exists a prior prepared transaction then won't we end up > > applying the changes twice? I think we can first try to achieve it at > > the time of Alter Subscription because the other solution can add > > overhead at each commit? > > Yeah, at least the second solution might be problematic. I prototyped > the first one and worked well. However, to make the feature more consistent, > it is prohibit to exist prepared transactions on subscriber for now. > We can ease based on the requirement. > > > > 2. No solution yet. > > > > > > > One naive idea is that on the publisher we can remember whether the > > prepare has been sent and if so then only send commit_prepared, > > otherwise send the entire transaction. On the subscriber-side, we > > somehow, need to ensure before applying the first change whether the > > corresponding transaction is already prepared and if so then skip the > > changes and just perform the commit prepared. One drawback of this > > approach is that after restart, the prepare flag wouldn't be saved in > > the memory and we end up sending the entire transaction again. One way > > to avoid this overhead is that the publisher before sending the entire > > transaction checks with subscriber whether it has a prepared > > transaction corresponding to the current commit. I understand that > > this is not a good idea even if it works but I don't have any better > > ideas. What do you think? > > I considered but not sure it is good to add such mechanism. Your idea requires > additional wait-loop, which might lead bugs and unexpected behavior. And it may > degrade the performance based on the network environment. > As for the another solution (worker sends a list of prepared transactions), it > is also not so good because list of prepared transactions may be huge. > > Based on above, I think we can reject the case for now. > > FYI - We also considered the idea which walsender waits until all prepared transactions > are resolved before decoding and sending changes, but it did not work well > - the restarted walsender sent only COMMIT PREPARED record for transactions which > have been prepared before disabling the subscription. This happened because > 1) if the two_phase option of slots is false, the confirmed_flush can be ahead of > PREPARE record, and > 2) after the altering and restarting, start_decoding_at becomes same as > confirmed_flush and records behind this won't be decoded. > I don't understand the exact problem you are facing. IIUC, if the commit is after start_decoding_at point and prepare was before it, we expect to send the entire transaction followed by a commit record. The restart_lsn should be before the start of such a transaction and we should have recorded the changes in the reorder buffer. -- With Regards, Amit Kapila.
pgsql-hackers by date: