Re: Slow catchup of 2PC (twophase) transactions on replica in LR - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Date
Msg-id CAA4eK1K1fSkeK=kc26G5cq87vQG4=1qs_b+no4+ep654SeBy1w@mail.gmail.com
Whole thread Raw
In response to Re: Slow catchup of 2PC (twophase) transactions on replica in LR  (Ajin Cherian <itsajin@gmail.com>)
Responses Re: Slow catchup of 2PC (twophase) transactions on replica in LR
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
List pgsql-hackers
On Fri, Apr 5, 2024 at 4:59 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 4:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>>
>> I think this would probably be better than the current situation but
>> can we think of a solution to allow toggling the value of two_phase
>> even when prepared transactions are present? Can you please summarize
>> the reason for the problems in doing that and the solutions, if any?
>>
>
>
> Updated the patch, as it wasn't addressing updating of two-phase in the remote slot.
>

Vitaly, does the minimal solution provided by the proposed patch
(Allow to alter two_phase option of a subscriber provided no
uncommitted
prepared transactions are pending on that subscription.) address your use case?

>  Currently the main issue that needs to be handled is the handling of pending prepared transactions while the
two_phaseis altered. I see 3 issues with the current approach. 
>
> 1. Uncommitted prepared transactions when toggling two_phase from true to false
>   When two_phase was true, prepared transactions were decoded at PREPARE time and send to the subscriber, which is
thenprepared on the subscriber with a new gid. Once the two_phase is toggled to false, then the COMMIT PREPARED on the
publisheris converted to commit and the entire transaction is decoded and sent to the subscriber. This will   leave the
previouslyprepared transaction pending. 
>
> 2. Uncommitted prepared transactions when toggling two_phase form false to true
>   When two_phase was false, prepared transactions were ignored and not decoded at PREPARE time on the publisher. Once
thetwo_phase is toggled to true, the apply worker and the walsender are restarted and a replication is restarted from a
new"start_decoding_at" LSN. Now, this new "start_decoding_at" could be past the LSN of the PREPARE record and if so,
thePREPARE record is skipped and not send to the subscriber. Look at comments in DecodeTXNNeedSkip() for detail.  Later
whenthe user issues COMMIT PREPARED, this is decoded and sent to the subscriber. but there is no prepared transaction
onthe subscriber, and this fails because the  corresponding gid of the transaction couldn't be found. 
>
> 3. While altering the two_phase of the subscription, it is required to also alter the two_phase field of the slot on
theprimary. The subscription cannot remotely alter the two_phase option of the slot when the subscription is  enabled,
asthe slot is owned by the walsender on the publisher side. 
>

Thanks for summarizing the reasons for not allowing altering the
two_pc property for a subscription.

> Possible solutions for the 3 problems:
>
> 1. While toggling two_phase from true to false, we could probably get a list of prepared transactions for this
subscriberid and rollback/abort the prepared transactions. This will allow the transactions to be re-applied like a
normaltransaction when the commit comes. Alternatively, if this isn't appropriate doing it in the ALTER SUBSCRIPTION
context,we could store the xids of all prepared transactions of this subscription in a list and when the corresponding
xidis being committed by the apply worker, prior to commit, we make sure the previously prepared transaction is rolled
back.But this would add the overhead of checking this list every time a transaction is committed by the apply worker. 
>

In the second solution, if you check at the time of commit whether
there exists a prior prepared transaction then won't we end up
applying the changes twice? I think we can first try to achieve it at
the time of Alter Subscription because the other solution can add
overhead at each commit?

> 2. No solution yet.
>

One naive idea is that on the publisher we can remember whether the
prepare has been sent and if so then only send commit_prepared,
otherwise send the entire transaction. On the subscriber-side, we
somehow, need to ensure before applying the first change whether the
corresponding transaction is already prepared and if so then skip the
changes and just perform the commit prepared. One drawback of this
approach is that after restart, the prepare flag wouldn't be saved in
the memory and we end up sending the entire transaction again. One way
to avoid this overhead is that the publisher before sending the entire
transaction checks with subscriber whether it has a prepared
transaction corresponding to the current commit. I understand that
this is not a good idea even if it works but I don't have any better
ideas. What do you think?

> 3. We could mandate that the altering of two_phase state only be done after disabling the subscription, just like how
itis handled for failover option. 
>

makes sense.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Can't find not null constraint, but \d+ shows that
Next
From: Etsuro Fujita
Date:
Subject: Re: postgres_fdw fails because GMT != UTC