RE: Slow catchup of 2PC (twophase) transactions on replica in LR - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Date
Msg-id OSBPR01MB2552AF02B039F00EF7791401F5DA2@OSBPR01MB2552.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Slow catchup of 2PC (twophase) transactions on replica in LR  ("Vitaly Davydov" <v.davydov@postgrespro.ru>)
Responses Re: Slow catchup of 2PC (twophase) transactions on replica in LR
List pgsql-hackers
Dear Vitaly,

Thanks for giving comments! PSA new version patch.

> Thank you very much for the patch. In general, it seem to work well for me, but
> there seems to be a memory access problem in libpqrcv_alter_slot ->
> quote_identifier in case of NULL slot_name. It happens, if the two_phase option
> is altered on a subscription without slot. I think, a simple check for NULL may
> fix the problem. I guess, the same problem may be for failover option.

You are right. Regarding the failover option, it requires that slot_name is valid.
In case of two_phase, we must connect to the publisher only when altering "true"
to "false", slot_name must be there only at that time. Updated.

> Another possible problem is related to my use case. I haven't reproduced this
> case, just some thoughts. I guess, when two_phase is ON, the PREPARE statement
> may be truncated from the WAL at checkpoint, but COMMIT PREPARED is still kept
> in the WAL. On catchup, I would ask the master to send transactions from some
> restart LSN. I would like to get all such transactions competely, with theirs
> bodies, not only COMMIT PREPARED messages.

I don't think it is a real issue. WALs for prepared transactions will retain
until they are committed/aborted.
When the two_phase is on and transactions are PREPAREd, they will not be
cleaned up from the memory (See ReorderBufferProcessTXN()).  Then, RUNNING_XACT
record leads to update the restart_lsn of the slot but it cannot be move forward
because ReorderBufferGetOldestTXN() returns the prepared transaction (See
SnapBuildProcessRunningXacts()). restart_decoding_lsn of each transaction, which
is a candidate of restart_lsn of the slot. is always behind the startpoint of
its txn.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Pluggable cumulative statistics
Next
From: David Steele
Date:
Subject: Re: Logging which local address was connected to in log_line_prefix