RE: Slow catchup of 2PC (twophase) transactions on replica in LR - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: Slow catchup of 2PC (twophase) transactions on replica in LR |
Date | |
Msg-id | OSBPR01MB25528F4B0B8178D3AA8DE2BFF5082@OSBPR01MB2552.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Slow catchup of 2PC (twophase) transactions on replica in LR (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
|
List | pgsql-hackers |
Dear Amit, > > FYI - We also considered the idea which walsender waits until all prepared > transactions > > are resolved before decoding and sending changes, but it did not work well > > - the restarted walsender sent only COMMIT PREPARED record for > transactions which > > have been prepared before disabling the subscription. This happened because > > 1) if the two_phase option of slots is false, the confirmed_flush can be ahead of > > PREPARE record, and > > 2) after the altering and restarting, start_decoding_at becomes same as > > confirmed_flush and records behind this won't be decoded. > > > > I don't understand the exact problem you are facing. IIUC, if the > commit is after start_decoding_at point and prepare was before it, we > expect to send the entire transaction followed by a commit record. The > restart_lsn should be before the start of such a transaction and we > should have recorded the changes in the reorder buffer. This behavior is right for two_phase = false case. But if the parameter is altered between PREPARE and COMMIT PREPARED, there is a possibility that only COMMIT PREPARED is sent. As the first place, the executed workload is below. 1. created a subscription with (two_phase = false) 2. prepared a transaction on publisher 3. disabled the subscription once 4. altered the subscription to two_phase = true 5. enabled the subscription again 6. did COMMIT PREPARED on the publisher -> Apply worker would raise an ERROR while applying COMMIT PREPARED record: ERROR: prepared transaction with identifier "pg_gid_XXX_YYY" does not exist Below part describes why the ERROR occurred. ====== ### Regarding 1) the confirmed_flush can be ahead of PREPARE record, If two_phase is off, as you might know, confirmed_flush can be ahead of PREPARE record by keepalive mechanism. Walsender sometimes sends a keepalive message in WalSndKeepalive(). Here the LSN is written, which is lastly decoded record. Since the PREPARE record is skipped (just handled by ReorderBufferProcessXid()), sometimes the written LSN in the message can be ahead of PREPARE record. If the WAL records are aligned like below, the LSN can point CHECKPOINT_ONLINE. ... INSERT PREPARE txn1 CHECKPOINT_ONLINE ... On worker side, when it receives the keepalive, it compares the LSN in the message and lastly received LSN, and advance last_received. Then, the worker replies to the walsender, and at that time it replies that last_recevied record has been flushed on the subscriber. See send_feedback(). On publisher, when the walsender receives the message from subscriber, it reads the message and advance the confirmed_flush to the written value. If the walsender sends LSN which locates ahead PREPARE, the confirmed flush is updated as well. ### Regarding 2) after the altering, records behind the confirmed_flush are not decoded Then, at decoding phase. The snapshot builder determines the point where decoding is resumed, as start_decoding_at. After the restart, the value is same as confirmed_flush of the slot. Since the confiremed_fluish is ahead of PREPARE, the start_decoding_at becomes ahead as well, so whole of prepared transactions are not decoded. ====== Attached zip file contains the PoC and used script. You can refer what I really did. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Attachment
pgsql-hackers by date: