RE: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | OS0PR01MB57169619BD3AF79CA2D2A59894042@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Conflict detection for update_deleted in logical replication
|
List | pgsql-hackers |
On Monday, December 16, 2024 7:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: Hi, > > On Wed, Dec 11, 2024 at 2:32 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > Attach the V16 patch set which addressed above comments. > > > > There is a new 0002 patch where I tried to dynamically adjust the interval for > > advancing the transaction ID. Instead of always waiting for > > wal_receiver_status_interval, we can start with a short interval and increase > > it if there is no activity (no xid assigned on subscriber), but not beyond > > wal_receiver_status_interval. > > > > The intention is to more effectively advance xid to avoid retaining too much > > dead tuples. My colleague will soon share detailed performance data and > > analysis related to this enhancement. > > I am starting to review the patches, and trying to understand the > concept that how you are preventing vacuum to remove the dead tuple > which might required by the concurrent remote update, so I was looking > at the commit message which explains the idea quite clearly but I have > one question Thanks for the review! > > The process of advancing the non-removable transaction ID in the apply worker > involves: > > == copied from commit message of 0001 start== > 1) Call GetOldestActiveTransactionId() to take oldestRunningXid as the > candidate xid. > 2) Send a message to the walsender requesting the publisher status, which > includes the latest WAL write position and information about transactions > that are in the commit phase. > 3) Wait for the status from the walsender. After receiving the first status, do > not proceed if there are concurrent remote transactions that are still in the > commit phase. These transactions might have been assigned an earlier commit > timestamp but have not yet written the commit WAL record. Continue to > request > the publisher status until all these transactions have completed. > 4) Advance the non-removable transaction ID if the current flush location has > reached or surpassed the last received WAL position. > == copied from commit message of 0001 start== > > So IIUC in step 2) we send the message and get the list of all the > transactions which are in the commit phase? What do you exactly mean by a > transaction which is in the commit phase? I was referring to transactions calling RecordTransactionCommit() and have entered the commit critical section. In the patch, we checked if the proc has marked the new flag DELAY_CHKPT_IN_COMMIT in 'MyProc->delayChkptFlags'. > Can I assume transactions which are currently running on the publisher? I think it's a subset of the running transactions. We only get the transactions in commit phase with the intention to avoid delays caused by waiting for long-running transactions to complete, which can result in the long retention of dead tuples. We decided to wait for running(committing) transactions due to the WAL/LSN inversion issue[1]. The original idea is to directly return the latest WAL write position without checking running transactions. But since there is a gap between when we acquire the commit_timestamp and the commit LSN, it's possible the transactions might have been assigned an earlier commit timestamp but have not yet written the commit WAL record. > And in step 3) we wait for all the transactions to get committed which we saw > running (or in the commit phase) and we anyway don't worry about the newly > started transactions as they would not be problematic for us. And in step 4) > we would wait for all the flush location to reach "last received WAL > position", here my question is what exactly will be the "last received WAL > position" I assume it would be the position somewhere after the position of > the commit WAL of all the transaction we were interested on the publisher? Yes, your understanding is correct. It's a position after the position of all the interesting transactions. In the patch, we get the latest WAL write position(GetXLogWriteRecPtr()) in walsender after all interesting transactions have finished and reply it to apply worker. > At high level the overall idea looks promising to me but wanted to put > more thought on lower level details about what transactions exactly we > are waiting for and what WAL LSN we are waiting to get flushed. Yeah, that makes sense, thanks. [1] https://www.postgresql.org/message-id/OS0PR01MB571628594B26B4CC2346F09294592%40OS0PR01MB5716.jpnprd01.prod.outlook.com> Best Regards, Hou zj
pgsql-hackers by date: