RE: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | OS0PR01MB5716C8B3C364EEE86F91B3D294102@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Conflict detection for update_deleted in logical replication
Re: Conflict detection for update_deleted in logical replication Re: Conflict detection for update_deleted in logical replication |
List | pgsql-hackers |
On Friday, January 3, 2025 2:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: Hi, > > I have one comment on the 0001 patch: Thanks for the comments! > > + /* > + * The changes made by this and later transactions are still > non-removable > + * to allow for the detection of update_deleted conflicts when > applying > + * changes in this logical replication worker. > + * > + * Note that this info cannot directly protect dead tuples from being > + * prematurely frozen or removed. The logical replication launcher > + * asynchronously collects this info to determine whether to advance > the > + * xmin value of the replication slot. > + * > + * Therefore, FullTransactionId that includes both the > transaction ID and > + * its epoch is used here instead of a single Transaction ID. This is > + * critical because without considering the epoch, the transaction ID > + * alone may appear as if it is in the future due to transaction ID > + * wraparound. > + */ > + FullTransactionId oldest_nonremovable_xid; > > The last paragraph of the comment mentions that we need to use > FullTransactionId to properly compare XIDs even after the XID wraparound > happens. But once we set the oldest-nonremovable-xid it prevents XIDs from > being wraparound, no? I mean that workers' > oldest-nonremovable-xid values and slot's non-removal-xid (i.e., its > xmin) are never away from more than 2^31 XIDs. I think the issue is that the launcher may create the replication slot after the apply worker has already set the 'oldest_nonremovable_xid' because the launcher are doing that asynchronously. So, Before the slot is created, there's a window where transaction IDs might wrap around. If initially the apply worker has computed a candidate_xid (755) and the xid wraparound before the launcher creates the slot, causing the new current xid to be (740), then the old candidate_xid(755) looks like a xid in the future, and the launcher could advance the xmin to 755 which cause the dead tuples to be removed prematurely. (We are trying to reproduce this to ensure that it's a real issue and will share after finishing) We thought of another approach, which is to create/drop this slot first as soon as one enables/disables detect_update_deleted (E.g. create/drop slot during DDL). But it seems complicate to control the concurrent slot create/drop. For example, if one backend A enables detect_update_deteled, it will create a slot. But if another backend B is disabling the detect_update_deteled at the same time, then the newly created slot may be dropped by backend B. I thought about checking the number of subscriptions that enables detect_update_deteled before dropping the slot in backend B, but the subscription changes caused by backend A may not visable yet (e.g. not committed yet). Does that make sense to you, or do you have some other ideas? Best Regards, Hou zj
pgsql-hackers by date: