Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | CAA4eK1Li8XLJ5f-pYvPJ8pXxyA3G-QsyBLNzHY940amF7jm=3A@mail.gmail.com Whole thread Raw |
In response to | RE: Conflict detection for update_deleted in logical replication ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
Re: Conflict detection for update_deleted in logical replication
|
List | pgsql-hackers |
On Mon, Jan 6, 2025 at 4:52 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Friday, January 3, 2025 2:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > + /* > > + * The changes made by this and later transactions are still > > non-removable > > + * to allow for the detection of update_deleted conflicts when > > applying > > + * changes in this logical replication worker. > > + * > > + * Note that this info cannot directly protect dead tuples from being > > + * prematurely frozen or removed. The logical replication launcher > > + * asynchronously collects this info to determine whether to advance > > the > > + * xmin value of the replication slot. > > + * > > + * Therefore, FullTransactionId that includes both the > > transaction ID and > > + * its epoch is used here instead of a single Transaction ID. This is > > + * critical because without considering the epoch, the transaction ID > > + * alone may appear as if it is in the future due to transaction ID > > + * wraparound. > > + */ > > + FullTransactionId oldest_nonremovable_xid; > > > > The last paragraph of the comment mentions that we need to use > > FullTransactionId to properly compare XIDs even after the XID wraparound > > happens. But once we set the oldest-nonremovable-xid it prevents XIDs from > > being wraparound, no? I mean that workers' > > oldest-nonremovable-xid values and slot's non-removal-xid (i.e., its > > xmin) are never away from more than 2^31 XIDs. > > I think the issue is that the launcher may create the replication slot after > the apply worker has already set the 'oldest_nonremovable_xid' because the > launcher are doing that asynchronously. So, Before the slot is created, there's > a window where transaction IDs might wrap around. If initially the apply worker > has computed a candidate_xid (755) and the xid wraparound before the launcher > creates the slot, causing the new current xid to be (740), then the old > candidate_xid(755) looks like a xid in the future, and the launcher could > advance the xmin to 755 which cause the dead tuples to be removed prematurely. > (We are trying to reproduce this to ensure that it's a real issue and will > share after finishing) > > We thought of another approach, which is to create/drop this slot first as > soon as one enables/disables detect_update_deleted (E.g. create/drop slot > during DDL). But it seems complicate to control the concurrent slot > create/drop. For example, if one backend A enables detect_update_deteled, it > will create a slot. But if another backend B is disabling the > detect_update_deteled at the same time, then the newly created slot may be > dropped by backend B. I thought about checking the number of subscriptions that > enables detect_update_deteled before dropping the slot in backend B, but the > subscription changes caused by backend A may not visable yet (e.g. not > committed yet). > This means that for the transaction whose changes are not yet visible, we may have already created the slot and the backend B would end up dropping it. Is it possible that during the change of this new option via DDL, we take AccessExclusiveLock on pg_subscription as we do in DropSubscription() to ensure that concurrent transactions can't drop the slot? Will that help in solving the above scenario? The second idea could be that each worker first checks whether a slot exists along with a subscription flag (new option). Checking the existence of a slot each time would be costly, so we somehow need to cache it. But if we do that then we need to invent some cache invalidation mechanism for the slot. I am not sure if we can design a race-free mechanism for that. I mean we need to think of a solution for race conditions between the launcher and apply workers to ensure that after dropping the slot, launcher doesn't recreate the slot (say if some subscription enables this option) before all the workers can clear their existing values of oldest_nonremovable_xid. The third idea to avoid the race condition could be that in the function InitializeLogRepWorker() after CommitTransactionCommand(), we check if the retain_dead_tuples flag is true for MySubscription then we check whether the system slot exists. If exits then go ahead, otherwise, wait till the slot is created. It could be some additional cycles during worker start up but it is a one-time effort and that too only when the flag is set. In addition to this, we anyway need to create the slot in the launcher before launching the workers, and after re-reading the subscription, the change in retain_dead_tuples flag (off->on) should cause the worker restart. Now, in the third idea, the issue can still arise if, after waiting for the slot to be created, the user sets the retain_dead_tuples to false and back to true again immediately. Because the launcher may have noticed the "retain_dead_tuples=false" operation and dropped the slot, while the apply worker has not noticed and still holds an old candidate_xid. The xid may wraparound in this window before setting the retain_dead_tuples back to true. And, the apply worker would not restart because after it calls maybe_reread_subscription(), the retain_dead_tuples would have been set back to true again. Again, to avoid this race condition, the launcher can wait for each worker to reset the oldest_nonremovamble_xid before dropping the slot. Even after doing the above, the third idea could still have another race condition: 1. The launcher creates the replication slot and starts a worker with retain_dead_tuples = true, the worker is waiting for publish status and has not set oldest_nonremovable_xid. 2. The user set the option retain_dead_tuples to false, the launcher noticed that and drop the replication slot. 3. The worker received the status and set oldest_nonremovable_xid to a valid value (say 750). 4. Xid wraparound happened at this point and say new_available_xid becomes 740 5. User set retain_dead_tuples = true again. After the above steps, the apply worker holds an old oldest_nonremovable_xid (750) and will not restart if it does not call maybe_reread_subscription() before step 5. So, such a case can again create a problem of incorrect slot->xmin value. We can probably try to find some way to avoid this race condition as well but I haven't thought more about this as this idea sounds a bit risky and bug-prone to me. Among the above ideas, the first idea of taking AccessExclusiveLock on pg_subscription sounds safest to me. I haven't evaluated the changes for the first approach so I could be missing something that makes it difficult to achieve but I think it is worth investigating unless we have better ideas or we think that the current approach used in patch to use FullTransactionId is okay. -- With Regards, Amit Kapila.
pgsql-hackers by date: