RE: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | TY4PR01MB1690721034ADBE0447A3FFA3C943AA@TY4PR01MB16907.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (shveta malik <shveta.malik@gmail.com>) |
List | pgsql-hackers |
On Thursday, August 28, 2025 6:07 PM shveta malik <shveta.malik@gmail.com> wrote: > On Thu, Aug 28, 2025 at 8:02 AM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > > > I noticed that Cfbot failed to compile the document due to a typo > > after renaming the subscription option. Here are the updated V67 > > patches to fix that, only the doc in 0001 is modified. > > > > Please find a few comments: Thanks for the comments. > > patch001 : > 1) > Assert is hit while testing patch001 alone: > TRAP: failed Assert("TransactionIdIsValid(nonremovable_xid)"), File: > "launcher.c", Line: 1394, PID: 55350. > > Scenario: > I have 2 subs, both have stopped retention. Now on one of the sub, if I do this: > > --switch off rdt: > alter subscription sub1 disable; > alter subscription sub1 set (retain_dead_tuples=off); alter subscription sub1 > enable; > > --switch back rdt, launcher asserts after this alter subscription sub1 disable; > alter subscription sub1 set (retain_dead_tuples=on); alter subscription sub1 > enable; I fixed it by skipping collecting the xid from workers when the overall retention has been stopped. When fixing the reported Assert failure, I noticed another race condition that can hit this Assert: If the launcher has obtained pg_subscription.retentionactive and has not yet called compute_min_nonremovable_xid(), then if the apply worker stops retention and sets oldest_nonremovable_xid to invalid, the launcher will hit the assertion again when calling compute_min_nonremovable_xid(). I fixed it by adding a check in compute function: /* * Return if the apply worker has stopped retention concurrently. * * Although this function is invoked only when retentionactive is true, the * apply worker might stop retention after the launcher fetches the * retentionactive flag. */ if (!TransactionIdIsValid(nonremovable_xid)) return; The logic in 0002 is also adjusted to avoid assigning the slot.xmin to worker in this case. > > 2) > + Maximum duration for which this subscription's apply worker > is allowed > + to retain the information useful for conflict detection when > + <literal>retain_dead_tuples</literal> is enabled for the > associated > + subscriptions. > > associated subscriptions --> associated subscription (since we are talking > about 'this subscription's apply worker') I have reworded this part based on Amit's suggestion. > > 3) > + The information useful for conflict detection is no longer > retained if > + all apply workers associated with the subscriptions, where > + <literal>retain_dead_tuples</literal> is enabled, confirm that the > + retention duration exceeded the > > A trivial thing: 'retention duration has exceeded' sounds better to me. Changed > > ~~ > > patch002: > (feel free to defer these comments if we are focusing on patch001 right now): > > 4) > stop_conflict_info_retention() has the correct message and detail in > patch01 while in patch02, it is switched back to the older one (wrong one). > Perhaps some merge mistake Right, It a miss when rebasing the 0002. Fixed. > > 5) > resume_conflict_info_retention() still refers to the old GUC name: > max_conflict_retention_duration. Fixed. > > 6) > In compute_min_nonremovable_xid(), shall we have > Assert(TransactionIdIsValid(nonremovable_xid)) before assigning it to > worker->oldest_nonremovable_xid? Here: > > + nonremovable_xid = MyReplicationSlot->data.xmin; > + > + SpinLockAcquire(&worker->relmutex); > + worker->oldest_nonremovable_xid = nonremovable_xid; > + SpinLockRelease(&worker->relmutex); Added. > 7) > Now since compute_min_nonremovable_xid() is also taking care of assigning > valid xid to the worker, shall we update that in comment as well? We can have > one more line: > 'Additionally if any of the apply workers has invalid xid, assign slot's xmin to it.' Added. Best Regards, Hou zj
pgsql-hackers by date: