Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Conflict detection for update_deleted in logical replication
Date
Msg-id CAA4eK1JkYpDUCFgGNEg=O4m6XjQ-9oohxrohgpockKuy7eo9gA@mail.gmail.com
Whole thread Raw
In response to Re: Conflict detection for update_deleted in logical replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Conflict detection for update_deleted in logical replication
List pgsql-hackers
On Wed, Dec 25, 2024 at 8:13 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Attach the new version patch set which addressed all other comments.
>

Some more miscellaneous comments:
=============================
1.
@@ -1431,9 +1431,9 @@ RecordTransactionCommit(void)
  * modifying it.  This makes checkpoint's determination of which xacts
  * are delaying the checkpoint a bit fuzzy, but it doesn't matter.
  */
- Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
+ Assert((MyProc->delayChkptFlags & DELAY_CHKPT_IN_COMMIT) == 0);
  START_CRIT_SECTION();
- MyProc->delayChkptFlags |= DELAY_CHKPT_START;
+ MyProc->delayChkptFlags |= DELAY_CHKPT_IN_COMMIT;

  /*
  * Insert the commit XLOG record.
@@ -1536,7 +1536,7 @@ RecordTransactionCommit(void)
  */
  if (markXidCommitted)
  {
- MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
+ MyProc->delayChkptFlags &= ~DELAY_CHKPT_IN_COMMIT;
  END_CRIT_SECTION();

The comments related to this change should be updated in EndPrepare()
and RecordTransactionCommitPrepared(). They still refer to the
DELAY_CHKPT_START flag. We should update the comments explaining why a
similar change is not required for prepare or commit_prepare, if there
is one.

2.
 static bool
 tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
- TypeCacheEntry **eq)
+ TypeCacheEntry **eq, Bitmapset *columns)
 {
  int attrnum;

@@ -337,6 +340,14 @@ tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
  if (att->attisdropped || att->attgenerated)
  continue;

+ /*
+ * Ignore columns that are not listed for checking.
+ */
+ if (columns &&
+ !bms_is_member(att->attnum - FirstLowInvalidHeapAttributeNumber,
+    columns))
+ continue;

Update the comment atop tuples_equal to reflect this change.

3.
+FindMostRecentlyDeletedTupleInfo(Relation rel, TupleTableSlot *searchslot,
+ TransactionId *delete_xid,
+ RepOriginId *delete_origin,
+ TimestampTz *delete_time)
...
...
+ /* Try to find the tuple */
+ while (table_scan_getnextslot(scan, ForwardScanDirection, scanslot))
+ {
+ bool dead = false;
+ TransactionId xmax;
+ TimestampTz localts;
+ RepOriginId localorigin;
+
+ if (!tuples_equal(scanslot, searchslot, eq, indexbitmap))
+ continue;
+
+ tuple = ExecFetchSlotHeapTuple(scanslot, false, NULL);
+ buf = hslot->buffer;
+
+ LockBuffer(buf, BUFFER_LOCK_SHARE);
+
+ if (HeapTupleSatisfiesVacuum(tuple, oldestXmin, buf) ==
HEAPTUPLE_RECENTLY_DEAD)
+ dead = true;
+
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+ if (!dead)
+ continue;

Why do we need to check only for HEAPTUPLE_RECENTLY_DEAD and not
HEAPTUPLE_DEAD? IIUC, we came here because we couldn't find the live
tuple, now whether the tuple is DEAD or RECENTLY_DEAD, why should it
matter to detect update_delete conflict?

4. In FindMostRecentlyDeletedTupleInfo(), add comments to state why we
need to use SnapshotAny.

5.
+
+      <varlistentry
id="sql-createsubscription-params-with-detect-update-deleted">
+        <term><literal>detect_update_deleted</literal>
(<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the detection of <xref
linkend="conflict-update-deleted"/>
+          is enabled. The default is <literal>false</literal>. If set to
+          true, the dead tuples on the subscriber that are still useful for
+          detecting <xref linkend="conflict-update-deleted"/>
+          are retained,

One of the purposes of retaining dead tuples is to detect
update_delete conflict. But, I also see the following in 0001's commit
message: "Since the mechanism relies on a single replication slot, it
not only assists in retaining dead tuples but also preserves commit
timestamps and origin data. These information will be displayed in the
additional logs generated for logical replication conflicts.
Furthermore, the preserved commit timestamps and origin data are
essential for consistently detecting update_origin_differs conflicts."
which indicates there are other cases where retaining dead tuples can
help. So, I was thinking about whether to name this new option as
retain_dead_tuples or something along those lines?

BTW, it is not clear how retaining dead tuples will help the detection
update_origin_differs. Will it happen when the tuple is inserted or
updated on the subscriber and then when we try to update the same
tuple due to remote update, the commit_ts information of the xact is
not available because the same is already removed by vacuum? This
should happen for the update case for the new row generated by the
update operation as that will be used in comparison. Can you please
show it be a test case even if it is manual?

Can't it happen for delete_origin_differs as well for the same reason?

6. I feel we should keep 0004 as a later patch. We can ideally
consider committing 0001, 0002, 0003, 0005, and 0006 (or part of 0006
to get some tests that are relevant) as one unit and then the patch to
detect and report update_delete conflict. What do you think?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Logical Replication of sequences
Next
From: Masahiko Sawada
Date:
Subject: Re: Conflict detection for update_deleted in logical replication