Re: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Conflict detection for update_deleted in logical replication
Date
Msg-id CAA4eK1+q77+PdWUWzU_DVfVd3VsGGWZOiJPDy9NME-5iGJATBw@mail.gmail.com
Whole thread Raw
In response to Re: Conflict detection for update_deleted in logical replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, Nov 21, 2024 at 3:03 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Attach the V10 patch set which addressed above comments
> and fixed a CFbot warning due to un-initialized variable.
>

We should make the v10_2-0001* as the first main patch for review till
we have a consensus to resolve LSN<->Timestamp inversion issue. This
is because v10_2 doesn't rely on the correctness of LSN<->Timestamp
mapping. Now, say in some later release, we fix the LSN<->Timestamp
inversion issue, we can simply avoid sending remote_xact information
and it will behave the same as your v10_1 approach.

Comments on v10_2_0001*:
======================
1.
+/*
+ * The phases involved in advancing the non-removable transaction ID.
+ *
+ * Refer to maybe_advance_nonremovable_xid() for details on how the function
+ * transitions between these phases.
+ */
+typedef enum
+{
+ DTR_GET_CANDIDATE_XID,
+ DTR_REQUEST_PUBLISHER_STATUS,
+ DTR_WAIT_FOR_PUBLISHER_STATUS,
+ DTR_WAIT_FOR_LOCAL_FLUSH
+} DeadTupleRetainPhase;

First, can we have a better name for this enum like
RetainConflictInfoPhase or something like that? Second, the phase
transition is not very clear from the comments atop
maybe_advance_nonremovable_xid. You can refer to comments atop
tablesync.c or snapbuild.c to see other cases where we have explained
phase transitions.

2.
+ *   Wait for the status from the walsender. After receiving the first status
+ *   after acquiring a new candidate transaction ID, do not proceed if there
+ *   are ongoing concurrent remote transactions.

In this part of the comments: " .. after acquiring a new candidate
transaction ID ..." appears misplaced.

3. In maybe_advance_nonremovable_xid(), the handling of each phase
looks ad-hoc though I see that you have done that have so that you can
handle the phase change functionality after changing the phase
immediately. If we have to ever extend this functionality, it will be
tricky to handle the new phase or at least the code will become
complicated. How about handling each phase in the order of their
occurrence and having separate functions for each phase as we have in
apply_dispatch()? That way it would be convenient to invoke the phase
handling functionality even if it needs to be called multiple times in
the same function.

4.
/*
+ * An invalid position indiates the publisher is also
+ * a physical standby. In this scenario, advancing the
+ * non-removable transaction ID is not supported. This
+ * is because the logical walsender on the standby can
+ * only get the WAL replay position but there may be
+ * more WALs that are being replicated from the
+ * primary and those WALs could have earlier commit
+ * timestamp. Refer to
+ * maybe_advance_nonremovable_xid() for details.
+ */
+ if (XLogRecPtrIsInvalid(remote_lsn))
+ {
+ ereport(WARNING,
+ errmsg("cannot get the latest WAL position from the publisher"),
+ errdetail("The connected publisher is also a standby server."));
+
+ /*
+ * Continuously revert to the request phase until
+ * the standby server (publisher) is promoted, at
+ * which point a valid WAL position will be
+ * received.
+ */
+ phase = DTR_REQUEST_PUBLISHER_STATUS;
+ }

Shouldn't this be an ERROR as the patch doesn't support this case? The
same should be true for later patches for the subscription option.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Masahiro Ikeda
Date:
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Next
From: Marcos Pegoraro
Date:
Subject: Re: Missing INFO on client_min_messages