Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CAA4eK1+0fpMj-jvfch+7nkdCTxx6qbwNYrhQVGAbPfZLViaTcQ@mail.gmail.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Thu, Dec 16, 2021 at 11:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > So if skip_xid is already changed, the apply worker would do
> > > replorigin_advance() with WAL logging, instead of committing the
> > > catalog change?
> > >
> >
> > Right. BTW, how are you planning to advance the origin? Normally, a
> > commit transaction would do it but when we are skipping all changes,
> > the commit might not do it as there won't be any transaction id
> > assigned.
>
> I've not tested it yet but replorigin_advance() with wal_log = true
> seems to work for this case.
>

IIUC, the changes corresponding to above in the latest patch are as follows:

--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -921,7 +921,8 @@ replorigin_advance(RepOriginId node,
  LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);

  /* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 &&
+ replication_state->acquired_by != MyProcPid)
  {
...

clear_subscription_skip_xid()
{
..
+ else if (!XLogRecPtrIsInvalid(origin_lsn))
+ {
+ /*
+ * User has already changed subskipxid before clearing the subskipxid, so
+ * don't change the catalog but just advance the replication origin.
+ */
+ replorigin_advance(replorigin_session_origin, origin_lsn,
+    GetXLogInsertRecPtr(),
+    false, /* go_backward */
+    true /* wal_log */);
+ }
..
}

I was thinking what if we don't advance origin explicitly in this
case? Actually, that will be no different than the transactions where
the apply worker doesn't apply any change because the initial sync is
in progress (see should_apply_changes_for_rel()) or we have received
an empty transaction. In those cases also, the origin lsn won't be
advanced even though we acknowledge the advanced last_received
location because of keep_alive messages. Now, it is possible after the
restart we send the old start_lsn location because the replication
origin was not updated before restart but we handle that case in the
server by starting from the last confirmed location. See below code:

CreateDecodingContext()
{
..
else if (start_lsn < slot->data.confirmed_flush)
..

Few other comments on the latest patch:
=================================
1.
A conflict will produce an error and will stop the replication; it must be
    resolved manually by the user.  Details about the conflict can be found in
-   the subscriber's server log.
+   <xref linkend="monitoring-pg-stat-subscription-workers"/> as well as the
+   subscriber's server log.

Can we slightly change the modified line to: "Details about the
conflict can be found in <xref
linkend="monitoring-pg-stat-subscription-workers"/> and the
subscriber's server log."? I think we can commit this change
separately as this is true even without this patch.

2.
    The resolution can be done either by changing data on the subscriber so
-   that it does not conflict with the incoming change or by skipping the
-   transaction that conflicts with the existing data.  The transaction can be
-   skipped by calling the <link linkend="pg-replication-origin-advance">
+   that it does not conflict with the incoming changes or by skipping the whole
+   transaction.  This option specifies the ID of the transaction whose
+   application is to be skipped by the logical replication worker.  The logical
+   replication worker skips all data modification transaction conflicts with
+   the existing data. When a conflict produce an error, it is shown in
+   <structname>pg_stat_subscription_workers</structname> view as follows:

I don't think most of the additional text added in the above paragraph
is required. We can rephrase it as: "The resolution can be done either
by changing data on the subscriber so that it does not conflict with
the incoming change or by skipping the transaction that conflicts with
the existing data. When a conflict produces an error, it is shown in
<structname>pg_stat_subscription_workers</structname> view as
follows:". After that keep the text, you have.

3.
They skip the whole transaction, including changes that may not violate any
+   constraint.  They may easily make the subscriber inconsistent, especially if
+   a user specifies the wrong transaction ID or the position of origin.

Can we slightly reword the above text as: "Skipping the whole
transaction includes skipping the changes that may not violate any
constraint.  This can easily make the subscriber inconsistent,
especially if a user specifies the wrong transaction ID or the
position of origin."?

4.
The logical replication worker skips all data
+      modification changes within the specified transaction.  Therefore, since
+      it skips the whole transaction including the changes that may not violate
+      the constraint, it should only be used as a last resort. This option has
+      no effect for the transaction that is already prepared with enabling
+      <literal>two_phase</literal> on susbscriber.

Let's slightly reword the above text as: "The logical replication
worker skips all data modification changes within the specified
transaction including the changes that may not violate the constraint,
so, it should only be used as a last resort. This option has no effect
on the transaction that is already prepared by enabling
<literal>two_phase</literal> on the subscriber."

5.
+          by the logical replication worker. Setting
<literal>NONE</literal> means
+          to reset the transaction ID.

Let's slightly reword the second part of the sentence as: "Setting
<literal>NONE</literal> resets the transaction ID."

6.
Once we start skipping
+ * changes, we don't stop it until the we skip all changes of the
transaction even
+ * if the subscription invalidated and MySubscription->skipxid gets
changed or reset.

/subscription invalidated/subscription is invalidated

What do you mean by subscription invalidated and how is it related to
this feature? I think we should mention something on these lines in
the docs as well.

7.
"Please refer to the comments in these functions for details.". We can
slightly modify this part of the comment as: "Please refer to the
comments in corresponding functions for details."

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: ICU for global collation
Next
From: Juan José Santamaría Flecha
Date:
Subject: Re: Fix vcregress plpython3 warning