Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Skipping logical replication transactions on subscriber side |
Date | |
Msg-id | CAA4eK1+0fpMj-jvfch+7nkdCTxx6qbwNYrhQVGAbPfZLViaTcQ@mail.gmail.com Whole thread Raw |
In response to | Re: Skipping logical replication transactions on subscriber side (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Skipping logical replication transactions on subscriber side
|
List | pgsql-hackers |
On Thu, Dec 16, 2021 at 11:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > So if skip_xid is already changed, the apply worker would do > > > replorigin_advance() with WAL logging, instead of committing the > > > catalog change? > > > > > > > Right. BTW, how are you planning to advance the origin? Normally, a > > commit transaction would do it but when we are skipping all changes, > > the commit might not do it as there won't be any transaction id > > assigned. > > I've not tested it yet but replorigin_advance() with wal_log = true > seems to work for this case. > IIUC, the changes corresponding to above in the latest patch are as follows: --- a/src/backend/replication/logical/origin.c +++ b/src/backend/replication/logical/origin.c @@ -921,7 +921,8 @@ replorigin_advance(RepOriginId node, LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE); /* Make sure it's not used by somebody else */ - if (replication_state->acquired_by != 0) + if (replication_state->acquired_by != 0 && + replication_state->acquired_by != MyProcPid) { ... clear_subscription_skip_xid() { .. + else if (!XLogRecPtrIsInvalid(origin_lsn)) + { + /* + * User has already changed subskipxid before clearing the subskipxid, so + * don't change the catalog but just advance the replication origin. + */ + replorigin_advance(replorigin_session_origin, origin_lsn, + GetXLogInsertRecPtr(), + false, /* go_backward */ + true /* wal_log */); + } .. } I was thinking what if we don't advance origin explicitly in this case? Actually, that will be no different than the transactions where the apply worker doesn't apply any change because the initial sync is in progress (see should_apply_changes_for_rel()) or we have received an empty transaction. In those cases also, the origin lsn won't be advanced even though we acknowledge the advanced last_received location because of keep_alive messages. Now, it is possible after the restart we send the old start_lsn location because the replication origin was not updated before restart but we handle that case in the server by starting from the last confirmed location. See below code: CreateDecodingContext() { .. else if (start_lsn < slot->data.confirmed_flush) .. Few other comments on the latest patch: ================================= 1. A conflict will produce an error and will stop the replication; it must be resolved manually by the user. Details about the conflict can be found in - the subscriber's server log. + <xref linkend="monitoring-pg-stat-subscription-workers"/> as well as the + subscriber's server log. Can we slightly change the modified line to: "Details about the conflict can be found in <xref linkend="monitoring-pg-stat-subscription-workers"/> and the subscriber's server log."? I think we can commit this change separately as this is true even without this patch. 2. The resolution can be done either by changing data on the subscriber so - that it does not conflict with the incoming change or by skipping the - transaction that conflicts with the existing data. The transaction can be - skipped by calling the <link linkend="pg-replication-origin-advance"> + that it does not conflict with the incoming changes or by skipping the whole + transaction. This option specifies the ID of the transaction whose + application is to be skipped by the logical replication worker. The logical + replication worker skips all data modification transaction conflicts with + the existing data. When a conflict produce an error, it is shown in + <structname>pg_stat_subscription_workers</structname> view as follows: I don't think most of the additional text added in the above paragraph is required. We can rephrase it as: "The resolution can be done either by changing data on the subscriber so that it does not conflict with the incoming change or by skipping the transaction that conflicts with the existing data. When a conflict produces an error, it is shown in <structname>pg_stat_subscription_workers</structname> view as follows:". After that keep the text, you have. 3. They skip the whole transaction, including changes that may not violate any + constraint. They may easily make the subscriber inconsistent, especially if + a user specifies the wrong transaction ID or the position of origin. Can we slightly reword the above text as: "Skipping the whole transaction includes skipping the changes that may not violate any constraint. This can easily make the subscriber inconsistent, especially if a user specifies the wrong transaction ID or the position of origin."? 4. The logical replication worker skips all data + modification changes within the specified transaction. Therefore, since + it skips the whole transaction including the changes that may not violate + the constraint, it should only be used as a last resort. This option has + no effect for the transaction that is already prepared with enabling + <literal>two_phase</literal> on susbscriber. Let's slightly reword the above text as: "The logical replication worker skips all data modification changes within the specified transaction including the changes that may not violate the constraint, so, it should only be used as a last resort. This option has no effect on the transaction that is already prepared by enabling <literal>two_phase</literal> on the subscriber." 5. + by the logical replication worker. Setting <literal>NONE</literal> means + to reset the transaction ID. Let's slightly reword the second part of the sentence as: "Setting <literal>NONE</literal> resets the transaction ID." 6. Once we start skipping + * changes, we don't stop it until the we skip all changes of the transaction even + * if the subscription invalidated and MySubscription->skipxid gets changed or reset. /subscription invalidated/subscription is invalidated What do you mean by subscription invalidated and how is it related to this feature? I think we should mention something on these lines in the docs as well. 7. "Please refer to the comments in these functions for details.". We can slightly modify this part of the comment as: "Please refer to the comments in corresponding functions for details." -- With Regards, Amit Kapila.
pgsql-hackers by date: