Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id CALj2ACUMXrW9+XUgjzpfLaCS09A5RgMawX-e_S18d6KVmrVf0A@mail.gmail.com
Whole thread Raw
In response to Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi all,
>
> If a logical replication worker cannot apply the change on the
> subscriber for some reason (e.g., missing table or violating a
> constraint, etc.), logical replication stops until the problem is
> resolved. Ideally, we resolve the problem on the subscriber (e.g., by
> creating the missing table or removing the conflicting data, etc.) but
> occasionally a problem cannot be fixed and it may be necessary to skip
> the entire transaction in question. Currently, we have two ways to
> skip transactions: advancing the LSN of the replication origin on the
> subscriber and advancing the LSN of the replication slot on the
> publisher. But both ways might not be able to skip exactly one
> transaction in question and end up skipping other transactions too.

Does it mean pg_replication_origin_advance() can't skip exactly one
txn? I'm not familiar with the function or never used it though, I was
just searching for "how to skip a single txn in postgres" and ended up
in [1]. Could you please give some more details on scenarios when we
can't skip exactly one txn? Is there any other way to advance the LSN,
something like directly updating the pg_replication_slots catalog?

[1] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html

> I’d like to propose a way to skip the particular transaction on the
> subscriber side. As the first step, a transaction can be specified to
> be skipped by specifying remote XID on the subscriber. This feature
> would need two sub-features: (1) a sub-feature for users to identify
> the problem subscription and the problem transaction’s XID, and (2) a
> sub-feature to skip the particular transaction to apply.
>
> For (1), I think the simplest way would be to put the details of the
> change being applied in errcontext. For example, the following
> errcontext shows the remote XID as well as the action name, the
> relation name, and commit timestamp:
>
> ERROR:  duplicate key value violates unique constraint "test_pkey"
> DETAIL:  Key (c)=(1) already exists.
> CONTEXT:  during apply of "INSERT" for relation "public.test" in
> transaction with xid 590 commit timestamp 2021-05-21
> 14:32:02.134273+09
>
> The user can identify which remote XID has a problem during applying
> the change (XID=590 in this case). As another idea, we can have a
> statistics view for logical replication workers, showing information
> of the last failure transaction.

Agree with Amit on this. At times, it is difficult to look around in
the server logs, so it will be better to have it in both places.

> For (2), what I'm thinking is to add a new action to ALTER
> SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
> TRANSACTION 590. Also, we can have actions to reset it; ALTER
> SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
> XID to a new column of pg_subscription or a new catalog, having the
> worker reread its subscription information. Once the worker skipped
> the specified transaction, it resets the transaction to skip on the
> catalog. The syntax allows users to specify one remote XID to skip. In
> the future, it might be good if users can also specify multiple XIDs
> (a range of XIDs or a list of XIDs, etc).

What's it like skipping a txn with txn id? Is it that the particular
txn is forced to commit or abort or just skipping some of the code in
the apply worker? IIUC, the behavior of RESET SKIP TRANSACTION is just
to forget the txn id specified in SET SKIP TRANSACTION right?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Move pg_attribute.attcompression to earlier in struct for reduced size?
Next
From: Bharath Rupireddy
Date:
Subject: Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options