Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Skipping logical replication transactions on subscriber side
Date
Msg-id CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
Whole thread Raw
Responses Re: Skipping logical replication transactions on subscriber side
Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers
Hi all,

If a logical replication worker cannot apply the change on the
subscriber for some reason (e.g., missing table or violating a
constraint, etc.), logical replication stops until the problem is
resolved. Ideally, we resolve the problem on the subscriber (e.g., by
creating the missing table or removing the conflicting data, etc.) but
occasionally a problem cannot be fixed and it may be necessary to skip
the entire transaction in question. Currently, we have two ways to
skip transactions: advancing the LSN of the replication origin on the
subscriber and advancing the LSN of the replication slot on the
publisher. But both ways might not be able to skip exactly one
transaction in question and end up skipping other transactions too.

I’d like to propose a way to skip the particular transaction on the
subscriber side. As the first step, a transaction can be specified to
be skipped by specifying remote XID on the subscriber. This feature
would need two sub-features: (1) a sub-feature for users to identify
the problem subscription and the problem transaction’s XID, and (2) a
sub-feature to skip the particular transaction to apply.

For (1), I think the simplest way would be to put the details of the
change being applied in errcontext. For example, the following
errcontext shows the remote XID as well as the action name, the
relation name, and commit timestamp:

ERROR:  duplicate key value violates unique constraint "test_pkey"
DETAIL:  Key (c)=(1) already exists.
CONTEXT:  during apply of "INSERT" for relation "public.test" in
transaction with xid 590 commit timestamp 2021-05-21
14:32:02.134273+09

The user can identify which remote XID has a problem during applying
the change (XID=590 in this case). As another idea, we can have a
statistics view for logical replication workers, showing information
of the last failure transaction.

For (2), what I'm thinking is to add a new action to ALTER
SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
TRANSACTION 590. Also, we can have actions to reset it; ALTER
SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
XID to a new column of pg_subscription or a new catalog, having the
worker reread its subscription information. Once the worker skipped
the specified transaction, it resets the transaction to skip on the
catalog. The syntax allows users to specify one remote XID to skip. In
the future, it might be good if users can also specify multiple XIDs
(a range of XIDs or a list of XIDs, etc).

Feedback and comment are very welcome.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Skip partition tuple routing with constant partition key
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Skip partition tuple routing with constant partition key