Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Skipping logical replication transactions on subscriber side
Date
Msg-id F7A797EB-6E0D-43F9-9AE7-10776DD383D3@enterprisedb.com
Whole thread Raw
In response to Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Skipping logical replication transactions on subscriber side
List pgsql-hackers

> On Aug 30, 2021, at 12:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached rebased patches.

Here are some review comments:

For the v12-0002 patch:

The documentation changes for ALTER SUBSCRIPTION .. RESET look strange to me.  You grouped SET and RESET together, much
likesql-altertable.html has them grouped, but I don't think it flows naturally here, as the two commands do not support
thesame set of parameters.  It might look better if you documented these separately.  It might also be good to order
theparameters the same, so that the differences can more quickly be seen. 

For the v12-0003 patch:

I believe this feature is needed, but it also seems like a very powerful foot-gun.  Can we do anything to make it less
likelythat users will hurt themselves with this tool? 

I am thinking back to support calls I have attended.  When a production system is down, there is often some hesitancy
toperform ad-hoc operations on the database, but once the decision has been made to do so, people try to get the whole
processdone as quickly as possible.  If multiple transactions on the publisher fail on the subscriber, they will do so
inseries, not in parallel.  The process of clearing these errors will amount to copying the xid of each failed
transactionto the ALTER SUBSCRIPTION ... SET (skip_xid = xxx) command and running it, then the next, then the next,
.... Perhaps the first couple times through the process, the customer will look to see that the failure is of the same
typeand on the same table, but after a short time they will likely just script something to clear the rest as quickly
aspossible.  In the heat of the moment, they may not include a check of the failure message, but merely a grep of the
failingxid. 

If the user could instead clear all failed transactions of the same type, that might make it less likely that they
unthinkinglyalso skip subsequent errors of some different type.  Perhaps something like ALTER SUBSCRIPTION ... SET
(skip_failures= 'duplicate key value violates unique constraint "test_pkey"')?  This is arguably a different feature
request,and not something your patch is required to address, but I wonder how much we should limit people shooting
themselvesin the foot?  If we built something like this using your skip_xid feature, rather than instead of your
skip_xidfeature, would your feature need to be modified? 

The docs could use some rewording, too:

+          If incoming data violates any constraints the logical replication
+          will stop until it is resolved.

In my experience, logical replication doesn't stop, but instead goes into an infinite loop of retries.

+          The resolution can be done either
+          by changing data on the subscriber so that it doesn't conflict with
+          incoming change or by skipping the whole transaction.

I'm having trouble thinking of an example conflict where skipping a transaction would be better than writing a BEFORE
INSERTtrigger on the conflicting table which suppresses or redirects conflicting rows somewhere else.  Particularly for
largertransactions containing multiple statements, suppressing the conflicting rows using a trigger would be less messy
thanskipping the transaction.  I think your patch adds a useful tool to the toolkit, but maybe we should mention more
alternativesin the docs?  Something like, "changing the data on the subscriber so that it doesn't conflict with
incomingchanges, or dropping the conflicting constraint or unique index, or writing a trigger on the subscriber to
suppressor redirect conflicting incoming changes, or as a last resort, by skipping the whole transaction"? 

Perhaps I'm reading your phrase "changing the data on the subscriber" too narrowly.  To me, that means running DML
(eithera DELETE or an UPDATE) on the existing data in the table where the conflict arises.  These other options are DDL
anddo not easily come to mind when I read that phrase. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Zhihong Yu
Date:
Subject: Re: [PATCH] Partial foreign key updates in referential integrity triggers