Re: Skipping logical replication transactions on subscriber side - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: Skipping logical replication transactions on subscriber side |
Date | |
Msg-id | F7A797EB-6E0D-43F9-9AE7-10776DD383D3@enterprisedb.com Whole thread Raw |
In response to | Re: Skipping logical replication transactions on subscriber side (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Skipping logical replication transactions on subscriber side
|
List | pgsql-hackers |
> On Aug 30, 2021, at 12:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I've attached rebased patches. Here are some review comments: For the v12-0002 patch: The documentation changes for ALTER SUBSCRIPTION .. RESET look strange to me. You grouped SET and RESET together, much likesql-altertable.html has them grouped, but I don't think it flows naturally here, as the two commands do not support thesame set of parameters. It might look better if you documented these separately. It might also be good to order theparameters the same, so that the differences can more quickly be seen. For the v12-0003 patch: I believe this feature is needed, but it also seems like a very powerful foot-gun. Can we do anything to make it less likelythat users will hurt themselves with this tool? I am thinking back to support calls I have attended. When a production system is down, there is often some hesitancy toperform ad-hoc operations on the database, but once the decision has been made to do so, people try to get the whole processdone as quickly as possible. If multiple transactions on the publisher fail on the subscriber, they will do so inseries, not in parallel. The process of clearing these errors will amount to copying the xid of each failed transactionto the ALTER SUBSCRIPTION ... SET (skip_xid = xxx) command and running it, then the next, then the next, .... Perhaps the first couple times through the process, the customer will look to see that the failure is of the same typeand on the same table, but after a short time they will likely just script something to clear the rest as quickly aspossible. In the heat of the moment, they may not include a check of the failure message, but merely a grep of the failingxid. If the user could instead clear all failed transactions of the same type, that might make it less likely that they unthinkinglyalso skip subsequent errors of some different type. Perhaps something like ALTER SUBSCRIPTION ... SET (skip_failures= 'duplicate key value violates unique constraint "test_pkey"')? This is arguably a different feature request,and not something your patch is required to address, but I wonder how much we should limit people shooting themselvesin the foot? If we built something like this using your skip_xid feature, rather than instead of your skip_xidfeature, would your feature need to be modified? The docs could use some rewording, too: + If incoming data violates any constraints the logical replication + will stop until it is resolved. In my experience, logical replication doesn't stop, but instead goes into an infinite loop of retries. + The resolution can be done either + by changing data on the subscriber so that it doesn't conflict with + incoming change or by skipping the whole transaction. I'm having trouble thinking of an example conflict where skipping a transaction would be better than writing a BEFORE INSERTtrigger on the conflicting table which suppresses or redirects conflicting rows somewhere else. Particularly for largertransactions containing multiple statements, suppressing the conflicting rows using a trigger would be less messy thanskipping the transaction. I think your patch adds a useful tool to the toolkit, but maybe we should mention more alternativesin the docs? Something like, "changing the data on the subscriber so that it doesn't conflict with incomingchanges, or dropping the conflicting constraint or unique index, or writing a trigger on the subscriber to suppressor redirect conflicting incoming changes, or as a last resort, by skipping the whole transaction"? Perhaps I'm reading your phrase "changing the data on the subscriber" too narrowly. To me, that means running DML (eithera DELETE or an UPDATE) on the existing data in the table where the conflict arises. These other options are DDL anddo not easily come to mind when I read that phrase. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: