Re: Optionally automatically disable logical replication subscriptions on error - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Optionally automatically disable logical replication subscriptions on error
Date
Msg-id CAA4eK1KkhsNAW3=XxOdVRD9+RcQG5PRRFBFzAMibMG0YFFmAmg@mail.gmail.com
Whole thread Raw
In response to Optionally automatically disable logical replication subscriptions on error  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: Optionally automatically disable logical replication subscriptions on error
List pgsql-hackers
On Fri, Jun 18, 2021 at 1:48 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
>
> Hackers,
>
> Logical replication apply workers for a subscription can easily get stuck in an infinite loop of attempting to apply
achange, triggering an error (such as a constraint violation), exiting with an error written to the subscription worker
log,and restarting. 
>
> As things currently stand, only superusers can create subscriptions.  Ongoing work to delegate superuser tasks to
non-superuserscreates the potential for even more errors to be triggered, specifically, errors where the apply worker
doesnot have permission to make changes to the target table. 
>
> The attached patch makes it possible to create a subscription using a new subscription_parameter, "disable_on_error",
suchthat rather than going into an infinite loop, the apply worker will catch errors and automatically disable the
subscription,breaking the loop.  The new parameter defaults to false.  When false, the PG_TRY/PG_CATCH overhead is
avoided,so for subscriptions which do not use the feature, there shouldn't be any change.  Users can manually clear the
errorafter fixing the underlying issue with an ALTER SUBSCRIPTION .. ENABLE command. 
>

I see this idea has merits and it will help users to repair failing
subscriptions. Few points on a quick look at the patch: (a) The patch
seem to be assuming that the error can happen only by the apply worker
but I think the constraint violation can happen via one of the table
sync workers as well, (b) What happens if the error happens when you
are updating the error information in the catalog table. I think
instead of seeing the actual apply time error, the user might see some
other for which it won't be clear what is an appropriate action.

We are also discussing another action like skipping the apply of the
transaction on an error [1]. I think it is better to evaluate both the
proposals as one seems to be an extension of another. Adding
Sawada-San, as he is working on the other proposal.

[1] - https://www.postgresql.org/message-id/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK%3D30xJfUVihNZDA%40mail.gmail.com

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: Thomas Munro
Date:
Subject: Re: snapshot too old issues, first around wraparound and then more.