Re: subscription disable_on_error not working after ALTER SUBSCRIPTION set bad conninfo - Mailing list pgsql-hackers

From Peter Smith
Subject Re: subscription disable_on_error not working after ALTER SUBSCRIPTION set bad conninfo
Date
Msg-id CAHut+PtoTp6GT7xcYN_BDDQnvr-0CkcuxUCB_YyPVaGL7oMQgg@mail.gmail.com
Whole thread Raw
In response to Re: subscription disable_on_error not working after ALTER SUBSCRIPTION set bad conninfo  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: subscription disable_on_error not working after ALTER SUBSCRIPTION set bad conninfo
List pgsql-hackers
On Thu, Jan 18, 2024 at 12:55 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
...
>
> Although we can improve it to handle this case too, I'm not sure it's
> a bug. The doc says[1]:
>
> Specifies whether the subscription should be automatically disabled if
> any errors are detected by subscription workers during data
> replication from the publisher.
>
> When an apply worker is trying to establish a connection, it's not
> replicating data from the publisher.
>
> Regards,
>
> [1]
https://www.postgresql.org/docs/devel/sql-createsubscription.html#SQL-CREATESUBSCRIPTION-PARAMS-WITH-DISABLE-ON-ERROR
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com

Yeah, I had also seen that wording of those docs. And I agree it
leaves open some room for doubts because strictly from that wording it
can be interpreted that establishing the connection is not actually
"data replication from the publisher" in which case maybe there is no
bug.

OTOH, it was not clear to me if that precise wording was the intention
or not. It could have been written as "Specifies whether the
subscription should be automatically disabled if any errors are
detected by subscription workers." which would mean the same thing 99%
of the time except that would mean that the current behaviour is a
bug.

I tried looking at the original thread where this feature was born [1]
but it is still unclear to me if 'disable_on_error' was meant for
every kind of error or only data replication errors.

Indeed. even the commit message [2] seems to have an each-way bet:
* It talks about errors applying changes --- "Logical replication
apply workers for a subscription can easily get stuck in an infinite
loop of attempting to apply a change..."
* But, it also says it covers any errors --- "When true, both the
tablesync worker and apply worker catch any errors thrown..."

~

Maybe we should be asking ourselves how a user intuitively expects
this option to behave. IMO the answer is right there in the option
name - the subscription says 'disable_on_error' and I got an error, so
I expected the subscription to be disabled. To wriggle out of it by
saying "Ah, but we did not mean _those_ kinds of errors" doesn't quite
seem genuine to me.

======
[1]
https://www.postgresql.org/message-id/flat/CAA4eK1KsaVgkO%3DRbjj0bcXZTpeV1QVm0TGkdxZiH73MHfxf6oQ%40mail.gmail.com#d4a0db154fbeca356a494c50ac877ff1
[2] https://github.com/postgres/postgres/commit/705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Adding facility for injection points (or probe points?) for more advanced tests
Next
From: Michael Paquier
Date:
Subject: Re: heavily contended lwlocks with long wait queues scale badly