RE: Optionally automatically disable logical replication subscriptions on error - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Optionally automatically disable logical replication subscriptions on error
Date
Msg-id TYCPR01MB83730138EF8B26278647E8D6ED029@TYCPR01MB8373.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Optionally automatically disable logical replication subscriptions on error  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Optionally automatically disable logical replication subscriptions on error
List pgsql-hackers
On Tuesday, March 1, 2022 9:49 AM Peter Smith <smithpb2250@gmail.com> wrote:
> Please see below my review comments for v22.
> 
> ======
> 
> 1. Commit message
> 
> "table sync worker" -> "tablesync worker"
Fixed.

> ~~~
> 
> 2. doc/src/sgml/catalogs.sgml
> 
> +      <para>
> +       If true, the subscription will be disabled when subscription
> +       workers detect any errors
> +      </para></entry>
> 
> It felt a bit strange to say "subscription" 2x in the sentence, but I am not sure
> how to improve it. Maybe like below?
> 
> BEFORE
> If true, the subscription will be disabled when subscription workers detect any
> errors
> 
> SUGGESTED
> If true, the subscription will be disabled if one of its workers detects an error
Fixed.


> ~~~
> 
> 3. src/backend/replication/logical/worker.c - DisableSubscriptionOnError
> 
> @@ -2802,6 +2803,69 @@ LogicalRepApplyLoop(XLogRecPtr
> last_received)  }
> 
>  /*
> + * Disable the current subscription, after error recovery processing.
> + */
> +static void
> +DisableSubscriptionOnError(void)
> 
> I thought the "after error recovery processing" part was a bit generic and did not
> really say what it was doing.
> 
> BEFORE
> Disable the current subscription, after error recovery processing.
> SUGGESTED
> Disable the current subscription, after logging the error that caused this
> function to be called.
Fixed.

> ~~~
> 
> 4. src/backend/replication/logical/worker.c - start_apply
> 
> + if (MySubscription->disableonerr)
> + {
> + DisableSubscriptionOnError();
> + return;
> + }
> +
> + MemoryContextSwitchTo(ecxt);
> + PG_RE_THROW();
> + }
> + PG_END_TRY();
> 
> The current code looks correct, but I felt it is a bit tricky to easily see the
> execution path after the return.
> 
> Since it will effectively just exit anyhow I think it will be simpler just to do that
> explicitly right here instead of the 'return'. This will also make the code
> consistent with the same 'disableonerr' logic of the start_start_sync.
> 
> SUGGESTION
> if (MySubscription->disableonerr)
> {
> DisableSubscriptionOnError();
> proc_exit(0);
> }
Fixed.

> ~~~
> 
> 5. src/bin/pg_dump/pg_dump.c
> 
> @@ -4463,6 +4473,9 @@ dumpSubscription(Archive *fout, const
> SubscriptionInfo *subinfo)
>   if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0)
>   appendPQExpBufferStr(query, ", two_phase = on");
> 
> + if (strcmp(subinfo->subdisableonerr, "f") != 0)
> + appendPQExpBufferStr(query, ", disable_on_error = true");
> +
> 
> Although the code is correct, I think it would be more natural to set this option
> as true when the user wants it true. e.g. check for "t"
> same as 'subbinary' is doing. This way, even if there was some
> unknown/corrupted value the code would do nothing, which is the behaviour
> you want...
> 
> SUGGESTION
> if (strcmp(subinfo->subdisableonerr, "t") == 0)
Fixed.


> ~~~
> 
> 6. src/include/catalog/pg_subscription.h
> 
> @@ -67,6 +67,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
> 
>   char subtwophasestate; /* Stream two-phase transactions */
> 
> + bool subdisableonerr; /* True if occurrence of apply errors
> + * should disable the subscription */
> 
> The comment seems not quite right because it's not just about apply errors. E.g.
> I think any error in tablesync will cause disablement too.
> 
> BEFORE
> True if occurrence of apply errors should disable the subscription SUGGESTED
> True if a worker error should cause the subscription to be disabled
Fixed.


> ~~~
> 
> 7. src/test/regress/sql/subscription.sql - whitespace
> 
> +-- now it works
> +CREATE SUBSCRIPTION regress_testsub CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false,
> disable_on_error = false);
> +
> +\dRs+
> +
> +ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
> +
> +\dRs+
> +ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE); DROP
> +SUBSCRIPTION regress_testsub;
> +
> 
> I think should be a blank line after that last \dRs+ just like the other one,
> because it belongs logically with the code above it, not with the ALTER
> slot_name.
Fixed.


> ~~~
> 
> 8. src/test/subscription/t/028_disable_on_error.pl - filename
> 
> The 028 number needs to be bumped because there is already a TAP test
> called 028 now
This is already done in v22, so I've skipped this.

> ~~~
> 
> 9. src/test/subscription/t/028_disable_on_error.pl - missing test
> 
> There was no test case for the last combination where the user correct the
> apply worker problem: E.g. After a previous error/disable of the subscriber,
> remove the index, publish the inserts again, and check they get applied
> properly.
Fixed.

Attached the updated version v24.


Best Regards,
    Takamichi Osumi


Attachment

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Next
From: Pavel Stehule
Date:
Subject: Re: Separate the result of \watch for each query execution (psql)