RE: Optionally automatically disable logical replication subscriptions on error - Mailing list pgsql-hackers
From | osumi.takamichi@fujitsu.com |
---|---|
Subject | RE: Optionally automatically disable logical replication subscriptions on error |
Date | |
Msg-id | TYCPR01MB83730138EF8B26278647E8D6ED029@TYCPR01MB8373.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Optionally automatically disable logical replication subscriptions on error (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Optionally automatically disable logical replication subscriptions on error
|
List | pgsql-hackers |
On Tuesday, March 1, 2022 9:49 AM Peter Smith <smithpb2250@gmail.com> wrote: > Please see below my review comments for v22. > > ====== > > 1. Commit message > > "table sync worker" -> "tablesync worker" Fixed. > ~~~ > > 2. doc/src/sgml/catalogs.sgml > > + <para> > + If true, the subscription will be disabled when subscription > + workers detect any errors > + </para></entry> > > It felt a bit strange to say "subscription" 2x in the sentence, but I am not sure > how to improve it. Maybe like below? > > BEFORE > If true, the subscription will be disabled when subscription workers detect any > errors > > SUGGESTED > If true, the subscription will be disabled if one of its workers detects an error Fixed. > ~~~ > > 3. src/backend/replication/logical/worker.c - DisableSubscriptionOnError > > @@ -2802,6 +2803,69 @@ LogicalRepApplyLoop(XLogRecPtr > last_received) } > > /* > + * Disable the current subscription, after error recovery processing. > + */ > +static void > +DisableSubscriptionOnError(void) > > I thought the "after error recovery processing" part was a bit generic and did not > really say what it was doing. > > BEFORE > Disable the current subscription, after error recovery processing. > SUGGESTED > Disable the current subscription, after logging the error that caused this > function to be called. Fixed. > ~~~ > > 4. src/backend/replication/logical/worker.c - start_apply > > + if (MySubscription->disableonerr) > + { > + DisableSubscriptionOnError(); > + return; > + } > + > + MemoryContextSwitchTo(ecxt); > + PG_RE_THROW(); > + } > + PG_END_TRY(); > > The current code looks correct, but I felt it is a bit tricky to easily see the > execution path after the return. > > Since it will effectively just exit anyhow I think it will be simpler just to do that > explicitly right here instead of the 'return'. This will also make the code > consistent with the same 'disableonerr' logic of the start_start_sync. > > SUGGESTION > if (MySubscription->disableonerr) > { > DisableSubscriptionOnError(); > proc_exit(0); > } Fixed. > ~~~ > > 5. src/bin/pg_dump/pg_dump.c > > @@ -4463,6 +4473,9 @@ dumpSubscription(Archive *fout, const > SubscriptionInfo *subinfo) > if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0) > appendPQExpBufferStr(query, ", two_phase = on"); > > + if (strcmp(subinfo->subdisableonerr, "f") != 0) > + appendPQExpBufferStr(query, ", disable_on_error = true"); > + > > Although the code is correct, I think it would be more natural to set this option > as true when the user wants it true. e.g. check for "t" > same as 'subbinary' is doing. This way, even if there was some > unknown/corrupted value the code would do nothing, which is the behaviour > you want... > > SUGGESTION > if (strcmp(subinfo->subdisableonerr, "t") == 0) Fixed. > ~~~ > > 6. src/include/catalog/pg_subscription.h > > @@ -67,6 +67,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) > BKI_SHARED_RELATION BKI_ROW > > char subtwophasestate; /* Stream two-phase transactions */ > > + bool subdisableonerr; /* True if occurrence of apply errors > + * should disable the subscription */ > > The comment seems not quite right because it's not just about apply errors. E.g. > I think any error in tablesync will cause disablement too. > > BEFORE > True if occurrence of apply errors should disable the subscription SUGGESTED > True if a worker error should cause the subscription to be disabled Fixed. > ~~~ > > 7. src/test/regress/sql/subscription.sql - whitespace > > +-- now it works > +CREATE SUBSCRIPTION regress_testsub CONNECTION > 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, > disable_on_error = false); > + > +\dRs+ > + > +ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true); > + > +\dRs+ > +ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE); DROP > +SUBSCRIPTION regress_testsub; > + > > I think should be a blank line after that last \dRs+ just like the other one, > because it belongs logically with the code above it, not with the ALTER > slot_name. Fixed. > ~~~ > > 8. src/test/subscription/t/028_disable_on_error.pl - filename > > The 028 number needs to be bumped because there is already a TAP test > called 028 now This is already done in v22, so I've skipped this. > ~~~ > > 9. src/test/subscription/t/028_disable_on_error.pl - missing test > > There was no test case for the last combination where the user correct the > apply worker problem: E.g. After a previous error/disable of the subscriber, > remove the index, publish the inserts again, and check they get applied > properly. Fixed. Attached the updated version v24. Best Regards, Takamichi Osumi
Attachment
pgsql-hackers by date:
Previous
From: Bharath RupireddyDate:
Subject: Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Next
From: Pavel StehuleDate:
Subject: Re: Separate the result of \watch for each query execution (psql)