RE: Optionally automatically disable logical replication subscriptions on error - Mailing list pgsql-hackers
From | osumi.takamichi@fujitsu.com |
---|---|
Subject | RE: Optionally automatically disable logical replication subscriptions on error |
Date | |
Msg-id | TYCPR01MB83734B2F88F1FD435A8EBD98ED039@TYCPR01MB8373.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Optionally automatically disable logical replication subscriptions on error (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Optionally automatically disable logical replication subscriptions on error
RE: Optionally automatically disable logical replication subscriptions on error |
List | pgsql-hackers |
On Wednesday, March 2, 2022 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > After more thoughts, should we do both AbortOutOfAnyTransaction() and error > message handling while holding interrupts? That is, > > HOLD_INTERRUPTS(); > EmitErrorReport(); > FlushErrorState(); > AbortOutOfAny Transaction(); > RESUME_INTERRUPTS(); > pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work > er()); > > I think it's better that we do clean up first and then do other works such as > sending the message to the stats collector and updating the catalog. I agree. Fixed. Along with this change, I corrected the header comment of DisableSubscriptionOnError, too. > Here are some comments on v24 patch: > > + /* Look up our subscription in the catalogs */ > + tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId, > + > CStringGetDatum(MySubscription->name)); > > s/catalogs/catalog/ > > Why don't we use SUBSCRIPTIONOID with MySubscription->oid? Changed. > --- > + if (!HeapTupleIsValid(tup)) > + ereport(ERROR, > + errcode(ERRCODE_UNDEFINED_OBJECT), > + errmsg("subscription \"%s\" does not > exist", > + MySubscription->name)); > > I think we should use elog() here rather than ereport() since it's a > should-not-happen error. Fixed > --- > + /* Notify the subscription will be no longer valid */ > > I'd suggest rephrasing it to like "Notify the subscription will be disabled". (the > subscription is still valid actually, but just disabled). Fixed. Also, I've made this sentence past one, because of the code place change below. > --- > + /* Notify the subscription will be no longer valid */ > + ereport(LOG, > + errmsg("logical replication subscription > \"%s\" will be disabled due to an error", > + MySubscription->name)); > + > > I think we can report the log at the end of this function rather than during the > transaction. Fixed. In this case, I needed to adjust the comment to indicate the processing to disable the sub has *completed* as well. > --- > +my $cmd = qq( > +CREATE TABLE tbl (i INT); > +ALTER TABLE tbl REPLICA IDENTITY FULL; > +CREATE INDEX tbl_idx ON tbl(i)); > > I think we don't need to set REPLICA IDENTITY FULL to this table since there is > notupdate/delete. > > Do we need tbl_idx? Removed both the replica identity and tbl_idx; > --- > +$cmd = qq( > +SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr WHERE > +sr.srsubstate IN ('s', 'r')); > +$node_subscriber->poll_query_until('postgres', $cmd); > > It seems better to add a condition of srrelid just in case. Makes sense. Fixed. > --- > +$cmd = qq( > +SELECT count(1) = 1 FROM pg_catalog.pg_subscription s WHERE > s.subname = > +'sub' AND s.subenabled IS FALSE); > +$node_subscriber->poll_query_until('postgres', $cmd) > + or die "Timed out while waiting for subscriber to be disabled"; > > I think that it's more natural to directly check the subscription's subenabled. > For example: > > SELECT subenabled = false FROM pg_subscription WHERE subname = 'sub'; Fixed. I modified another code similar to this for tablesync error as well. > --- > +$cmd = q(ALTER SUBSCRIPTION sub ENABLE); > +$node_subscriber->safe_psql('postgres', $cmd); $cmd = q(SELECT > COUNT(1) > += 3 FROM tbl WHERE i = 3); > +$node_subscriber->poll_query_until('postgres', $cmd) > + or die "Timed out while waiting for applying"; > > I think it's better to wait for the subscriber to catch up and check the query > result instead of using poll_query_until() so that we can check the query result > in case where the test fails. I modified the code to wait for the subscriber and deleted poll_query_until. Also, when I consider the test failure for this test as you mentioned, expecting and checking the number of return value that equals 3 would be better. So, I expressed this point in my test as well, according to your advice. > --- > +$cmd = qq(DROP INDEX tbl_unique); > +$node_subscriber->safe_psql('postgres', $cmd); > > In the newly added tap tests, all queries are consistently assigned to $cmd and > executed even when the query is used only once. It seems a different style from > the one in other tap tests. Is there any reason why we do this style for all queries > in the tap tests? Fixed. I removed the 'cmd' variable itself. Attached an updated patch v26. Best Regards, Takamichi Osumi
Attachment
pgsql-hackers by date: