Re: Optionally automatically disable logical replication subscriptions on error - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Optionally automatically disable logical replication subscriptions on error |
Date | |
Msg-id | CAD21AoDPCnVpMB-nR_5S-ZsV-+c1VH2uVxJ99ZyO+H_tnO7KTg@mail.gmail.com Whole thread Raw |
In response to | Re: Optionally automatically disable logical replication subscriptions on error (Peter Smith <smithpb2250@gmail.com>) |
Responses |
RE: Optionally automatically disable logical replication subscriptions on error
|
List | pgsql-hackers |
On Wed, Mar 2, 2022 at 9:34 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Please see below my review comments for v24. > > ====== > > 1. src/backend/replication/logical/worker.c - start_table_sync > > + /* Report the worker failed during table synchronization */ > + pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker()); > > (This review comment is just FYI in case you did not do this deliberately) > > FYI, you didn't really need to call am_tablesync_worker() here because > it is already asserted for the sync phase that it MUST be a tablesync > worker. > > OTOH, IMO it documents the purpose of the parm so if it was deliberate > then that is OK too. > > ~~~ > > 2. src/backend/replication/logical/worker.c - start_table_sync > > + PG_CATCH(); > + { > + /* > + * Abort the current transaction so that we send the stats message in > + * an idle state. > + */ > + AbortOutOfAnyTransaction(); > + > + /* Report the worker failed during table synchronization */ > + pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker()); > + > > [Maybe you will say that this review comment is unrelated to > disable_on_err, but since this is a totally new/refactored function > then I think maybe there is no problem to make this change at the same > time. Anyway there is no function change, it is just rearranging some > comments.] > > I felt the separation of those 2 statements and comments makes that > code less clean than it could/should be. IMO they should be grouped > together. > > SUGGESTED > /* > * Report the worker failed during table synchronization. Abort the > * current transaction so that the stats message is sent in an idle > * state. > */ > AbortOutOfAnyTransaction(); > pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker()); After more thoughts, should we do both AbortOutOfAnyTransaction() and error message handling while holding interrupts? That is, HOLD_INTERRUPTS(); EmitErrorReport(); FlushErrorState(); AbortOutOfAny Transaction(); RESUME_INTERRUPTS(); pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker()); I think it's better that we do clean up first and then do other works such as sending the message to the stats collector and updating the catalog. Here are some comments on v24 patch: + /* Look up our subscription in the catalogs */ + tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId, + CStringGetDatum(MySubscription->name)); s/catalogs/catalog/ Why don't we use SUBSCRIPTIONOID with MySubscription->oid? --- + if (!HeapTupleIsValid(tup)) + ereport(ERROR, + errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("subscription \"%s\" does not exist", + MySubscription->name)); I think we should use elog() here rather than ereport() since it's a should-not-happen error. --- + /* Notify the subscription will be no longer valid */ I'd suggest rephrasing it to like "Notify the subscription will be disabled". (the subscription is still valid actually, but just disabled). --- + /* Notify the subscription will be no longer valid */ + ereport(LOG, + errmsg("logical replication subscription \"%s\" will be disabled due to an error", + MySubscription->name)); + I think we can report the log at the end of this function rather than during the transaction. --- +my $cmd = qq( +CREATE TABLE tbl (i INT); +ALTER TABLE tbl REPLICA IDENTITY FULL; +CREATE INDEX tbl_idx ON tbl(i)); I think we don't need to set REPLICA IDENTITY FULL to this table since there is notupdate/delete. Do we need tbl_idx? --- +$cmd = qq( +SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr +WHERE sr.srsubstate IN ('s', 'r')); +$node_subscriber->poll_query_until('postgres', $cmd); It seems better to add a condition of srrelid just in case. --- +$cmd = qq( +SELECT count(1) = 1 FROM pg_catalog.pg_subscription s +WHERE s.subname = 'sub' AND s.subenabled IS FALSE); +$node_subscriber->poll_query_until('postgres', $cmd) + or die "Timed out while waiting for subscriber to be disabled"; I think that it's more natural to directly check the subscription's subenabled. For example: SELECT subenabled = false FROM pg_subscription WHERE subname = 'sub'; --- +$cmd = q(ALTER SUBSCRIPTION sub ENABLE); +$node_subscriber->safe_psql('postgres', $cmd); +$cmd = q(SELECT COUNT(1) = 3 FROM tbl WHERE i = 3); +$node_subscriber->poll_query_until('postgres', $cmd) + or die "Timed out while waiting for applying"; I think it's better to wait for the subscriber to catch up and check the query result instead of using poll_query_until() so that we can check the query result in case where the test fails. --- +$cmd = qq(DROP INDEX tbl_unique); +$node_subscriber->safe_psql('postgres', $cmd); In the newly added tap tests, all queries are consistently assigned to $cmd and executed even when the query is used only once. It seems a different style from the one in other tap tests. Is there any reason why we do this style for all queries in the tap tests? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: