Re: Optionally automatically disable logical replication subscriptions on error - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Optionally automatically disable logical replication subscriptions on error |
Date | |
Msg-id | CAHut+Pt+Mm=312rZ+YZkMdC0xZqUGfdLzt=V9burF6coW_vRkg@mail.gmail.com Whole thread Raw |
In response to | RE: Optionally automatically disable logical replication subscriptions on error ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>) |
Responses |
RE: Optionally automatically disable logical replication subscriptions on error
|
List | pgsql-hackers |
On Mon, Feb 21, 2022 at 11:44 PM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > On Monday, February 21, 2022 2:56 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Thanks for addressing my previous comments. Now I have looked at v19. > > > > On Mon, Feb 21, 2022 at 11:25 AM osumi.takamichi@fujitsu.com > > <osumi.takamichi@fujitsu.com> wrote: > > > > > > On Friday, February 18, 2022 3:27 PM Peter Smith > > <smithpb2250@gmail.com> wrote: > > > > Hi. Below are my code review comments for v18. > > > Thank you for your review ! > > ... > > > > 5. src/backend/replication/logical/worker.c - > > > > DisableSubscriptionOnError > > > > > > > > + /* > > > > + * We would not be here unless this subscription's disableonerr > > > > + field was > > > > + * true when our worker began applying changes, but check whether > > > > + that > > > > + * field has changed in the interim. > > > > + */ > > > > > > > > Apparently, this function might just do nothing if it detects some > > > > situation where the flag was changed somehow, but I'm not 100% sure > > > > that the callers are properly catering for when nothing happens. > > > > > > > > IMO it would be better if this function would return true/false to > > > > mean "did disable subscription happen or not?" because that will > > > > give the calling code the chance to check the function return and do > > > > the right thing - e.g. if the caller first thought it should be disabled but then > > it turned out it did NOT disable... > > > I don't think we need to do something more. > > > After this function, table sync worker and the apply worker just exit. > > > IMO, we don't need to do additional work for already-disabled > > > subscription on the caller sides. > > > It should be sufficient to fulfill the purpose of > > > DisableSubscriptionOnError or confirm it has been fulfilled. > > > > Hmmm - Yeah, it may be the workers might just exit soon after anyhow as you > > say so everything comes out in the wash, but still, I felt for this case when > > DisableSubscriptionOnError turned out to do nothing it would be better to exit > > via the existing logic. And that is easy to do if the function returns true/false. > > > > For example, changes like below seemed neater code to me. YMMV. > > > > BEFORE (SyncTableStartWrapper): > > if (MySubscription->disableonerr) > > { > > DisableSubscriptionOnError(); > > proc_exit(0); > > } > > AFTER > > if (MySubscription->disableonerr && DisableSubscriptionOnError()) > > proc_exit(0); > > > > BEFORE (ApplyLoopWrapper) > > if (MySubscription->disableonerr) > > { > > /* Disable the subscription */ > > DisableSubscriptionOnError(); > > return; > > } > > AFTER > > if (MySubscription->disableonerr && DisableSubscriptionOnError()) return; > Okay, so this return value works for better readability. > Fixed. > > > > ~~~ > > > > Here are a couple more comments: > > > > 1. src/backend/replication/logical/worker.c - DisableSubscriptionOnError, > > Refactor error handling > > > > (this comment assumes the above gets changed too) > I think those are independent. OK. I was only curious if the change #5 above might cause the error to be logged 2x, if the DisableSubscriptionOnError returns false. - firstly, when it logs errors within the function - secondly, by normal error mechanism when the caller re-throws it. But, if you are sure that won't happen then it is good news. > > > > +static void > > +DisableSubscriptionOnError(void) > > +{ > > + Relation rel; > > + bool nulls[Natts_pg_subscription]; > > + bool replaces[Natts_pg_subscription]; > > + Datum values[Natts_pg_subscription]; > > + HeapTuple tup; > > + Form_pg_subscription subform; > > + > > + /* Emit the error */ > > + EmitErrorReport(); > > + /* Abort any active transaction */ > > + AbortOutOfAnyTransaction(); > > + /* Reset the ErrorContext */ > > + FlushErrorState(); > > + > > + /* Disable the subscription in a fresh transaction */ > > + StartTransactionCommand(); > > > > If this DisableSubscriptionOnError function decides later that actually the > > 'disableonerr' flag is false (i.e. it's NOT going to disable the subscription after > > all) then IMO it make more sense that the error logging for that case should just > > do whatever it is doing now by the normal error processing mechanism. > > > > In other words, I thought perhaps the code to EmitErrorReport/FlushError state > > etc be moved to be BELOW the if > > (!subform->subdisableonerr) bail-out code? > > > > Please see what you think in my attached POC [1]. It seems neater to me, and > > tests are all OK. Maybe I am mistaken... > I had a concern that this order change of codes would have a negative > impact when we have another new error during the call of DisableSubscriptionOnError. > > With the debugger, I raised an error in this function before emitting the original error. > As a result, the original error that makes the apply worker go into the path of > DisableSubscriptionOnError (in my test, duplication error) has vanished. > In this sense, v19 looks safer, and the current order to handle error recovery first > looks better to me. > > FYI, after the 2nd debugger error, > the next new apply worker created quickly met the same type of error, > went into the same path, and disabled the subscription with the log. > But, it won't be advisable to let the possibility left. OK - thanks for checking it. Will it be better to put some comments about that? Something like -- BEFORE /* Emit the error */ EmitErrorReport(); /* Abort any active transaction */ AbortOutOfAnyTransaction(); /* Reset the ErrorContext */ FlushErrorState(); /* Disable the subscription in a fresh transaction */ StartTransactionCommand(); AFTER /* Disable the subscription in a fresh transaction */ AbortOutOfAnyTransaction(); StartTransactionCommand(); /* * Log the error that caused DisableSubscriptionOnError to be called. * We do this immediately so that it won't be lost if some other internal * error occurs in the following code, */ EmitErrorReport(); FlushErrorState(); ... > > > ------ > > [1] peter-v19-poc.diff - POC just to try some of my suggestions above to make > > sure all tests still pass ok. > Thanks ! I included you as co-author, because > you shared meaningful patches for me. > Thanks! ------ Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: