Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Add Information during standby recovery conflicts |
Date | |
Msg-id | c141f3fc-2cff-a8f5-08b5-0aafa173c949@oss.nttdata.com Whole thread Raw |
In response to | Re: Add Information during standby recovery conflicts (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: Add Information during standby recovery conflicts
|
List | pgsql-hackers |
On 2020/10/30 10:29, Masahiko Sawada wrote: > , > > On Thu, 29 Oct 2020 at 00:16, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2020/10/27 9:41, Masahiko Sawada wrote: >>> On Tue, 20 Oct 2020 at 22:02, Drouvot, Bertrand <bdrouvot@amazon.com> wrote: >>>> >>>> Hi, >>>> >>>> On 10/15/20 9:15 AM, Masahiko Sawada wrote: >>>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless youcan confirm the sender and know the content is safe. >>>>> >>>>> >>>>> >>>>> On Thu, 15 Oct 2020 at 14:52, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: >>>>>> At Thu, 15 Oct 2020 14:28:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in >>>>>>>> ereport(..(errmsg("%s", _("hogehoge")))) results in >>>>>>>> fprintf((translated("%s")), translate("hogehoge")). >>>>>>>> >>>>>>>> So your change (errmsg("%s", gettext_noop("hogehoge")) results in >>>>>>>> >>>>>>>> fprintf((translated("%s")), DONT_translate("hogehoge")). >>>>>>>> >>>>>>>> which leads to a translation problem. >>>>>>>> >>>>>>>> (errmsg(gettext_noop("hogehoge")) >>>>>>> This seems equivalent to (errmsg("hogehoge")), right? >>>>>> Yes and no. However eventually the two works the same way, >>>>>> "(errmsg(gettext_noop("hogehoge"))" is a shorthand of >>>>>> >>>>>> 1: char *msg = gettext_noop("hogehoge"); >>>>>> ... >>>>>> 2: .. (errmsg(msg)); >>>>>> >>>>>> That is, the line 1 only registers a message id "hogehoge" and doesn't >>>>>> translate. The line 2 tries to translate the content of msg and it >>>>>> finds the translation for the message id "hogehoge". >>>>> Understood. >>>>> >>>>>>> I think I could understand translation stuff. Given we only report the >>>>>>> const string returned from get_recovery_conflict_desc() without >>>>>>> placeholders, the patch needs to use errmsg_internal() instead while >>>>>>> not changing _() part. (errmsg(get_recovery_conflict_desc())) is not >>>>>>> good (warned by -Wformat-security). >>>>>> Ah, right. we get a complain if no value parameters added. We can >>>>>> silence it by adding a dummy parameter to errmsg, but I'm not sure >>>>>> which is preferable. >>>>> Okay, I'm going to use errmsg_internal() for now until a better idea comes. >>>>> >>>>> I've attached the updated patch that fixed the translation part. >>>> >>>> Thanks for reviewing and helping on this patch! >>>> >>>> The patch tester bot is currently failing due to: >>>> >>>> "proc.c:1290:5: error: ‘standbyWaitStart’ may be used uninitialized in >>>> this function [-Werror=maybe-uninitialized]" >>>> >>>> I've attached a new version with the minor change to fix it. >>>> >>> >>> Thank you for updating the patch! >>> >>> I've looked at the patch and revised a bit the formatting etc. >>> >>> After some thoughts, I think it might be better to report the waiting >>> time as well. it would help users and there is no particular reason >>> for logging the report only once. It also helps make the patch clean >>> by reducing the variables such as recovery_conflict_logged. I’ve >>> implemented it in the v8 patch. >> >> I read v8 patch. Here are review comments. > > Thank you for your review. > >> >> When recovery conflict with buffer pin happens, log message is output >> every deadlock_timeout. Is this intentional behavior? If yes, IMO that's >> not good because lots of messages can be output. > > Agreed. > > if we were to log the recovery conflict only once in bufferpin > conflict case, we would log it multiple times only in lock conflict > case. So I guess it's better to do that in all conflict cases. Yes, I agree that this behavior basically should be consistent between all cases. > >> >> + if (log_recovery_conflict_waits) >> + waitStart = GetCurrentTimestamp(); >> >> What happens if log_recovery_conflict_waits is off at the beginning and >> then it's changed during waiting for the conflict? In this case, waitStart is >> zero, but log_recovery_conflict_waits is on. This may cause some problems? > > Hmm, I didn't see a path that happens to reload the config file during > waiting for buffer cleanup lock. Even if the startup process receives > SIGHUP during that, it calls HandleStartupProcInterrupts() at the next > convenient time. It could be the beginning of main apply loop or > inside WaitForWALToBecomeAvailable() and so on but I didn’t see it in > the wait loop for taking a buffer cleanup. Yes, you're right. I seem to have read the code wrongly. > However, I think it’s > better to use (waitStart > 0) for safety when checking if we log the > recovery conflict instead of log_recovery_conflict_waits. > >> >> + if (report_waiting) >> + ts = GetCurrentTimestamp(); >> >> GetCurrentTimestamp() doesn't need to be called every cycle >> in the loop after "logged" is true and "new_status" is not NULL. > > Agreed > >> >> +extern const char *get_procsignal_reason_desc(ProcSignalReason reason); >> >> Is this garbage? > > Yes. > >> >> When log_lock_waits is enabled, both "still waiting for ..." and "acquired ..." >> messages are output. Like this, when log_recovery_conflict_waits is enabled, >> not only "still waiting ..." but also "resolved ..." message should be output? >> The latter message might not need to be output if the conflict is canceled >> due to max_standby_xxx_delay parameter. The latter message would be >> useful to know how long the recovery was waiting for the conflict. Thought? >> It's ok to implement this as a separate patch later, though. > > There was a discussion that the latter message without waiting time is > not necessarily needed because the canceled process will log > "canceling statement due to conflict with XXX" as you mentioned. I > agreed with that. But I agree that the latter message with waiting > time would help users, for instance when the startup process is > waiting for multiple processes and it takes a time to cancel all of > them. I agree that it's useful to output the wait time. But as I told in previous email, it's ok to focus on the current patch for now and then implement this as a separate patch later. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
pgsql-hackers by date: