Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers
From | Drouvot, Bertrand |
---|---|
Subject | Re: Add Information during standby recovery conflicts |
Date | |
Msg-id | 6416172a-8c7d-84da-2600-f5f41c61db18@amazon.com Whole thread Raw |
In response to | Re: Add Information during standby recovery conflicts (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add Information during standby recovery conflicts
|
List | pgsql-hackers |
Hi, On 10/30/20 4:25 AM, Fujii Masao wrote: > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender > and know the content is safe. > > > > On 2020/10/30 10:29, Masahiko Sawada wrote: >> , >> >> On Thu, 29 Oct 2020 at 00:16, Fujii Masao >> <masao.fujii@oss.nttdata.com> wrote: >>> >>> >>> >>> On 2020/10/27 9:41, Masahiko Sawada wrote: >>>> On Tue, 20 Oct 2020 at 22:02, Drouvot, Bertrand >>>> <bdrouvot@amazon.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> On 10/15/20 9:15 AM, Masahiko Sawada wrote: >>>>>> CAUTION: This email originated from outside of the organization. >>>>>> Do not click links or open attachments unless you can confirm the >>>>>> sender and know the content is safe. >>>>>> >>>>>> >>>>>> >>>>>> On Thu, 15 Oct 2020 at 14:52, Kyotaro Horiguchi >>>>>> <horikyota.ntt@gmail.com> wrote: >>>>>>> At Thu, 15 Oct 2020 14:28:57 +0900, Masahiko Sawada >>>>>>> <masahiko.sawada@2ndquadrant.com> wrote in >>>>>>>>> ereport(..(errmsg("%s", _("hogehoge")))) results in >>>>>>>>> fprintf((translated("%s")), translate("hogehoge")). >>>>>>>>> >>>>>>>>> So your change (errmsg("%s", gettext_noop("hogehoge")) results in >>>>>>>>> >>>>>>>>> fprintf((translated("%s")), DONT_translate("hogehoge")). >>>>>>>>> >>>>>>>>> which leads to a translation problem. >>>>>>>>> >>>>>>>>> (errmsg(gettext_noop("hogehoge")) >>>>>>>> This seems equivalent to (errmsg("hogehoge")), right? >>>>>>> Yes and no. However eventually the two works the same way, >>>>>>> "(errmsg(gettext_noop("hogehoge"))" is a shorthand of >>>>>>> >>>>>>> 1: char *msg = gettext_noop("hogehoge"); >>>>>>> ... >>>>>>> 2: .. (errmsg(msg)); >>>>>>> >>>>>>> That is, the line 1 only registers a message id "hogehoge" and >>>>>>> doesn't >>>>>>> translate. The line 2 tries to translate the content of msg and it >>>>>>> finds the translation for the message id "hogehoge". >>>>>> Understood. >>>>>> >>>>>>>> I think I could understand translation stuff. Given we only >>>>>>>> report the >>>>>>>> const string returned from get_recovery_conflict_desc() without >>>>>>>> placeholders, the patch needs to use errmsg_internal() instead >>>>>>>> while >>>>>>>> not changing _() part. (errmsg(get_recovery_conflict_desc())) >>>>>>>> is not >>>>>>>> good (warned by -Wformat-security). >>>>>>> Ah, right. we get a complain if no value parameters added. We can >>>>>>> silence it by adding a dummy parameter to errmsg, but I'm not sure >>>>>>> which is preferable. >>>>>> Okay, I'm going to use errmsg_internal() for now until a better >>>>>> idea comes. >>>>>> >>>>>> I've attached the updated patch that fixed the translation part. >>>>> >>>>> Thanks for reviewing and helping on this patch! >>>>> >>>>> The patch tester bot is currently failing due to: >>>>> >>>>> "proc.c:1290:5: error: ‘standbyWaitStart’ may be used >>>>> uninitialized in >>>>> this function [-Werror=maybe-uninitialized]" >>>>> >>>>> I've attached a new version with the minor change to fix it. >>>>> >>>> >>>> Thank you for updating the patch! >>>> >>>> I've looked at the patch and revised a bit the formatting etc. >>>> >>>> After some thoughts, I think it might be better to report the waiting >>>> time as well. it would help users and there is no particular reason >>>> for logging the report only once. It also helps make the patch clean >>>> by reducing the variables such as recovery_conflict_logged. I’ve >>>> implemented it in the v8 patch. >>> >>> I read v8 patch. Here are review comments. >> >> Thank you for your review. >> >>> >>> When recovery conflict with buffer pin happens, log message is output >>> every deadlock_timeout. Is this intentional behavior? If yes, IMO >>> that's >>> not good because lots of messages can be output. >> >> Agreed. >> >> if we were to log the recovery conflict only once in bufferpin >> conflict case, we would log it multiple times only in lock conflict >> case. So I guess it's better to do that in all conflict cases. > > Yes, I agree that this behavior basically should be consistent between > all cases. > >> >>> >>> + if (log_recovery_conflict_waits) >>> + waitStart = GetCurrentTimestamp(); >>> >>> What happens if log_recovery_conflict_waits is off at the beginning and >>> then it's changed during waiting for the conflict? In this case, >>> waitStart is >>> zero, but log_recovery_conflict_waits is on. This may cause some >>> problems? >> >> Hmm, I didn't see a path that happens to reload the config file during >> waiting for buffer cleanup lock. Even if the startup process receives >> SIGHUP during that, it calls HandleStartupProcInterrupts() at the next >> convenient time. It could be the beginning of main apply loop or >> inside WaitForWALToBecomeAvailable() and so on but I didn’t see it in >> the wait loop for taking a buffer cleanup. > > Yes, you're right. I seem to have read the code wrongly. > >> However, I think it’s >> better to use (waitStart > 0) for safety when checking if we log the >> recovery conflict instead of log_recovery_conflict_waits. >> >>> >>> + if (report_waiting) >>> + ts = GetCurrentTimestamp(); >>> >>> GetCurrentTimestamp() doesn't need to be called every cycle >>> in the loop after "logged" is true and "new_status" is not NULL. >> >> Agreed >> >>> >>> +extern const char *get_procsignal_reason_desc(ProcSignalReason >>> reason); >>> >>> Is this garbage? >> >> Yes. >> >>> >>> When log_lock_waits is enabled, both "still waiting for ..." and >>> "acquired ..." >>> messages are output. Like this, when log_recovery_conflict_waits is >>> enabled, >>> not only "still waiting ..." but also "resolved ..." message should >>> be output? >>> The latter message might not need to be output if the conflict is >>> canceled >>> due to max_standby_xxx_delay parameter. The latter message would be >>> useful to know how long the recovery was waiting for the conflict. >>> Thought? >>> It's ok to implement this as a separate patch later, though. >> >> There was a discussion that the latter message without waiting time is >> not necessarily needed because the canceled process will log >> "canceling statement due to conflict with XXX" as you mentioned. I >> agreed with that. But I agree that the latter message with waiting >> time would help users, for instance when the startup process is >> waiting for multiple processes and it takes a time to cancel all of >> them. > > I agree that it's useful to output the wait time. > > But as I told in previous email, it's ok to focus on the current patch > for now and then implement this as a separate patch later. > Thanks for your work and thoughts on this patch! I've attached a new version that take your remarks (hope i did not miss some of them) into account (but still leave the last one for a separate patch later). Bertrand
Attachment
pgsql-hackers by date: