Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Add Information during standby recovery conflicts |
Date | |
Msg-id | CA+fd4k6xVsPLvV2YO9PKceCBqdxvxQJL7sPS1nmVA-iAk2n58A@mail.gmail.com Whole thread Raw |
In response to | Re: Add Information during standby recovery conflicts (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add Information during standby recovery conflicts
|
List | pgsql-hackers |
, On Thu, 29 Oct 2020 at 00:16, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/10/27 9:41, Masahiko Sawada wrote: > > On Tue, 20 Oct 2020 at 22:02, Drouvot, Bertrand <bdrouvot@amazon.com> wrote: > >> > >> Hi, > >> > >> On 10/15/20 9:15 AM, Masahiko Sawada wrote: > >>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless youcan confirm the sender and know the content is safe. > >>> > >>> > >>> > >>> On Thu, 15 Oct 2020 at 14:52, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > >>>> At Thu, 15 Oct 2020 14:28:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > >>>>>> ereport(..(errmsg("%s", _("hogehoge")))) results in > >>>>>> fprintf((translated("%s")), translate("hogehoge")). > >>>>>> > >>>>>> So your change (errmsg("%s", gettext_noop("hogehoge")) results in > >>>>>> > >>>>>> fprintf((translated("%s")), DONT_translate("hogehoge")). > >>>>>> > >>>>>> which leads to a translation problem. > >>>>>> > >>>>>> (errmsg(gettext_noop("hogehoge")) > >>>>> This seems equivalent to (errmsg("hogehoge")), right? > >>>> Yes and no. However eventually the two works the same way, > >>>> "(errmsg(gettext_noop("hogehoge"))" is a shorthand of > >>>> > >>>> 1: char *msg = gettext_noop("hogehoge"); > >>>> ... > >>>> 2: .. (errmsg(msg)); > >>>> > >>>> That is, the line 1 only registers a message id "hogehoge" and doesn't > >>>> translate. The line 2 tries to translate the content of msg and it > >>>> finds the translation for the message id "hogehoge". > >>> Understood. > >>> > >>>>> I think I could understand translation stuff. Given we only report the > >>>>> const string returned from get_recovery_conflict_desc() without > >>>>> placeholders, the patch needs to use errmsg_internal() instead while > >>>>> not changing _() part. (errmsg(get_recovery_conflict_desc())) is not > >>>>> good (warned by -Wformat-security). > >>>> Ah, right. we get a complain if no value parameters added. We can > >>>> silence it by adding a dummy parameter to errmsg, but I'm not sure > >>>> which is preferable. > >>> Okay, I'm going to use errmsg_internal() for now until a better idea comes. > >>> > >>> I've attached the updated patch that fixed the translation part. > >> > >> Thanks for reviewing and helping on this patch! > >> > >> The patch tester bot is currently failing due to: > >> > >> "proc.c:1290:5: error: ‘standbyWaitStart’ may be used uninitialized in > >> this function [-Werror=maybe-uninitialized]" > >> > >> I've attached a new version with the minor change to fix it. > >> > > > > Thank you for updating the patch! > > > > I've looked at the patch and revised a bit the formatting etc. > > > > After some thoughts, I think it might be better to report the waiting > > time as well. it would help users and there is no particular reason > > for logging the report only once. It also helps make the patch clean > > by reducing the variables such as recovery_conflict_logged. I’ve > > implemented it in the v8 patch. > > I read v8 patch. Here are review comments. Thank you for your review. > > When recovery conflict with buffer pin happens, log message is output > every deadlock_timeout. Is this intentional behavior? If yes, IMO that's > not good because lots of messages can be output. Agreed. if we were to log the recovery conflict only once in bufferpin conflict case, we would log it multiple times only in lock conflict case. So I guess it's better to do that in all conflict cases. > > + if (log_recovery_conflict_waits) > + waitStart = GetCurrentTimestamp(); > > What happens if log_recovery_conflict_waits is off at the beginning and > then it's changed during waiting for the conflict? In this case, waitStart is > zero, but log_recovery_conflict_waits is on. This may cause some problems? Hmm, I didn't see a path that happens to reload the config file during waiting for buffer cleanup lock. Even if the startup process receives SIGHUP during that, it calls HandleStartupProcInterrupts() at the next convenient time. It could be the beginning of main apply loop or inside WaitForWALToBecomeAvailable() and so on but I didn’t see it in the wait loop for taking a buffer cleanup. However, I think it’s better to use (waitStart > 0) for safety when checking if we log the recovery conflict instead of log_recovery_conflict_waits. > > + if (report_waiting) > + ts = GetCurrentTimestamp(); > > GetCurrentTimestamp() doesn't need to be called every cycle > in the loop after "logged" is true and "new_status" is not NULL. Agreed > > +extern const char *get_procsignal_reason_desc(ProcSignalReason reason); > > Is this garbage? Yes. > > When log_lock_waits is enabled, both "still waiting for ..." and "acquired ..." > messages are output. Like this, when log_recovery_conflict_waits is enabled, > not only "still waiting ..." but also "resolved ..." message should be output? > The latter message might not need to be output if the conflict is canceled > due to max_standby_xxx_delay parameter. The latter message would be > useful to know how long the recovery was waiting for the conflict. Thought? > It's ok to implement this as a separate patch later, though. There was a discussion that the latter message without waiting time is not necessarily needed because the canceled process will log "canceling statement due to conflict with XXX" as you mentioned. I agreed with that. But I agree that the latter message with waiting time would help users, for instance when the startup process is waiting for multiple processes and it takes a time to cancel all of them. > > + Controls whether a log message is produced when the startup process > + is waiting longer than longer than <varname>deadlock_timeout</varname>. > > Typo: "longer than longer than" should be "longer than". > > Probably the section about hot standby in high-availability.sgml would need > to be updated. Agreed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: