Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add Information during standby recovery conflicts
Date
Msg-id CA+fd4k6xVsPLvV2YO9PKceCBqdxvxQJL7sPS1nmVA-iAk2n58A@mail.gmail.com
Whole thread Raw
In response to Re: Add Information during standby recovery conflicts  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Add Information during standby recovery conflicts
List pgsql-hackers
,

On Thu, 29 Oct 2020 at 00:16, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2020/10/27 9:41, Masahiko Sawada wrote:
> > On Tue, 20 Oct 2020 at 22:02, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> >>
> >> Hi,
> >>
> >> On 10/15/20 9:15 AM, Masahiko Sawada wrote:
> >>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless
youcan confirm the sender and know the content is safe. 
> >>>
> >>>
> >>>
> >>> On Thu, 15 Oct 2020 at 14:52, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> >>>> At Thu, 15 Oct 2020 14:28:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in
> >>>>>> ereport(..(errmsg("%s", _("hogehoge")))) results in
> >>>>>> fprintf((translated("%s")), translate("hogehoge")).
> >>>>>>
> >>>>>> So your change (errmsg("%s", gettext_noop("hogehoge")) results in
> >>>>>>
> >>>>>> fprintf((translated("%s")), DONT_translate("hogehoge")).
> >>>>>>
> >>>>>> which leads to a translation problem.
> >>>>>>
> >>>>>> (errmsg(gettext_noop("hogehoge"))
> >>>>> This seems equivalent to (errmsg("hogehoge")), right?
> >>>> Yes and no.  However eventually the two works the same way,
> >>>> "(errmsg(gettext_noop("hogehoge"))" is a shorthand of
> >>>>
> >>>> 1: char *msg = gettext_noop("hogehoge");
> >>>> ...
> >>>> 2: .. (errmsg(msg));
> >>>>
> >>>> That is, the line 1 only registers a message id "hogehoge" and doesn't
> >>>> translate. The line 2 tries to translate the content of msg and it
> >>>> finds the translation for the message id "hogehoge".
> >>> Understood.
> >>>
> >>>>> I think I could understand translation stuff. Given we only report the
> >>>>> const string returned from get_recovery_conflict_desc() without
> >>>>> placeholders, the patch needs to use errmsg_internal() instead while
> >>>>> not changing _() part. (errmsg(get_recovery_conflict_desc())) is not
> >>>>> good (warned by -Wformat-security).
> >>>> Ah, right. we get a complain if no value parameters added. We can
> >>>> silence it by adding a dummy parameter to errmsg, but I'm not sure
> >>>> which is preferable.
> >>> Okay, I'm going to use errmsg_internal() for now until a better idea comes.
> >>>
> >>> I've attached the updated patch that fixed the translation part.
> >>
> >> Thanks for reviewing and helping on this patch!
> >>
> >> The patch tester bot is currently failing due to:
> >>
> >> "proc.c:1290:5: error: ‘standbyWaitStart’ may be used uninitialized in
> >> this function [-Werror=maybe-uninitialized]"
> >>
> >> I've attached a new version with the minor change to fix it.
> >>
> >
> > Thank you for updating the patch!
> >
> > I've looked at the patch and revised a bit the formatting etc.
> >
> > After some thoughts, I think it might be better to report the waiting
> > time as well. it would help users and there is no particular reason
> > for logging the report only once. It also helps make the patch clean
> > by reducing the variables such as recovery_conflict_logged. I’ve
> > implemented it in the v8 patch.
>
> I read v8 patch. Here are review comments.

Thank you for your review.

>
> When recovery conflict with buffer pin happens, log message is output
> every deadlock_timeout. Is this intentional behavior? If yes, IMO that's
> not good because lots of messages can be output.

Agreed.

if we were to log the recovery conflict only once in bufferpin
conflict case, we would log it multiple times only in lock conflict
case. So I guess it's better to do that in all conflict cases.

>
> +       if (log_recovery_conflict_waits)
> +               waitStart = GetCurrentTimestamp();
>
> What happens if log_recovery_conflict_waits is off at the beginning and
> then it's changed during waiting for the conflict? In this case, waitStart is
> zero, but log_recovery_conflict_waits is on. This may cause some problems?

Hmm, I didn't see a path that happens to reload the config file during
waiting for buffer cleanup lock. Even if the startup process receives
SIGHUP during that, it calls HandleStartupProcInterrupts() at the next
convenient time. It could be the beginning of main apply loop or
inside WaitForWALToBecomeAvailable() and so on but I didn’t see it in
the wait loop for taking a buffer cleanup. However, I think it’s
better to use (waitStart > 0) for safety when checking if we log the
recovery conflict instead of log_recovery_conflict_waits.

>
> +                       if (report_waiting)
> +                               ts = GetCurrentTimestamp();
>
> GetCurrentTimestamp() doesn't need to be called every cycle
> in the loop after "logged" is true and "new_status" is not NULL.

Agreed

>
> +extern const char *get_procsignal_reason_desc(ProcSignalReason reason);
>
> Is this garbage?

Yes.

>
> When log_lock_waits is enabled, both "still waiting for ..." and "acquired ..."
> messages are output. Like this, when log_recovery_conflict_waits is enabled,
> not only "still waiting ..." but also "resolved ..." message should be output?
> The latter message might not need to be output if the conflict is canceled
> due to max_standby_xxx_delay parameter. The latter message would be
> useful to know how long the recovery was waiting for the conflict. Thought?
> It's ok to implement this as a separate patch later, though.

There was a discussion that the latter message without waiting time is
not necessarily needed because the canceled process will log
"canceling statement due to conflict with XXX" as you mentioned. I
agreed with that. But I agree that the latter message with waiting
time would help users, for instance when the startup process is
waiting for multiple processes and it takes a time to cancel all of
them.

>
> +        Controls whether a log message is produced when the startup process
> +        is waiting longer than longer than <varname>deadlock_timeout</varname>.
>
> Typo: "longer than longer than" should be "longer than".
>
> Probably the section about hot standby in high-availability.sgml would need
> to be updated.

Agreed.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Ashutosh Sharma
Date:
Subject: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8
Next
From: Masahiko Sawada
Date:
Subject: Re: Boundary value check in lazy_tid_reaped()