Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add Information during standby recovery conflicts
Date
Msg-id CAD21AoC2K8=Du2hLhWGXwq31CH6K7pWxDYAiz4Tmzs5yyfMohA@mail.gmail.com
Whole thread Raw
In response to Re: Add Information during standby recovery conflicts  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Responses Re: Add Information during standby recovery conflicts  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
List pgsql-hackers
On Fri, Oct 30, 2020 at 6:02 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> Hi,
>
> On 10/30/20 4:25 AM, Fujii Masao wrote:
> > CAUTION: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the sender
> > and know the content is safe.
> >
> >
> >
> > On 2020/10/30 10:29, Masahiko Sawada wrote:
> >> ,
> >>
> >> On Thu, 29 Oct 2020 at 00:16, Fujii Masao
> >> <masao.fujii@oss.nttdata.com> wrote:
> >>>
> >>> I read v8 patch. Here are review comments.
> >>
> >> Thank you for your review.
> >>

Thank you for updating the patch!

Looking at the latest version patch
(v8-0002-Log-the-standby-recovery-conflict-waits.patch), I think it
doesn't address some comments from Fujii-san.

> >>>
> >>> When recovery conflict with buffer pin happens, log message is output
> >>> every deadlock_timeout. Is this intentional behavior? If yes, IMO
> >>> that's
> >>> not good because lots of messages can be output.
> >>
> >> Agreed.

I think the latest patch doesn't fix the above comment. Log message
for recovery conflict on buffer pin is logged every deadlock_timeout.

> >>
> >> if we were to log the recovery conflict only once in bufferpin
> >> conflict case, we would log it multiple times only in lock conflict
> >> case. So I guess it's better to do that in all conflict cases.
> >
> > Yes, I agree that this behavior basically should be consistent between
> > all cases.

The latest patch seems not to address this comment as well.

> >
> >>
> >>>
> >>> +       if (log_recovery_conflict_waits)
> >>> +               waitStart = GetCurrentTimestamp();
> >>>
> >>> What happens if log_recovery_conflict_waits is off at the beginning and
> >>> then it's changed during waiting for the conflict? In this case,
> >>> waitStart is
> >>> zero, but log_recovery_conflict_waits is on. This may cause some
> >>> problems?
> >>
> >> Hmm, I didn't see a path that happens to reload the config file during
> >> waiting for buffer cleanup lock. Even if the startup process receives
> >> SIGHUP during that, it calls HandleStartupProcInterrupts() at the next
> >> convenient time. It could be the beginning of main apply loop or
> >> inside WaitForWALToBecomeAvailable() and so on but I didn’t see it in
> >> the wait loop for taking a buffer cleanup.
> >
> > Yes, you're right. I seem to have read the code wrongly.
> >
> >> However, I think it’s
> >> better to use (waitStart > 0) for safety when checking if we log the
> >> recovery conflict instead of log_recovery_conflict_waits.
> >>
> >>>
> >>> +                       if (report_waiting)
> >>> +                               ts = GetCurrentTimestamp();
> >>>
> >>> GetCurrentTimestamp() doesn't need to be called every cycle
> >>> in the loop after "logged" is true and "new_status" is not NULL.
> >>
> >> Agreed
> >>
> >>>
> >>> +extern const char *get_procsignal_reason_desc(ProcSignalReason
> >>> reason);
> >>>
> >>> Is this garbage?
> >>
> >> Yes.
> >>
> >>>
> >>> When log_lock_waits is enabled, both "still waiting for ..." and
> >>> "acquired ..."
> >>> messages are output. Like this, when log_recovery_conflict_waits is
> >>> enabled,
> >>> not only "still waiting ..." but also "resolved ..." message should
> >>> be output?
> >>> The latter message might not need to be output if the conflict is
> >>> canceled
> >>> due to max_standby_xxx_delay parameter. The latter message would be
> >>> useful to know how long the recovery was waiting for the conflict.
> >>> Thought?
> >>> It's ok to implement this as a separate patch later, though.
> >>
> >> There was a discussion that the latter message without waiting time is
> >> not necessarily needed because the canceled process will log
> >> "canceling statement due to conflict with XXX" as you mentioned. I
> >> agreed with that. But I agree that the latter message with waiting
> >> time would help users, for instance when the startup process is
> >> waiting for multiple processes and it takes a time to cancel all of
> >> them.
> >
> > I agree that it's useful to output the wait time.
> >
> > But as I told in previous email, it's ok to focus on the current patch
> > for now and then implement this as a separate patch later.

Agreed.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: "kuroda.hayato@fujitsu.com"
Date:
Subject: RE: pgbench: option delaying queries till connections establishment?
Next
From: "meission"
Date:
Subject: subscribing