Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add Information during standby recovery conflicts
Date
Msg-id CAD21AoANgC3xiAbgw4emfORBoFR8U1uptkYshi8k8SbyzTK0NQ@mail.gmail.com
Whole thread Raw
In response to Re: Add Information during standby recovery conflicts  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Responses Re: Add Information during standby recovery conflicts  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
List pgsql-hackers
On Thu, Nov 26, 2020 at 12:49 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> Hi,
>
> On 11/25/20 2:20 PM, Masahiko Sawada wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe. 
> >
> >
> >
> > On Fri, Nov 20, 2020 at 6:18 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
> >> Hi,
> >>
> >> On 11/17/20 4:44 PM, Fujii Masao wrote:
> >>> Thanks for updating the patch! Here are review comments.
> >>>
> >>> +        Controls whether a log message is produced when the startup
> >>> process
> >>> +        is waiting longer than <varname>deadlock_timeout</varname>
> >>> +        for recovery conflicts.
> >>>
> >>> But a log message can be produced also when the backend is waiting
> >>> for recovery conflict. Right? If yes, this description needs to be
> >>> corrected.
> >> Thanks for looking at it!
> >>
> >> I don't think so, only the startup process should write those new log
> >> messages.
> >>
> >> What makes you think that would not be the case?
> >>
> >>>
> >>> +        for recovery conflicts.  This is useful in determining if
> >>> recovery
> >>> +        conflicts prevents the recovery from applying WAL.
> >>>
> >>> "prevents" should be "prevent"?
> >> Indeed: fixed in the new attached patch.
> >>
> >>>
> >>> +       TimestampDifference(waitStart, GetCurrentTimestamp(), &secs,
> >>> &usecs);
> >>> +       msecs = secs * 1000 + usecs / 1000;
> >>>
> >>> GetCurrentTimestamp() is basically called before LogRecoveryConflict()
> >>> is called. So isn't it better to avoid calling GetCurrentTimestamp()
> >>> newly in
> >>> LogRecoveryConflict() and to reuse the timestamp that we got?
> >>> It's helpful to avoid the waste of cycles.
> >>>
> >> good catch! fixed in the new attached patch.
> >>
> >>> +               while (VirtualTransactionIdIsValid(*vxids))
> >>> +               {
> >>> +                       PGPROC *proc =
> >>> BackendIdGetProc(vxids->backendId);
> >>>
> >>> BackendIdGetProc() can return NULL if the backend is not active
> >>> at that moment. This case should be handled.
> >>>
> >> handled in the new attached patch.
> >>> +               case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
> >>> +                       reasonDesc = gettext_noop("recovery is still
> >>> waiting recovery conflict on buffer pin");
> >>>
> >>> It's natural to use "waiting for recovery" rather than "waiting
> >>> recovery"?
> >>>
> >> I would be tempted to say so, the new patch makes use of "waiting for".
> >>> +               /* Also, set deadlock timeout for logging purpose if
> >>> necessary */
> >>> +               if (log_recovery_conflict_waits)
> >>> +               {
> >>> +                       timeouts[cnt].id = STANDBY_TIMEOUT;
> >>> +                       timeouts[cnt].type = TMPARAM_AFTER;
> >>> +                       timeouts[cnt].delay_ms = DeadlockTimeout;
> >>> +                       cnt++;
> >>> +               }
> >>>
> >>> This needs to be executed only when the message has not been logged yet.
> >>> Right?
> >>>
> >> good catch: fixed in the new attached patch.
> >>
> > Thank you for updating the patch! Here are review comments on the
> > latest version patch.
> >
> > +       while (VirtualTransactionIdIsValid(*vxids))
> > +       {
> > +           PGPROC *proc = BackendIdGetProc(vxids->backendId);
> > +
> > +           if (proc)
> > +           {
> > +               if (nprocs == 0)
> > +                   appendStringInfo(&buf, "%d", proc->pid);
> > +               else
> > +                   appendStringInfo(&buf, ", %d", proc->pid);
> > +
> > +               nprocs++;
> > +               vxids++;
> > +           }
> > +       }
> >
> > We need to increment vxids even if *proc is null. Otherwise, the loop won't end.
>
> My bad, that's fixed.
>
> >
> > ---
> > +               TimestampTz cur_ts = GetCurrentTimestamp();;
> Fixed
> >
> > There is an extra semi-colon.
> >
> > ---
> >   int            max_standby_streaming_delay = 30 * 1000;
> > +bool       log_recovery_conflict_waits = false;
> > +bool       logged_lock_conflict = false;
> >
> >
> > +       if (log_recovery_conflict_waits && !logged_lock_conflict)
> > +       {
> > +           timeouts[cnt].id = STANDBY_TIMEOUT;
> > +           timeouts[cnt].type = TMPARAM_AFTER;
> > +           timeouts[cnt].delay_ms = DeadlockTimeout;
> > +           cnt++;
> > +       }
> >
> > Can we pass a bool indicating if a timeout may be needed for recovery
> > conflict logging from ProcSleep() to ResolveRecoveryConflictWithLock()
> > instead of using a static variable?
>
> Yeah that makes more sense, specially as we already have
> logged_recovery_conflict at our disposal.
>
> New patch version attached.
>

Thank you for updating the patch! The patch works fine and looks good
to me except for the following small comments:

+/*
+ * Log the recovery conflict.
+ *
+ * waitStart is the timestamp when the caller started to wait. This
function also
+ * reports the details about the conflicting process ids if *waitlist
is not NULL.
+ */
+void
+LogRecoveryConflict(ProcSignalReason reason, TimestampTz waitStart,
+                                       TimestampTz cur_ts,
VirtualTransactionId *waitlist)

I think it's better to explain cur_ts as well in the function comment.

Regarding the function arguments, 'waitStart' is camel case whereas
'cur_ts' is snake case and 'waitlist' is using only lower cases. I
think we should unify them.

---
-ResolveRecoveryConflictWithLock(LOCKTAG locktag)
+ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logged_recovery_conflict)

The function argument name 'logged_recovery_conflict' sounds a bit
redundant to me as this function is used only for recovery conflict
resolution. How about 'need_log' or something? Also it’s better to
explain it in the function comment.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: POC: postgres_fdw insert batching
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: POC: postgres_fdw insert batching