Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Add Information during standby recovery conflicts
Date
Msg-id 20201216.112235.1367853844658985673.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Add Information during standby recovery conflicts  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Add Information during standby recovery conflicts
List pgsql-hackers
At Tue, 15 Dec 2020 15:40:03 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2020/12/15 12:04, Kyotaro Horiguchi wrote:
> > [40509:startup] DETAIL:  Conflicting processes: 41171, 41194.
...
> > I'm not sure, but it seems to me at least the period is unnecessary
> > here.
> 
> Since Error Message Style Guide in the docs says "Detail and hint
> messages:
> Use complete sentences, and end each with a period.", I think that a
> period
> is necessary here. No?

In the first place it is not a complete sentence. Might be better be
something like this if we strictly follow the style guide?

> Conflicting processes are 41171, 41194.
> Conflicting processes are: 41171, 41194.

> 
> > +            bool        maybe_log_conflict =
> > +            (standbyWaitStart != 0 && !logged_recovery_conflict);
> > odd indentation.
> 
> This is the result of pgindent run. I'm not sure why pgindent indents
> that way, but for now I'd like to follow pgindent.

Agreed.

> > +        /* Also, set the timer if necessary */
> > +        if (logging_timer)
> > +        {
> > +            timeouts[cnt].id = STANDBY_LOCK_TIMEOUT;
> > +            timeouts[cnt].type = TMPARAM_AFTER;
> > +            timeouts[cnt].delay_ms = DeadlockTimeout;
> > +            cnt++;
> > +        }
> > This doesn't consider spurious wakeup. I'm not sure it actually
> > happenes but we usually consider that.  That is since before this
> > patch, but ProcWaitForSignal()'s comment says that:
> > 
> >>   * As this uses the generic process latch the caller has to be robust
> >>   * against
> >>   * unrelated wakeups: Always check that the desired state has occurred,
> >>   * and
> >>   * wait again if not.
> > If we don't care of spurious wakeups, we should add such a comment.
> 
> If ProcWaitForSignal() wakes up because of the reason (e.g., SIGHUP)
> other than deadlock_timeout, ProcSleep() will call
> ResolveRecoveryConflictWithLock() and we sleep on ProcWaitForSignal()
> again since the recovery conflict has not been resolved yet. So we can
> say that we consider "spurious wakeups"?

So, the following seems to be spurious wakeups..

> However when I read the related code again, I found another issue in
> the patch. In ResolveRecoveryConflictWithLock(), if SIGHUP causes us
> to
> wake up out of ProcWaitForSignal() before deadlock_timeout is reached,
> we will use deadlock_timeout again when sleeping on
> ProcWaitForSignal().
> Instead, probably we should use the "deadlock_timeout - elapsed time"
> so that we can emit a log message as soon as deadlock_timeout passes
> since starting waiting on recovery conflict. Otherwise it may take at
> most
> "deadlock_timeout * 2" to log "still waiting ..." message.
> 
> To fix this issue, "deadlock_timeout - elapsed time" needs to be used
> as
> the timeout when enabling the timer at least in
> ResolveRecoveryConflictWithLock() and
> ResolveRecoveryConflictWithBufferPin().
> Also similar change needs to be applied to
> ResolveRecoveryConflictWithVirtualXIDs().
> 
> BTW, without applying the patch, *originally*
> ResolveRecoveryConflictWithBufferPin() seems to have this issue.
> It enables deadlock_timeout timer so as to request for hot-standbfy
> backends to check themselves for deadlocks. But if we wake up out of
> ProcWaitForSignal() before deadlock_timeout is reached, the subsequent
> call to ResolveRecoveryConflictWithBufferPin() also uses
> deadlock_timeout
> again instead of "deadlock_timeout - elapsed time". So the request for
> deadlock check can be delayed. Furthermore,
> if ResolveRecoveryConflictWithBufferPin() always wakes up out of
> ProcWaitForSignal() before deadlock_timeout is reached, the request
> for deadlock check may also never happen infinitely.
> 
> Maybe we should fix the original issue at first separately from the
> patch.

Yeah, it's not an issue of this patch.

> > +                bool        maybe_log_conflict;
> > +                bool        maybe_update_title;
> > Although it should be a matter of taste and I understand that the
> > "maybe" means that "that logging and changing of ps display may not
> > happen in this iteration" , that variables seem expressing
> > respectively "we should write log if the timeout for recovery conflict
> > expires" and "we should update title if 500ms elapsed". So they seem
> > *to me* better be just "log_conflict" and "update_title".
> > I feel the same with  "maybe_log_conflict" in ProcSleep().
> 
> I have no strong opinion about those names. So if other people also
> think so, I'm ok to rename them.

Shorter is better as far as it makes sense and not-too abbreviated.

> > + for recovery conflicts.  This is useful in determining if recovery
> > +        conflicts prevent the recovery from applying WAL.
> > (I'm not confident on this) Isn't the sentence better be in past or
> > present continuous tense?
> 
> Could you tell me why you think that's better?

To make sure, I mentioned about the "prevent". The reason for that is
it represents the status at the present or past.  I don't insist on
that if you don't think it's not better.

> for recovery conflicts.  This is useful in determining if recovery
>        conflicts are preventing the recovery from applying WAL.

> >> BTW, attached is the POC patch that implements the idea discussed
> >> upthread;
> >> if log_recovery_conflict_waits is enabled, the startup process reports
> >> the log also after the recovery conflict was resolved and the startup
> >> process
> >> finished waiting for it. This patch needs to be applied after
> >> v11-0002-Log-the-standby-recovery-conflict-waits.patch is applied.
> > Ah. I was just writing a comment about that. I haven't looked it
> > closer but it looks good to me.  By the way doesn't it contains a
> > simple fix of a comment for the base patch?
> 
> Yes, so the typo included in the base patch should be fixed when
> pushing it.

Understood. Thanks.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: archive status ".ready" files may be created too early
Next
From: Amit Kapila
Date:
Subject: Re: Misleading comment in prologue of ReorderBufferQueueMessage