Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Add Information during standby recovery conflicts |
Date | |
Msg-id | 20201216.112235.1367853844658985673.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Add Information during standby recovery conflicts (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add Information during standby recovery conflicts
|
List | pgsql-hackers |
At Tue, 15 Dec 2020 15:40:03 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2020/12/15 12:04, Kyotaro Horiguchi wrote: > > [40509:startup] DETAIL: Conflicting processes: 41171, 41194. ... > > I'm not sure, but it seems to me at least the period is unnecessary > > here. > > Since Error Message Style Guide in the docs says "Detail and hint > messages: > Use complete sentences, and end each with a period.", I think that a > period > is necessary here. No? In the first place it is not a complete sentence. Might be better be something like this if we strictly follow the style guide? > Conflicting processes are 41171, 41194. > Conflicting processes are: 41171, 41194. > > > + bool maybe_log_conflict = > > + (standbyWaitStart != 0 && !logged_recovery_conflict); > > odd indentation. > > This is the result of pgindent run. I'm not sure why pgindent indents > that way, but for now I'd like to follow pgindent. Agreed. > > + /* Also, set the timer if necessary */ > > + if (logging_timer) > > + { > > + timeouts[cnt].id = STANDBY_LOCK_TIMEOUT; > > + timeouts[cnt].type = TMPARAM_AFTER; > > + timeouts[cnt].delay_ms = DeadlockTimeout; > > + cnt++; > > + } > > This doesn't consider spurious wakeup. I'm not sure it actually > > happenes but we usually consider that. That is since before this > > patch, but ProcWaitForSignal()'s comment says that: > > > >> * As this uses the generic process latch the caller has to be robust > >> * against > >> * unrelated wakeups: Always check that the desired state has occurred, > >> * and > >> * wait again if not. > > If we don't care of spurious wakeups, we should add such a comment. > > If ProcWaitForSignal() wakes up because of the reason (e.g., SIGHUP) > other than deadlock_timeout, ProcSleep() will call > ResolveRecoveryConflictWithLock() and we sleep on ProcWaitForSignal() > again since the recovery conflict has not been resolved yet. So we can > say that we consider "spurious wakeups"? So, the following seems to be spurious wakeups.. > However when I read the related code again, I found another issue in > the patch. In ResolveRecoveryConflictWithLock(), if SIGHUP causes us > to > wake up out of ProcWaitForSignal() before deadlock_timeout is reached, > we will use deadlock_timeout again when sleeping on > ProcWaitForSignal(). > Instead, probably we should use the "deadlock_timeout - elapsed time" > so that we can emit a log message as soon as deadlock_timeout passes > since starting waiting on recovery conflict. Otherwise it may take at > most > "deadlock_timeout * 2" to log "still waiting ..." message. > > To fix this issue, "deadlock_timeout - elapsed time" needs to be used > as > the timeout when enabling the timer at least in > ResolveRecoveryConflictWithLock() and > ResolveRecoveryConflictWithBufferPin(). > Also similar change needs to be applied to > ResolveRecoveryConflictWithVirtualXIDs(). > > BTW, without applying the patch, *originally* > ResolveRecoveryConflictWithBufferPin() seems to have this issue. > It enables deadlock_timeout timer so as to request for hot-standbfy > backends to check themselves for deadlocks. But if we wake up out of > ProcWaitForSignal() before deadlock_timeout is reached, the subsequent > call to ResolveRecoveryConflictWithBufferPin() also uses > deadlock_timeout > again instead of "deadlock_timeout - elapsed time". So the request for > deadlock check can be delayed. Furthermore, > if ResolveRecoveryConflictWithBufferPin() always wakes up out of > ProcWaitForSignal() before deadlock_timeout is reached, the request > for deadlock check may also never happen infinitely. > > Maybe we should fix the original issue at first separately from the > patch. Yeah, it's not an issue of this patch. > > + bool maybe_log_conflict; > > + bool maybe_update_title; > > Although it should be a matter of taste and I understand that the > > "maybe" means that "that logging and changing of ps display may not > > happen in this iteration" , that variables seem expressing > > respectively "we should write log if the timeout for recovery conflict > > expires" and "we should update title if 500ms elapsed". So they seem > > *to me* better be just "log_conflict" and "update_title". > > I feel the same with "maybe_log_conflict" in ProcSleep(). > > I have no strong opinion about those names. So if other people also > think so, I'm ok to rename them. Shorter is better as far as it makes sense and not-too abbreviated. > > + for recovery conflicts. This is useful in determining if recovery > > + conflicts prevent the recovery from applying WAL. > > (I'm not confident on this) Isn't the sentence better be in past or > > present continuous tense? > > Could you tell me why you think that's better? To make sure, I mentioned about the "prevent". The reason for that is it represents the status at the present or past. I don't insist on that if you don't think it's not better. > for recovery conflicts. This is useful in determining if recovery > conflicts are preventing the recovery from applying WAL. > >> BTW, attached is the POC patch that implements the idea discussed > >> upthread; > >> if log_recovery_conflict_waits is enabled, the startup process reports > >> the log also after the recovery conflict was resolved and the startup > >> process > >> finished waiting for it. This patch needs to be applied after > >> v11-0002-Log-the-standby-recovery-conflict-waits.patch is applied. > > Ah. I was just writing a comment about that. I haven't looked it > > closer but it looks good to me. By the way doesn't it contains a > > simple fix of a comment for the base patch? > > Yes, so the typo included in the base patch should be fixed when > pushing it. Understood. Thanks. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: