Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers
From | Drouvot, Bertrand |
---|---|
Subject | Re: Add Information during standby recovery conflicts |
Date | |
Msg-id | 29da248a-e21c-a3eb-a051-f1ec79b13d31@amazon.com Whole thread Raw |
In response to | Re: Add Information during standby recovery conflicts ("Drouvot, Bertrand" <bdrouvot@amazon.com>) |
Responses |
Re: Add Information during standby recovery conflicts
Re: Add Information during standby recovery conflicts |
List | pgsql-hackers |
On 8/28/20 4:14 PM, Drouvot, Bertrand wrote: > > On 8/28/20 7:03 AM, Masahiko Sawada wrote: >> CAUTION: This email originated from outside of the organization. Do >> not click links or open attachments unless you can confirm the sender >> and know the content is safe. >> >> >> >> On Thu, 27 Aug 2020 at 20:58, Drouvot, Bertrand <bdrouvot@amazon.com> >> wrote: >>> >>> On 8/27/20 10:16 AM, Masahiko Sawada wrote >>>> On Mon, 10 Aug 2020 at 23:43, Drouvot, Bertrand >>>> <bdrouvot@amazon.com> wrote: >>>>> Hi, >>>>> >>>>> On 7/31/20 7:12 AM, Masahiko Sawada wrote: >>>>>> + tmpWaitlist = waitlist; >>>>>> + while (VirtualTransactionIdIsValid(*tmpWaitlist)) >>>>>> + { >>>>>> + tmpWaitlist++; >>>>>> + } >>>>>> + >>>>>> + num_waitlist_entries = (tmpWaitlist - waitlist); >>>>>> + >>>>>> + /* display wal record information */ >>>>>> + if (log_recovery_conflicts && >>>>>> (TimestampDifferenceExceeds(recovery_conflicts_log_time, >>>>>> GetCurrentTimestamp(), >>>>>> + DeadlockTimeout))) { >>>>>> + LogBlockedWalRecordInfo(num_waitlist_entries, reason); >>>>>> + recovery_conflicts_log_time = GetCurrentTimestamp(); >>>>>> + } >>>>>> >>>>>> recovery_conflicts_log_time is not initialized. And shouldn't we >>>>>> compare the current timestamp to the timestamp when the startup >>>>>> process started waiting? >>>>>> >>>>>> I think we should call LogBlockedWalRecordInfo() inside of the inner >>>>>> while loop rather than at the beginning of >>>>>> ResolveRecoveryConflictWithVirtualXIDs(). In lock conflict cases, >>>>>> the >>>>>> startup process waits until 'ltime', then enters >>>>>> ResolveRecoveryConflictWithVirtualXIDs() after reaching 'ltime'. >>>>>> Therefore, it makes sense to call LogBlockedWalRecordInfo() at the >>>>>> beginning of ResolveRecoveryConflictWithVirtualXIDs(). However, in >>>>>> snapshot and tablespace conflict cases (i.g. >>>>>> ResolveRecoveryConflictWithSnapshot() and >>>>>> ResolveRecoveryConflictWithTablespace()), it enters >>>>>> ResolveRecoveryConflictWithVirtualXIDs() without waits and waits for >>>>>> reaching ‘ltime’ inside of the inner while look. So the above >>>>>> condition could always be false. >>>>> That would make the information being displayed after >>>>> max_standby_streaming_delay is reached for the multiple cases you >>>>> just >>>>> described. >>>> Sorry, it should be deadlock_timeout, not max_standby_streaming_delay. >>>> Otherwise, the recovery conflict log message is printed when >>>> resolution, which seems not to achieve the original purpose. Am I >>>> missing something? >>> Ok, I understand where the confusion is coming from. >>> >>> Indeed the new version is now printing the recovery conflict log >>> message >>> during the conflict resolution (while the initial intention was to be >>> warned as soon as the replay had to wait). >>> >>> The advantage of the new version is that it would be consistent across >>> all the conflicts scenarios (if not, we would get messages during the >>> resolution or when the replay started waiting, depending of the >>> conflict >>> scenario). >>> >>> On the other hand, the cons of the new version is that we would miss >>> messages when no resolution is needed (replay wait duration < >>> max_standby_streaming_delay), but is that really annoying? (As no >>> cancellation would occur) >>> >>> Thinking about it, i like the new version (being warned during the >>> resolution) as we would get messages only when cancelation will occur >>> (which is what the user might want to avoid, so the extra info would be >>> useful). >>> >>> What do you think? >> Hmm, I think we print the reason why backends are canceled even of as >> now by ProcessInterrupts(). With this patch and related patches you >> proposed on another thread, the startup process reports virtual xids >> being interrupted, the reason, and LSN of blocked WAL, then processes >> will also report its virtual xid and reason. Therefore, the new >> information added by these patches is only the LSN of blocked WAL. > > That's completely right, let's come back to the original intention of > this patch (means, don't wait for the conflict resolution to log > messages). > > I'll submit a new version once updated. Please find attached the new patch. It provides the following outcomes depending on the conflict: 2020-10-04 09:08:51.923 UTC [30788] LOG: recovery is waiting recovery conflict on buffer pin OR 2020-10-04 09:52:25.832 UTC [1249] LOG: recovery is waiting recovery conflict on snapshot 2020-10-04 09:52:25.832 UTC [1249] DETAIL: Conflicting virtual transaction ids: 3/2, 2/4. OR 2020-10-04 09:11:51.717 UTC [30788] LOG: recovery is waiting recovery conflict on lock 2020-10-04 09:11:51.717 UTC [30788] DETAIL: Conflicting virtual transaction id: 2/5. OR 2020-10-04 09:13:04.104 UTC [30788] LOG: recovery is resolving recovery conflict on database Thanks Bertrand
Attachment
pgsql-hackers by date: