Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Add Information during standby recovery conflicts
Date
Msg-id 05621ec8-6e62-25db-14c4-50079304617d@amazon.com
Whole thread Raw
In response to Re: Add Information during standby recovery conflicts  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: Add Information during standby recovery conflicts  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers
On 8/27/20 10:16 AM, Masahiko Sawada wrote
>
> On Mon, 10 Aug 2020 at 23:43, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>> Hi,
>>
>> On 7/31/20 7:12 AM, Masahiko Sawada wrote:
>>> +   tmpWaitlist = waitlist;
>>> +   while (VirtualTransactionIdIsValid(*tmpWaitlist))
>>> +   {
>>> +       tmpWaitlist++;
>>> +   }
>>> +
>>> +   num_waitlist_entries = (tmpWaitlist - waitlist);
>>> +
>>> +   /* display wal record information */
>>> +   if (log_recovery_conflicts &&
>>> (TimestampDifferenceExceeds(recovery_conflicts_log_time,
>>> GetCurrentTimestamp(),
>>> +                                   DeadlockTimeout))) {
>>> +       LogBlockedWalRecordInfo(num_waitlist_entries, reason);
>>> +       recovery_conflicts_log_time = GetCurrentTimestamp();
>>> +   }
>>>
>>> recovery_conflicts_log_time is not initialized. And shouldn't we
>>> compare the current timestamp to the timestamp when the startup
>>> process started waiting?
>>>
>>> I think we should call LogBlockedWalRecordInfo() inside of the inner
>>> while loop rather than at the beginning of
>>> ResolveRecoveryConflictWithVirtualXIDs(). In lock conflict cases, the
>>> startup process waits until 'ltime', then enters
>>> ResolveRecoveryConflictWithVirtualXIDs() after reaching 'ltime'.
>>> Therefore, it makes sense to call LogBlockedWalRecordInfo() at the
>>> beginning of ResolveRecoveryConflictWithVirtualXIDs(). However, in
>>> snapshot and tablespace conflict cases (i.g.
>>> ResolveRecoveryConflictWithSnapshot() and
>>> ResolveRecoveryConflictWithTablespace()), it enters
>>> ResolveRecoveryConflictWithVirtualXIDs() without waits and waits for
>>> reaching ‘ltime’ inside of the inner while look. So the above
>>> condition could always be false.
>> That would make the information being displayed after
>> max_standby_streaming_delay is reached for the multiple cases you just
>> described.
> Sorry, it should be deadlock_timeout, not max_standby_streaming_delay.
> Otherwise, the recovery conflict log message is printed when
> resolution, which seems not to achieve the original purpose. Am I
> missing something?

Ok, I understand where the confusion is coming from.

Indeed the new version is now printing the recovery conflict log message 
during the conflict resolution (while the initial intention was to be 
warned as soon as the replay had to wait).

The advantage of the new version is that it would be consistent across 
all the conflicts scenarios (if not, we would get messages during the 
resolution or when the replay started waiting, depending of the conflict 
scenario).

On the other hand, the cons of the new version is that we would miss 
messages when no resolution is needed (replay wait duration < 
max_standby_streaming_delay), but is that really annoying? (As no 
cancellation would occur)

Thinking about it, i like the new version (being warned during the 
resolution) as we would get messages only when cancelation will occur 
(which is what the user might want to avoid, so the extra info would be 
useful).

What do you think?

Bertrand




pgsql-hackers by date:

Previous
From: Sachin Khanna
Date:
Subject: Please help for error ( file is required for XML support )
Next
From: Amit Kapila
Date:
Subject: Re: Parallel copy