Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Add Information during standby recovery conflicts
Date
Msg-id 48a1c817-cbde-33c4-2672-13ef37e1d946@amazon.com
Whole thread Raw
In response to Re: Add Information during standby recovery conflicts  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Add Information during standby recovery conflicts  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi,

On 11/25/20 2:20 PM, Masahiko Sawada wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 
>
>
>
> On Fri, Nov 20, 2020 at 6:18 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>> Hi,
>>
>> On 11/17/20 4:44 PM, Fujii Masao wrote:
>>> Thanks for updating the patch! Here are review comments.
>>>
>>> +        Controls whether a log message is produced when the startup
>>> process
>>> +        is waiting longer than <varname>deadlock_timeout</varname>
>>> +        for recovery conflicts.
>>>
>>> But a log message can be produced also when the backend is waiting
>>> for recovery conflict. Right? If yes, this description needs to be
>>> corrected.
>> Thanks for looking at it!
>>
>> I don't think so, only the startup process should write those new log
>> messages.
>>
>> What makes you think that would not be the case?
>>
>>>
>>> +        for recovery conflicts.  This is useful in determining if
>>> recovery
>>> +        conflicts prevents the recovery from applying WAL.
>>>
>>> "prevents" should be "prevent"?
>> Indeed: fixed in the new attached patch.
>>
>>>
>>> +       TimestampDifference(waitStart, GetCurrentTimestamp(), &secs,
>>> &usecs);
>>> +       msecs = secs * 1000 + usecs / 1000;
>>>
>>> GetCurrentTimestamp() is basically called before LogRecoveryConflict()
>>> is called. So isn't it better to avoid calling GetCurrentTimestamp()
>>> newly in
>>> LogRecoveryConflict() and to reuse the timestamp that we got?
>>> It's helpful to avoid the waste of cycles.
>>>
>> good catch! fixed in the new attached patch.
>>
>>> +               while (VirtualTransactionIdIsValid(*vxids))
>>> +               {
>>> +                       PGPROC *proc =
>>> BackendIdGetProc(vxids->backendId);
>>>
>>> BackendIdGetProc() can return NULL if the backend is not active
>>> at that moment. This case should be handled.
>>>
>> handled in the new attached patch.
>>> +               case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
>>> +                       reasonDesc = gettext_noop("recovery is still
>>> waiting recovery conflict on buffer pin");
>>>
>>> It's natural to use "waiting for recovery" rather than "waiting
>>> recovery"?
>>>
>> I would be tempted to say so, the new patch makes use of "waiting for".
>>> +               /* Also, set deadlock timeout for logging purpose if
>>> necessary */
>>> +               if (log_recovery_conflict_waits)
>>> +               {
>>> +                       timeouts[cnt].id = STANDBY_TIMEOUT;
>>> +                       timeouts[cnt].type = TMPARAM_AFTER;
>>> +                       timeouts[cnt].delay_ms = DeadlockTimeout;
>>> +                       cnt++;
>>> +               }
>>>
>>> This needs to be executed only when the message has not been logged yet.
>>> Right?
>>>
>> good catch: fixed in the new attached patch.
>>
> Thank you for updating the patch! Here are review comments on the
> latest version patch.
>
> +       while (VirtualTransactionIdIsValid(*vxids))
> +       {
> +           PGPROC *proc = BackendIdGetProc(vxids->backendId);
> +
> +           if (proc)
> +           {
> +               if (nprocs == 0)
> +                   appendStringInfo(&buf, "%d", proc->pid);
> +               else
> +                   appendStringInfo(&buf, ", %d", proc->pid);
> +
> +               nprocs++;
> +               vxids++;
> +           }
> +       }
>
> We need to increment vxids even if *proc is null. Otherwise, the loop won't end.

My bad, that's fixed.

>
> ---
> +               TimestampTz cur_ts = GetCurrentTimestamp();;
Fixed
>
> There is an extra semi-colon.
>
> ---
>   int            max_standby_streaming_delay = 30 * 1000;
> +bool       log_recovery_conflict_waits = false;
> +bool       logged_lock_conflict = false;
>
>
> +       if (log_recovery_conflict_waits && !logged_lock_conflict)
> +       {
> +           timeouts[cnt].id = STANDBY_TIMEOUT;
> +           timeouts[cnt].type = TMPARAM_AFTER;
> +           timeouts[cnt].delay_ms = DeadlockTimeout;
> +           cnt++;
> +       }
>
> Can we pass a bool indicating if a timeout may be needed for recovery
> conflict logging from ProcSleep() to ResolveRecoveryConflictWithLock()
> instead of using a static variable?

Yeah that makes more sense, specially as we already have 
logged_recovery_conflict at our disposal.

New patch version attached.

Thanks!

Bertrand


Attachment

pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: PoC: custom signal handler for extensions
Next
From: Stephen Frost
Date:
Subject: Re: A few new options for CHECKPOINT