Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Add Information during standby recovery conflicts
Date
Msg-id 6416172a-8c7d-84da-2600-f5f41c61db18@amazon.com
Whole thread Raw
In response to Re: Add Information during standby recovery conflicts  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Add Information during standby recovery conflicts
List pgsql-hackers
Hi,

On 10/30/20 4:25 AM, Fujii Masao wrote:
> CAUTION: This email originated from outside of the organization. Do 
> not click links or open attachments unless you can confirm the sender 
> and know the content is safe.
>
>
>
> On 2020/10/30 10:29, Masahiko Sawada wrote:
>> ,
>>
>> On Thu, 29 Oct 2020 at 00:16, Fujii Masao 
>> <masao.fujii@oss.nttdata.com> wrote:
>>>
>>>
>>>
>>> On 2020/10/27 9:41, Masahiko Sawada wrote:
>>>> On Tue, 20 Oct 2020 at 22:02, Drouvot, Bertrand 
>>>> <bdrouvot@amazon.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 10/15/20 9:15 AM, Masahiko Sawada wrote:
>>>>>> CAUTION: This email originated from outside of the organization. 
>>>>>> Do not click links or open attachments unless you can confirm the 
>>>>>> sender and know the content is safe.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, 15 Oct 2020 at 14:52, Kyotaro Horiguchi 
>>>>>> <horikyota.ntt@gmail.com> wrote:
>>>>>>> At Thu, 15 Oct 2020 14:28:57 +0900, Masahiko Sawada 
>>>>>>> <masahiko.sawada@2ndquadrant.com> wrote in
>>>>>>>>> ereport(..(errmsg("%s", _("hogehoge")))) results in
>>>>>>>>> fprintf((translated("%s")), translate("hogehoge")).
>>>>>>>>>
>>>>>>>>> So your change (errmsg("%s", gettext_noop("hogehoge")) results in
>>>>>>>>>
>>>>>>>>> fprintf((translated("%s")), DONT_translate("hogehoge")).
>>>>>>>>>
>>>>>>>>> which leads to a translation problem.
>>>>>>>>>
>>>>>>>>> (errmsg(gettext_noop("hogehoge"))
>>>>>>>> This seems equivalent to (errmsg("hogehoge")), right?
>>>>>>> Yes and no.  However eventually the two works the same way,
>>>>>>> "(errmsg(gettext_noop("hogehoge"))" is a shorthand of
>>>>>>>
>>>>>>> 1: char *msg = gettext_noop("hogehoge");
>>>>>>> ...
>>>>>>> 2: .. (errmsg(msg));
>>>>>>>
>>>>>>> That is, the line 1 only registers a message id "hogehoge" and 
>>>>>>> doesn't
>>>>>>> translate. The line 2 tries to translate the content of msg and it
>>>>>>> finds the translation for the message id "hogehoge".
>>>>>> Understood.
>>>>>>
>>>>>>>> I think I could understand translation stuff. Given we only 
>>>>>>>> report the
>>>>>>>> const string returned from get_recovery_conflict_desc() without
>>>>>>>> placeholders, the patch needs to use errmsg_internal() instead 
>>>>>>>> while
>>>>>>>> not changing _() part. (errmsg(get_recovery_conflict_desc())) 
>>>>>>>> is not
>>>>>>>> good (warned by -Wformat-security).
>>>>>>> Ah, right. we get a complain if no value parameters added. We can
>>>>>>> silence it by adding a dummy parameter to errmsg, but I'm not sure
>>>>>>> which is preferable.
>>>>>> Okay, I'm going to use errmsg_internal() for now until a better 
>>>>>> idea comes.
>>>>>>
>>>>>> I've attached the updated patch that fixed the translation part.
>>>>>
>>>>> Thanks for reviewing and helping on this patch!
>>>>>
>>>>> The patch tester bot is currently failing due to:
>>>>>
>>>>> "proc.c:1290:5: error: ‘standbyWaitStart’ may be used 
>>>>> uninitialized in
>>>>> this function [-Werror=maybe-uninitialized]"
>>>>>
>>>>> I've attached a new version with the minor change to fix it.
>>>>>
>>>>
>>>> Thank you for updating the patch!
>>>>
>>>> I've looked at the patch and revised a bit the formatting etc.
>>>>
>>>> After some thoughts, I think it might be better to report the waiting
>>>> time as well. it would help users and there is no particular reason
>>>> for logging the report only once. It also helps make the patch clean
>>>> by reducing the variables such as recovery_conflict_logged. I’ve
>>>> implemented it in the v8 patch.
>>>
>>> I read v8 patch. Here are review comments.
>>
>> Thank you for your review.
>>
>>>
>>> When recovery conflict with buffer pin happens, log message is output
>>> every deadlock_timeout. Is this intentional behavior? If yes, IMO 
>>> that's
>>> not good because lots of messages can be output.
>>
>> Agreed.
>>
>> if we were to log the recovery conflict only once in bufferpin
>> conflict case, we would log it multiple times only in lock conflict
>> case. So I guess it's better to do that in all conflict cases.
>
> Yes, I agree that this behavior basically should be consistent between 
> all cases.
>
>>
>>>
>>> +       if (log_recovery_conflict_waits)
>>> +               waitStart = GetCurrentTimestamp();
>>>
>>> What happens if log_recovery_conflict_waits is off at the beginning and
>>> then it's changed during waiting for the conflict? In this case, 
>>> waitStart is
>>> zero, but log_recovery_conflict_waits is on. This may cause some 
>>> problems?
>>
>> Hmm, I didn't see a path that happens to reload the config file during
>> waiting for buffer cleanup lock. Even if the startup process receives
>> SIGHUP during that, it calls HandleStartupProcInterrupts() at the next
>> convenient time. It could be the beginning of main apply loop or
>> inside WaitForWALToBecomeAvailable() and so on but I didn’t see it in
>> the wait loop for taking a buffer cleanup.
>
> Yes, you're right. I seem to have read the code wrongly.
>
>> However, I think it’s
>> better to use (waitStart > 0) for safety when checking if we log the
>> recovery conflict instead of log_recovery_conflict_waits.
>>
>>>
>>> +                       if (report_waiting)
>>> +                               ts = GetCurrentTimestamp();
>>>
>>> GetCurrentTimestamp() doesn't need to be called every cycle
>>> in the loop after "logged" is true and "new_status" is not NULL.
>>
>> Agreed
>>
>>>
>>> +extern const char *get_procsignal_reason_desc(ProcSignalReason 
>>> reason);
>>>
>>> Is this garbage?
>>
>> Yes.
>>
>>>
>>> When log_lock_waits is enabled, both "still waiting for ..." and 
>>> "acquired ..."
>>> messages are output. Like this, when log_recovery_conflict_waits is 
>>> enabled,
>>> not only "still waiting ..." but also "resolved ..." message should 
>>> be output?
>>> The latter message might not need to be output if the conflict is 
>>> canceled
>>> due to max_standby_xxx_delay parameter. The latter message would be
>>> useful to know how long the recovery was waiting for the conflict. 
>>> Thought?
>>> It's ok to implement this as a separate patch later, though.
>>
>> There was a discussion that the latter message without waiting time is
>> not necessarily needed because the canceled process will log
>> "canceling statement due to conflict with XXX" as you mentioned. I
>> agreed with that. But I agree that the latter message with waiting
>> time would help users, for instance when the startup process is
>> waiting for multiple processes and it takes a time to cancel all of
>> them.
>
> I agree that it's useful to output the wait time.
>
> But as I told in previous email, it's ok to focus on the current patch
> for now and then implement this as a separate patch later.
>
Thanks for your work and thoughts on this patch!

I've attached a new version that take your remarks (hope i did not miss 
some of them) into account (but still leave the last one for a separate 
patch later).

Bertrand


Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Ajin Cherian
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions