Re: Add Information during standby recovery conflicts - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Add Information during standby recovery conflicts
Date
Msg-id c141f3fc-2cff-a8f5-08b5-0aafa173c949@oss.nttdata.com
Whole thread Raw
In response to Re: Add Information during standby recovery conflicts  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: Add Information during standby recovery conflicts
List pgsql-hackers

On 2020/10/30 10:29, Masahiko Sawada wrote:
> ,
> 
> On Thu, 29 Oct 2020 at 00:16, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2020/10/27 9:41, Masahiko Sawada wrote:
>>> On Tue, 20 Oct 2020 at 22:02, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 10/15/20 9:15 AM, Masahiko Sawada wrote:
>>>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless
youcan confirm the sender and know the content is safe.
 
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 15 Oct 2020 at 14:52, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>>>>>> At Thu, 15 Oct 2020 14:28:57 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in
>>>>>>>> ereport(..(errmsg("%s", _("hogehoge")))) results in
>>>>>>>> fprintf((translated("%s")), translate("hogehoge")).
>>>>>>>>
>>>>>>>> So your change (errmsg("%s", gettext_noop("hogehoge")) results in
>>>>>>>>
>>>>>>>> fprintf((translated("%s")), DONT_translate("hogehoge")).
>>>>>>>>
>>>>>>>> which leads to a translation problem.
>>>>>>>>
>>>>>>>> (errmsg(gettext_noop("hogehoge"))
>>>>>>> This seems equivalent to (errmsg("hogehoge")), right?
>>>>>> Yes and no.  However eventually the two works the same way,
>>>>>> "(errmsg(gettext_noop("hogehoge"))" is a shorthand of
>>>>>>
>>>>>> 1: char *msg = gettext_noop("hogehoge");
>>>>>> ...
>>>>>> 2: .. (errmsg(msg));
>>>>>>
>>>>>> That is, the line 1 only registers a message id "hogehoge" and doesn't
>>>>>> translate. The line 2 tries to translate the content of msg and it
>>>>>> finds the translation for the message id "hogehoge".
>>>>> Understood.
>>>>>
>>>>>>> I think I could understand translation stuff. Given we only report the
>>>>>>> const string returned from get_recovery_conflict_desc() without
>>>>>>> placeholders, the patch needs to use errmsg_internal() instead while
>>>>>>> not changing _() part. (errmsg(get_recovery_conflict_desc())) is not
>>>>>>> good (warned by -Wformat-security).
>>>>>> Ah, right. we get a complain if no value parameters added. We can
>>>>>> silence it by adding a dummy parameter to errmsg, but I'm not sure
>>>>>> which is preferable.
>>>>> Okay, I'm going to use errmsg_internal() for now until a better idea comes.
>>>>>
>>>>> I've attached the updated patch that fixed the translation part.
>>>>
>>>> Thanks for reviewing and helping on this patch!
>>>>
>>>> The patch tester bot is currently failing due to:
>>>>
>>>> "proc.c:1290:5: error: ‘standbyWaitStart’ may be used uninitialized in
>>>> this function [-Werror=maybe-uninitialized]"
>>>>
>>>> I've attached a new version with the minor change to fix it.
>>>>
>>>
>>> Thank you for updating the patch!
>>>
>>> I've looked at the patch and revised a bit the formatting etc.
>>>
>>> After some thoughts, I think it might be better to report the waiting
>>> time as well. it would help users and there is no particular reason
>>> for logging the report only once. It also helps make the patch clean
>>> by reducing the variables such as recovery_conflict_logged. I’ve
>>> implemented it in the v8 patch.
>>
>> I read v8 patch. Here are review comments.
> 
> Thank you for your review.
> 
>>
>> When recovery conflict with buffer pin happens, log message is output
>> every deadlock_timeout. Is this intentional behavior? If yes, IMO that's
>> not good because lots of messages can be output.
> 
> Agreed.
> 
> if we were to log the recovery conflict only once in bufferpin
> conflict case, we would log it multiple times only in lock conflict
> case. So I guess it's better to do that in all conflict cases.

Yes, I agree that this behavior basically should be consistent between all cases.

> 
>>
>> +       if (log_recovery_conflict_waits)
>> +               waitStart = GetCurrentTimestamp();
>>
>> What happens if log_recovery_conflict_waits is off at the beginning and
>> then it's changed during waiting for the conflict? In this case, waitStart is
>> zero, but log_recovery_conflict_waits is on. This may cause some problems?
> 
> Hmm, I didn't see a path that happens to reload the config file during
> waiting for buffer cleanup lock. Even if the startup process receives
> SIGHUP during that, it calls HandleStartupProcInterrupts() at the next
> convenient time. It could be the beginning of main apply loop or
> inside WaitForWALToBecomeAvailable() and so on but I didn’t see it in
> the wait loop for taking a buffer cleanup.

Yes, you're right. I seem to have read the code wrongly.

> However, I think it’s
> better to use (waitStart > 0) for safety when checking if we log the
> recovery conflict instead of log_recovery_conflict_waits.
> 
>>
>> +                       if (report_waiting)
>> +                               ts = GetCurrentTimestamp();
>>
>> GetCurrentTimestamp() doesn't need to be called every cycle
>> in the loop after "logged" is true and "new_status" is not NULL.
> 
> Agreed
> 
>>
>> +extern const char *get_procsignal_reason_desc(ProcSignalReason reason);
>>
>> Is this garbage?
> 
> Yes.
> 
>>
>> When log_lock_waits is enabled, both "still waiting for ..." and "acquired ..."
>> messages are output. Like this, when log_recovery_conflict_waits is enabled,
>> not only "still waiting ..." but also "resolved ..." message should be output?
>> The latter message might not need to be output if the conflict is canceled
>> due to max_standby_xxx_delay parameter. The latter message would be
>> useful to know how long the recovery was waiting for the conflict. Thought?
>> It's ok to implement this as a separate patch later, though.
> 
> There was a discussion that the latter message without waiting time is
> not necessarily needed because the canceled process will log
> "canceling statement due to conflict with XXX" as you mentioned. I
> agreed with that. But I agree that the latter message with waiting
> time would help users, for instance when the startup process is
> waiting for multiple processes and it takes a time to cancel all of
> them.

I agree that it's useful to output the wait time.

But as I told in previous email, it's ok to focus on the current patch
for now and then implement this as a separate patch later.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8