Re: Crash by targetted recovery - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Crash by targetted recovery
Date
Msg-id f7c69cf0-fcac-91b4-6fa2-f2560b6c49b5@oss.nttdata.com
Whole thread Raw
In response to Re: Crash by targetted recovery  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Crash by targetted recovery  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers

On 2020/02/28 12:13, Kyotaro Horiguchi wrote:
> At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>
>>
>> On 2020/02/27 17:05, Kyotaro Horiguchi wrote:
>>> Thank you for the comment.
>>> At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao
>>> <masao.fujii@oss.nttdata.com> wrote in
>>>> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
>>>>>> I failed to understand why random access while reading from
>>>>>> stream is bad idea. Could you elaborate why?
>>>>> It seems to me the word "streaming" suggests that WAL record should be
>>>>> read sequentially. Random access, which means reading from arbitrary
>>>>> location, breaks a stream.  (But the patch doesn't try to stop wal
>>>>> sender if randAccess.)
>>>>>
>>>>>> Isn't it sufficient to set currentSource to 0 when disabling
>>>>>> StandbyMode?
>>>>> I thought that and it should work, but I hesitated to manipulate on
>>>>> currentSource in StartupXLOG. currentSource is basically a private
>>>>> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
>>>>> think it's not good to modify it out of the the logic in
>>>>> WaitForWALToBecomeAvailable.
>>>>
>>>> If so, what about adding the following at the top of
>>>> WaitForWALToBecomeAvailable()?
>>>>
>>>>       if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
>>>>            currentSource = 0;
>>> It works virtually the same way. I'm happy to do that if you don't
>>> agree to using randAccess. But I'd rather do that in the 'if
>>> (!InArchiveRecovery)' section.
>>
>> The approach using randAccess seems unsafe. Please imagine
>> the case where currentSource is changed to XLOG_FROM_ARCHIVE
>> because randAccess is true, while walreceiver is still running.
>> For example, this case can occur when the record at REDO
>> starting point is fetched with randAccess = true after walreceiver
>> is invoked to fetch the last checkpoint record. The situation
>> "currentSource != XLOG_FROM_STREAM while walreceiver is
>>   running" seems invalid. No?
> 
> When I mentioned an possibility of changing ReadRecord so that it
> modifies randAccess instead of currentSource, I thought that
> WaitForWALToBecomeAvailable should shutdown wal receiver as
> needed.
> 
> At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> me> location, breaks a stream.  (But the patch doesn't try to stop wal
> me> sender if randAccess.)

Sorry, I failed to notice that.

> And random access during StandbyMode ususally (always?) lets RecPtr go
> back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
> don't have a file in pg_wal and the REDO point is far back by more
> than a segment from the initial checkpoint record.

It works correctly. This is why WaitForWALToBecomeAvailable() uses
fetching_ckpt argument.

> If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
> re-connects to the primary for the past segment.

But this can lead to unnecessary restart of walreceiver. Since
fetching_ckpt ensures that the WAL file containing the REDO
starting record exists in pg_wal, there is no need to reconnect
to the primary when reading the REDO starting record.

Is there other case where we need to go back to XLOG_FROM_ARCHIVE
by random access?

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: logical replication empty transactions
Next
From: Juan José Santamaría Flecha
Date:
Subject: Re: color by default