Home > mailing lists

Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code - Mailing list pgsql-hackers

From	Fujii Masao
Subject	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Date	February 9, 2015 11:29:50
Msg-id	CAHGQGwFSOJHdyOQdqynU66pA56eUgeOK+BwcBeoCDRCJnoH-Nw@mail.gmail.com Whole thread
In response to	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code (Michael Paquier <michael.paquier@gmail.com>)
Responses	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
List	pgsql-hackers

Tree view

On Sun, Feb 8, 2015 at 2:54 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Feb 6, 2015 at 4:58 PM, Fujii Masao wrote:
>> -                     * Wait for more WAL to arrive. Time out after 5 seconds,
>> -                     * like when polling the archive, to react to a trigger
>> -                     * file promptly.
>> +                     * Wait for more WAL to arrive. Time out after
>> the amount of
>> +                     * time specified by wal_retrieve_retry_interval, like
>> +                     * when polling the archive, to react to a
>> trigger file promptly.
>>                       */
>>                      WaitLatch(&XLogCtl->recoveryWakeupLatch,
>>                                WL_LATCH_SET | WL_TIMEOUT,
>> -                              5000L);
>> +                              wal_retrieve_retry_interval * 1000L);
>>
>> This change can prevent the startup process from reacting to
>> a trigger file. Imagine the case where the large interval is set
>> and the user want to promote the standby by using the trigger file
>> instead of pg_ctl promote. I think that the sleep time should be 5s
>> if the interval is set to more than 5s. Thought?
>
> I disagree here. It is interesting to accelerate the check of WAL
> availability from a source in some cases for replication, but the
> opposite is true as well as mentioned by Alexey at the beginning of
> the thread to reduce the number of requests when requesting WAL
> archives from an external storage type AWS. Hence a correct solution
> would be to check periodically for the trigger file with a maximum
> one-time wait of 5s to ensure backward-compatible behavior. We could
> reduce it to 1s or something like that as well.

You seem to have misunderstood the code in question. Or I'm missing something.
The timeout of the WaitLatch is just the interval to check for the trigger file
while waiting for more WAL to arrive from streaming replication. Not related to
the retry time to restore WAL from the archive.

Regards,

-- 
Fujii Masao

pgsql-hackers by date:

From: Fujii Masao
Date: 09 February 2015, 10:58:27
Subject: Re: The return value of allocate_recordbuf()

From: Michael Paquier
Date: 09 February 2015, 12:02:40
Subject: Re: The return value of allocate_recordbuf()

Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code - Mailing list pgsql-hackers

Previous

Next