Home > mailing lists

Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Date	February 10, 2015 13:33:07
Msg-id	CAB7nPqQ_Fm8a-OSEeE6n1nT-LUp7ty85Yc+vWW1v06rwQxC1Hg@mail.gmail.com Whole thread Raw
In response to	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code (Fujii Masao <masao.fujii@gmail.com>)
Responses	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
List	pgsql-hackers

Tree view

On Mon, Feb 9, 2015 at 8:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sun, Feb 8, 2015 at 2:54 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Feb 6, 2015 at 4:58 PM, Fujii Masao wrote:
>>> -                     * Wait for more WAL to arrive. Time out after 5 seconds,
>>> -                     * like when polling the archive, to react to a trigger
>>> -                     * file promptly.
>>> +                     * Wait for more WAL to arrive. Time out after
>>> the amount of
>>> +                     * time specified by wal_retrieve_retry_interval, like
>>> +                     * when polling the archive, to react to a
>>> trigger file promptly.
>>>                       */
>>>                      WaitLatch(&XLogCtl->recoveryWakeupLatch,
>>>                                WL_LATCH_SET | WL_TIMEOUT,
>>> -                              5000L);
>>> +                              wal_retrieve_retry_interval * 1000L);
>>>
>>> This change can prevent the startup process from reacting to
>>> a trigger file. Imagine the case where the large interval is set
>>> and the user want to promote the standby by using the trigger file
>>> instead of pg_ctl promote. I think that the sleep time should be 5s
>>> if the interval is set to more than 5s. Thought?
>>
>> I disagree here. It is interesting to accelerate the check of WAL
>> availability from a source in some cases for replication, but the
>> opposite is true as well as mentioned by Alexey at the beginning of
>> the thread to reduce the number of requests when requesting WAL
>> archives from an external storage type AWS. Hence a correct solution
>> would be to check periodically for the trigger file with a maximum
>> one-time wait of 5s to ensure backward-compatible behavior. We could
>> reduce it to 1s or something like that as well.
>
> You seem to have misunderstood the code in question. Or I'm missing something.
> The timeout of the WaitLatch is just the interval to check for the trigger file
> while waiting for more WAL to arrive from streaming replication. Not related to
> the retry time to restore WAL from the archive.

[Re-reading the code...]
Aah.. Yes you are right. Sorry for the noise. Yes let's wait for a
maximum of 5s then. I also noticed in previous patch that the wait was
maximized to 5s. To begin with, a loop should have been used if it was
a sleep, but as now patch uses a latch this limit does not make much
sense... Patch updated is attached.
Regards,
--
Michael

Attachment

20150210_wal_source_check_v7.patch

pgsql-hackers by date:

From: Marc Balmer
Date: 10 February 2015, 13:32:29
Subject: Re: For cursors, there is FETCH and MOVE, why no TELL?

From: Andres Freund
Date: 10 February 2015, 13:43:29
Subject: Re: Assertion failure when streaming logical changes

Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code - Mailing list pgsql-hackers

Attachment

Previous

Next