Home > mailing lists

Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Date	January 16, 2015 17:44:48
Msg-id	20150116174441.GK16991@alap3.anarazel.de Whole thread Raw
In response to	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code (Michael Paquier <michael.paquier@gmail.com>)
Responses	Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
List	pgsql-hackers

Tree view

Hi,

On 2015-01-05 17:16:43 +0900, Michael Paquier wrote:
> +     <varlistentry id="restore-command-retry-interval" xreflabel="restore_command_retry_interval">
> +      <term><varname>restore_command_retry_interval</varname> (<type>integer</type>)
> +      <indexterm>
> +        <primary><varname>restore_command_retry_interval</> recovery parameter</primary>
> +      </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +        If <varname>restore_command</> returns nonzero exit status code, retry
> +        command after the interval of time specified by this parameter,
> +        measured in milliseconds if no unit is specified. Default value is
> +        <literal>5s</>.
> +       </para>
> +       <para>
> +        This command is particularly useful for warm standby configurations
> +        to leverage the amount of retries done to restore a WAL segment file
> +        depending on the activity of a master node.
> +       </para>

Don't really like this paragraph. What's "leveraging the amount of
retries done" supposed to mean?

> +# restore_command_retry_interval
> +#
> +# specifies an optional retry interval for restore_command command, if
> +# previous command returned nonzero exit status code. This can be useful
> +# to increase or decrease the number of calls of restore_command.

It's not really the number  but frequency, right?

> +        else if (strcmp(item->name, "restore_command_retry_interval") == 0)
> +        {
> +            const char *hintmsg;
> +
> +            if (!parse_int(item->value, &restore_command_retry_interval, GUC_UNIT_MS,
> +                           &hintmsg))
> +                ereport(ERROR,
> +                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                         errmsg("parameter \"%s\" requires a temporal value",
> +                                "restore_command_retry_interval"),
> +                         hintmsg ? errhint("%s", _(hintmsg)) : 0));

"temporal value" sounds odd to my ears. I'd rather keep in line with
what guc.c outputs in such a case.

> +                    now = GetCurrentTimestamp();
> +                    if (!TimestampDifferenceExceeds(last_fail_time, now,
> +                                                    restore_command_retry_interval))
>                      {
> -                        pg_usleep(1000000L * (5 - (now - last_fail_time)));
> -                        now = (pg_time_t) time(NULL);
> +                        long        secs;
> +                        int            microsecs;
> +
> +                        TimestampDifference(last_fail_time, now, &secs, µsecs);
> +                        pg_usleep(restore_command_retry_interval * 1000L - (1000000L * secs + 1L * microsecs));
> +                        now = GetCurrentTimestamp();
>                      }
>                      last_fail_time = now;
>                      currentSource = XLOG_FROM_ARCHIVE;

Am I missing something or does this handle/affect streaming failures
just the same? That might not be a bad idea, because the current
interval is far too long for e.g. a sync replication setup. But it
certainly makes the name a complete misnomer.

Not this patch's fault, but I'm getting a bit tired seing the above open
coded. How about adding a function that does the sleeping based on a
timestamptz and a ms interval?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

pgsql-hackers by date:

From: Jim Nasby
Date: 16 January 2015, 17:42:46
Subject: Re: proposal: row_to_array function

From: Pavel Stehule
Date: 16 January 2015, 18:02:52
Subject: Re: proposal: row_to_array function

Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code - Mailing list pgsql-hackers

Previous

Next