Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code |
Date | |
Msg-id | 20150116174441.GK16991@alap3.anarazel.de Whole thread Raw |
In response to | Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: Patch: add recovery_timeout option to control timeout
of restore_command nonzero status code
|
List | pgsql-hackers |
Hi, On 2015-01-05 17:16:43 +0900, Michael Paquier wrote: > + <varlistentry id="restore-command-retry-interval" xreflabel="restore_command_retry_interval"> > + <term><varname>restore_command_retry_interval</varname> (<type>integer</type>) > + <indexterm> > + <primary><varname>restore_command_retry_interval</> recovery parameter</primary> > + </indexterm> > + </term> > + <listitem> > + <para> > + If <varname>restore_command</> returns nonzero exit status code, retry > + command after the interval of time specified by this parameter, > + measured in milliseconds if no unit is specified. Default value is > + <literal>5s</>. > + </para> > + <para> > + This command is particularly useful for warm standby configurations > + to leverage the amount of retries done to restore a WAL segment file > + depending on the activity of a master node. > + </para> Don't really like this paragraph. What's "leveraging the amount of retries done" supposed to mean? > +# restore_command_retry_interval > +# > +# specifies an optional retry interval for restore_command command, if > +# previous command returned nonzero exit status code. This can be useful > +# to increase or decrease the number of calls of restore_command. It's not really the number but frequency, right? > + else if (strcmp(item->name, "restore_command_retry_interval") == 0) > + { > + const char *hintmsg; > + > + if (!parse_int(item->value, &restore_command_retry_interval, GUC_UNIT_MS, > + &hintmsg)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("parameter \"%s\" requires a temporal value", > + "restore_command_retry_interval"), > + hintmsg ? errhint("%s", _(hintmsg)) : 0)); "temporal value" sounds odd to my ears. I'd rather keep in line with what guc.c outputs in such a case. > + now = GetCurrentTimestamp(); > + if (!TimestampDifferenceExceeds(last_fail_time, now, > + restore_command_retry_interval)) > { > - pg_usleep(1000000L * (5 - (now - last_fail_time))); > - now = (pg_time_t) time(NULL); > + long secs; > + int microsecs; > + > + TimestampDifference(last_fail_time, now, &secs, µsecs); > + pg_usleep(restore_command_retry_interval * 1000L - (1000000L * secs + 1L * microsecs)); > + now = GetCurrentTimestamp(); > } > last_fail_time = now; > currentSource = XLOG_FROM_ARCHIVE; Am I missing something or does this handle/affect streaming failures just the same? That might not be a bad idea, because the current interval is far too long for e.g. a sync replication setup. But it certainly makes the name a complete misnomer. Not this patch's fault, but I'm getting a bit tired seing the above open coded. How about adding a function that does the sleeping based on a timestamptz and a ms interval? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: