Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Support for N synchronous standby servers - take 2
Date
Msg-id 27732.1461769845@sss.pgh.pa.us
Whole thread Raw
In response to Re: Support for N synchronous standby servers - take 2  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> Sorry, I have attached an empty patch. This is another one that should
> be with content.

I started to review this, and in passing came across this gem in
syncrep_scanner.l:


/* * flex emits a yy_fatal_error() function that it calls in response to * critical errors like malloc failure, file
I/Oerrors, and detection of * internal inconsistency.  That function prints a message and calls exit(). * Mutate it to
insteadcall ereport(FATAL), which terminates this process. * * The process that causes this fatal error should be
terminated.* Otherwise it has to abandon the new setting value of * synchronous_standby_names and keep running with the
previousone * while the other processes switch to the new one. * This inconsistency of the setting that each process is
basedon * can cause a serious problem. Though it's basically not good idea to * use FATAL here because it can take down
thepostmaster, * we should do that in order to avoid such an inconsistency. */
 
#undef fprintf
#define fprintf(file, fmt, msg) syncrep_flex_fatal(fmt, msg)

static void
syncrep_flex_fatal(const char *fmt, const char *msg)
{ereport(FATAL, (errmsg_internal("%s", msg)));
}


This is the faultiest reasoning possible.  There are a hundred reasons why
a process might fail to absorb a GUC setting, and causing just one such
code path to FATAL out is not going to improve system stability one bit.

If you think it is absolutely imperative that all processes in the system
have identical synchronous_standby_names settings, then we need to make
it be PGC_POSTMASTER, not indulge in half-baked non-solutions like this.
But I'd like to know why that is so essential.  It looks to me like what
matters is only whether each individual walsender thinks its client is
a sync standby, and so inconsistent settings between different walsenders
don't really matter.  Which is a good thing, because if it's to remain
SIGHUP, you can't promise that they'll all absorb a new value at the same
instant anyway.

In short, I don't see any good reason not to make this be a plain ERROR
like it is in every other scanner in the backend.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: 9.6 and fsync=off
Next
From: Andres Freund
Date:
Subject: Re: pgindent