Re: pgsql: pg_ctl: Detect current standby state from pg_control - Mailing list pgsql-committers

From Tom Lane
Subject Re: pgsql: pg_ctl: Detect current standby state from pg_control
Date
Msg-id 27012.1474894431@sss.pgh.pa.us
Whole thread Raw
In response to Re: pgsql: pg_ctl: Detect current standby state from pg_control  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: pgsql: pg_ctl: Detect current standby state from pg_control  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-committers
Michael Paquier <michael.paquier@gmail.com> writes:
> On Mon, Sep 26, 2016 at 12:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Coverity thinks that this patch introduced a bunch of
>> null-pointer-dereference hazards, and AFAICS it is right.
>> The change in get_controlfile()'s API is completely broken
>> and needs to be undone.

> So you'd rather have a sleep(1) and complicate this code to replace it
> with a WaitLatch() when the code is used by the backend? I don't agree
> with that as the new API is cleaner as presented, though I agree that
> it is a change that caller does not get any result in case of a CRC
> mismatch, but who's going to use results that cannot be trusted
> anyway?

Well, they certainly won't be using any untrustworthy reports if
pg_controldata dumps core instead of printing the data, which is what
will happen in HEAD.  Although I don't understand your reference to
needing a sleep.  I think that it's 100% pointless for get_control_dbstate
to be worried about transient CRC failures.  If writes to pg_control
aren't atomic then we have problems enormously larger than whether
"pg_ctl promote" throws an error or not.

> It seems to me that the correct fix here is that
> pg_controldata.c should just exit like in the attached patch.

No.  Removing pg_controldata's designed-in ability to print information
from corrupted pg_control files was not part of the advertised impact of
this patch, and I will absolutely not hold still for it happening as an
accidental consequence of ill-advised refactoring.

It may well be that you need two different APIs for get_controlfile(),
maybe with a flag, or two different wrappers around some common code.
Or just go back to using #ifdef FRONTEND, and forget the silliness of
expecting pg_ctl not to throw an error for bad pg_control content.

BTW, now that get_controlfile has become a global symbol rather than
something private to pg_controldata.c, I suggest that it needs a less
generic name.  "control file" could apply to a lot of things.
Possibly "get_pg_control_contents" would be better.

            regards, tom lane


pgsql-committers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: pgsql: Refactor script execution state machine in pgbench.
Next
From: Tom Lane
Date:
Subject: pgsql: Document has_type_privilege().