Re: Different behaviour for pg_ctl --wait between pg9.5 and pg10 - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Different behaviour for pg_ctl --wait between pg9.5 and pg10
Date
Msg-id 20277.1521467295@sss.pgh.pa.us
Whole thread Raw
In response to Different behaviour for pg_ctl --wait between pg9.5 and pg10  (Greg k <gregg.kay@gmail.com>)
List pgsql-bugs
Greg k <gregg.kay@gmail.com> writes:
> I have a script where after a point-in-time recovery I run
> "pg_ctl start -D /data -w -t 86400"
> and then try to connect as soon as pg_ctl finishes. With Postgres 9.5.9 (on
> Centos 7.4) I can connect at the end. But with Postgres 10.3 I get a
> connection error
> psql: FATAL:  the database system is starting up
> It seems with Postgres 10.3 the postmaster.pid file state goes from
> 'starting' to 'standby' to 'ready' but pg_ctl is saying the server is ready
> to accept connections even though the postmaster.pid file says 'standby'.

The problem from pg_ctl's standpoint is that it can't tell whether
"standby" is a short-lived state.  In PG 10 it assumes not, so it exits
once that state is reached.  The previous implementation couldn't
distinguish that state at all (because PQping doesn't) and would therefore
wait until the server accepted connections or it timed out.  That happened
to be good for your use-case, I guess, but a lot of other people did not
like it: it led to waiting till timeout, and then reporting failure, when
starting a non-hot standby server.  Even in the PITR case, there's no
certainty that the server will exit "standby" state in any prompt fashion
(ie, before pg_ctl times out), so that you did not really have a guarantee
before that the server would accept connections after pg_ctl exited.

I'd suggest adding a wait-for-psql-to-connect loop after your pg_ctl
start when starting a PITR run.

            regards, tom lane


pgsql-bugs by date:

Previous
From: Martin Liška
Date:
Subject: Re: BUG #15121: Multiple UBSAN errors
Next
From: Tomas Vondra
Date:
Subject: Re: BUG #15121: Multiple UBSAN errors