Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running |
Date | |
Msg-id | 201011240514.oAO5Ejd22656@momjian.us Whole thread Raw |
In response to | Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Re: [BUGS] BUG #5650: Postgres service showing
as stopped when in fact it is running
|
List | pgsql-hackers |
Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > Tom Lane wrote: > > >> Possibly the cleanest fix is to implement pg_ping as a libpq function. > > >> You do have to distinguish connection failures (ie connection refused) > > >> from errors that came back from the postmaster, and the easiest place to > > >> be doing that is inside libpq. > > > > > OK, so a new libpq function --- got it. Would we just pass the status > > > from the backend or can it be done without backend modifications? > > > > It would definitely be better to do it without backend mods, so that > > the functionality would work against back-branch postmasters. > > > > To my mind, the entire purpose of such a function is to classify the > > possible errors so that the caller doesn't have to. So I wouldn't > > consider that it ought to "pass back the status from the backend". > > I think what we basically want is a function that takes a conninfo > > string (or one of the variants of that) and returns an enum defined > > more or less like this: > > > > * failed to connect to postmaster > > * connected, but postmaster is not accepting sessions > > * postmaster is up and accepting sessions > > > > I'm not sure those are exactly the categories we want, but something > > close to that. In particular, I don't know if there's any value in > > subdividing the "not accepting sessions" status --- pg_ctl doesn't > > really care, but other use-cases might want to tell the difference > > between the various canAcceptConnections failure states. > > > > BTW, it is annoying that we can't definitively distinguish "postmaster > > is not running" from a connectivity problem, but I can't see a way > > around that. > > Agreed. I will research this. I have researched this and developed the attached patch. It implements PGping() and PGpingParams() in libpq, and has pg_ctl use it for pg_ctl -w server status detection. The new output for cases where .pgpass is not allowing for a connection is: $ pg_ctl -w -l /dev/null start waiting for server to start.... done server started However, could not connect, perhaps due to invalid authentication or misconfiguration. The code basically checks the connection status between PQconnectStart() and connectDBComplete() to see if the server is running but we failed to connect for some reason. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index a911c50..32c58a5 100644 *** /tmp/b2EvXa_libpq.sgml Tue Nov 23 17:41:50 2010 --- doc/src/sgml/libpq.sgml Tue Nov 23 17:36:32 2010 *************** int PQbackendPID(const PGconn *conn); *** 1511,1516 **** --- 1511,1584 ---- </listitem> </varlistentry> + <varlistentry id="libpq-pqpingparams"> + <term><function>PQpingParams</function><indexterm><primary>PQpingParams</></></term> + <listitem> + <para> + <function>PQpingParams</function> indicates the status of the + server. The currently recognized parameter key words are the + same as <function>PQconnectParams</>. + + <synopsis> + PGPing PQpingParams(const char **keywords, const char **values, int expand_dbname); + </synopsis> + + It returns one of the following values: + + <variablelist> + <varlistentry id="libpq-pqpingparams-pqaccess"> + <term><literal>PQACCESS</literal></term> + <listitem> + <para> + The server is running and allows access. + </para> + </listitem> + </varlistentry> + + <varlistentry id="libpq-pqpingparams-pqreject"> + <term><literal>PQREJECT</literal></term> + <listitem> + <para> + The server is running but rejected a connection request. + </para> + </listitem> + </varlistentry> + + <varlistentry id="libpq-pqpingparams-pqnoresponse"> + <term><literal>PQNORESPONSE</literal></term> + <listitem> + <para> + The server did not respond. + </para> + </listitem> + </varlistentry> + </variablelist> + + </para> + + </listitem> + </varlistentry> + + <varlistentry id="libpq-pqping"> + <term><function>PQping</function><indexterm><primary>PQping</></></term> + <listitem> + <para> + Returns the status of the server. + + <synopsis> + PGPing PQping(const char *conninfo); + </synopsis> + </para> + + <para> + This function uses the same <literal>conninfo</literal> parameter + key words as <function>PQconnectdb</>. It returns the same + values as <function>PQpingParams</> above. + </para> + + </listitem> + </varlistentry> + <varlistentry id="libpq-pqconnectionneedspassword"> <term><function>PQconnectionNeedsPassword</function><indexterm><primary>PQconnectionNeedsPassword</></></term> <listitem> diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 14d36b5..7a5bb7a 100644 *** /tmp/3BxVxb_pg_ctl.c Tue Nov 23 17:41:50 2010 --- src/bin/pg_ctl/pg_ctl.c Tue Nov 23 17:25:19 2010 *************** static char **readfile(const char *path) *** 136,142 **** static int start_postmaster(void); static void read_post_opts(void); ! static bool test_postmaster_connection(bool); static bool postmaster_is_alive(pid_t pid); static char postopts_file[MAXPGPATH]; --- 136,142 ---- static int start_postmaster(void); static void read_post_opts(void); ! static PGPing test_postmaster_connection(bool); static bool postmaster_is_alive(pid_t pid); static char postopts_file[MAXPGPATH]; *************** start_postmaster(void) *** 400,410 **** * Note that the checkpoint parameter enables a Windows service control * manager checkpoint, it's got nothing to do with database checkpoints!! */ ! static bool test_postmaster_connection(bool do_checkpoint) { ! PGconn *conn; ! bool success = false; int i; char portstr[32]; char *p; --- 400,409 ---- * Note that the checkpoint parameter enables a Windows service control * manager checkpoint, it's got nothing to do with database checkpoints!! */ ! static PGPing test_postmaster_connection(bool do_checkpoint) { ! PGPing ret = PQACCESS; /* assume success for zero wait */ int i; char portstr[32]; char *p; *************** test_postmaster_connection(bool do_check *** 508,525 **** for (i = 0; i < wait_seconds; i++) { ! if ((conn = PQconnectdb(connstr)) != NULL && ! (PQstatus(conn) == CONNECTION_OK || ! PQconnectionNeedsPassword(conn))) ! { ! PQfinish(conn); ! success = true; ! break; ! } else { - PQfinish(conn); - #if defined(WIN32) if (do_checkpoint) { --- 507,516 ---- for (i = 0; i < wait_seconds; i++) { ! if ((ret = PQping(connstr)) != PQNORESPONSE) ! return ret; else { #if defined(WIN32) if (do_checkpoint) { *************** test_postmaster_connection(bool do_check *** 543,549 **** } } ! return success; } --- 534,541 ---- } } ! /* value of last call to PQping */ ! return ret; } *************** do_start(void) *** 746,754 **** if (do_wait) { print_msg(_("waiting for server to start...")); ! if (test_postmaster_connection(false) == false) { write_stderr(_("%s: could not start server\n" "Examine the log output.\n"), --- 738,748 ---- if (do_wait) { + int status; + print_msg(_("waiting for server to start...")); ! if ((status = test_postmaster_connection(false)) == PQNORESPONSE) { write_stderr(_("%s: could not start server\n" "Examine the log output.\n"), *************** do_start(void) *** 759,764 **** --- 753,761 ---- { print_msg(_(" done\n")); print_msg(_("server started\n")); + if (status == PQREJECT) + write_stderr(_("However, could not connect, perhaps due to invalid authentication or\n" + "misconfiguration.\n")); } } else diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index ecbd54c..a6c73af 100644 *** /tmp/TyACxa_exports.txt Tue Nov 23 17:41:50 2010 --- src/interfaces/libpq/exports.txt Tue Nov 23 17:26:29 2010 *************** PQescapeLiteral 154 *** 157,159 **** --- 157,161 ---- PQescapeIdentifier 155 PQconnectdbParams 156 PQconnectStartParams 157 + PQping 158 + PQpingParams 159 diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 8f318a1..2149f96 100644 *** /tmp/fGWite_fe-connect.c Tue Nov 23 17:41:50 2010 --- src/interfaces/libpq/fe-connect.c Tue Nov 23 17:27:28 2010 *************** static bool connectOptions1(PGconn *conn *** 285,290 **** --- 285,291 ---- static bool connectOptions2(PGconn *conn); static int connectDBStart(PGconn *conn); static int connectDBComplete(PGconn *conn); + static PGPing internal_ping(PGconn *conn); static PGconn *makeEmptyPGconn(void); static void fillPGconn(PGconn *conn, PQconninfoOption *connOptions); static void freePGconn(PGconn *conn); *************** PQconnectdbParams(const char **keywords, *** 375,380 **** --- 376,395 ---- } + PGPing + PQpingParams(const char **keywords, + const char **values, + int expand_dbname) + { + PGconn *conn = PQconnectStartParams(keywords, values, expand_dbname); + PGPing ret; + + ret = internal_ping(conn); + PQfinish(conn); + + return ret; + } + /* * PQconnectdb * *************** PQconnectdb(const char *conninfo) *** 408,413 **** --- 423,440 ---- return conn; } + PGPing + PQping(const char *conninfo) + { + PGconn *conn = PQconnectStart(conninfo); + PGPing ret; + + ret = internal_ping(conn); + PQfinish(conn); + + return ret; + } + /* * PQconnectStartParams * *************** error_return: *** 2491,2496 **** --- 2518,2549 ---- /* + * internal_ping + * Determine if a server is running and if we can connect to it. + */ + PGPing + internal_ping(PGconn *conn) + { + if (conn && conn->status != CONNECTION_BAD) + { + (void) connectDBComplete(conn); + + /* + * If the connection needs a password, we can consider the + * server as accepting connections. + */ + if (conn && (conn->status != CONNECTION_BAD || + PQconnectionNeedsPassword(conn))) + return PQACCESS; + else + return PQREJECT; + } + else + return PQNORESPONSE; + } + + + /* * makeEmptyPGconn * - create a PGconn data structure with (as yet) no interesting data */ diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h index 659d82d..d1a6dd4 100644 *** /tmp/VTULxb_libpq-fe.h Tue Nov 23 17:41:50 2010 --- src/interfaces/libpq/libpq-fe.h Tue Nov 23 17:31:25 2010 *************** typedef enum *** 107,112 **** --- 107,119 ---- PQERRORS_VERBOSE /* all the facts, ma'am */ } PGVerbosity; + typedef enum + { + PQACCESS, /* connected to server */ + PQREJECT, /* server rejected access */ + PQNORESPONSE /* server did not respond */ + } PGPing; + /* PGconn encapsulates a connection to the backend. * The contents of this struct are not supposed to be known to applications. */ *************** extern int PQendcopy(PGconn *conn); *** 403,408 **** --- 410,418 ---- extern int PQsetnonblocking(PGconn *conn, int arg); extern int PQisnonblocking(const PGconn *conn); extern int PQisthreadsafe(void); + extern PGPing PQping(const char *conninfo); + extern PGPing PQpingParams(const char **keywords, + const char **values, int expand_dbname); /* Force the write buffer to be written (or at least try) */ extern int PQflush(PGconn *conn);
pgsql-hackers by date: