Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running
Date
Msg-id 201011240514.oAO5Ejd22656@momjian.us
Whole thread Raw
In response to Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running
List pgsql-hackers
Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > Tom Lane wrote:
> > >> Possibly the cleanest fix is to implement pg_ping as a libpq function.
> > >> You do have to distinguish connection failures (ie connection refused)
> > >> from errors that came back from the postmaster, and the easiest place to
> > >> be doing that is inside libpq.
> >
> > > OK, so a new libpq function --- got it.  Would we just pass the status
> > > from the backend or can it be done without backend modifications?
> >
> > It would definitely be better to do it without backend mods, so that
> > the functionality would work against back-branch postmasters.
> >
> > To my mind, the entire purpose of such a function is to classify the
> > possible errors so that the caller doesn't have to.  So I wouldn't
> > consider that it ought to "pass back the status from the backend".
> > I think what we basically want is a function that takes a conninfo
> > string (or one of the variants of that) and returns an enum defined
> > more or less like this:
> >
> >     * failed to connect to postmaster
> >     * connected, but postmaster is not accepting sessions
> >     * postmaster is up and accepting sessions
> >
> > I'm not sure those are exactly the categories we want, but something
> > close to that.  In particular, I don't know if there's any value in
> > subdividing the "not accepting sessions" status --- pg_ctl doesn't
> > really care, but other use-cases might want to tell the difference
> > between the various canAcceptConnections failure states.
> >
> > BTW, it is annoying that we can't definitively distinguish "postmaster
> > is not running" from a connectivity problem, but I can't see a way
> > around that.
>
> Agreed.  I will research this.

I have researched this and developed the attached patch.  It implements
PGping() and PGpingParams() in libpq, and has pg_ctl use it for pg_ctl
-w server status detection.

The new output for cases where .pgpass is not allowing for a connection
is:

    $ pg_ctl -w -l /dev/null start
    waiting for server to start.... done
    server started
    However, could not connect, perhaps due to invalid authentication or
    misconfiguration.

The code basically checks the connection status between PQconnectStart()
and connectDBComplete() to see if the server is running but we failed to
connect for some reason.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index a911c50..32c58a5 100644
*** /tmp/b2EvXa_libpq.sgml    Tue Nov 23 17:41:50 2010
--- doc/src/sgml/libpq.sgml    Tue Nov 23 17:36:32 2010
*************** int PQbackendPID(const PGconn *conn);
*** 1511,1516 ****
--- 1511,1584 ----
       </listitem>
      </varlistentry>

+     <varlistentry id="libpq-pqpingparams">
+      <term><function>PQpingParams</function><indexterm><primary>PQpingParams</></></term>
+      <listitem>
+       <para>
+        <function>PQpingParams</function> indicates the status of the
+        server.  The currently recognized parameter key words are the
+        same as <function>PQconnectParams</>.
+
+ <synopsis>
+ PGPing PQpingParams(const char **keywords, const char **values, int expand_dbname);
+ </synopsis>
+
+        It returns one of the following values:
+
+        <variablelist>
+         <varlistentry id="libpq-pqpingparams-pqaccess">
+          <term><literal>PQACCESS</literal></term>
+          <listitem>
+           <para>
+            The server is running and allows access.
+           </para>
+          </listitem>
+         </varlistentry>
+
+         <varlistentry id="libpq-pqpingparams-pqreject">
+          <term><literal>PQREJECT</literal></term>
+          <listitem>
+           <para>
+            The server is running but rejected a connection request.
+           </para>
+          </listitem>
+         </varlistentry>
+
+         <varlistentry id="libpq-pqpingparams-pqnoresponse">
+          <term><literal>PQNORESPONSE</literal></term>
+          <listitem>
+           <para>
+            The server did not respond.
+           </para>
+          </listitem>
+         </varlistentry>
+        </variablelist>
+
+       </para>
+
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="libpq-pqping">
+      <term><function>PQping</function><indexterm><primary>PQping</></></term>
+      <listitem>
+       <para>
+        Returns the status of the server.
+
+ <synopsis>
+ PGPing PQping(const char *conninfo);
+ </synopsis>
+       </para>
+
+       <para>
+        This function uses the same <literal>conninfo</literal> parameter
+        key words as <function>PQconnectdb</>.  It returns the same
+        values as <function>PQpingParams</> above.
+       </para>
+
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="libpq-pqconnectionneedspassword">
       <term><function>PQconnectionNeedsPassword</function><indexterm><primary>PQconnectionNeedsPassword</></></term>
       <listitem>
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 14d36b5..7a5bb7a 100644
*** /tmp/3BxVxb_pg_ctl.c    Tue Nov 23 17:41:50 2010
--- src/bin/pg_ctl/pg_ctl.c    Tue Nov 23 17:25:19 2010
*************** static char **readfile(const char *path)
*** 136,142 ****
  static int    start_postmaster(void);
  static void read_post_opts(void);

! static bool test_postmaster_connection(bool);
  static bool postmaster_is_alive(pid_t pid);

  static char postopts_file[MAXPGPATH];
--- 136,142 ----
  static int    start_postmaster(void);
  static void read_post_opts(void);

! static PGPing test_postmaster_connection(bool);
  static bool postmaster_is_alive(pid_t pid);

  static char postopts_file[MAXPGPATH];
*************** start_postmaster(void)
*** 400,410 ****
   * Note that the checkpoint parameter enables a Windows service control
   * manager checkpoint, it's got nothing to do with database checkpoints!!
   */
! static bool
  test_postmaster_connection(bool do_checkpoint)
  {
!     PGconn       *conn;
!     bool        success = false;
      int            i;
      char        portstr[32];
      char       *p;
--- 400,409 ----
   * Note that the checkpoint parameter enables a Windows service control
   * manager checkpoint, it's got nothing to do with database checkpoints!!
   */
! static PGPing
  test_postmaster_connection(bool do_checkpoint)
  {
!     PGPing        ret = PQACCESS;    /* assume success for zero wait */
      int            i;
      char        portstr[32];
      char       *p;
*************** test_postmaster_connection(bool do_check
*** 508,525 ****

      for (i = 0; i < wait_seconds; i++)
      {
!         if ((conn = PQconnectdb(connstr)) != NULL &&
!             (PQstatus(conn) == CONNECTION_OK ||
!              PQconnectionNeedsPassword(conn)))
!         {
!             PQfinish(conn);
!             success = true;
!             break;
!         }
          else
          {
-             PQfinish(conn);
-
  #if defined(WIN32)
              if (do_checkpoint)
              {
--- 507,516 ----

      for (i = 0; i < wait_seconds; i++)
      {
!         if ((ret = PQping(connstr)) != PQNORESPONSE)
!             return ret;
          else
          {
  #if defined(WIN32)
              if (do_checkpoint)
              {
*************** test_postmaster_connection(bool do_check
*** 543,549 ****
          }
      }

!     return success;
  }


--- 534,541 ----
          }
      }

!     /* value of last call to PQping */
!     return ret;
  }


*************** do_start(void)
*** 746,754 ****

      if (do_wait)
      {
          print_msg(_("waiting for server to start..."));

!         if (test_postmaster_connection(false) == false)
          {
              write_stderr(_("%s: could not start server\n"
                             "Examine the log output.\n"),
--- 738,748 ----

      if (do_wait)
      {
+         int status;
+
          print_msg(_("waiting for server to start..."));

!         if ((status = test_postmaster_connection(false)) == PQNORESPONSE)
          {
              write_stderr(_("%s: could not start server\n"
                             "Examine the log output.\n"),
*************** do_start(void)
*** 759,764 ****
--- 753,761 ----
          {
              print_msg(_(" done\n"));
              print_msg(_("server started\n"));
+             if (status == PQREJECT)
+                 write_stderr(_("However, could not connect, perhaps due to invalid authentication or\n"
+                                 "misconfiguration.\n"));
          }
      }
      else
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index ecbd54c..a6c73af 100644
*** /tmp/TyACxa_exports.txt    Tue Nov 23 17:41:50 2010
--- src/interfaces/libpq/exports.txt    Tue Nov 23 17:26:29 2010
*************** PQescapeLiteral           154
*** 157,159 ****
--- 157,161 ----
  PQescapeIdentifier        155
  PQconnectdbParams         156
  PQconnectStartParams      157
+ PQping                    158
+ PQpingParams              159
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 8f318a1..2149f96 100644
*** /tmp/fGWite_fe-connect.c    Tue Nov 23 17:41:50 2010
--- src/interfaces/libpq/fe-connect.c    Tue Nov 23 17:27:28 2010
*************** static bool connectOptions1(PGconn *conn
*** 285,290 ****
--- 285,291 ----
  static bool connectOptions2(PGconn *conn);
  static int    connectDBStart(PGconn *conn);
  static int    connectDBComplete(PGconn *conn);
+ static PGPing internal_ping(PGconn *conn);
  static PGconn *makeEmptyPGconn(void);
  static void fillPGconn(PGconn *conn, PQconninfoOption *connOptions);
  static void freePGconn(PGconn *conn);
*************** PQconnectdbParams(const char **keywords,
*** 375,380 ****
--- 376,395 ----

  }

+ PGPing
+ PQpingParams(const char **keywords,
+                   const char **values,
+                   int expand_dbname)
+ {
+     PGconn       *conn = PQconnectStartParams(keywords, values, expand_dbname);
+     PGPing        ret;
+
+     ret = internal_ping(conn);
+     PQfinish(conn);
+
+     return ret;
+ }
+
  /*
   *        PQconnectdb
   *
*************** PQconnectdb(const char *conninfo)
*** 408,413 ****
--- 423,440 ----
      return conn;
  }

+ PGPing
+ PQping(const char *conninfo)
+ {
+     PGconn       *conn = PQconnectStart(conninfo);
+     PGPing        ret;
+
+     ret = internal_ping(conn);
+     PQfinish(conn);
+
+     return ret;
+ }
+
  /*
   *        PQconnectStartParams
   *
*************** error_return:
*** 2491,2496 ****
--- 2518,2549 ----


  /*
+  * internal_ping
+  *    Determine if a server is running and if we can connect to it.
+  */
+ PGPing
+ internal_ping(PGconn *conn)
+ {
+     if (conn && conn->status != CONNECTION_BAD)
+     {
+         (void) connectDBComplete(conn);
+
+         /*
+          *    If the connection needs a password, we can consider the
+          *    server as accepting connections.
+          */
+         if (conn && (conn->status != CONNECTION_BAD ||
+             PQconnectionNeedsPassword(conn)))
+             return PQACCESS;
+         else
+             return PQREJECT;
+     }
+     else
+         return PQNORESPONSE;
+ }
+
+
+ /*
   * makeEmptyPGconn
   *     - create a PGconn data structure with (as yet) no interesting data
   */
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 659d82d..d1a6dd4 100644
*** /tmp/VTULxb_libpq-fe.h    Tue Nov 23 17:41:50 2010
--- src/interfaces/libpq/libpq-fe.h    Tue Nov 23 17:31:25 2010
*************** typedef enum
*** 107,112 ****
--- 107,119 ----
      PQERRORS_VERBOSE            /* all the facts, ma'am */
  } PGVerbosity;

+ typedef enum
+ {
+     PQACCESS,                    /* connected to server */
+     PQREJECT,                    /* server rejected access */
+     PQNORESPONSE                /* server did not respond */
+ } PGPing;
+
  /* PGconn encapsulates a connection to the backend.
   * The contents of this struct are not supposed to be known to applications.
   */
*************** extern int    PQendcopy(PGconn *conn);
*** 403,408 ****
--- 410,418 ----
  extern int    PQsetnonblocking(PGconn *conn, int arg);
  extern int    PQisnonblocking(const PGconn *conn);
  extern int    PQisthreadsafe(void);
+ extern PGPing PQping(const char *conninfo);
+ extern PGPing PQpingParams(const char **keywords,
+                   const char **values, int expand_dbname);

  /* Force the write buffer to be written (or at least try) */
  extern int    PQflush(PGconn *conn);

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: profiling connection overhead
Next
From: Greg Smith
Date:
Subject: Re: Instrument checkpoint sync calls