Re: TODO item for pg_ctl and server detection - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: TODO item for pg_ctl and server detection
Date
Msg-id 201012281828.oBSIS1q23355@momjian.us
Whole thread Raw
In response to Re: TODO item for pg_ctl and server detection  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: TODO item for pg_ctl and server detection
List pgsql-hackers
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > While I am working on pg_ctl, I saw this TODO item:
> >     Have the postmaster write a random number to a file on startup that
> >     pg_ctl checks against the contents of a pg_ping response on its initial
> >     connection (without login)
>
> >         This will protect against connecting to an old instance of the
> >     postmaster in a different or deleted subdirectory.
>
> >     http://archives.postgresql.org/pgsql-bugs/2009-10/msg00110.php
> >     http://archives.postgresql.org/pgsql-bugs/2009-10/msg00156.php
>
> > Based on our new PQPing(), do we ever want to implement this or should I
> > remove the TODO item?  It seems this would require a server connection,
> > which is something we didn't want to force pg_ctl -w to do in case
> > authentication is broken.
>
> Well, rereading that old thread makes me realize that what you just
> implemented is still pretty far short of what was discussed.  In
> particular, this implementation entirely fails to cope with the
> possibility that a Windows postmaster is using a specialized
> listen_addresses setting that has to be taken into account in order to
> get a TCP connection.  I wonder whether we should revert this patch and
> have another go at the idea of a separate postmaster.ports status file
> with a line for each active port.

I had forgotten about having to use TCP and needing to honor
listen_address restrictions.  We only need one valid listen_address so I
went ahead and added a line to the postmaster.pid file.

I am not sure what a separate file will buy us except additional files
to open/manage.

> The business with a magic number can't be implemented unless we actually
> add a new separate pg_ping protocol.  PQping() has removed a lot of the
> pressure to have that, namely all the authentication-failure problem
> cases.  I'm not sure that the case where you're looking at an inactive
> data directory but there's a live postmaster someplace else with the
> same port number is important enough to justify new protocol all by
> itself.

Yes, that was my calculus too.  I realized that we create session ids by
merging the process id and backend start time, so I went ahead and added
the postmaster start time epoch to the postmaster.pid file.  While there
is no way to pass back the postmaster start time from PQping, I added
code to pg_ctl to make sure the time in the postmaster.pid file is not
_before_ pg_ctl started running.  We only check PQping() after we have
started the postmaster ourselves, so it fits our needs.

Patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index cda7f64..86bc5a6 100644
*** /tmp/pgdiff.11857/wrhSFb_storage.sgml    Tue Dec 28 12:51:36 2010
--- doc/src/sgml/storage.sgml    Tue Dec 28 11:57:56 2010
*************** last started with</entry>
*** 117,124 ****
  <row>
   <entry><filename>postmaster.pid</></entry>
   <entry>A lock file recording the current postmaster process id (PID),
!  cluster data directory, port number, Unix domain socket directory,
!  and shared memory segment ID</entry>
  </row>

  </tbody>
--- 117,125 ----
  <row>
   <entry><filename>postmaster.pid</></entry>
   <entry>A lock file recording the current postmaster process id (PID),
!  postmaster start time, cluster data directory, port number, user-specified
!  Unix domain socket directory, first valid listen_address host, and
!  shared memory segment ID</entry>
  </row>

  </tbody>
diff --git a/src/backend/port/ipc_test.c b/src/backend/port/ipc_test.c
index a003dc9..461a7a6 100644
*** /tmp/pgdiff.11857/QXAQWd_ipc_test.c    Tue Dec 28 12:51:36 2010
--- src/backend/port/ipc_test.c    Tue Dec 28 09:38:50 2010
*************** on_exit_reset(void)
*** 104,110 ****
  }

  void
! RecordSharedMemoryInLockFile(unsigned long id1, unsigned long id2)
  {
  }

--- 104,110 ----
  }

  void
! AddToLockFile(int target_line, const char *str)
  {
  }

diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index d970eb2..ff77099 100644
*** /tmp/pgdiff.11857/KqzRVc_sysv_shmem.c    Tue Dec 28 12:51:36 2010
--- src/backend/port/sysv_shmem.c    Tue Dec 28 09:54:14 2010
*************** InternalIpcMemoryCreate(IpcMemoryKey mem
*** 198,206 ****
      /* Register on-exit routine to detach new segment before deleting */
      on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress));

!     /* Record key and ID in lockfile for data directory. */
!     RecordSharedMemoryInLockFile((unsigned long) memKey,
!                                  (unsigned long) shmid);

      return memAddress;
  }
--- 198,214 ----
      /* Register on-exit routine to detach new segment before deleting */
      on_shmem_exit(IpcMemoryDetach, PointerGetDatum(memAddress));

!     /*
!      * Append record key and ID in lockfile for data directory. Format
!      * to try to keep it the same length.
!      */
!     {
!         char line[32];
!
!         sprintf(line, "%9lu %9lu\n", (unsigned long) memKey,
!                                      (unsigned long) shmid);
!         AddToLockFile(LOCK_FILE_LINES, line);
!     }

      return memAddress;
  }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a46a323..c1e553a 100644
*** /tmp/pgdiff.11857/IjURra_postmaster.c    Tue Dec 28 12:51:36 2010
--- src/backend/postmaster/postmaster.c    Tue Dec 28 12:22:12 2010
*************** PostmasterMain(int argc, char *argv[])
*** 483,489 ****
      int            status;
      char       *userDoption = NULL;
      int            i;
!
      MyProcPid = PostmasterPid = getpid();

      MyStartTime = time(NULL);
--- 483,490 ----
      int            status;
      char       *userDoption = NULL;
      int            i;
!     bool        connection_line_output = false;
!
      MyProcPid = PostmasterPid = getpid();

      MyStartTime = time(NULL);
*************** PostmasterMain(int argc, char *argv[])
*** 860,869 ****
--- 861,882 ----
                                            UnixSocketDir,
                                            ListenSocket, MAXLISTEN);
              else
+             {
                  status = StreamServerPort(AF_UNSPEC, curhost,
                                            (unsigned short) PostPortNumber,
                                            UnixSocketDir,
                                            ListenSocket, MAXLISTEN);
+                 /* must supply a valid listen_address for PQping() */
+                 if (!connection_line_output)
+                 {
+                     char line[MAXPGPATH + 2];
+
+                     sprintf(line, "%s\n", curhost);
+                     AddToLockFile(LOCK_FILE_LINES - 1, line);
+                     connection_line_output = true;
+                 }
+             }
+
              if (status == STATUS_OK)
                  success++;
              else
*************** PostmasterMain(int argc, char *argv[])
*** 880,885 ****
--- 893,902 ----
          pfree(rawstring);
      }

+     /* Supply an empty listen_address line for PQping() */
+     if (!connection_line_output)
+         AddToLockFile(LOCK_FILE_LINES - 1, "\n");
+
  #ifdef USE_BONJOUR
      /* Register for Bonjour only if we opened TCP socket(s) */
      if (enable_bonjour && ListenSocket[0] != PGINVALID_SOCKET)
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index d74b5cc..6935786 100644
*** /tmp/pgdiff.11857/sjyEPe_miscinit.c    Tue Dec 28 12:51:36 2010
--- src/backend/utils/init/miscinit.c    Tue Dec 28 12:38:33 2010
***************
*** 46,52 ****


  #define DIRECTORY_LOCK_FILE        "postmaster.pid"
!
  ProcessingMode Mode = InitProcessing;

  /* Note: we rely on this to initialize as zeroes */
--- 46,65 ----


  #define DIRECTORY_LOCK_FILE        "postmaster.pid"
! /*
!  *    The lock file contents are:
!  *
!  * line #
!  *        1    pid
!  *        2    postmaster start time
!  *        3    data directory
!  *        4    port #
!  *        5    user-specified socket directory
!  *            (the lines below are added later)
!  *        6    first valid listen_address
!  *        7    shared memory key
!  */
!
  ProcessingMode Mode = InitProcessing;

  /* Note: we rely on this to initialize as zeroes */
*************** CreateLockFile(const char *filename, boo
*** 659,665 ****
                 bool isDDLock, const char *refName)
  {
      int            fd;
!     char        buffer[MAXPGPATH * 2 + 256];
      int            ntries;
      int            len;
      int            encoded_pid;
--- 672,678 ----
                 bool isDDLock, const char *refName)
  {
      int            fd;
!     char        buffer[MAXPGPATH * 3 + 256];
      int            ntries;
      int            len;
      int            encoded_pid;
*************** CreateLockFile(const char *filename, boo
*** 826,836 ****
          if (isDDLock)
          {
              char       *ptr = buffer;
!             unsigned long id1,
!                         id2;
              int lineno;

!             for (lineno = 1; lineno <= 4; lineno++)
              {
                  if ((ptr = strchr(ptr, '\n')) == NULL)
                  {
--- 839,848 ----
          if (isDDLock)
          {
              char       *ptr = buffer;
!             unsigned long id1, id2;
              int lineno;

!             for (lineno = 1; lineno <= LOCK_FILE_LINES - 1; lineno++)
              {
                  if ((ptr = strchr(ptr, '\n')) == NULL)
                  {
*************** CreateLockFile(const char *filename, boo
*** 874,882 ****
      /*
       * Successfully created the file, now fill it.
       */
!     snprintf(buffer, sizeof(buffer), "%d\n%s\n%d\n%s\n",
               amPostmaster ? (int) my_pid : -((int) my_pid),
!              DataDir, PostPortNumber, UnixSocketDir);
      errno = 0;
      if (write(fd, buffer, strlen(buffer)) != strlen(buffer))
      {
--- 886,895 ----
      /*
       * Successfully created the file, now fill it.
       */
!     snprintf(buffer, sizeof(buffer), "%d\n%ld\n%s\n%d\n%s\n",
               amPostmaster ? (int) my_pid : -((int) my_pid),
!              (long) MyStartTime, DataDir, PostPortNumber,
!              UnixSocketDir);
      errno = 0;
      if (write(fd, buffer, strlen(buffer)) != strlen(buffer))
      {
*************** TouchSocketLockFile(void)
*** 985,1008 ****
      }
  }

  /*
!  * Append information about a shared memory segment to the data directory
!  * lock file.
!  *
!  * This may be called multiple times in the life of a postmaster, if we
!  * delete and recreate shmem due to backend crash.    Therefore, be prepared
!  * to overwrite existing information.  (As of 7.1, a postmaster only creates
!  * one shm seg at a time; but for the purposes here, if we did have more than
!  * one then any one of them would do anyway.)
   */
  void
! RecordSharedMemoryInLockFile(unsigned long id1, unsigned long id2)
  {
      int            fd;
      int            len;
      int            lineno;
      char       *ptr;
!     char        buffer[MAXPGPATH * 2 + 256];

      fd = open(DIRECTORY_LOCK_FILE, O_RDWR | PG_BINARY, 0);
      if (fd < 0)
--- 998,1016 ----
      }
  }

+
  /*
!  * Add lines to the data directory lock file.  This erases all following
!  * lines, but that is OK because lines are added in order.
   */
  void
! AddToLockFile(int target_line, const char *str)
  {
      int            fd;
      int            len;
      int            lineno;
      char       *ptr;
!     char        buffer[MAXPGPATH * 3 + 256];

      fd = open(DIRECTORY_LOCK_FILE, O_RDWR | PG_BINARY, 0);
      if (fd < 0)
*************** RecordSharedMemoryInLockFile(unsigned lo
*** 1029,1035 ****
       * Skip over first four lines (PID, pgdata, portnum, socketdir).
       */
      ptr = buffer;
!     for (lineno = 1; lineno <= 4; lineno++)
      {
          if ((ptr = strchr(ptr, '\n')) == NULL)
          {
--- 1037,1043 ----
       * Skip over first four lines (PID, pgdata, portnum, socketdir).
       */
      ptr = buffer;
!     for (lineno = 1; lineno < target_line; lineno++)
      {
          if ((ptr = strchr(ptr, '\n')) == NULL)
          {
*************** RecordSharedMemoryInLockFile(unsigned lo
*** 1040,1050 ****
          ptr++;
      }

!     /*
!      * Append key information.    Format to try to keep it the same length
!      * always (trailing junk won't hurt, but might confuse humans).
!      */
!     sprintf(ptr, "%9lu %9lu\n", id1, id2);

      /*
       * And rewrite the data.  Since we write in a single kernel call, this
--- 1048,1054 ----
          ptr++;
      }

!     strlcat(buffer, str, sizeof(buffer));

      /*
       * And rewrite the data.  Since we write in a single kernel call, this
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 5e1b772..395a863 100644
*** /tmp/pgdiff.11857/8MG68b_pg_ctl.c    Tue Dec 28 12:51:36 2010
--- src/bin/pg_ctl/pg_ctl.c    Tue Dec 28 12:44:45 2010
***************
*** 22,27 ****
--- 22,28 ----

  #include <locale.h>
  #include <signal.h>
+ #include <stdlib.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <unistd.h>
*************** static void read_post_opts(void);
*** 138,143 ****
--- 139,145 ----

  static PGPing test_postmaster_connection(bool);
  static bool postmaster_is_alive(pid_t pid);
+ static time_t start_time;

  static char postopts_file[MAXPGPATH];
  static char pid_file[MAXPGPATH];
*************** static PGPing
*** 404,416 ****
  test_postmaster_connection(bool do_checkpoint)
  {
      int            portnum = 0;
!     char        socket_dir[MAXPGPATH];
      char        connstr[MAXPGPATH + 256];
      PGPing        ret = PQPING_OK;    /* assume success for wait == zero */
      char      **optlines;
      int            i;

!     socket_dir[0] = '\0';
      connstr[0] = '\0';

      for (i = 0; i < wait_seconds; i++)
--- 406,418 ----
  test_postmaster_connection(bool do_checkpoint)
  {
      int            portnum = 0;
!     char        host_str[MAXPGPATH];
      char        connstr[MAXPGPATH + 256];
      PGPing        ret = PQPING_OK;    /* assume success for wait == zero */
      char      **optlines;
      int            i;

!     host_str[0] = '\0';
      connstr[0] = '\0';

      for (i = 0; i < wait_seconds; i++)
*************** test_postmaster_connection(bool do_check
*** 425,437 ****
               *        0    lock file created but status not written
               *        2    pre-9.1 server, shared memory not created
               *        3    pre-9.1 server, shared memory created
!              *        4    9.1+ server, shared memory not created
!              *        5    9.1+ server, shared memory created
               *
               *    For pre-9.1 Unix servers, we grab the port number from the
               *    shmem key (first value on line 3).  Pre-9.1 Win32 has no
!              *    written shmem key, so we fail.  9.1+ writes both the port
!              *    number and socket address in the file for us to use.
               *    (PG_VERSION could also have told us the major version.)
               */

--- 427,440 ----
               *        0    lock file created but status not written
               *        2    pre-9.1 server, shared memory not created
               *        3    pre-9.1 server, shared memory created
!              *        5    9.1+ server, listen host not created
!              *        6    9.1+ server, shared memory not created
!              *        7    9.1+ server, shared memory created
               *
               *    For pre-9.1 Unix servers, we grab the port number from the
               *    shmem key (first value on line 3).  Pre-9.1 Win32 has no
!              *    written shmem key, so we fail.  9.1+ writes connection
!              *    information in the file for us to use.
               *    (PG_VERSION could also have told us the major version.)
               */

*************** test_postmaster_connection(bool do_check
*** 439,445 ****
              if ((optlines = readfile(pid_file)) != NULL &&
                  optlines[0] != NULL &&
                  optlines[1] != NULL &&
!                 optlines[2] != NULL)
              {
                  /* A 3-line file? */
                  if (optlines[3] == NULL)
--- 442,451 ----
              if ((optlines = readfile(pid_file)) != NULL &&
                  optlines[0] != NULL &&
                  optlines[1] != NULL &&
!                 optlines[2] != NULL &&
!                 /* pre-9.1 server or listen_address line is present? */
!                 (optlines[3] == NULL ||
!                  optlines[5] != NULL))
              {
                  /* A 3-line file? */
                  if (optlines[3] == NULL)
*************** test_postmaster_connection(bool do_check
*** 459,489 ****
                          return PQPING_NO_ATTEMPT;
                      }
                  }
!                 else    /* 9.1+ server */
                  {
!                     portnum = atoi(optlines[2]);
!
!                     /* Get socket directory, if specified. */
!                     if (optlines[3][0] != '\n')
                      {
!                         /*
!                          *    While unix_socket_directory can accept relative
!                          *    directories, libpq's host must have a leading slash
!                          *    to indicate a socket directory.
!                          */
!                         if (optlines[3][0] != '/')
!                         {
!                             write_stderr(_("%s: -w option cannot use a relative socket directory specification\n"),
!                                          progname);
!                             return PQPING_NO_ATTEMPT;
!                         }
!                         strlcpy(socket_dir, optlines[3], MAXPGPATH);
!                         /* remove newline */
!                         if (strchr(socket_dir, '\n') != NULL)
!                             *strchr(socket_dir, '\n') = '\0';
                      }
!                 }

                  /*
                   * We need to set connect_timeout otherwise on Windows the
                   * Service Control Manager (SCM) will probably timeout first.
--- 465,515 ----
                          return PQPING_NO_ATTEMPT;
                      }
                  }
!                 else
                  {
!                     /*
!                      *    Easy check to see if we are looking at the right
!                      *    data directory:  Is the postmaster older than this
!                      *    execution of pg_ctl?
!                      */
!                     if (atol(optlines[1]) < start_time)
                      {
!                         write_stderr(_("%s: this data directory is running an older postmaster\n"),
!                                      progname);
!                         return PQPING_NO_ATTEMPT;
                      }
!
!                     portnum = atoi(optlines[3]);

+                     /*
+                      *    Determine the proper host string to use.
+                      */
+ #ifdef HAVE_UNIX_SOCKETS
+                     /*
+                      *    Use socket directory, if specified.  We assume if we
+                      *    have unix sockets, the server does too because we
+                      *    just started the postmaster.
+                      */
+                     /*
+                      *    While unix_socket_directory can accept relative
+                      *    directories, libpq's host must have a leading slash
+                      *    to indicate a socket directory.
+                      */
+                     if (optlines[4][0] != '\n' && optlines[4][0] != '/')
+                     {
+                         write_stderr(_("%s: -w option cannot use a relative socket directory specification\n"),
+                                      progname);
+                         return PQPING_NO_ATTEMPT;
+                     }
+                     strlcpy(host_str, optlines[4], sizeof(host_str));
+ #else
+                     strlcpy(host_str, optlines[5], sizeof(host_str));
+ #endif
+                     /* remove newline */
+                     if (strchr(host_str, '\n') != NULL)
+                         *strchr(host_str, '\n') = '\0';
+                 }
+
                  /*
                   * We need to set connect_timeout otherwise on Windows the
                   * Service Control Manager (SCM) will probably timeout first.
*************** test_postmaster_connection(bool do_check
*** 491,499 ****
                  snprintf(connstr, sizeof(connstr),
                           "dbname=postgres port=%d connect_timeout=5", portnum);

!                 if (socket_dir[0] != '\0')
                      snprintf(connstr + strlen(connstr), sizeof(connstr) - strlen(connstr),
!                         " host='%s'", socket_dir);
              }
          }

--- 517,525 ----
                  snprintf(connstr, sizeof(connstr),
                           "dbname=postgres port=%d connect_timeout=5", portnum);

!                 if (host_str[0] != '\0')
                      snprintf(connstr + strlen(connstr), sizeof(connstr) - strlen(connstr),
!                         " host='%s'", host_str);
              }
          }

*************** main(int argc, char **argv)
*** 1756,1761 ****
--- 1782,1788 ----

      progname = get_progname(argv[0]);
      set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_ctl"));
+     start_time = time(NULL);

      /*
       * save argv[0] so do_start() can look for the postmaster if necessary. we
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 032875e..72e1dc8 100644
*** /tmp/pgdiff.11857/4qcsPd_miscadmin.h    Tue Dec 28 12:51:36 2010
--- src/include/miscadmin.h    Tue Dec 28 11:13:38 2010
*************** extern PGDLLIMPORT bool process_shared_p
*** 348,358 ****
  extern char *shared_preload_libraries_string;
  extern char *local_preload_libraries_string;

  extern void CreateDataDirLockFile(bool amPostmaster);
  extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster);
  extern void TouchSocketLockFile(void);
! extern void RecordSharedMemoryInLockFile(unsigned long id1,
!                              unsigned long id2);
  extern void ValidatePgVersion(const char *path);
  extern void process_shared_preload_libraries(void);
  extern void process_local_preload_libraries(void);
--- 348,358 ----
  extern char *shared_preload_libraries_string;
  extern char *local_preload_libraries_string;

+ #define LOCK_FILE_LINES        7
  extern void CreateDataDirLockFile(bool amPostmaster);
  extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster);
  extern void TouchSocketLockFile(void);
! extern void AddToLockFile(int target_line, const char *str);
  extern void ValidatePgVersion(const char *path);
  extern void process_shared_preload_libraries(void);
  extern void process_local_preload_libraries(void);

pgsql-hackers by date:

Previous
From: Gurjeet Singh
Date:
Subject: Re: pg_primary_conninfo
Next
From: Tom Lane
Date:
Subject: Re: pg_primary_conninfo