Re: We should Axe /contrib/start-scripts - Mailing list pgsql-hackers

From Tom Lane
Subject Re: We should Axe /contrib/start-scripts
Date
Msg-id 24413.1251329569@sss.pgh.pa.us
Whole thread Raw
In response to Re: We should Axe /contrib/start-scripts  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: We should Axe /contrib/start-scripts
List pgsql-hackers
I wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Thanks Andrew, Alvaro, and Chander.  You've given me some thoughts to
>> toss around.  Of course, any of these is going to be somewhat more
>> complex than using [ pg_ctl -w ]

> Yeah.  I wonder if we shouldn't expend a bit more effort to make that
> way bulletproof.  As I mentioned the other day, if there were a way for
> pg_ctl to pass down its parent's PID then we could have the postmaster
> exclude that as a false match, and then using pg_ctl would be just as
> safe as invoking the postmaster directly.

> The two ways I can see to do that are to add a command line switch
> to the postmaster, or to pass the PID as an environment variable,
> say "PG_GRANDPARENT_PID".  The latter is a bit uglier but it would
> require touching much less code (and documentation).

Attached is a simple patch that uses the environment-variable approach.
This is a whole lot more self-contained than what would be needed to
pass the PID as an explicit switch, so I'm inclined to do it this way.
You could argue that a bad guy could confuse matters by intentionally
passing an existing postmaster's PID in this variable --- but someone
with the ability to launch the postmaster can probably also remove an
existing lockfile, so I don't think there's a credible security risk.

Any objections?

            regards, tom lane

Index: src/backend/utils/init/miscinit.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/init/miscinit.c,v
retrieving revision 1.176
diff -c -r1.176 miscinit.c
*** src/backend/utils/init/miscinit.c    12 Aug 2009 20:53:30 -0000    1.176
--- src/backend/utils/init/miscinit.c    26 Aug 2009 23:24:56 -0000
***************
*** 683,689 ****
      int            len;
      int            encoded_pid;
      pid_t        other_pid;
!     pid_t        my_pid = getpid();

      /*
       * We need a loop here because of race conditions.    But don't loop forever
--- 683,728 ----
      int            len;
      int            encoded_pid;
      pid_t        other_pid;
!     pid_t        my_pid,
!                 my_p_pid,
!                 my_gp_pid;
!     const char *envvar;
!
!     /*
!      * If the PID in the lockfile is our own PID or our parent's or
!      * grandparent's PID, then the file must be stale (probably left over from
!      * a previous system boot cycle).  We need to check this because of the
!      * likelihood that a reboot will assign exactly the same PID as we had in
!      * the previous reboot, or one that's only one or two counts larger and
!      * hence the lockfile's PID now refers to an ancestor shell process.  We
!      * allow pg_ctl to pass down its parent shell PID (our grandparent PID)
!      * via the environment variable PG_GRANDPARENT_PID; this is so that
!      * launching the postmaster via pg_ctl can be just as reliable as
!      * launching it directly.  There is no provision for detecting
!      * further-removed ancestor processes, but if the init script is written
!      * carefully then all but the immediate parent shell will be root-owned
!      * processes and so the kill test will fail with EPERM.  Note that we
!      * cannot get a false negative this way, because an existing postmaster
!      * would surely never launch a competing postmaster or pg_ctl process
!      * directly.
!      */
!     my_pid = getpid();
!
! #ifndef WIN32
!     my_p_pid = getppid();
! #else
!     /*
!      * Windows hasn't got getppid(), but doesn't need it since it's not
!      * using real kill() either...
!      */
!     my_p_pid = 0;
! #endif
!
!     envvar = getenv("PG_GRANDPARENT_PID");
!     if (envvar)
!         my_gp_pid = atoi(envvar);
!     else
!         my_gp_pid = 0;

      /*
       * We need a loop here because of race conditions.    But don't loop forever
***************
*** 745,761 ****
          /*
           * Check to see if the other process still exists
           *
!          * If the PID in the lockfile is our own PID or our parent's PID, then
!          * the file must be stale (probably left over from a previous system
!          * boot cycle).  We need this test because of the likelihood that a
!          * reboot will assign exactly the same PID as we had in the previous
!          * reboot.    Also, if there is just one more process launch in this
!          * reboot than in the previous one, the lockfile might mention our
!          * parent's PID.  We can reject that since we'd never be launched
!          * directly by a competing postmaster.    We can't detect grandparent
!          * processes unfortunately, but if the init script is written
!          * carefully then all but the immediate parent shell will be
!          * root-owned processes and so the kill test will fail with EPERM.
           *
           * We can treat the EPERM-error case as okay because that error
           * implies that the existing process has a different userid than we
--- 784,794 ----
          /*
           * Check to see if the other process still exists
           *
!          * Per discussion above, my_pid, my_p_pid, and my_gp_pid can be
!          * ignored as false matches.
!          *
!          * Normally kill() will fail with ESRCH if the given PID doesn't
!          * exist.
           *
           * We can treat the EPERM-error case as okay because that error
           * implies that the existing process has a different userid than we
***************
*** 772,789 ****
           * Unix socket file belonging to an instance of Postgres being run by
           * someone else, at least on machines where /tmp hasn't got a
           * stickybit.)
-          *
-          * Windows hasn't got getppid(), but doesn't need it since it's not
-          * using real kill() either...
-          *
-          * Normally kill() will fail with ESRCH if the given PID doesn't
-          * exist.
           */
!         if (other_pid != my_pid
! #ifndef WIN32
!             && other_pid != getppid()
! #endif
!             )
          {
              if (kill(other_pid, 0) == 0 ||
                  (errno != ESRCH && errno != EPERM))
--- 805,813 ----
           * Unix socket file belonging to an instance of Postgres being run by
           * someone else, at least on machines where /tmp hasn't got a
           * stickybit.)
           */
!         if (other_pid != my_pid && other_pid != my_p_pid &&
!             other_pid != my_gp_pid)
          {
              if (kill(other_pid, 0) == 0 ||
                  (errno != ESRCH && errno != EPERM))
Index: src/bin/pg_ctl/pg_ctl.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/pg_ctl/pg_ctl.c,v
retrieving revision 1.111
diff -c -r1.111 pg_ctl.c
*** src/bin/pg_ctl/pg_ctl.c    11 Jun 2009 14:49:07 -0000    1.111
--- src/bin/pg_ctl/pg_ctl.c    26 Aug 2009 23:24:56 -0000
***************
*** 672,677 ****
--- 672,692 ----
          unlimit_core_size();
  #endif

+     /*
+      * If possible, tell the postmaster our parent shell's PID (see the
+      * comments in CreateLockFile() for motivation).  Windows hasn't got
+      * getppid() unfortunately.
+      */
+ #ifndef WIN32
+     {
+         static char env_var[32];
+
+         snprintf(env_var, sizeof(env_var), "PG_GRANDPARENT_PID=%d",
+                  (int) getppid());
+         putenv(env_var);
+     }
+ #endif
+
      exitcode = start_postmaster();
      if (exitcode != 0)
      {

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Logging configuration changes
Next
From: Bruce Momjian
Date:
Subject: Re: 8.5 release timetable, again