Thread: Lockfile restart failure is still there :-(

Lockfile restart failure is still there :-(

From
Tom Lane
Date:
Last fall I proposed a minor tweak to solve the problem of Postgres
not restarting after a system reboot, in cases where it looked at the
old lockfile and thought the old postmaster was still alive:
http://archives.postgresql.org/pgsql-hackers/2004-09/msg00935.php

However it turns out the bug is still there.  We eliminated one case,
which is where the PID shown in the lockfile now belongs to the
immediate parent process of the postmaster (ie the shell that's spawning
it).  But the PID might belong to an older process, for instance a
root-owned "su" that spawned the immediate parent shell.  I argued in
the above message that this wouldn't be a problem because the kill()
would fail against a non-postgres-owned process.  But I evidently didn't
read the code quite carefully enough: as CreateLockFile() is written,
it considers an EPERM error from kill() to be reason to treat the
lockfile as valid.

I was thinking at the time, and still think, it is reasonable to treat
EPERM as being a safe rather than unsafe case.  EPERM implies that the
process exists but does not belong to the postgres userid, and therefore
it could not possibly be a competing postmaster.  We can assume that any
postmaster successfully started in a particular data directory belongs
to the userid that owns that directory, because (a) we check that we are
not root, and (b) we check that the data directory has no group or world
permissions; therefore if we were not of its owner's userid we'd not
be able to do anything in it.

Can anyone see any holes in this reasoning?  Are there any cases where
an EPERM failure could occur against a process that is of our own userid?

I am strongly tempted to add a direct check in checkDataDir() that the
data directory actually does belong to our own uid, just for paranoia's
sake.  Someone might decide that they could relax the permission check
("hey, why not let the dbadmin group have write permission on $PGDATA")
without realizing they'd be weakening the startup safety interlock.

Comments?
        regards, tom lane


Re: Lockfile restart failure is still there :-(

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> I am strongly tempted to add a direct check in checkDataDir() that the
> data directory actually does belong to our own uid, just for paranoia's
> sake.  Someone might decide that they could relax the permission check
> ("hey, why not let the dbadmin group have write permission on $PGDATA")
> without realizing they'd be weakening the startup safety interlock.

Personally I often find I want to do exactly the kind of thing you're
describing. Why does the whole directory have to be so restrictive? Why not
just verify that the lock file itself is owned by our userid? Someone could
always chown it of course but short of that if it's owned by our userid then
surely the process that created it had to be our userid as well?

-- 
greg



Re: Lockfile restart failure is still there :-(

From
Andrew Dunstan
Date:

Tom Lane wrote:

>But I evidently didn't
>read the code quite carefully enough: as CreateLockFile() is written,
>it considers an EPERM error from kill() to be reason to treat the
>lockfile as valid.
>  
>

I thought that was part of what you were going to address. There can
hardly be an objection now to fixing it.

>I am strongly tempted to add a direct check in checkDataDir() that the
>data directory actually does belong to our own uid, just for paranoia's
>sake.  Someone might decide that they could relax the permission check
>("hey, why not let the dbadmin group have write permission on $PGDATA")
>without realizing they'd be weakening the startup safety interlock.
>
>
>  
>

I assume that ACLs can't be used to get around the restrictions ...

cheers

andrew


Re: Lockfile restart failure is still there :-(

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I am strongly tempted to add a direct check in checkDataDir() that the
>> data directory actually does belong to our own uid, just for paranoia's
>> sake.  Someone might decide that they could relax the permission check
>> ("hey, why not let the dbadmin group have write permission on $PGDATA")
>> without realizing they'd be weakening the startup safety interlock.

> Personally I often find I want to do exactly the kind of thing you're
> describing. Why does the whole directory have to be so restrictive? Why not
> just verify that the lock file itself is owned by our userid?

We need to be able to write in the whole directory, not just the
lockfile.  But actually the point I am making above is in your favor:
after adding a check on ownership, it would be a matter of your
protection wishes what the directory protections need to be.  Right
now that check is an integral part of some non-obvious safety
considerations.
        regards, tom lane


Re: Lockfile restart failure is still there :-(

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> We need to be able to write in the whole directory, not just the
> lockfile.  But actually the point I am making above is in your favor:
> after adding a check on ownership, it would be a matter of your
> protection wishes what the directory protections need to be.  Right
> now that check is an integral part of some non-obvious safety
> considerations.

Ah, I see. So yes, I was annoyed at last once when I changed permissions on
the postgres data directory and got errors telling me not to do that.

-- 
greg



Re: Lockfile restart failure is still there :-(

From
Tom Lane
Date:
I wrote:
> I was thinking at the time, and still think, it is reasonable to treat
> EPERM as being a safe rather than unsafe case.

I remembered why we were rejecting that case originally: it protects us
against blowing away a Unix socket file belonging to a postmaster that's
running under someone else's userid.  Unlike the data directory
lockfile, we can't expect /tmp's ownership to tell us anything :-(.
However, we can still dispense with the EPERM check for the socket
lockfile too, because:

1. We create all these lockfiles with mode 0600.  If the lockfile were
made under another userid, we'd have bombed out earlier for fail to read
the lockfile.

2. On most modern platforms, /tmp has a directory stickybit and we'd not
be able to remove someone else's lockfile or socket file anyway.

So I think it's OK ...

I have applied the attached patch to HEAD and 8.0.x.  Can anyone comment
on whether it would be safe/reasonable to enable the data directory
ownership check on Windows?
        regards, tom lane

*** src/backend/postmaster/postmaster.c.orig    Fri Mar 11 20:38:55 2005
--- src/backend/postmaster/postmaster.c    Thu Mar 17 22:29:27 2005
***************
*** 953,959 ****
--- 953,982 ----     }      /*
+      * Check that the directory belongs to my userid; if not, reject.
+      *
+      * This check is an essential part of the interlock that prevents two
+      * postmasters from starting in the same directory (see CreateLockFile()).
+      * Do not remove or weaken it.
+      *
+      * XXX can we safely enable this check on Windows?
+      */
+ #if !defined(WIN32) && !defined(__CYGWIN__)
+     if (stat_buf.st_uid != geteuid())
+         ereport(FATAL,
+                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                  errmsg("data directory \"%s\" has wrong ownership",
+                         DataDir),
+                  errhint("The server must be started by the user that owns the data directory.")));
+ #endif
+ 
+     /*      * Check if the directory has group or world access.  If so, reject.
+      *
+      * It would be possible to allow weaker constraints (for example, allow
+      * group access) but we cannot make a general assumption that that is
+      * okay; for example there are platforms where nearly all users customarily
+      * belong to the same group.  Perhaps this test should be configurable.      *      * XXX temporarily suppress
checkwhen on Windows, because there may not      * be proper support for Unix-y file permissions.  Need to think of a
 
*** src/backend/utils/init/miscinit.c.orig    Fri Dec 31 17:46:19 2004
--- src/backend/utils/init/miscinit.c    Thu Mar 17 22:29:17 2005
***************
*** 505,510 ****
--- 505,513 ----     {         /*          * Try to create the lock file --- O_EXCL makes this atomic.
+          *
+          * Think not to make the file protection weaker than 0600.  See
+          * comments below.          */         fd = open(filename, O_RDWR | O_CREAT | O_EXCL, 0600);         if (fd
>=0)
 
***************
*** 564,569 ****
--- 567,587 ----          * then all but the immediate parent shell will be root-owned processes          * and so the
killtest will fail with EPERM.          *
 
+          * We can treat the EPERM-error case as okay because that error implies
+          * that the existing process has a different userid than we do, which
+          * means it cannot be a competing postmaster.  A postmaster cannot
+          * successfully attach to a data directory owned by a userid other
+          * than its own.  (This is now checked directly in checkDataDir(),
+          * but has been true for a long time because of the restriction that
+          * the data directory isn't group- or world-accessible.)  Also,
+          * since we create the lockfiles mode 600, we'd have failed above
+          * if the lockfile belonged to another userid --- which means that
+          * whatever process kill() is reporting about isn't the one that
+          * made the lockfile.  (NOTE: this last consideration is the only
+          * one that keeps us from blowing away a Unix socket file belonging
+          * to an instance of Postgres being run by someone else, at least
+          * on machines where /tmp hasn't got a stickybit.)
+          *          * Windows hasn't got getppid(), but doesn't need it since it's not          * using real kill()
either...         *
 
***************
*** 577,587 ****             )         {             if (kill(other_pid, 0) == 0 ||
!                 (errno != ESRCH #ifdef __BEOS__
!                  && errno != EINVAL #endif
!                  ))             {                 /* lockfile belongs to a live process */
ereport(FATAL,
--- 595,605 ----             )         {             if (kill(other_pid, 0) == 0 ||
!                 (errno != ESRCH && #ifdef __BEOS__
!                  errno != EINVAL && #endif
!                  errno != EPERM))             {                 /* lockfile belongs to a live process */
  ereport(FATAL,