Lockfile restart failure is still there :-( - Mailing list pgsql-hackers

From Tom Lane
Subject Lockfile restart failure is still there :-(
Date
Msg-id 27501.1111094433@sss.pgh.pa.us
Whole thread Raw
Responses Re: Lockfile restart failure is still there :-(  (Greg Stark <gsstark@mit.edu>)
Re: Lockfile restart failure is still there :-(  (Andrew Dunstan <andrew@dunslane.net>)
Re: Lockfile restart failure is still there :-(  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Last fall I proposed a minor tweak to solve the problem of Postgres
not restarting after a system reboot, in cases where it looked at the
old lockfile and thought the old postmaster was still alive:
http://archives.postgresql.org/pgsql-hackers/2004-09/msg00935.php

However it turns out the bug is still there.  We eliminated one case,
which is where the PID shown in the lockfile now belongs to the
immediate parent process of the postmaster (ie the shell that's spawning
it).  But the PID might belong to an older process, for instance a
root-owned "su" that spawned the immediate parent shell.  I argued in
the above message that this wouldn't be a problem because the kill()
would fail against a non-postgres-owned process.  But I evidently didn't
read the code quite carefully enough: as CreateLockFile() is written,
it considers an EPERM error from kill() to be reason to treat the
lockfile as valid.

I was thinking at the time, and still think, it is reasonable to treat
EPERM as being a safe rather than unsafe case.  EPERM implies that the
process exists but does not belong to the postgres userid, and therefore
it could not possibly be a competing postmaster.  We can assume that any
postmaster successfully started in a particular data directory belongs
to the userid that owns that directory, because (a) we check that we are
not root, and (b) we check that the data directory has no group or world
permissions; therefore if we were not of its owner's userid we'd not
be able to do anything in it.

Can anyone see any holes in this reasoning?  Are there any cases where
an EPERM failure could occur against a process that is of our own userid?

I am strongly tempted to add a direct check in checkDataDir() that the
data directory actually does belong to our own uid, just for paranoia's
sake.  Someone might decide that they could relax the permission check
("hey, why not let the dbadmin group have write permission on $PGDATA")
without realizing they'd be weakening the startup safety interlock.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: contrib/pgcrypto
Next
From: Simon Riggs
Date:
Subject: Re: securing pg_proc