Improving backend startup interlock - Mailing list pgsql-hackers

From Tom Lane
Subject Improving backend startup interlock
Date
Msg-id 12759.1033228893@sss.pgh.pa.us
Whole thread Raw
Responses Re: Improving backend startup interlock  (Giles Lean <giles@nemeton.com.au>)
List pgsql-hackers
I have the beginnings of an idea about improving our interlock logic
for postmaster startup.  The existing method is pretty good, but we
have had multiple reports that it can fail during system boot if the
old postmaster wasn't given a chance to shut down cleanly: there's
a fair-sized chance that the old postmaster PID will have been assigned
to some other process, and that fools the interlock check.

I think we can improve matters by combining the existing checks for
old-postmaster-PID and old-shared-memory-segment into one cohesive
entity.  To do this, we must abandon the existing special case for
"private memory" when running a bootstrap or standalone backend.
Even a standalone backend will be required to get a shmem segment
just like a postmaster would.  This ensures that we can use both
parts of the safety check, even when the old holder of the data
directory interlock was a standalone backend.

Here's a sketch of the improved startup procedure:

1. Try to open and read the $PGDATA/postmaster.pid file.  If we fail
because it's not there, okay to continue, because old postmaster must
have shut down cleanly; skip to step 8.  If we fail for any other reason
(eg, permissions failure), complain and abort startup.  (Because we
write the postmaster.pid file mode 600, getting past this step
guarantees we are either the same UID as the old postmaster or root;
else we'd have failed to read the old file.  This fact justifies some
assumptions below.)

2. Extract old postmaster PID and old shared memory key from file.
(Both will now always be there, per above; abort if file contents are
not as expected.)  We do not bother with trying kill(PID, 0) anymore,
because it doesn't prove anything.

3. Try to attach to the old shared memory segment using the old key.
There are three possible outcomes:
A: fail because it's not there.  Then we know the old postmaster  (or standalone backend) is gone, and so are all its
children. Okay to skip to step 7.
 
B: fail for some other reason, eg permissions violation.  Because  we know we are the same UID (or root) as before,
thismust indicate  that the "old" shmem segment actually belongs to someone else;  so we have a chance collision with
someoneelse's shmem key.  Ignore the shmem segment, skip to step 7.  (In short,  we can treat all failures alike, which
isa Good Thing.)
 
C: attach succeeds. Continue to step 4.

4. Examine header of old shmem segment to see if it contains the right  magic number *and* old postmaster PID.  If not,
itisn't really  a Postgres shmem segment, so ignore it; detach and skip to step 7.
 

5. If old shmem segment still has other processes attached to it,  abort: these must be an old postmaster and/or old
backendsstill  alive.  (We can check nattach > 1 in the SysV case, or just assume  they are there in the
hugepages-segmentcase that Neil wants to add.)
 

6. Detach from and delete the old shmem segment.  (Deletion isn't  strictly necessary, but we should do it to avoid
suckingresources.)
 

7. Delete the old postmaster.pid file.  If this fails for any reason,  abort.  (Either we've got permissions problems
ora race condition  with someone else trying to start up.)
 

8. Create a shared memory segment.

9. Create a new postmaster.pid file and record my PID and segment key.  If we fail to do this (with O_EXCL create),
abort;someone else  must be trying to start up at the same time.  Be careful to create  the lockfile mode 600, per
notesabove.
 


This is not quite ready for prime time yet, because it's not very
bulletproof against the scenario where two would-be postmasters are
starting concurrently.  The first one might get all the way through the
sequence before the second one arrives at step 7 --- in which case the
second one will be deleting the first one's lockfile.  Oops.  A possible
answer is to create a second lockfile that only exists for the duration
of the startup sequence, and use that to ensure that only one process is
trying this sequence at a time.  This reintroduces the same problem
we're trying to get away from (must rely on kill(PID, 0) to determine
validity of the lock file), but at least the window of vulnerability is
much smaller than before.  Does anyone see a better way?

A more general objection is that this approach will hardwire, even more
solidly than before, the assumption that we are using a shared-memory
API that provides identifiable shmem segments (ie, something we can
record a key for and later try to attach to).  I think some people
wanted to get away from that.  But so far I've not seen any proposal
for an alternative startup interlock that doesn't require attachable
shared memory.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Greg Copeland
Date:
Subject: Re: How to REINDEX in high volume environments?
Next
From: "Magnus Naeslund(f)"
Date:
Subject: Re: Vacuum from within a function crashes backend