RE: How to shoot yourself in the foot: kill -9 postmaster - Mailing list pgsql-hackers
From | Hiroshi Inoue |
---|---|
Subject | RE: How to shoot yourself in the foot: kill -9 postmaster |
Date | |
Msg-id | EKEJJICOHDIEMGPNIFIJIEMBDMAA.Inoue@tpf.co.jp Whole thread Raw |
In response to | Re: How to shoot yourself in the foot: kill -9 postmaster (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > The interlock has to be tightly tied to the PGDATA directory, because > what we're trying to protect is the files in and under that directory. > It seems that something based on file(s) in that directory is the way > to go. > > The best idea I've seen so far is Hiroshi's idea of having all the > backends hold fcntl locks on the same file (probably postmaster.pid > would do fine). Then the new postmaster can test whether any backends > are still alive by trying to lock the old postmaster.pid file. > Unfortunately, I read in the fcntl man page: > > Locks are not inherited by a child process in a fork(2) system call. > Yes flock() works well here but fcntl() doesn't. > This makes the idea much less attractive than I originally thought: > a new backend would not automatically inherit a lock on the > postmaster.pid file from the postmaster, but would have to open/lock it > for itself. That means there's a window where the new backend exists > but would be invisible to a hypothetical new postmaster. > > We could work around this with the following, very ugly protocol: > > 1. Postmaster normally maintains fcntl read lock on its postmaster.pid > file. Each spawned backend immediately opens and read-locks > postmaster.pid, too, and holds that file open until it dies. (Thus > wasting a kernel FD per backend, which is one of the less attractive > things about this.) If the backend is unable to obtain read lock on > postmaster.pid, then it complains and dies. We must use read locks > here so that all these processes can hold them separately. > > 2. If a newly started postmaster sees a pre-existing postmaster.pid > file, it tries to obtain a *write* lock on that file. If it fails, > conclude that an old postmaster or backend is still alive; complain > and quit. If it succeeds, sit for say 1 second before deleting the file > and creating a new one. (The delay here is to allow any just-started > old backends to fail to acquire read lock and quit. A possible > objection is that we have no way to guarantee 1 second is enough, though > it ought to be plenty if the lock acquisition is just after the fork.) > I have another idea. My main point is to not remove the existent pidfile. For example 1) A newly started postmaster tries to obtain a write lock on the first byte of the pidfile. If it fails the postmasterquit. 2) The postmaster tries to obtain a write lock on the second byte of the pidfile. If it fails the postmaster quit. 3) The postmaster releases the lock of 2). 4) Each backend obtains a read-lock on the second byte of the pidfile. Regards, Hiroshi Inoue
pgsql-hackers by date: