RE: How to shoot yourself in the foot: kill -9 postmaster - Mailing list pgsql-hackers

From Hiroshi Inoue
Subject RE: How to shoot yourself in the foot: kill -9 postmaster
Date
Msg-id EKEJJICOHDIEMGPNIFIJIEMBDMAA.Inoue@tpf.co.jp
Whole thread Raw
In response to Re: How to shoot yourself in the foot: kill -9 postmaster  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> The interlock has to be tightly tied to the PGDATA directory, because
> what we're trying to protect is the files in and under that directory.
> It seems that something based on file(s) in that directory is the way
> to go.
> 
> The best idea I've seen so far is Hiroshi's idea of having all the
> backends hold fcntl locks on the same file (probably postmaster.pid
> would do fine).  Then the new postmaster can test whether any backends
> are still alive by trying to lock the old postmaster.pid file.
> Unfortunately, I read in the fcntl man page:
> 
>     Locks are not inherited by a child process in a fork(2) system call.
> 

Yes flock() works well here but fcntl() doesn't.

> This makes the idea much less attractive than I originally thought:
> a new backend would not automatically inherit a lock on the
> postmaster.pid file from the postmaster, but would have to open/lock it
> for itself.  That means there's a window where the new backend exists
> but would be invisible to a hypothetical new postmaster.
> 
> We could work around this with the following, very ugly protocol:
> 
> 1. Postmaster normally maintains fcntl read lock on its postmaster.pid
> file.  Each spawned backend immediately opens and read-locks
> postmaster.pid, too, and holds that file open until it dies.  (Thus
> wasting a kernel FD per backend, which is one of the less attractive
> things about this.)  If the backend is unable to obtain read lock on
> postmaster.pid, then it complains and dies.  We must use read locks
> here so that all these processes can hold them separately.
> 
> 2. If a newly started postmaster sees a pre-existing postmaster.pid
> file, it tries to obtain a *write* lock on that file.  If it fails,
> conclude that an old postmaster or backend is still alive; complain
> and quit.  If it succeeds, sit for say 1 second before deleting the file
> and creating a new one.  (The delay here is to allow any just-started
> old backends to fail to acquire read lock and quit.  A possible
> objection is that we have no way to guarantee 1 second is enough, though
> it ought to be plenty if the lock acquisition is just after the fork.)
> 

I have another idea. My main point is to not remove the existent
pidfile. For example
1) A newly started postmaster tries to obtain a write lock on the  first byte of the pidfile. If it fails the
postmasterquit.
 
2) The postmaster tries to obtain a write lock on the second byte   of the pidfile. If it fails the postmaster quit.
3) The postmaster releases the lock of 2).
4) Each backend obtains a read-lock on the second byte of the   pidfile.

Regards,
Hiroshi Inoue 


pgsql-hackers by date:

Previous
From: "Hiroshi Inoue"
Date:
Subject: RE: Proposed WAL changes
Next
From: Tom Lane
Date:
Subject: Re: Proposed WAL changes