Re: Posix Shared Mem patch - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Posix Shared Mem patch |
Date | |
Msg-id | 18162.1340761845@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Posix Shared Mem patch ("A.M." <agentm@themactionfaction.com>) |
Responses |
Re: Posix Shared Mem patch
Re: Posix Shared Mem patch |
List | pgsql-hackers |
"A.M." <agentm@themactionfaction.com> writes: > On 06/26/2012 07:30 PM, Tom Lane wrote: >>> I solved this via fcntl locking. >> No, you didn't, because fcntl locks aren't inherited by child processes. >> Too bad, because they'd be a great solution otherwise. > You claimed this last time and I replied: > http://archives.postgresql.org/pgsql-hackers/2011-04/msg00656.php > "I address this race condition by ensuring that a lock-holding violator > is the postmaster or a postmaster child. If such as condition is > detected, the child exits immediately without touching the shared > memory. POSIX shmem is inherited via file descriptors." > This is possible because the locking API allows one to request which PID > violates the lock. The child expects the lock to be held and checks that > the PID is the parent. If the lock is not held, that means that the > postmaster is dead, so the child exits immediately. OK, I went back and re-read the original patch, and I now agree that something like this is possible --- but I don't like the way you did it. The dependence on particular PIDs seems both unnecessary and risky. The key concept here seems to be that the postmaster first stakes a claim on the data directory by exclusive-locking a lock file. If successful, it reduces that lock to shared mode (which can be done atomically, according to the SUS fcntl specification), and then holds the shared lock until it exits. Spawned children will not initially have a lock, but what they can do is attempt to acquire shared lock on the lock file. If fail, exit. If successful, *check to see that the parent postmaster is still alive* (ie, getppid() != 1). If so, the parent must have been continuously holding the lock, and the child has successfully joined the pool of shared lock holders. Otherwise bail out without having changed anything. It is the "parent is still alive" check, not any test on individual PIDs, that makes this work. There are two concrete reasons why I don't care for the GetPIDHoldingLock() way. Firstly, the fact that you can get a blocking PID from F_GETLK isn't an essential part of the concept of file locking IMO --- it's just an incidental part of this particular API. May I remind you that the reason we're stuck on SysV shmem in the first place is that we decided to depend on an incidental part of that API, namely nattch? I would like to not require file locking to have any semantics more specific than "a process can hold an exclusive or a shared lock on a file, which is auto-released at process exit". Secondly, in an NFS world I don't believe that the returned l_pid value can be trusted for anything. If it's a PID from a different machine then it might accidentally conflict with one on our machine, or not. Reflecting on this further, it seems to me that the main remaining failure modes are (1) file locking doesn't work, or (2) idiot DBA manually removes the lock file. Both of these could be ameliorated with some refinements to the basic idea. For (1), I suggest that we tweak the startup process (only) to attempt to acquire exclusive lock on the lock file. If it succeeds, we know that file locking is broken, and we can complain. (This wouldn't help for cases where cross-machine locking is broken, but I see no practical way to detect that.) For (2), the problem really is that the proposed patch conflates the PID file with the lock file, but people are conditioned to think that PID files are removable. I suggest that we create a separate, permanently present file that serves only as the lock file and doesn't ever get modified (it need have no content other than the string "Don't remove this!"). It'd be created by initdb, not by individual postmaster runs; indeed the postmaster should fail if it doesn't find the lock file already present. The postmaster PID file should still exist with its current contents, but it would serve mostly as documentation and as server-contact information for pg_ctl; it would not be part of the data directory locking mechanism. I wonder whether this design can be adapted to Windows? IIRC we do not have a bulletproof data directory lock scheme for Windows. It seems like this makes few enough demands on the lock mechanism that there ought to be suitable primitives available there too. regards, tom lane
pgsql-hackers by date: