Thread: On file locking
I've been looking at the PID file creation mechanism we currently use. It goes through a loop in an attempt to create the PID file, and if one is there it attempts to remove it if the PID it contains no longer exists (there are checks for shared memory usage as well). This could be cleaned up rather dramatically if we were to use one of the file locking primitives supplied by the OS to grab an exclusive lock on the file, and the upside is that, when the locking code is used, the postmaster would *know* whether or not there's another postmaster running, but the price for that is that we'd have to eat a file descriptor (closing the file means losing the lock), and we'd still have to retain the old code anyway in the event that there is no suitable file locking mechanism to use on the platform in question. The first question for the group is: is it worth doing that? The second question for the group is: if we do indeed decide to do file locking in that manner, what *other* applications of the OS-level file locking mechanism will we have? Some of them allow you to lock sections of a file, for instance, while others apply a lock on the entire file. It's not clear to me that the former will be available on all the platforms we're interested in, so locking the entire file is probably the only thing we can really count on (and keep in mind that even if an API to lock sections of a file is available, it may well be that it's implemented by locking the entire file anyway). What I had in mind was implementation of a file locking function that would take a file descriptor and a file range. If the underlying OS mechanism supported it, it would lock that range. The interesting case is when the underlying OS mechanism did *not* support it. Would it be more useful in that case to return an error indication? Would it be more useful to simply lock the entire file? If no underlying file locking mechanism is available, it seems obvious to me that the function would have to always return an error. Thoughts? -- Kevin Brown kevin@sysexperts.com
Mmy problem is freebsd getting totally loaded at which point it sends kills to various processes. This sometime seems to end up with several actual postmasters running, and none of them working. Better existing process detection would help that greatly I'm sure. Chris > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Kevin Brown > Sent: Friday, 31 January 2003 11:24 AM > To: PostgreSQL Development > Subject: [HACKERS] On file locking > > > I've been looking at the PID file creation mechanism we currently use. > It goes through a loop in an attempt to create the PID file, and if > one is there it attempts to remove it if the PID it contains no longer > exists (there are checks for shared memory usage as well). > > This could be cleaned up rather dramatically if we were to use one of > the file locking primitives supplied by the OS to grab an exclusive > lock on the file, and the upside is that, when the locking code is > used, the postmaster would *know* whether or not there's another > postmaster running, but the price for that is that we'd have to eat a > file descriptor (closing the file means losing the lock), and we'd > still have to retain the old code anyway in the event that there is no > suitable file locking mechanism to use on the platform in question. > > The first question for the group is: is it worth doing that? > > The second question for the group is: if we do indeed decide to do > file locking in that manner, what *other* applications of the OS-level > file locking mechanism will we have? Some of them allow you to lock > sections of a file, for instance, while others apply a lock on the > entire file. It's not clear to me that the former will be available > on all the platforms we're interested in, so locking the entire file > is probably the only thing we can really count on (and keep in mind > that even if an API to lock sections of a file is available, it may > well be that it's implemented by locking the entire file anyway). > > What I had in mind was implementation of a file locking function that > would take a file descriptor and a file range. If the underlying OS > mechanism supported it, it would lock that range. The interesting > case is when the underlying OS mechanism did *not* support it. Would > it be more useful in that case to return an error indication? Would > it be more useful to simply lock the entire file? If no underlying > file locking mechanism is available, it seems obvious to me that the > function would have to always return an error. > > > Thoughts? > > > > -- > Kevin Brown kevin@sysexperts.com > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
> file descriptor (closing the file means losing the lock), and we'd > still have to retain the old code anyway in the event that there is no > suitable file locking mechanism to use on the platform in question. What is the gain given the above statement? If what we currently do can cause issues (fail), then beefing it up where available may be useful -- but otherwise it's just additional code. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Kevin Brown <kevin@sysexperts.com> writes: > This could be cleaned up rather dramatically if we were to use one of > the file locking primitives supplied by the OS to grab an exclusive > lock on the file, and the upside is that, when the locking code is > used, the postmaster would *know* whether or not there's another > postmaster running, but the price for that is that we'd have to eat a > file descriptor (closing the file means losing the lock), Yeah, I was just thinking about that this morning. Eating one file descriptor in the postmaster is absolutely no problem --- the postmaster doesn't have all that many files open anyhow. What I was wondering was whether it was worth eating an FD for every backend process, by holding open the file inherited from the postmaster. If we did that, we would have a reliable way of detecting that the old postmaster died but left surviving child backends. (As I mentioned in a nearby flamefest, the existing interlock for this situation strikes me as mighty fragile.) But this only wins if a child process inheriting an open file also inherits copies of any locks held by the parent. If not, then the issue is moot. Anybody have any idea if file locks work that way? Is it portable?? > The second question for the group is: if we do indeed decide to do > file locking in that manner, what *other* applications of the OS-level > file locking mechanism will we have? I can't see any use in partial-file locks for us, and would not want to design an internal API that expects them to work. regards, tom lane
> This could be cleaned up rather dramatically if we were to use one of > the file locking primitives supplied by the OS to grab an exclusive > lock on the file, ... > ... > The first question for the group is: is it worth doing that? In the past it has been proposed and declined -- there is some stuff in the archives. While it would be beneficial to installations using local data it would introduce new failure modes for installations using NFS. Regards, Giles
On Friday 31 Jan 2003 9:56 am, you wrote: > Kevin Brown <kevin@sysexperts.com> writes: > But this only wins if a child process inheriting an open file also > inherits copies of any locks held by the parent. If not, then the > issue is moot. Anybody have any idea if file locks work that way? > Is it portable?? In my experience of HP-UX and linux, they do differ. How much, I don't remember. I have a stupid proposal. Keep file lock aside. I think shared memory can be kept alive even after process dies. Why not write a shared memory segment id to a file and let postmaster check that segment. That would be much easier. Besides file locking is implemented using setgid bit on most unices. And everybody is free to do what he/she thinks right with it. May be stupid but just a thought.. Shridhar
Tom Lane wrote: > But this only wins if a child process inheriting an open file also > inherits copies of any locks held by the parent. If not, then the > issue is moot. Anybody have any idea if file locks work that way? > Is it portable?? An alternate way might be to use semaphores, but I can't see how to do that using the standard PGSemaphores implementation: it appears to depend on cooperating processes inheriting a copy of the postmaster's heap. And since the POSIX semaphores default to unnamed ones, it appears this idea is also a dead end unless my impressions are dead wrong... -- Kevin Brown kevin@sysexperts.com
> But this only wins if a child process inheriting an open file also > inherits copies of any locks held by the parent. If not, then the > issue is moot. Anybody have any idea if file locks work that way? > Is it portable?? From RedHat 8.0 manages fork(2): SYNOPSIS #include <sys/types.h> #include <unistd.h> pid_t fork(void); DESCRIPTION fork creates a child process that differs from the parent process only in its PID and PPID, and inthe fact that resource utilizations are set to 0. File locks and pending signals are not inherited. ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^ And from SunOS 5.8 flock Locks are on files, not file descriptors. That is, file descriptors duplicated through dup(2) or fork(2) do not result in multiple instances of a lock, but rather multiple references to a singlelock. If a process holding a lock on a file forks and the child explicitly unlocks the file, the parent will lose its lock. Locks are not inherited by a child process. If I understand correctly it says that if parent dies, file is unlocked no matter if there's children still running? -- Antti Haapala
Antti Haapala <antti.haapala@iki.fi> writes: > And from SunOS 5.8 flock > Locks are on files, not file descriptors. That is, file > descriptors duplicated through dup(2) or fork(2) do not > result in multiple instances of a lock, but rather multiple > references to a single lock. If a process holding a lock on > a file forks and the child explicitly unlocks the file, the > parent will lose its lock. Locks are not inherited by a > child process. That seems self-contradictory. If the fork results in multiple references to the open file, then I should think that if the parent dies but the child still holds the file open, then the lock still exists. Seems that some experimentation is called for ... regards, tom lane
On Fri, 31 Jan 2003, Shridhar Daithankar<shridhar_daithankar@persistent.co.in> wrote: > Besides file locking is implemented using setgid bit on most unices. And > everybody is free to do what he/she thinks right with it. I don't believe it's implemented with the setgid bit on most Unices. As I recall, it's certainly not on Xenix, SCO Unix, any of the BSDs, Linux, SunOS, Solaris, and Tru64 Unix. (I'm talking about the flock system call, here.) cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
Curt Sampson wrote: > On Fri, 31 Jan 2003, Shridhar Daithankar<shridhar_daithankar@persistent.co.in> wrote: > > > Besides file locking is implemented using setgid bit on most unices. And > > everybody is free to do what he/she thinks right with it. > > I don't believe it's implemented with the setgid bit on most Unices. As > I recall, it's certainly not on Xenix, SCO Unix, any of the BSDs, Linux, > SunOS, Solaris, and Tru64 Unix. > > (I'm talking about the flock system call, here.) Linux, at least, supports mandatory file locks. The Linux kernel documentation mentions that you're supposed to use fcntl() or lockf() (the latter being a library wrapper around the former) to actually lock the file but, when those operations are applied to a file that has the setgid bit set but without the group execute bit set, the kernel enforces it as a mandatory lock. That means that operations like open(), read(), and write() initiated by other processes on the same file will block (or return EAGAIN, if O_NONBLOCK was used to open it) if that's what the lock on the file calls for. That same documentation mentions that locks acquired using flock() will *not* invoke the mandatory lock semantics even if on a file marked for it, so I guess flock() isn't implemented on top of fcntl() in Linux. So if we wanted to make use of mandatory locks, we'd have to refrain from using flock(). -- Kevin Brown kevin@sysexperts.com
Kevin Brown <kevin@sysexperts.com> writes: > So if we wanted to make use of mandatory locks, we'd have to refrain > from using flock(). We have no need for mandatory locks; the advisory style will do fine. This is true because we have no desire to interoperate with any non-Postgres code ... everyone else is supposed to stay the heck out of $PGDATA. regards, tom lane
Tom Lane wrote: > Kevin Brown <kevin@sysexperts.com> writes: > > So if we wanted to make use of mandatory locks, we'd have to refrain > > from using flock(). > > We have no need for mandatory locks; the advisory style will do fine. > This is true because we have no desire to interoperate with any > non-Postgres code ... everyone else is supposed to stay the heck out of > $PGDATA. True. But, of course, mandatory locks could be used to *make* everyone else stay out of $PGDATA. :-) -- Kevin Brown kevin@sysexperts.com
On Fri, 31 Jan 2003, Tom Lane wrote: > Antti Haapala <antti.haapala@iki.fi> writes: > > And from SunOS 5.8 flock > > Locks are on files, not file descriptors. That is, file > > descriptors duplicated through dup(2) or fork(2) do not > > result in multiple instances of a lock, but rather multiple > > references to a single lock. If a process holding a lock on > > a file forks and the child explicitly unlocks the file, the > > parent will lose its lock. Locks are not inherited by a > > child process. > > That seems self-contradictory. Yes. I note that in NetBSD, that paragraph of the manual page is identical except that the last sentence has been removed. At any rate, it seems to me highly unlikely that, since the child has the *same* descriptor as the parent had, that the lock would disappear. The other option would be that the lock belongs to the process, in which case one would think that a child doing an unlock should not affect the parent, because it's a different process.... cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
Curt Sampson <cjs@cynic.net> wrote: > At any rate, it seems to me highly unlikely that, since the child has > the *same* descriptor as the parent had, that the lock would > disappear. It depends on the lock function. After fork(): o with flock() the lock continues to be held, but will be unlocked if any child process explicitly unlocks it o with fcntl() the lock is not inherited in the child o with lockf() the standards and manual pages don't say Boring reference material follows. flock ===== From the NetBSD manual page: NOTES Locks are on files, not file descriptors. That is, file descriptors du- plicated through dup(2) or fork(2) donot result in multiple instances of a lock, but rather multiple references to a single lock. If a process holdinga lock on a file forks and the child explicitly unlocks the file, the parent will lose its lock. The Red Hat Linux 8.0 manual page has similar wording. (No standards to check here -- flock() is not standardised in POSIX, X/Open, Single Unix Standard, ...) fcntl ===== The NetBSD manual page notes that these locks are not inherited by child processes: Another minor semantic problem with this interface is that locks are not inherited by a child process created usingthe fork(2) function. Ditto the Single Unix Standard versions 2 and 3. lockf() ======= The standards and manual pages that I've checked don't discuss fork() in relation to lockf(), which seems a peculiar ommission and makes me suspect that behaviour has varied historically. In practice I would expect lockf() semantics to be the same as fcntl(). Regards, Giles
Giles Lean <giles@nemeton.com.au> writes: > Boring reference material follows. Couldn't help noticing that you omitted HPUX ;-) On HPUX 10.20, flock doesn't seem to exist (hasn't got a man page nor any mention in /usr/include). lockf says All locks for a process are released upon the first close of the file, even if the process still has the file opened,and all locks held by a process are released when the process terminates. and When a file descriptor is closed, all locks on the file from the calling process are deleted, even if other file descriptorsfor that file (obtained through dup() or open(), for example) still exist. which seems to imply (but doesn't actually say) that HPUX keeps track of exactly which process took out the lock, even if the file is held open by multiple processes. This all doesn't look good for using file locks in the way I had in mind :-( ... but considering that all these man pages seem pretty vague, maybe some direct experimentation is called for. regards, tom lane
On Sun, 2 Feb 2003, Tom Lane wrote: > This all doesn't look good for using file locks in the way I had in > mind :-( ... but considering that all these man pages seem pretty vague, > maybe some direct experimentation is called for. Definitely. I wonder about the NetBSD manpage quotes in the post you followed up to, given that last time I checked flock() was implmented, in the kernel, using fcntl(). Either that's changed, or the manpages are unclear or lying. This has been my experience in the past; locking semantics are subtle and unclear enough that you really need to test for exactly what you want at build time on every system, and you've got to do this testing on the filesystem you intend to put the locks on. (So you don't, e.g., test a local filesystem but end up with data on an NFS filesystem with different locking semantics.) That's what procmail does. Given this, I'm not even sure the whole idea is worth persuing. (Though I guess I should find out what NetBSD is really doing, and fix the manual pages correspond to reality.) cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
Curt Sampson <cjs@cynic.net> wrote: > On Sun, 2 Feb 2003, Tom Lane wrote: > > > This all doesn't look good for using file locks in the way I had in > > mind :-( ... but considering that all these man pages seem pretty vague, > > maybe some direct experimentation is called for. > > Definitely. I wonder about the NetBSD manpage quotes in the post you > followed up to, given that last time I checked flock() was implmented, > in the kernel, using fcntl(). Either that's changed, or the manpages > are unclear or lying. Using the same kernel code != same semantics. I think the NetBSD manual pages are trying to say that it's "safe" to have lockf(), fcntl(), and flock() locking playing together. That needn't be the case on all operating systems and the standards don't require it. > This has been my experience in the past; locking semantics are subtle > and unclear enough that you really need to test for exactly what you > want at build time on every system, and you've got to do this testing > on the filesystem you intend to put the locks on. What he said ... Giles
Tom Lane wrote: > On HPUX 10.20, flock doesn't seem to exist (hasn't got a man page nor > any mention in /usr/include). Correct. Still isn't there in later releases. > lockf says > > All locks for a process are released upon > the first close of the file, even if the process still has the file > opened, and all locks held by a process are released when the process > terminates. > > and > > When a file descriptor is closed, all locks on the file from the > calling process are deleted, even if other file descriptors for that > file (obtained through dup() or open(), for example) still exist. > > which seems to imply (but doesn't actually say) that HPUX keeps track of > exactly which process took out the lock, even if the file is held open > by multiple processes. Having done some testing today, I now understand what the standards are trying to say when they talk about locks being "inherited". Or at least I think I understand: standards are tricky, locking is subtle, and I'm prepared to be corrected if I'm wrong! All of these lock functions succeed when the same process asks for a lock that it already has. That is: fcntl(fd, ...); fcntl(fd, ...); /* success -- no error returned */ For flock() only, the lock is inherited by a child process along with the file descriptor so the child can re-issue the flock() call and that will pass, too: flock(fd, ...); pid = fork(); if (pid == 0) flock(fd, ...); /* success -- no error returned */ For fcntl() and lockf() the locks are not inherited, and the call in a child fails: fcntl(fd, ...); pid = fork(); if (pid == 0) fcntl(fd, ...); /* will fail and return -1 */ In no case does just closing the file descriptor in the child lose the parent's lock. I rationalise this as follows: 1. flock() is using a "last close" semantic, so closing the file descriptor is documented not to lose the lock 2. lockf() and fcntl() use a "first close", but because the locks are not inherited by the child process the child can'tunlock them > This all doesn't look good for using file locks in the way I had in > mind :-( ... but considering that all these man pages seem pretty vague, > maybe some direct experimentation is called for. I conjecture that Tom was looking for a facility to lock a file and have it stay locked if the postmaster or any child process was still running. flock() fits the bill, but it's not portable everywhere. One additional warning: this stuff *is* potentially filesystem dependent, per the source code I looked at, which would call filesystem specific routines. I tested with HP-UX 11.00 (VxFS), NetBSD (FFS) and Linux (ext3). I've put the rough and ready test code up for FTP, if anyone wants to check my working: ftp://ftp.nemeton.com.au/pub/pgsql/ Limitations in the testing: I only used whole file locking (no byte ranges) and didn't prove that a lock taken by flock() is still held after a child calls close() as it is documented to be. Regards, Giles
> That same documentation mentions that locks acquired using flock() > will *not* invoke the mandatory lock semantics even if on a file > marked for it, so I guess flock() isn't implemented on top of fcntl() > in Linux. They're not. And there's another difference between fcntl and flock in Linux: although fork(2) states that file locks are not inherited, locks made by flock are inherited to children and they keep the lock even when the parent process is killed with SIGKILL. Tested this. Just see man syscall, there exists bothflock(2) andfcntl(2) -- Antti Haapala +358 50 369 3535 ICQ: #177673735
> All of these lock functions succeed when the same process asks for a > lock that it already has. That is: > > fcntl(fd, ...); > fcntl(fd, ...); /* success -- no error returned */ > > For flock() only, the lock is inherited by a child process along > with the file descriptor so the child can re-issue the flock() > call and that will pass, too: > > flock(fd, ...); > pid = fork(); > if (pid == 0) > flock(fd, ...); /* success -- no error returned */ True... > For fcntl() and lockf() the locks are not inherited, and the > call in a child fails: > > fcntl(fd, ...); > pid = fork(); > if (pid == 0) > fcntl(fd, ...); /* will fail and return -1 */ > > In no case does just closing the file descriptor in the child lose > the parent's lock. I rationalise this as follows: > > 1. flock() is using a "last close" semantic, so closing the file > descriptor is documented not to lose the lock Yep. > 2. lockf() and fcntl() use a "first close", but because the locks > are not inherited by the child process the child can't unlock > them And at least old linux system call manuals seems to reflect this (incorrectly) when they state that file locks are not inherited (should be "record locks obtained by fcntl"). > One additional warning: this stuff *is* potentially filesystem > dependent, per the source code I looked at, which would call > filesystem specific routines. > > I only used whole file locking (no byte ranges) and didn't prove that > a lock taken by flock() is still held after a child calls close() as > it is documented to be. I tested this on Linux 2.4.x ext2 fs and it seems to follow the spec exactly. If child is forked and it closes the file, parent still has the lock until it's killed or it has also closed the file. What about having two different lock files: one that would indicate that there are some child processes still running and another which would indicate that there's postmaster's parent process running? - Using flock and fcntl semantics respectively (or flock semantics with children immediately closing their fds). And of course locking is file system dependant, just think NFS on linux where virtually no locking semantics actually work :) -- Antti Haapala