Thread: On file locking

On file locking

From
Kevin Brown
Date:
I've been looking at the PID file creation mechanism we currently use.
It goes through a loop in an attempt to create the PID file, and if
one is there it attempts to remove it if the PID it contains no longer
exists (there are checks for shared memory usage as well).

This could be cleaned up rather dramatically if we were to use one of
the file locking primitives supplied by the OS to grab an exclusive
lock on the file, and the upside is that, when the locking code is
used, the postmaster would *know* whether or not there's another
postmaster running, but the price for that is that we'd have to eat a
file descriptor (closing the file means losing the lock), and we'd
still have to retain the old code anyway in the event that there is no
suitable file locking mechanism to use on the platform in question.

The first question for the group is: is it worth doing that?

The second question for the group is: if we do indeed decide to do
file locking in that manner, what *other* applications of the OS-level
file locking mechanism will we have?  Some of them allow you to lock
sections of a file, for instance, while others apply a lock on the
entire file.  It's not clear to me that the former will be available
on all the platforms we're interested in, so locking the entire file
is probably the only thing we can really count on (and keep in mind
that even if an API to lock sections of a file is available, it may
well be that it's implemented by locking the entire file anyway).

What I had in mind was implementation of a file locking function that
would take a file descriptor and a file range.  If the underlying OS
mechanism supported it, it would lock that range.  The interesting
case is when the underlying OS mechanism did *not* support it.  Would
it be more useful in that case to return an error indication?  Would
it be more useful to simply lock the entire file?  If no underlying
file locking mechanism is available, it seems obvious to me that the
function would have to always return an error.


Thoughts?



-- 
Kevin Brown                          kevin@sysexperts.com


Re: On file locking

From
"Christopher Kings-Lynne"
Date:
Mmy problem is freebsd getting totally loaded at which point it sends kills
to various processes.  This sometime seems to end up with several actual
postmasters running, and none of them working.

Better existing process detection would help that greatly I'm sure.

Chris

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Kevin Brown
> Sent: Friday, 31 January 2003 11:24 AM
> To: PostgreSQL Development
> Subject: [HACKERS] On file locking
>
>
> I've been looking at the PID file creation mechanism we currently use.
> It goes through a loop in an attempt to create the PID file, and if
> one is there it attempts to remove it if the PID it contains no longer
> exists (there are checks for shared memory usage as well).
>
> This could be cleaned up rather dramatically if we were to use one of
> the file locking primitives supplied by the OS to grab an exclusive
> lock on the file, and the upside is that, when the locking code is
> used, the postmaster would *know* whether or not there's another
> postmaster running, but the price for that is that we'd have to eat a
> file descriptor (closing the file means losing the lock), and we'd
> still have to retain the old code anyway in the event that there is no
> suitable file locking mechanism to use on the platform in question.
>
> The first question for the group is: is it worth doing that?
>
> The second question for the group is: if we do indeed decide to do
> file locking in that manner, what *other* applications of the OS-level
> file locking mechanism will we have?  Some of them allow you to lock
> sections of a file, for instance, while others apply a lock on the
> entire file.  It's not clear to me that the former will be available
> on all the platforms we're interested in, so locking the entire file
> is probably the only thing we can really count on (and keep in mind
> that even if an API to lock sections of a file is available, it may
> well be that it's implemented by locking the entire file anyway).
>
> What I had in mind was implementation of a file locking function that
> would take a file descriptor and a file range.  If the underlying OS
> mechanism supported it, it would lock that range.  The interesting
> case is when the underlying OS mechanism did *not* support it.  Would
> it be more useful in that case to return an error indication?  Would
> it be more useful to simply lock the entire file?  If no underlying
> file locking mechanism is available, it seems obvious to me that the
> function would have to always return an error.
>
>
> Thoughts?
>
>
>
> --
> Kevin Brown                          kevin@sysexperts.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>



Re: On file locking

From
Rod Taylor
Date:
> file descriptor (closing the file means losing the lock), and we'd
> still have to retain the old code anyway in the event that there is no
> suitable file locking mechanism to use on the platform in question.

What is the gain given the above statement?  If what we currently do can
cause issues (fail), then beefing it up where available may be useful --
but otherwise it's just additional code.
--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: On file locking

From
Tom Lane
Date:
Kevin Brown <kevin@sysexperts.com> writes:
> This could be cleaned up rather dramatically if we were to use one of
> the file locking primitives supplied by the OS to grab an exclusive
> lock on the file, and the upside is that, when the locking code is
> used, the postmaster would *know* whether or not there's another
> postmaster running, but the price for that is that we'd have to eat a
> file descriptor (closing the file means losing the lock),

Yeah, I was just thinking about that this morning.  Eating one file
descriptor in the postmaster is absolutely no problem --- the postmaster
doesn't have all that many files open anyhow.  What I was wondering was
whether it was worth eating an FD for every backend process, by holding
open the file inherited from the postmaster.  If we did that, we would
have a reliable way of detecting that the old postmaster died but left
surviving child backends.  (As I mentioned in a nearby flamefest, the
existing interlock for this situation strikes me as mighty fragile.)

But this only wins if a child process inheriting an open file also
inherits copies of any locks held by the parent.  If not, then the
issue is moot.  Anybody have any idea if file locks work that way?
Is it portable??

> The second question for the group is: if we do indeed decide to do
> file locking in that manner, what *other* applications of the OS-level
> file locking mechanism will we have?

I can't see any use in partial-file locks for us, and would not want
to design an internal API that expects them to work.
        regards, tom lane


Re: On file locking

From
Giles Lean
Date:
> This could be cleaned up rather dramatically if we were to use one of
> the file locking primitives supplied by the OS to grab an exclusive
> lock on the file, ...
> ...
> The first question for the group is: is it worth doing that?

In the past it has been proposed and declined -- there is some stuff
in the archives.  While it would be beneficial to installations using
local data it would introduce new failure modes for installations
using NFS.

Regards,

Giles



Re: On file locking

From
"Shridhar Daithankar"
Date:
On Friday 31 Jan 2003 9:56 am, you wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> But this only wins if a child process inheriting an open file also
> inherits copies of any locks held by the parent.  If not, then the
> issue is moot.  Anybody have any idea if file locks work that way?
> Is it portable??

In my experience of HP-UX and linux, they do differ. How much, I don't
remember.

I have a stupid proposal. Keep file lock aside. I think shared memory can be
kept alive even after process dies. Why not write a shared memory segment id
to a file and let postmaster check that segment. That would be much easier.

Besides file locking is implemented using setgid  bit on most unices. And
everybody is free to do what he/she thinks right with it.

May be stupid but just a thought..
Shridhar



Re: On file locking

From
Kevin Brown
Date:
Tom Lane wrote:
> But this only wins if a child process inheriting an open file also
> inherits copies of any locks held by the parent.  If not, then the
> issue is moot.  Anybody have any idea if file locks work that way?
> Is it portable??

An alternate way might be to use semaphores, but I can't see how to do
that using the standard PGSemaphores implementation: it appears to
depend on cooperating processes inheriting a copy of the postmaster's
heap.

And since the POSIX semaphores default to unnamed ones, it appears
this idea is also a dead end unless my impressions are dead wrong...



-- 
Kevin Brown                          kevin@sysexperts.com


Re: On file locking

From
Antti Haapala
Date:
> But this only wins if a child process inheriting an open file also
> inherits copies of any locks held by the parent.  If not, then the
> issue is moot.  Anybody have any idea if file locks work that way?
> Is it portable??

From RedHat 8.0 manages fork(2):

SYNOPSIS      #include <sys/types.h>      #include <unistd.h>
      pid_t fork(void);

DESCRIPTION      fork  creates a child process that differs from the parent process only      in its PID and PPID, and
inthe fact that resource utilizations are set      to 0.  File locks and pending signals are not inherited.
^^^^^^^^^^                    ^^^^^^^^^^^^^^^^^^
 

And from SunOS 5.8 flock    Locks are on files, not file  descriptors.   That  is,  file    descriptors  duplicated
through dup(2)  or  fork(2) do not    result in multiple instances of a lock, but rather  multiple    references to a
singlelock.  If a process holding a lock on    a file forks and the child explicitly unlocks the file,  the    parent
will lose  its  lock.  Locks are not inherited by a    child process.
 

If I understand correctly it says that if parent dies, file is unlocked no
matter if there's children still running?

-- 
Antti Haapala



Re: On file locking

From
Tom Lane
Date:
Antti Haapala <antti.haapala@iki.fi> writes:
> And from SunOS 5.8 flock
>      Locks are on files, not file  descriptors.   That  is,  file
>      descriptors  duplicated  through  dup(2)  or  fork(2) do not
>      result in multiple instances of a lock, but rather  multiple
>      references to a single lock.  If a process holding a lock on
>      a file forks and the child explicitly unlocks the file,  the
>      parent  will  lose  its  lock.  Locks are not inherited by a
>      child process.

That seems self-contradictory.  If the fork results in multiple
references to the open file, then I should think that if the parent
dies but the child still holds the file open, then the lock still
exists.  Seems that some experimentation is called for ...
        regards, tom lane


Re: On file locking

From
Curt Sampson
Date:
On Fri, 31 Jan 2003, Shridhar Daithankar<shridhar_daithankar@persistent.co.in> wrote:

> Besides file locking is implemented using setgid  bit on most unices. And
> everybody is free to do what he/she thinks right with it.

I don't believe it's implemented with the setgid bit on most Unices. As
I recall, it's certainly not on Xenix, SCO Unix, any of the BSDs, Linux,
SunOS, Solaris, and Tru64 Unix.

(I'm talking about the flock system call, here.)

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 


Re: On file locking

From
Kevin Brown
Date:
Curt Sampson wrote:
> On Fri, 31 Jan 2003, Shridhar Daithankar<shridhar_daithankar@persistent.co.in> wrote:
> 
> > Besides file locking is implemented using setgid  bit on most unices. And
> > everybody is free to do what he/she thinks right with it.
> 
> I don't believe it's implemented with the setgid bit on most Unices. As
> I recall, it's certainly not on Xenix, SCO Unix, any of the BSDs, Linux,
> SunOS, Solaris, and Tru64 Unix.
> 
> (I'm talking about the flock system call, here.)

Linux, at least, supports mandatory file locks.  The Linux kernel
documentation mentions that you're supposed to use fcntl() or lockf()
(the latter being a library wrapper around the former) to actually
lock the file but, when those operations are applied to a file that
has the setgid bit set but without the group execute bit set, the
kernel enforces it as a mandatory lock.  That means that operations
like open(), read(), and write() initiated by other processes on the
same file will block (or return EAGAIN, if O_NONBLOCK was used to open
it) if that's what the lock on the file calls for.

That same documentation mentions that locks acquired using flock()
will *not* invoke the mandatory lock semantics even if on a file
marked for it, so I guess flock() isn't implemented on top of fcntl()
in Linux.

So if we wanted to make use of mandatory locks, we'd have to refrain
from using flock().




-- 
Kevin Brown                          kevin@sysexperts.com


Re: On file locking

From
Tom Lane
Date:
Kevin Brown <kevin@sysexperts.com> writes:
> So if we wanted to make use of mandatory locks, we'd have to refrain
> from using flock().

We have no need for mandatory locks; the advisory style will do fine.
This is true because we have no desire to interoperate with any
non-Postgres code ... everyone else is supposed to stay the heck out of
$PGDATA.
        regards, tom lane


Re: On file locking

From
Kevin Brown
Date:
Tom Lane wrote:
> Kevin Brown <kevin@sysexperts.com> writes:
> > So if we wanted to make use of mandatory locks, we'd have to refrain
> > from using flock().
> 
> We have no need for mandatory locks; the advisory style will do fine.
> This is true because we have no desire to interoperate with any
> non-Postgres code ... everyone else is supposed to stay the heck out of
> $PGDATA.

True.  But, of course, mandatory locks could be used to *make*
everyone else stay out of $PGDATA.  :-)


-- 
Kevin Brown                          kevin@sysexperts.com


Re: On file locking

From
Curt Sampson
Date:
On Fri, 31 Jan 2003, Tom Lane wrote:

> Antti Haapala <antti.haapala@iki.fi> writes:
> > And from SunOS 5.8 flock
> >      Locks are on files, not file  descriptors.   That  is,  file
> >      descriptors  duplicated  through  dup(2)  or  fork(2) do not
> >      result in multiple instances of a lock, but rather  multiple
> >      references to a single lock.  If a process holding a lock on
> >      a file forks and the child explicitly unlocks the file,  the
> >      parent  will  lose  its  lock.  Locks are not inherited by a
> >      child process.
>
> That seems self-contradictory.

Yes. I note that in NetBSD, that paragraph of the manual page is
identical except that the last sentence has been removed.

At any rate, it seems to me highly unlikely that, since the child has
the *same* descriptor as the parent had, that the lock would disappear.

The other option would be that the lock belongs to the process, in which
case one would think that a child doing an unlock should not affect the
parent, because it's a different process....

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 


Re: On file locking

From
Giles Lean
Date:
Curt Sampson <cjs@cynic.net> wrote:

> At any rate, it seems to me highly unlikely that, since the child has
> the *same* descriptor as the parent had, that the lock would
> disappear.

It depends on the lock function.  After fork():
   o with flock() the lock continues to be held, but will be unlocked     if any child process explicitly unlocks it
   o with fcntl() the lock is not inherited in the child
   o with lockf() the standards and manual pages don't say

Boring reference material follows.

flock
===== 

From the NetBSD manual page:

NOTES    Locks are on files, not file descriptors.  That is, file descriptors du-    plicated through dup(2) or fork(2)
donot result in multiple instances of    a lock, but rather multiple references to a single lock.  If a process
holdinga lock on a file forks and the child explicitly unlocks the file,    the parent will lose its lock.
 

The Red Hat Linux 8.0 manual page has similar wording.  (No standards
to check here -- flock() is not standardised in POSIX, X/Open, Single
Unix Standard, ...)

fcntl
=====

The NetBSD manual page notes that these locks are not inherited by
child processes:
    Another minor semantic problem with this interface is that locks    are not inherited by a child process created
usingthe fork(2)    function.
 

Ditto the Single Unix Standard versions 2 and 3.

lockf()
=======

The standards and manual pages that I've checked don't discuss
fork() in relation to lockf(), which seems a peculiar ommission
and makes me suspect that behaviour has varied historically.

In practice I would expect lockf() semantics to be the same as
fcntl().

Regards,

Giles 








Re: On file locking

From
Tom Lane
Date:
Giles Lean <giles@nemeton.com.au> writes:
> Boring reference material follows.

Couldn't help noticing that you omitted HPUX ;-)

On HPUX 10.20, flock doesn't seem to exist (hasn't got a man page nor
any mention in /usr/include).  lockf says
    All locks for a process are released upon    the first close of the file, even if the process still has the file
opened,and all locks held by a process are released when the process    terminates.
 

and
    When a file descriptor is closed, all locks on the file from the    calling process are deleted, even if other file
descriptorsfor that    file (obtained through dup() or open(), for example) still exist.
 

which seems to imply (but doesn't actually say) that HPUX keeps track of
exactly which process took out the lock, even if the file is held open
by multiple processes.

This all doesn't look good for using file locks in the way I had in
mind :-( ... but considering that all these man pages seem pretty vague,
maybe some direct experimentation is called for.
        regards, tom lane


Re: On file locking

From
Curt Sampson
Date:
On Sun, 2 Feb 2003, Tom Lane wrote:

> This all doesn't look good for using file locks in the way I had in
> mind :-( ... but considering that all these man pages seem pretty vague,
> maybe some direct experimentation is called for.

Definitely. I wonder about the NetBSD manpage quotes in the post you
followed up to, given that last time I checked flock() was implmented,
in the kernel, using fcntl(). Either that's changed, or the manpages
are unclear or lying.

This has been my experience in the past; locking semantics are subtle
and unclear enough that you really need to test for exactly what you
want at build time on every system, and you've got to do this testing
on the filesystem you intend to put the locks on. (So you don't, e.g.,
test a local filesystem but end up with data on an NFS filesystem with
different locking semantics.) That's what procmail does.

Given this, I'm not even sure the whole idea is worth persuing. (Though
I guess I should find out what NetBSD is really doing, and fix the
manual pages correspond to reality.)

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 


Re: On file locking

From
Giles Lean
Date:
Curt Sampson <cjs@cynic.net> wrote:

> On Sun, 2 Feb 2003, Tom Lane wrote:
> 
> > This all doesn't look good for using file locks in the way I had in
> > mind :-( ... but considering that all these man pages seem pretty vague,
> > maybe some direct experimentation is called for.
> 
> Definitely. I wonder about the NetBSD manpage quotes in the post you
> followed up to, given that last time I checked flock() was implmented,
> in the kernel, using fcntl(). Either that's changed, or the manpages
> are unclear or lying.

Using the same kernel code != same semantics.

I think the NetBSD manual pages are trying to say that it's "safe" to
have lockf(), fcntl(), and flock() locking playing together.  That
needn't be the case on all operating systems and the standards don't
require it.

> This has been my experience in the past; locking semantics are subtle
> and unclear enough that you really need to test for exactly what you
> want at build time on every system, and you've got to do this testing
> on the filesystem you intend to put the locks on.

What he said ...

Giles


Re: On file locking

From
Giles Lean
Date:
Tom Lane wrote:

> On HPUX 10.20, flock doesn't seem to exist (hasn't got a man page nor
> any mention in /usr/include).

Correct.  Still isn't there in later releases.

>  lockf says
> 
>      All locks for a process are released upon
>      the first close of the file, even if the process still has the file
>      opened, and all locks held by a process are released when the process
>      terminates.
> 
> and
> 
>      When a file descriptor is closed, all locks on the file from the
>      calling process are deleted, even if other file descriptors for that
>      file (obtained through dup() or open(), for example) still exist.
> 
> which seems to imply (but doesn't actually say) that HPUX keeps track of
> exactly which process took out the lock, even if the file is held open
> by multiple processes.

Having done some testing today, I now understand what the standards
are trying to say when they talk about locks being "inherited". Or at
least I think I understand: standards are tricky, locking is subtle,
and I'm prepared to be corrected if I'm wrong!

All of these lock functions succeed when the same process asks for a
lock that it already has.  That is:
    fcntl(fd, ...);    fcntl(fd, ...);  /* success -- no error returned */

For flock() only, the lock is inherited by a child process along
with the file descriptor so the child can re-issue the flock()
call and that will pass, too:
    flock(fd, ...);    pid = fork();    if (pid == 0)        flock(fd, ...);  /* success -- no error returned */

For fcntl() and lockf() the locks are not inherited, and the
call in a child fails:
    fcntl(fd, ...);    pid = fork();    if (pid == 0)        fcntl(fd, ...);  /* will fail and return -1 */

In no case does just closing the file descriptor in the child lose
the parent's lock.  I rationalise this as follows:

1. flock() is using a "last close" semantic, so closing the file  descriptor is documented not to lose the lock

2. lockf() and fcntl() use a "first close", but because the locks  are not inherited by the child process the child
can'tunlock  them
 

> This all doesn't look good for using file locks in the way I had in
> mind :-( ... but considering that all these man pages seem pretty vague,
> maybe some direct experimentation is called for.

I conjecture that Tom was looking for a facility to lock a file and
have it stay locked if the postmaster or any child process was still
running.  flock() fits the bill, but it's not portable everywhere.

One additional warning: this stuff *is* potentially filesystem
dependent, per the source code I looked at, which would call
filesystem specific routines.

I tested with HP-UX 11.00 (VxFS), NetBSD (FFS) and Linux (ext3).  I've
put the rough and ready test code up for FTP, if anyone wants to check
my working:
   ftp://ftp.nemeton.com.au/pub/pgsql/

Limitations in the testing:

I only used whole file locking (no byte ranges) and didn't prove that
a lock taken by flock() is still held after a child calls close() as
it is documented to be.

Regards,

Giles


Re: On file locking

From
Antti Haapala
Date:
> That same documentation mentions that locks acquired using flock()
> will *not* invoke the mandatory lock semantics even if on a file
> marked for it, so I guess flock() isn't implemented on top of fcntl()
> in Linux.

They're not. And there's another difference between fcntl and flock in
Linux: although fork(2) states that file locks are not inherited, locks
made by flock are inherited to children and they keep the lock even when
the parent process is killed with SIGKILL. Tested this.

Just see man syscall, there exists bothflock(2)
andfcntl(2)



-- 
Antti Haapala
+358 50 369 3535
ICQ: #177673735



Re: On file locking

From
Antti Haapala
Date:
> All of these lock functions succeed when the same process asks for a
> lock that it already has.  That is:
>
>      fcntl(fd, ...);
>      fcntl(fd, ...);  /* success -- no error returned */
>
> For flock() only, the lock is inherited by a child process along
> with the file descriptor so the child can re-issue the flock()
> call and that will pass, too:
>
>      flock(fd, ...);
>      pid = fork();
>      if (pid == 0)
>          flock(fd, ...);  /* success -- no error returned */

True...

> For fcntl() and lockf() the locks are not inherited, and the
> call in a child fails:
>
>      fcntl(fd, ...);
>      pid = fork();
>      if (pid == 0)
>          fcntl(fd, ...);  /* will fail and return -1 */
>
> In no case does just closing the file descriptor in the child lose
> the parent's lock.  I rationalise this as follows:
>
> 1. flock() is using a "last close" semantic, so closing the file
>    descriptor is documented not to lose the lock

Yep.

> 2. lockf() and fcntl() use a "first close", but because the locks
>    are not inherited by the child process the child can't unlock
>    them

And at least old linux system call manuals seems to reflect this
(incorrectly) when they state that file locks are not inherited (should
be "record locks obtained by fcntl").

> One additional warning: this stuff *is* potentially filesystem
> dependent, per the source code I looked at, which would call
> filesystem specific routines.
>
> I only used whole file locking (no byte ranges) and didn't prove that
> a lock taken by flock() is still held after a child calls close() as
> it is documented to be.

I tested this on Linux 2.4.x ext2 fs and it seems to follow the spec
exactly. If child is forked and it closes the file, parent still has the
lock until it's killed or it has also closed the file.

What about having two different lock files: one that would indicate that
there are some child processes still running and another which would
indicate that there's postmaster's parent process running? - Using flock
and fcntl semantics respectively (or flock semantics with children
immediately closing their fds).

And of course locking is file system dependant, just think NFS on linux
where virtually no locking semantics actually work :)

-- 
Antti Haapala