Re: Proposal to add a QNX 6.5 port to PostgreSQL - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Proposal to add a QNX 6.5 port to PostgreSQL
Date
Msg-id CA+TgmobA_Cs2PZJs1PjTEQG_+Ns1kQ1XwRmvBsgqp3iSWZR_+g@mail.gmail.com
Whole thread Raw
In response to Re: Proposal to add a QNX 6.5 port to PostgreSQL  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Proposal to add a QNX 6.5 port to PostgreSQL
List pgsql-hackers
On Fri, Jul 25, 2014 at 6:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> *         QNX lacks System V shared memory: I created "src/backend/port/posix_shmem.c" which replaces System V calls
(shmget,shmat, shmdt, ...) with POSIX calls (shm_open, mmap, munmap, shm_unlink)
 
>
> This isn't really acceptable for production usage; if it were, we'd have
> done it already.  The POSIX APIs lack any way to tell how many processes
> are attached to a shmem segment, which is *necessary* functionality for
> us (it's a critical part of the interlock against starting multiple
> postmasters in one data directory).

I think it would be good to spend some energy figuring out what to do
about this.  The Linux developers, for reasons I have not been able to
understand, appear to hate System V shared memory, and rumors have
circulated here that they would like to get rid of it altogether.  And
quite apart from that, even using a few bytes of System V shared
memory is apparently inconvenient for people who run many copies of
PostgreSQL on the same machine or who run in environments where it's
not available, such as FreeBSD jails for which it hasn't been
specifically enabled.[1]

Now, in fairness, all of the alternative systems have their own share
of problems.  POSIX shared memory isn't available everywhere, and the
anonymous mmap we're now using doesn't work in EXEC_BACKEND builds,
can't be used for dynamic shared memory, and apparently performs
poorly on BSD systems.[1]  In spite of that, I think that having an
option to use POSIX shared memory would make a reasonable number of
PostgreSQL users happier than they are today; and maybe even attract a
few new ones.

In our last discussion on this topic, we talked about using file locks
as a substitute for nattch.  You concluded that fcntl was totally
broken for this purpose because of the possibility of some other piece
of code accidentally opening and closing the lock file.[2]  lockf
appears to have the same problem, but flock might not, at least on
some systems.  The semantics as described in my copy of the Linux man
pages are that a child created by fork() inherits a copy of the
filehandle pointing to the same lock, and that the lock is released
when either ANY process with a copy of that filehandle makes an
explicit unlock request or ALL copies of the filehandle are closed.
That seems like it'd be OK for our purposes, though the Linux guys
seem to think the semantics might be different on other platforms, and
note that it won't work over NFS.

Another thing that strikes me is that lsof works on just about every
platform I've ever used, and it tells you who has got a certain file
open.  Of course it has to use different methods to do that on
different platforms, but at least on Linux, /proc/self/fd/N is a
symlink to the file you've got open, and shared memory segments are
files in /dev/shm.  So maybe at least on particular platforms where we
care enough, we could install operating-system-specific code to
provide an interlock using a mechanism of this type.  Not sure if that
will fly, but it's a thought.

Yet another idea is to somehow use POSIX semaphores, which are
distinct from POSIX shared memory.  semop() has a SEM_UNDO flag which
causes whatever operation you perform to reversed out a process exit.
So you could have each new postgres process increment the semaphore
value in such a way that it would be decremented on exit, although I'm
not sure how to avoid a race if the postmaster dies before a new child
has a chance to increment the semaphore.

Finally, how about named pipes?   Linux says that trying to open a
named pipe for write when there are no readers will return ENXIO, and
attempting to write to an already-open pipe with no remaining readers
will cause SIGPIPE.  So: create a permanent named pipe in the data
directory that all PostgreSQL processes keep open.  When the
postmaster starts, it opens the pipe for read, then for write, then
closes it for read.  It then tries to write to the pipe.  If this
fails to result in SIGPIPE, then somebody else has got the thing open;
so the new postmaster should die at once.   But if does get a SIGPIPE
then there are as of that moment no other readers.

I'm not sure if any of this helps QNX or not, but maybe if we figure
out which of these mechanisms (or others) might be acceptable we can
cross-check that against what QNX supports.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] See comments on
http://rhaas.blogspot.com/2012/06/absurd-shared-memory-limits.html
[2] http://www.postgresql.org/message-id/18958.1340764854@sss.pgh.pa.us



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: multixact optimization patches
Next
From: Wim Lewis
Date:
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)