Thread: Re: POSIX shared memory support

Re: POSIX shared memory support

From
Stephen Frost
Date:
Chris, et al,

(commit-fest consensus discussion)
* Chris Marcellino wrote:
> In case you haven't had enough, here is another version of the code
> to make Postgres use POSIX shared memory. Along with the issues that
> have already been addressed, this version ensures that orphaned
> backends are not in the database when restarting Postgres by using a
> single 1 byte SysV segment to see who is attached to the segment
> using shmctl/IPC_STAT/nattach.

This really feels like a deal-breaker to me.  My first reaction to this
patch, honestly, is that it's being justified for all the wrong reasons.
Changing to POSIX shm seems like a reasonable goal in general, provided
it can do what we need, but doing it to work around silly defaults
doesn't really work for me.  If the real issue you have is with the SysV
limits then I'd suggest you bring that up with the kernel/distribution
folks to get them to use something more sane.

Looking around a bit, it looks like it's already being addressed in some
places, for example Solaris 10 apparently uses 1/4th of memory, while
Centos 5 uses 4GB.  Suse also uses a larger default, from what I
understand.  Supporting this effort to get it raised on various
platforms and distributions seems like a much better approach.

Additionally, it strikes me that there *is* a limit on POSIX shared
memory too, generally half of ram on the systems I've looked at, but
there's no guarentee that'll always be the default or that half of ram
will always be enough for us.  So, even with this change, the problem
isn't completely 'solved'.

Finding a way for POSIX shm to do what we need, including Tom's
concerns, without depending on SvsV shm as a crutch work around, would
make this change much more reasonable and could be justified as moving
to a well defined POSIX standard, and means we may be able to support
platforms which either are new and don't implement SysV but just POSIX,
or cases where SysV is being actively depreceated.  Neither of which is
possible if we're stuck with using it in some cases.
Thanks,
    Stephen

Re: POSIX shared memory support

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> Finding a way for POSIX shm to do what we need, including Tom's
> concerns, without depending on SvsV shm as a crutch work around, would
> make this change much more reasonable and could be justified as moving
> to a well defined POSIX standard, and means we may be able to support
> platforms which either are new and don't implement SysV but just POSIX,
> or cases where SysV is being actively depreceated.  Neither of which is
> possible if we're stuck with using it in some cases.

Yeah, I would be far more interested in this patch if it avoided needing
SysV shmem at all.  The problem is to find an adequate substitute for
the nattch-based interlock against live children of a dead postmaster.

It's possible that file locking could be used instead, but that has its
own set of portability and reliability issues to address.  For example:
ISTR that on some NFS configurations, file locking silently doesn't
work, or might silently fail after it worked before, if the lock server
daemon should happen to crash.  And I don't even know what's available
on Windows.  So it'd need some research to make a credible proposal
along those lines.
        regards, tom lane


Re: POSIX shared memory support

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Yeah, I would be far more interested in this patch if it avoided needing
> SysV shmem at all.  The problem is to find an adequate substitute for
> the nattch-based interlock against live children of a dead postmaster.

Right, I had an idea about that but didn't really want to clutter the
response to the general idea with it.  At least on Linux (I don't know
if it's the case elsewhere..), creating a POSIX shm ends up creating an
actual 'file' in /dev/shm/, which you might be able to count the
hard-links to in order to get an idea of the number of processes using
it?  It was just a thought that struck me, not sure if it's at all
possible.
Thanks,
    Stephen

Re: POSIX shared memory support

From
James Mansion
Date:
Tom Lane wrote:
> Yeah, I would be far more interested in this patch if it avoided needing
> SysV shmem at all.  The problem is to find an adequate substitute for
> the nattch-based interlock against live children of a dead postmaster.
>
>   
(confused) Why can't you use mmap of /dev/zero and inherit the fd into 
child processes?
(simple enough to do something similar on Win32, even if the mechanism 
isn't identical)



Re: POSIX shared memory support

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> Right, I had an idea about that but didn't really want to clutter the
> response to the general idea with it.  At least on Linux (I don't know
> if it's the case elsewhere..), creating a POSIX shm ends up creating an
> actual 'file' in /dev/shm/, which you might be able to count the
> hard-links to in order to get an idea of the number of processes using
> it?  It was just a thought that struck me, not sure if it's at all
> possible.

That's not gonna work on anything but Linux, AFAIK.
        regards, tom lane


Re: POSIX shared memory support

From
Magnus Hagander
Date:
James Mansion wrote:
> Tom Lane wrote:
> > Yeah, I would be far more interested in this patch if it avoided
> > needing SysV shmem at all.  The problem is to find an adequate
> > substitute for the nattch-based interlock against live children of
> > a dead postmaster.
> >
> >   
> (confused) Why can't you use mmap of /dev/zero and inherit the fd
> into child processes?
> (simple enough to do something similar on Win32, even if the
> mechanism isn't identical)

This is what we do on win32 today. We don't use the sysv emulation
layer anymore.

//Magnus


Re: POSIX shared memory support

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> James Mansion wrote:
>> (confused) Why can't you use mmap of /dev/zero and inherit the fd
>> into child processes?

> This is what we do on win32 today. We don't use the sysv emulation
> layer anymore.

Did we ever find an interlock that makes the win32 implementation
safe against the postmaster-dead-children-still-alive scenario?
        regards, tom lane


Re: POSIX shared memory support

From
Magnus Hagander
Date:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > James Mansion wrote:
> >> (confused) Why can't you use mmap of /dev/zero and inherit the fd
> >> into child processes?
> 
> > This is what we do on win32 today. We don't use the sysv emulation
> > layer anymore.
> 
> Did we ever find an interlock that makes the win32 implementation
> safe against the postmaster-dead-children-still-alive scenario?

Yes. I don't remember the details offhand (and I'm at the airport right
now), but the code that I put in there passed all those checks that we
could think of. (The one that the old, sysv emulating, code didn't as
well)

//Magnus


Re: POSIX shared memory support

From
Bruce Momjian
Date:
The original patch author:
       Chris Marcellino <cmarcellino@apple.com>

was not CC'ed as part of this email thread.  That was a mistake.  Chris,
the email thread discussing your patch is here:
      http://archives.postgresql.org/pgsql-hackers/2008-03/msg01262.php

Please read the discussion --- the bottom line is that there isn't much
support for the patch.  Magnus was able to do the POSIX usage without
relying on shared memory, but I just talked to him via IM and he said it
used a Win32-specific feature that isn't portable to Unix.

I am holding this patch for the next commit fest in hopes you can adjust
it, but if not the patch will be rejected at that time.

---------------------------------------------------------------------------

Stephen Frost wrote:
-- Start of PGP signed section.
> Chris, et al,
> 
> (commit-fest consensus discussion)
> * Chris Marcellino wrote:
> > In case you haven't had enough, here is another version of the code  
> > to make Postgres use POSIX shared memory. Along with the issues that  
> > have already been addressed, this version ensures that orphaned  
> > backends are not in the database when restarting Postgres by using a  
> > single 1 byte SysV segment to see who is attached to the segment  
> > using shmctl/IPC_STAT/nattach.
> 
> This really feels like a deal-breaker to me.  My first reaction to this
> patch, honestly, is that it's being justified for all the wrong reasons.
> Changing to POSIX shm seems like a reasonable goal in general, provided
> it can do what we need, but doing it to work around silly defaults
> doesn't really work for me.  If the real issue you have is with the SysV
> limits then I'd suggest you bring that up with the kernel/distribution
> folks to get them to use something more sane.
> 
> Looking around a bit, it looks like it's already being addressed in some
> places, for example Solaris 10 apparently uses 1/4th of memory, while
> Centos 5 uses 4GB.  Suse also uses a larger default, from what I
> understand.  Supporting this effort to get it raised on various
> platforms and distributions seems like a much better approach.
> 
> Additionally, it strikes me that there *is* a limit on POSIX shared
> memory too, generally half of ram on the systems I've looked at, but
> there's no guarentee that'll always be the default or that half of ram
> will always be enough for us.  So, even with this change, the problem
> isn't completely 'solved'.
> 
> Finding a way for POSIX shm to do what we need, including Tom's
> concerns, without depending on SvsV shm as a crutch work around, would
> make this change much more reasonable and could be justified as moving
> to a well defined POSIX standard, and means we may be able to support
> platforms which either are new and don't implement SysV but just POSIX,
> or cases where SysV is being actively depreceated.  Neither of which is
> possible if we're stuck with using it in some cases.
> 
>     Thanks,
> 
>         Stephen
-- End of PGP section, PGP failed!

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +