Thread: Re: POSIX shared memory support
Chris, et al, (commit-fest consensus discussion) * Chris Marcellino wrote: > In case you haven't had enough, here is another version of the code > to make Postgres use POSIX shared memory. Along with the issues that > have already been addressed, this version ensures that orphaned > backends are not in the database when restarting Postgres by using a > single 1 byte SysV segment to see who is attached to the segment > using shmctl/IPC_STAT/nattach. This really feels like a deal-breaker to me. My first reaction to this patch, honestly, is that it's being justified for all the wrong reasons. Changing to POSIX shm seems like a reasonable goal in general, provided it can do what we need, but doing it to work around silly defaults doesn't really work for me. If the real issue you have is with the SysV limits then I'd suggest you bring that up with the kernel/distribution folks to get them to use something more sane. Looking around a bit, it looks like it's already being addressed in some places, for example Solaris 10 apparently uses 1/4th of memory, while Centos 5 uses 4GB. Suse also uses a larger default, from what I understand. Supporting this effort to get it raised on various platforms and distributions seems like a much better approach. Additionally, it strikes me that there *is* a limit on POSIX shared memory too, generally half of ram on the systems I've looked at, but there's no guarentee that'll always be the default or that half of ram will always be enough for us. So, even with this change, the problem isn't completely 'solved'. Finding a way for POSIX shm to do what we need, including Tom's concerns, without depending on SvsV shm as a crutch work around, would make this change much more reasonable and could be justified as moving to a well defined POSIX standard, and means we may be able to support platforms which either are new and don't implement SysV but just POSIX, or cases where SysV is being actively depreceated. Neither of which is possible if we're stuck with using it in some cases. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > Finding a way for POSIX shm to do what we need, including Tom's > concerns, without depending on SvsV shm as a crutch work around, would > make this change much more reasonable and could be justified as moving > to a well defined POSIX standard, and means we may be able to support > platforms which either are new and don't implement SysV but just POSIX, > or cases where SysV is being actively depreceated. Neither of which is > possible if we're stuck with using it in some cases. Yeah, I would be far more interested in this patch if it avoided needing SysV shmem at all. The problem is to find an adequate substitute for the nattch-based interlock against live children of a dead postmaster. It's possible that file locking could be used instead, but that has its own set of portability and reliability issues to address. For example: ISTR that on some NFS configurations, file locking silently doesn't work, or might silently fail after it worked before, if the lock server daemon should happen to crash. And I don't even know what's available on Windows. So it'd need some research to make a credible proposal along those lines. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Yeah, I would be far more interested in this patch if it avoided needing > SysV shmem at all. The problem is to find an adequate substitute for > the nattch-based interlock against live children of a dead postmaster. Right, I had an idea about that but didn't really want to clutter the response to the general idea with it. At least on Linux (I don't know if it's the case elsewhere..), creating a POSIX shm ends up creating an actual 'file' in /dev/shm/, which you might be able to count the hard-links to in order to get an idea of the number of processes using it? It was just a thought that struck me, not sure if it's at all possible. Thanks, Stephen
Tom Lane wrote: > Yeah, I would be far more interested in this patch if it avoided needing > SysV shmem at all. The problem is to find an adequate substitute for > the nattch-based interlock against live children of a dead postmaster. > > (confused) Why can't you use mmap of /dev/zero and inherit the fd into child processes? (simple enough to do something similar on Win32, even if the mechanism isn't identical)
Stephen Frost <sfrost@snowman.net> writes: > Right, I had an idea about that but didn't really want to clutter the > response to the general idea with it. At least on Linux (I don't know > if it's the case elsewhere..), creating a POSIX shm ends up creating an > actual 'file' in /dev/shm/, which you might be able to count the > hard-links to in order to get an idea of the number of processes using > it? It was just a thought that struck me, not sure if it's at all > possible. That's not gonna work on anything but Linux, AFAIK. regards, tom lane
James Mansion wrote: > Tom Lane wrote: > > Yeah, I would be far more interested in this patch if it avoided > > needing SysV shmem at all. The problem is to find an adequate > > substitute for the nattch-based interlock against live children of > > a dead postmaster. > > > > > (confused) Why can't you use mmap of /dev/zero and inherit the fd > into child processes? > (simple enough to do something similar on Win32, even if the > mechanism isn't identical) This is what we do on win32 today. We don't use the sysv emulation layer anymore. //Magnus
Magnus Hagander <magnus@hagander.net> writes: > James Mansion wrote: >> (confused) Why can't you use mmap of /dev/zero and inherit the fd >> into child processes? > This is what we do on win32 today. We don't use the sysv emulation > layer anymore. Did we ever find an interlock that makes the win32 implementation safe against the postmaster-dead-children-still-alive scenario? regards, tom lane
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > James Mansion wrote: > >> (confused) Why can't you use mmap of /dev/zero and inherit the fd > >> into child processes? > > > This is what we do on win32 today. We don't use the sysv emulation > > layer anymore. > > Did we ever find an interlock that makes the win32 implementation > safe against the postmaster-dead-children-still-alive scenario? Yes. I don't remember the details offhand (and I'm at the airport right now), but the code that I put in there passed all those checks that we could think of. (The one that the old, sysv emulating, code didn't as well) //Magnus
The original patch author: Chris Marcellino <cmarcellino@apple.com> was not CC'ed as part of this email thread. That was a mistake. Chris, the email thread discussing your patch is here: http://archives.postgresql.org/pgsql-hackers/2008-03/msg01262.php Please read the discussion --- the bottom line is that there isn't much support for the patch. Magnus was able to do the POSIX usage without relying on shared memory, but I just talked to him via IM and he said it used a Win32-specific feature that isn't portable to Unix. I am holding this patch for the next commit fest in hopes you can adjust it, but if not the patch will be rejected at that time. --------------------------------------------------------------------------- Stephen Frost wrote: -- Start of PGP signed section. > Chris, et al, > > (commit-fest consensus discussion) > * Chris Marcellino wrote: > > In case you haven't had enough, here is another version of the code > > to make Postgres use POSIX shared memory. Along with the issues that > > have already been addressed, this version ensures that orphaned > > backends are not in the database when restarting Postgres by using a > > single 1 byte SysV segment to see who is attached to the segment > > using shmctl/IPC_STAT/nattach. > > This really feels like a deal-breaker to me. My first reaction to this > patch, honestly, is that it's being justified for all the wrong reasons. > Changing to POSIX shm seems like a reasonable goal in general, provided > it can do what we need, but doing it to work around silly defaults > doesn't really work for me. If the real issue you have is with the SysV > limits then I'd suggest you bring that up with the kernel/distribution > folks to get them to use something more sane. > > Looking around a bit, it looks like it's already being addressed in some > places, for example Solaris 10 apparently uses 1/4th of memory, while > Centos 5 uses 4GB. Suse also uses a larger default, from what I > understand. Supporting this effort to get it raised on various > platforms and distributions seems like a much better approach. > > Additionally, it strikes me that there *is* a limit on POSIX shared > memory too, generally half of ram on the systems I've looked at, but > there's no guarentee that'll always be the default or that half of ram > will always be enough for us. So, even with this change, the problem > isn't completely 'solved'. > > Finding a way for POSIX shm to do what we need, including Tom's > concerns, without depending on SvsV shm as a crutch work around, would > make this change much more reasonable and could be justified as moving > to a well defined POSIX standard, and means we may be able to support > platforms which either are new and don't implement SysV but just POSIX, > or cases where SysV is being actively depreceated. Neither of which is > possible if we're stuck with using it in some cases. > > Thanks, > > Stephen -- End of PGP section, PGP failed! -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +