Thread: Feature: POSIX Shared memory support, round 2

Feature: POSIX Shared memory support, round 2

From
Chris Marcellino
Date:
As discussed earlier, using POSIX shared memory can solve a few issues,
> On Mac OS X and other BSD's, the default System V shared memory
> limits are often very low and require adjustment for acceptable
> performance. Particularly, when Postgres is included as part of
> larger end-user friendly software products, these kernel settings
> are often difficult to change for 2 reasons:
>
> 1. The (arbitrarily) limited resources must be shared by all
> programs that use System V shared memory. For example on my Mac OS
> X computer, I have Postgres running a standalone database, but also
> as part of Apple Remote Desktop. Without manual adjustment, running
> both simultaneously causes one of them to fail. Correcting this in
> any robust way is challenging to automate for consumer-style (i.e.
> Mac) installers.
>
> 2. On these BSD's, this System V shared memory is wired down and
> cannot be swapped out for any reason. If Postgres is running as
> part of another software program or is a lower priority, other
> programs cannot use the potentially limited memory. This places the
> user or developer in a tricky position of having to minimize
> overall system impact, while permitting enough shared memory for
> Postgres to perform well.

Also, the SysV code is complex since it needs to deal with the
(probable) likelihood that a shmid will collide with another program
or postmaster.

Here is a new patch that uses the POSIX api's. It encodes the
canonical path (see 'man realpath') of the database's data directory
into the shared memory segment name using an strong hash function to
make it fit in the shared memory segment name under all cases,
without risk of key collision.

I have taken a new, simpler approach to handling databases that have
been kill -9 or crashed. It is described in the comments, but
essentially since all collisions in shared memory key must be from
orphaned backends or crashed postmasters from the current data
directory, they can be freed. A 2 character identifier field is
prepended to the data directory hash, which is incremented after
freeing an orphan, so that the new postmaster need not wait for the
backends to die. This approach also works equally well on Windows as
it does on Unixen. The comments also describe some of the portability
concerns (which have been handled). Please see the code
(PGSharedMemoryCreate and its helpers) for more information on this
point.

To build/test this, place the attached file in src/backend/port/ and
change the symbolic link pg_shmem.c to point to this file. If this
gets used on BSD's, keep in mind that shared memory is no longer
drawn from the SysV pool, so the SysV settings (SHMMAX, etc.) can be
set to their default values to recover the memory that was wired down
for the SysV pool.
I don't have access to any Linux machines to test this.

Thanks for your feedback,
Chris Marcellino



Attachment

Re: Feature: POSIX Shared memory support, round 2

From
Tom Lane
Date:
Chris Marcellino <cmarcellino@apple.com> writes:
> Here is a new patch that uses the POSIX api's. It encodes the
> canonical path (see 'man realpath') of the database's data directory
> into the shared memory segment name using an strong hash function to
> make it fit in the shared memory segment name under all cases,
> without risk of key collision.

I find this patch utterly unreadable, because of your cavalier disregard
for making the comments match the truth.  You have copied-and-pasted the
original SysV code and fixed some small fraction of the comments, and I
cannot tell which ones still reflect reality --- but I can tell that a
lot of them don't.

Also, I don't see where this implements any sort of detection of live
backends attached to an existing segment, so I don't think you have
responded to that objection.  Magnus' idea for Windows was to use a
segment set up to automatically go away as soon as the last attacher
died, but AFAICT that isn't how this works.

            regards, tom lane

Re: Feature: POSIX Shared memory support, round 2

From
Chris Marcellino
Date:
That is strange, because the majority of the comments are new. Much
of the code and comments are reused from the SysV code because, you
know, this is an enhancement. The comments that are left serve a
purpose.
In PGSharedMemoryCreate, this implementation avoids the need to tell
if live backends are attached to an existing segment, since exisiting
segments are not reattached to--the old segments are cleared when the
live orphan backends die.
I would love to hear some specific, less sweeping, comments about how
the code is actually written and functions. Otherwise, I'll try to
refactor this and return once again.

Thank you,
Chris Marcellino


On Feb 9, 2007, at 6:40 AM, Tom Lane wrote:

> Chris Marcellino <cmarcellino@apple.com> writes:
>> Here is a new patch that uses the POSIX api's. It encodes the
>> canonical path (see 'man realpath') of the database's data directory
>> into the shared memory segment name using an strong hash function to
>> make it fit in the shared memory segment name under all cases,
>> without risk of key collision.
>
> I find this patch utterly unreadable, because of your cavalier
> disregard
> for making the comments match the truth.  You have copied-and-
> pasted the
> original SysV code and fixed some small fraction of the comments,
> and I
> cannot tell which ones still reflect reality --- but I can tell that a
> lot of them don't.
>
> Also, I don't see where this implements any sort of detection of live
> backends attached to an existing segment, so I don't think you have
> responded to that objection.  Magnus' idea for Windows was to use a
> segment set up to automatically go away as soon as the last attacher
> died, but AFAICT that isn't how this works.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                 http://www.postgresql.org/about/donate