Re: dynamic shared memory - Mailing list pgsql-hackers

From Noah Misch
Subject Re: dynamic shared memory
Date
Msg-id 20130901132400.GA100090@tornado.leadboat.com
Whole thread Raw
In response to Re: dynamic shared memory  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: dynamic shared memory
List pgsql-hackers
On Sat, Aug 31, 2013 at 08:27:14AM -0400, Robert Haas wrote:
> On Fri, Aug 30, 2013 at 11:45 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> >> I shared your opinion that preferred_address is never going to be
> >> reliable, although FWIW Noah thinks it can be made reliable with a
> >> large-enough hammer.
> >
> > I think we need to have the arguments for that on list then. Those are
> > pretty damn fundamental design decisions.

I somewhat disfavor having a vague "preferred_address" parameter.  mmap()'s
first argument is specified that way, but mmap()'s specification caters to an
open-ended range of implementations and clients.  A PostgreSQL backend
interface can be more rigid.  If we choose to support fixed-address callers,
let those receive either the requested address or an ereport(ERROR).  If the
caller does not care, make no effort to provide a consistent address.  (Better
still, under --enable-cassert, try to force the address to differ across
processes.)

[quotations reordered]
> >> But even if it isn't reliable, there doesn't seem to be all that much
> >> value in forbidding access to that part of the OS-provided API.  In

That's also valid, though.  Even if no core code exploits the flexibility,
3rd-party code might do so.

> >> the world where it's not reliable, it may still be convenient to map
> >> things at the same address when you can, so that pointers can't be
> >> used.  Of course you'd have to have some fallback strategy for when
> >> you don't get the same mapping, and maybe that's painful enough that
> >> there's no point after all.  Or maybe it's worth having one code path
> >> for relativized pointers and another for non-relativized pointers.
> >
> > It seems likely to me that will end up with untested code in that
> > case. Or even unsupported platforms.

I agree.  It would take an exceptional use case to justify such parallel code
paths; I won't expect that to ever happen for core code.

> > I for one cannot see how you even remotely could make that work a) on
> > windows (check the troubles we have to go through to get s_b
> > consistently placed, and that's directly after startup) b) 32bit systems.
> 
> Noah?

The difficulty depends on whether processes other than the segment's creator
will attach anytime or only as they start.  Attachment at startup is enough
for parallel query, but it's not enough for something like lock table
expansion.  I'll focus on the attach-anytime case since it's more general.

On a system supporting MAP_FIXED, implement this by having the postmaster
reserve address space under a PROT_NONE mapping, then carving out from that
mapping for each fixed-address dynamic segment.  The size of the reservation
would be controlled by a GUC; one might set it to several times anticipated
peak usage.  (The overhead of doing that depends on the kernel.)  Windows
permits the same technique with its own primitives.

A system where mmap() accepts only a zero address in practice (HP-UX,
according to Gnulib, although HP docs suggest it has improved over time)
requires a different technique.  For those systems, expand the regular shared
memory segment and carve from that to make "dynamic" segments.  This amounts
to adding ShmemFree() to supplement ShmemAlloc().  If a core platform had to
use this implementation, its disadvantages would be sufficient to discard the
whole idea of reliable fixed addresses.  But I find it acceptable if it's a
crutch for older kernels, rare hardware, etc.

I don't foresee fundamental differences on 32-bit.  All the allocation
maximums scale down, but that's the usual story for 32-bit.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: [v9.4] row level security
Next
From: Heikki Linnakangas
Date:
Subject: Re: [v9.4] row level security