Re: dynamic shared memory - Mailing list pgsql-hackers

From Robert Haas
Subject Re: dynamic shared memory
Date
Msg-id CA+Tgmoa-8gett_hkNsQ9r04rVW-hoAzTobQr3SCGKocRo=7tUg@mail.gmail.com
Whole thread Raw
In response to Re: dynamic shared memory  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: dynamic shared memory  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Fri, Aug 30, 2013 at 11:45 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> The way I've designed it, no.  If what we expect to be the control
>> segment doesn't exist or doesn't conform to our expectations, we just
>> assume that it's not really the control segment after all - e.g.
>> someone rebooted, clearing all the segments, and then an unrelated
>> process (malicious, perhaps, or just a completely different cluster)
>> reused the same name.  This is similar to what we do for the main
>> shared memory segment.
>
> The case I am mostly wondering about is some process crashing and
> overwriting random memory. We need to be pretty sure that we'll never
> fail partially through cleaning up old segments because they are
> corrupted or because we died halfway through our last cleanup attempt.

Right.  I had those considerations in mind and I believe I have nailed
the hatch shut pretty tight.  The cleanup code is designed never to
die with an error.  Of course it might, but it would have to be
something like an out of memory failure or similar that isn't really
what we're concerned about here.  You are welcome to look for holes,
but these issues are where most of my brainpower went during
development.

>> That's true, but that decision has not been uncontroversial - e.g. the
>> NetBSD guys don't like it, because they have a big performance
>> difference between those two types of memory.  We have to balance the
>> possible harm of one more setting against the benefit of letting
>> people do what they want without needing to recompile or modify code.
>
> But then, it made them fix the issue afaik :P

Pah.  :-)

>> You can look at it while the server's running.
>
> That's what debuggers are for.

Tough crowd.  I like it.  YMMV.

>> I would never advocate deliberately trying to circumvent a
>> carefully-considered OS-level policy decision about resource
>> utilization, but I don't think that's the dynamic here.  I think if we
>> insist on predetermining the dynamic shared memory implementation
>> based on the OS, we'll just be inconveniencing people needlessly, or
>> flat-out making things not work. [...]
>
> But using file-backed memory will *suck* performancewise. Why should we
> ever want to offer that to a user? That's what I was arguing about
> primarily.

I see.  There might be additional writeback traffic, but it might not
be that bad in common cases.  After all the data's pretty hot.

>> If we're SURE
>> that a Linux user will prefer "posix" to "sysv" or "mmap" or "none" in
>> 100% of cases, and that a NetBSD user will always prefer "sysv" over
>> "mmap" or "none" in 100% of cases, then, OK, sure, let's bake it in.
>> But I'm not that sure.
>
> I think posix shmem will be preferred to sysv shmem if present, in just
> about any relevant case. I don't know of any system with lower limits on
> posix shmem than on sysv.

OK, how about this....  SysV doesn't allow extending segments, but
mmap does.  The thing here is that you're saying "remove mmap and keep
sysv" but Noah suggested to me that we remove sysv and keep mmap.
This suggests to me that the picture is not so black and white as you
think it is.

>> I shared your opinion that preferred_address is never going to be
>> reliable, although FWIW Noah thinks it can be made reliable with a
>> large-enough hammer.
>
> I think we need to have the arguments for that on list then. Those are
> pretty damn fundamental design decisions.
> I for one cannot see how you even remotely could make that work a) on
> windows (check the troubles we have to go through to get s_b
> consistently placed, and that's directly after startup) b) 32bit systems.

Noah?

>> But even if it isn't reliable, there doesn't seem to be all that much
>> value in forbidding access to that part of the OS-provided API.  In
>> the world where it's not reliable, it may still be convenient to map
>> things at the same address when you can, so that pointers can't be
>> used.  Of course you'd have to have some fallback strategy for when
>> you don't get the same mapping, and maybe that's painful enough that
>> there's no point after all.  Or maybe it's worth having one code path
>> for relativized pointers and another for non-relativized pointers.
>
> It seems likely to me that will end up with untested code in that
> case. Or even unsupported platforms.

Maybe.  I think for the amount of code we're talking about here, it's
not worth getting excited about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: dynamic shared memory
Next
From: Andres Freund
Date:
Subject: Re: INSERT...ON DUPLICATE KEY IGNORE