Re: MMAP Buffers - Mailing list pgsql-hackers

From Greg Stark
Subject Re: MMAP Buffers
Date
Msg-id BANLkTinZyTLJcq3q_9RUOn8c-0kT70=urw@mail.gmail.com
Whole thread Raw
In response to Re: MMAP Buffers  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: MMAP Buffers  (Radosław Smogura <rsmogura@softperience.eu>)
Re: MMAP Buffers  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Apr 16, 2011 at 7:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> The OP says that this patch maintains the WAL-before-data rule without any explanation of how it accomplishes that
seeminglyquite amazing feat.  I assume I'm going to have to read this patch at some point to refute this assertion, and
Ithink that sucks. I am pretty nearly 100% confident that this approach is utterly doomed, and I don't want to spend a
lotof time on it unless someone can provide me with a compelling explanation of why my confidence is misplaced. 

Fwiw he did explain how he did that. Or at least I think he did --
it's possible I read what I expected because what he came up with is
something I've recently been thinking about.

What he did, I gather, is treat the mmapped buffers as a read-only
copy of the data. To actually make any modifications he copies it into
shared buffers and treats them like normal. When the buffers get
flushed from memory they get written and then the pointers get
repointed back at the mmapped copy. Effectively this means the shared
buffers get extended to include all of the filesystem cache instead of
having to evict buffers from shared buffers just because you want to
read another one that's already in filesystem cache.

It doesn't save the copying between filesystem cache and shared
buffers for buffers that are actually being written to. But it does
save some amount of other copies on read-only traffic and it can even
save some i/o. It does require a function call before each buffer
modification where the pattern is currently <lock buffer>, <mutate
buffer>, <mark buffer dirty>. From what he describes he needs to add a
<prepare buffer for mutation> between the lock and mutate.

I think it's an interesting experiment and it's good to know how to
solve some of the subproblems. Notably, how do you extend files or
drop them atomically across processes? And how do you deal with
getting the mappings to be the same across all the processes or deal
with them being different? But I don't think it's a great long-term
direction. It just seems clunky to have to copy things from mmapped
buffers to local buffers and back. Perhaps the performance testing
will show that clunkiness is well worth it but we'll need to see that
for a wide variety of workloads to judge that.

--
greg


pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: MMAP Buffers
Next
From: Noah Misch
Date:
Subject: Re: Broken HOT chains in system catalogs