Thread: Re: [HACKERS] Safe/Fast I/O ...

Re: [HACKERS] Safe/Fast I/O ...

From
Garrett Wollman
Date:
[Please forgive me for the way this post is put together; I'm not
actually on your mailing-list, but was just perusing the archives.]

Michal Mosiewicz <mimo@interdata.com.pl> writes:

> The main reason of using memory mapping is that you don't have to create
> unnecessary buffers. Normally, for every operation you have to create
> some in-memory buffer, copy the data there, do some operations, put the
> data back into file. In case of memory mapping you may avoid of creating
> of unnecessary buffers, and moreover you may call your system functions
> less frequently. There are also additional savings. (Less memory
> copying, reusing memory if several processes map the same file)

Additionally, if your operating system is at all reasonable, using
memory mapping allows you to take advantage of all the work that has
gone into tuning your VM system.  If you map a large file, and then
access in some way that shows reasonable locality, the VM system will
probably be able to do a better job of page replacement on a
system-wide basis than you could do with a cache built into your
application.  (A good system will also provide other benefits, such as
pre-faulting and extended read ahead.)

Of course, it does have one significant drawback: memory-mapped regions
do not automatically extend when their underlying files do.  So, for
interacting with a structure that shows effectively linear access and
growth, asynchronous I/O is more likely to be a benefit, since AIO can
extend a file asynchronously, whereas other mechanisms will block
while the file is being extended.  (Depending on the system, this may
not be true for multi-threaded programs.)

-GAWollman

--
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
wollman@lcs.mit.edu  | O Siem / The fires of freedom
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick

Memory mapping (Was: Safe/Fast I/O ...)

From
Michal Mosiewicz
Date:
While having some spare two hours I was just looking at the current code
of postgres. I was trying to estimate how would it fit to the current
postgres guts.

Finally I've found more proofs that memory mapping would do a lot to
current performance, but I must admit that current storage manager is
pretty read/write oriented. It would be easier to integrate memory
mapping into buffer manager. Actually buffer manager role is to map some
parts of files into memory buffers. However it takes a lot to get
through several layers (smgr and finally md).

I noticed that one of the very important features of mmaping is that you
can sync the buffer (even some part of it), not the whole file. So if
there would be some kind of page level locking, it would be absolutly
necessary to make sure that only committed pages are synced and we don't
overload the IO with unfinished things.

Also, I think that there is no need to create buffers in shared memory.
I have just tested that if you map files with MAP_SHARED attribute set,
then each proces is working on exactly the same copy of memory.

I have also noticed more interesting things, maybe somebody would
clarify on that since I'm not so literate with mmaping. First thing I
was wondering about was how would we deal with open descriptor limits if
we use direct buffer-to-file mappings. While currently buffers are
isolated from files it's possible to close some descriptors without
throwing buffers. However it seems (tried it) that memory mapping works
even after a file descriptor is closed. So, is this possible to cross
the limit of open files by using memory mapping? Or maybe the descriptor
remains open until munmap call? Or maybe it's just a Linux feature?

Mike

--
WWW: http://www.lodz.pdi.net/~mimo  tel: Int. Acc. Code + 48 42 148340
add: Michal Mosiewicz  *  Bugaj 66 m.54 *  95-200 Pabianice  *  POLAND

Re: [HACKERS] Memory mapping (Was: Safe/Fast I/O ...)

From
ocie@paracel.com
Date:
Michal Mosiewicz wrote:
>
> While having some spare two hours I was just looking at the current code
> of postgres. I was trying to estimate how would it fit to the current
> postgres guts.
>
> Finally I've found more proofs that memory mapping would do a lot to
> current performance, but I must admit that current storage manager is
> pretty read/write oriented. It would be easier to integrate memory
> mapping into buffer manager. Actually buffer manager role is to map some
> parts of files into memory buffers. However it takes a lot to get
> through several layers (smgr and finally md).
>
> I noticed that one of the very important features of mmaping is that you
> can sync the buffer (even some part of it), not the whole file. So if
> there would be some kind of page level locking, it would be absolutly
> necessary to make sure that only committed pages are synced and we don't
> overload the IO with unfinished things.
>
> Also, I think that there is no need to create buffers in shared memory.
> I have just tested that if you map files with MAP_SHARED attribute set,
> then each proces is working on exactly the same copy of memory.

This means that the processes can share the memory, but these pages
must be explicitly mapped in the other process before it can get to
them and must be explicitly unmapped from all processes before the
memory is freed up.

It seems like there are basically two ways we could use this.

1) mmap in all files that might be used and just access them directly.

2) mmap in pages from files as they are needed and munmap the pages
out when they are no longer needed.

#1 seems easier, but it does limit us to 2gb databases on 32 bit
machines.

#2 could be done by having a sort of mmap helper.  As soon as process
A knows that it will need (might need?) a given page from a given
file, it communicates this to another process B, which attempts to
create a shared mmap for that page.  When process A actually needs to
use the page, it uses the real mmap, which should be fast if process B
has already mapped this page into memory.

Other processes could make use of this mapping (following proper
locking etiquette), each making their request to B, which simply
increments a counter on that mapping for each request after the first
one.  When a process is done with one of these mappings, it unmaps the
page itself, and then tells B that it is done with the page.  When B
sees that the count on this page has gone to zero, it can either
remove its own map, or retain it in some sort of cache in case it is
requested again in the near future.  Either way, when B figures the
page is no longer being used, it unmaps the page itself.

This mapping might get synced by the OS at unknown intervals, but
processes can sync the pages themselves, say at the end of a
transaction.

Ocie

Re: [HACKERS] Memory mapping (Was: Safe/Fast I/O ...)

From
"Matthew N. Dodd"
Date:
On Wed, 15 Apr 1998, Michal Mosiewicz wrote:
> isolated from files it's possible to close some descriptors without
> throwing buffers. However it seems (tried it) that memory mapping works
> even after a file descriptor is closed. So, is this possible to cross
> the limit of open files by using memory mapping? Or maybe the descriptor
> remains open until munmap call? Or maybe it's just a Linux feature?

Nope, thats how it works.

A good friend of mine used this in some modifications to INN (probably in
INN -current right now).

Sending an article involved opening the file, mmapping it, closing the fd,
writing the mapped area and munmap-ing.

Its pretty slick.

Be careful of the file changing under you.

/*
   Matthew N. Dodd        | A memory retaining a love you had for life
   winter@jurai.net        | As cruel as it seems nothing ever seems to
   http://www.jurai.net/~winter | go right - FLA M 3.1:53
*/


Re: [HACKERS] Memory mapping (Was: Safe/Fast I/O ...)

From
Bruce Momjian
Date:
>
> While having some spare two hours I was just looking at the current code
> of postgres. I was trying to estimate how would it fit to the current
> postgres guts.
>
> Finally I've found more proofs that memory mapping would do a lot to
> current performance, but I must admit that current storage manager is
> pretty read/write oriented. It would be easier to integrate memory
> mapping into buffer manager. Actually buffer manager role is to map some
> parts of files into memory buffers. However it takes a lot to get
> through several layers (smgr and finally md).
>
> I noticed that one of the very important features of mmaping is that you
> can sync the buffer (even some part of it), not the whole file. So if
> there would be some kind of page level locking, it would be absolutly
> necessary to make sure that only committed pages are synced and we don't
> overload the IO with unfinished things.

We really don't need to worry about it.  Our goal it to control flushing
of pg_log to disk.  If we control that, we don't care if the non-pg_log
pages go to disk.  In a crash, any non-synced pg_log transactions are
rolled-back.

We are spoiled because we have just one compact central file to worry
about sync-ing.

>
> Also, I think that there is no need to create buffers in shared memory.
> I have just tested that if you map files with MAP_SHARED attribute set,
> then each proces is working on exactly the same copy of memory.
>
> I have also noticed more interesting things, maybe somebody would
> clarify on that since I'm not so literate with mmaping. First thing I
> was wondering about was how would we deal with open descriptor limits if
> we use direct buffer-to-file mappings. While currently buffers are
> isolated from files it's possible to close some descriptors without
> throwing buffers. However it seems (tried it) that memory mapping works
> even after a file descriptor is closed. So, is this possible to cross
> the limit of open files by using memory mapping? Or maybe the descriptor
> remains open until munmap call? Or maybe it's just a Linux feature?

Not sure about this, but the open file limit is not a restriction for us
very often, it is.  It is a per-backend issue, and I can't imagine cases
where a backend has more than 64 file descriptors open.  If so, you can
increase the kernel limits, usually.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)