Re: [HACKERS] mmap and MAP_ANON - Mailing list pgsql-hackers

From dg@illustra.com (David Gould)
Subject Re: [HACKERS] mmap and MAP_ANON
Date
Msg-id 9805141839.AA19284@hawk.illustra.com
Whole thread Raw
In response to Re: [HACKERS] mmap and MAP_ANON  (Michal Mosiewicz <mimo@interdata.com.pl>)
List pgsql-hackers
Michal Mosiewicz asks:
> Why a lot of people investigate how to replace shared memory with
> mmapping anonymously but there is no discussion on replacing
> reads/writes with memory mapping of heap files.
>
> This way we would save not only on having better system cache
> utilisation but also we would have less memory copying. For me it seems
> like a more robust solution. I suggested it few months ago.
>
> If it's a bad idea, I wonder why?

Unfortunately, it is probably a bad idea.

The postgres buffer cache is a shared pool of pages containing an assortment
of blocks from all the different tables in use by all the different backends.

That is, if backend 'a' is reading table 'ta', and backend 'b' is reading
table 'tb' then the buffer cache will have blocks from both table 'ta'
and table 'tb' in it.

The benefit occurs when backend 'x' starts reading either table 'ta' or 'tb'.
Rather than have to go to disk, it finds the pages already loaded in the
share buffer cache. Likewise, if backend 'a' should modify a page in table
'ta', the change is then visible to all the other backends (ignoring locks
for this discussion) without any explicit communication between the backends.

If we started creating a separate mmapped region for each table several
problems occur:

 - each time a backend wants to use a table it will have to somehow find out
   if it is already mapped, and then either map it (for the first time), or
   attach to an existing mapping created by another backend. This implies
   that the backends need to communicate with all the other backends to let
   them know what mappings they are using.

 - if two backends are using the same table, and the table is too big to
   map the whole thing, then each backend needs a "window" into the table.
   This becomes difficult if the two backends are using different parts of
   the table (ie, the first page and the last page).

 - there is a finite amount of memory available on the system for postgres
   to use. This will have to be split amoung all the open tables used by
   all the backends. If you have 50 backends each using 10 each with 3
   indexes, you now need 2,000 mappings in the system. Assuming that there
   are 2001 pages available for mapping, how do you decide with table gets
   to map 2 pages? How do you get all the backends to agree about this?

Essentially, mapping tables separately creates a requirement for a huge
amount of communication and synchronization amoung the backends. And, even
if this were not prohibitive, it ends up fragmenting the available memory
for buffers so badly that the cacheing becomes ineffective.

So, unless you are going to map whole tables and those tables are needed by
_all_ the active backends the idea of mmapping separate tables is unworkable.

That said, there are tables that meet this criteria, for instance the
transaction logs and anchors. Here mmapping might indeed be useful but even
so it would take some thought and a fair amount of work to gain any benefit.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"Of course, someone who knows more about this will correct me if I'm wrong,
 and someone who knows less will correct me if I'm right."
               --David Palmer (palmer@tybalt.caltech.edu)

pgsql-hackers by date:

Previous
From: darcy@druid.net (D'Arcy J.M. Cain)
Date:
Subject: Re: [HACKERS] char(8) vs char8
Next
From: Internet Wire
Date:
Subject: Internet Wire