Re: [HACKERS] mmap and MAP_ANON - Mailing list pgsql-hackers
From | dg@illustra.com (David Gould) |
---|---|
Subject | Re: [HACKERS] mmap and MAP_ANON |
Date | |
Msg-id | 9805141839.AA19284@hawk.illustra.com Whole thread Raw |
In response to | Re: [HACKERS] mmap and MAP_ANON (Michal Mosiewicz <mimo@interdata.com.pl>) |
List | pgsql-hackers |
Michal Mosiewicz asks: > Why a lot of people investigate how to replace shared memory with > mmapping anonymously but there is no discussion on replacing > reads/writes with memory mapping of heap files. > > This way we would save not only on having better system cache > utilisation but also we would have less memory copying. For me it seems > like a more robust solution. I suggested it few months ago. > > If it's a bad idea, I wonder why? Unfortunately, it is probably a bad idea. The postgres buffer cache is a shared pool of pages containing an assortment of blocks from all the different tables in use by all the different backends. That is, if backend 'a' is reading table 'ta', and backend 'b' is reading table 'tb' then the buffer cache will have blocks from both table 'ta' and table 'tb' in it. The benefit occurs when backend 'x' starts reading either table 'ta' or 'tb'. Rather than have to go to disk, it finds the pages already loaded in the share buffer cache. Likewise, if backend 'a' should modify a page in table 'ta', the change is then visible to all the other backends (ignoring locks for this discussion) without any explicit communication between the backends. If we started creating a separate mmapped region for each table several problems occur: - each time a backend wants to use a table it will have to somehow find out if it is already mapped, and then either map it (for the first time), or attach to an existing mapping created by another backend. This implies that the backends need to communicate with all the other backends to let them know what mappings they are using. - if two backends are using the same table, and the table is too big to map the whole thing, then each backend needs a "window" into the table. This becomes difficult if the two backends are using different parts of the table (ie, the first page and the last page). - there is a finite amount of memory available on the system for postgres to use. This will have to be split amoung all the open tables used by all the backends. If you have 50 backends each using 10 each with 3 indexes, you now need 2,000 mappings in the system. Assuming that there are 2001 pages available for mapping, how do you decide with table gets to map 2 pages? How do you get all the backends to agree about this? Essentially, mapping tables separately creates a requirement for a huge amount of communication and synchronization amoung the backends. And, even if this were not prohibitive, it ends up fragmenting the available memory for buffers so badly that the cacheing becomes ineffective. So, unless you are going to map whole tables and those tables are needed by _all_ the active backends the idea of mmapping separate tables is unworkable. That said, there are tables that meet this criteria, for instance the transaction logs and anchors. Here mmapping might indeed be useful but even so it would take some thought and a fair amount of work to gain any benefit. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 "Of course, someone who knows more about this will correct me if I'm wrong, and someone who knows less will correct me if I'm right." --David Palmer (palmer@tybalt.caltech.edu)
pgsql-hackers by date: