Re: [HACKERS] mmap and MAP_ANON - Mailing list pgsql-hackers
From | ocie@paracel.com |
---|---|
Subject | Re: [HACKERS] mmap and MAP_ANON |
Date | |
Msg-id | 9805131838.AA05684@dolomite.paracel.com Whole thread Raw |
In response to | Re: [HACKERS] mmap and MAP_ANON (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Tom Lane wrote: > > "Göran Thyni" <goran@bildbasen.se> writes: > > Linux can only MAP_SHARED if the file is a *real* file, > > devices or trick like MAP_ANON does only work with MAP_PRIVATE. > > Well, this makes some sense: MAP_SHARED implies that the shared memory > will also be accessible to independently started processes, and > to do that you have to have an openable filename to refer to the > data segment by. > > MAP_PRIVATE will *not* work for our purposes: according to my copy > of mmap(2): > > : If MAP_PRIVATE is set in flags: > : o Modification to the mapped region by the calling process is > : not visible to other processes which have mapped the same > : region using either MAP_PRIVATE or MAP_SHARED. > : Modifications are not visible to descendant processes that > : have inherited the mapped region across a fork(). > > so privately mapped segments are useless for interprocess communication, > even after we get rid of exec(). > > mmaping /dev/zero, as has been suggested earlier in this thread, > seems like a really bad idea to me. Would that not imply that > any process anywhere in the system that also decides to mmap /dev/zero > would get its hands on the Postgres shared memory segment? You > can't restrict permissions on /dev/zero to prevent it. On some systems, mmaping /dev/zero can be shared with child processes as in this example: #include <sys/types.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <sys/wait.h> int main() { int fd; caddr_t ma; int i; int pagesize = sysconf(_SC_PAGESIZE); fd=open("/dev/zero",O_RDWR); if (fd==-1) { perror("open"); exit(1); } ma=mmap((caddr_t) 0, pagesize, (PROT_READ|PROT_WRITE), MAP_SHARED, fd, 0); if ((int)ma == -1) { perror("mmap"); exit(1); } memset(ma,0,pagesize); i=fork(); if (i==-1) { perror("fork"); exit(1); } if (i==0) { /* child */ ((char*)ma)[0]=1; sleep(1); printf("child %d %d\n",((char*)ma)[0],((char*)ma)[1]); sleep(1); return 0; } else { /* parent */ ((char*)ma)[1]=1; sleep(1); printf("parent %d %d\n",((char*)ma)[0],((char*)ma)[1]); } wait(NULL); munmap(ma,pagesize*10); return 0; } This works on Solaris and as expected, both the parent and child are able to write into the memory and their changes are honored (the memory is truely shared between processes. We can certainly map a real file, and this might even give us some interesting crash recovery options. The nice thing about doing away with the exec is that the memory mapped in the parent process is avalible at the same address region in every process, so we don't have to do funky pointer tricks. The only problem I see with mmap is that we don't know exactly when a page will be written to disk. I.E. If you make two writes, the page might get sync'ed between them, thus storing an inconsistant intermediate state to the disk. Perhaps with proper transaction control, this is not a problem. The question is should the individual database files be mapped into memory, or should one "pgmem" file be mapped, with pages from different files read into it. The first option would allow different backend processes to map different pages of different files as they are needed. The postmaster could "pre-map" pages on behalf of the backend processes as sort of an inteligent read-ahead mechanism. I'll try to write this seperate from Postgres just to see how it works. Ocie
pgsql-hackers by date: