Thread: Re: [HACKERS] Safe/Fast I/O ...
[Please forgive me for the way this post is put together; I'm not actually on your mailing-list, but was just perusing the archives.] Michal Mosiewicz <mimo@interdata.com.pl> writes: > The main reason of using memory mapping is that you don't have to create > unnecessary buffers. Normally, for every operation you have to create > some in-memory buffer, copy the data there, do some operations, put the > data back into file. In case of memory mapping you may avoid of creating > of unnecessary buffers, and moreover you may call your system functions > less frequently. There are also additional savings. (Less memory > copying, reusing memory if several processes map the same file) Additionally, if your operating system is at all reasonable, using memory mapping allows you to take advantage of all the work that has gone into tuning your VM system. If you map a large file, and then access in some way that shows reasonable locality, the VM system will probably be able to do a better job of page replacement on a system-wide basis than you could do with a cache built into your application. (A good system will also provide other benefits, such as pre-faulting and extended read ahead.) Of course, it does have one significant drawback: memory-mapped regions do not automatically extend when their underlying files do. So, for interacting with a structure that shows effectively linear access and growth, asynchronous I/O is more likely to be a benefit, since AIO can extend a file asynchronously, whereas other mechanisms will block while the file is being extended. (Depending on the system, this may not be true for multi-threaded programs.) -GAWollman -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same wollman@lcs.mit.edu | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick
While having some spare two hours I was just looking at the current code of postgres. I was trying to estimate how would it fit to the current postgres guts. Finally I've found more proofs that memory mapping would do a lot to current performance, but I must admit that current storage manager is pretty read/write oriented. It would be easier to integrate memory mapping into buffer manager. Actually buffer manager role is to map some parts of files into memory buffers. However it takes a lot to get through several layers (smgr and finally md). I noticed that one of the very important features of mmaping is that you can sync the buffer (even some part of it), not the whole file. So if there would be some kind of page level locking, it would be absolutly necessary to make sure that only committed pages are synced and we don't overload the IO with unfinished things. Also, I think that there is no need to create buffers in shared memory. I have just tested that if you map files with MAP_SHARED attribute set, then each proces is working on exactly the same copy of memory. I have also noticed more interesting things, maybe somebody would clarify on that since I'm not so literate with mmaping. First thing I was wondering about was how would we deal with open descriptor limits if we use direct buffer-to-file mappings. While currently buffers are isolated from files it's possible to close some descriptors without throwing buffers. However it seems (tried it) that memory mapping works even after a file descriptor is closed. So, is this possible to cross the limit of open files by using memory mapping? Or maybe the descriptor remains open until munmap call? Or maybe it's just a Linux feature? Mike -- WWW: http://www.lodz.pdi.net/~mimo tel: Int. Acc. Code + 48 42 148340 add: Michal Mosiewicz * Bugaj 66 m.54 * 95-200 Pabianice * POLAND
Michal Mosiewicz wrote: > > While having some spare two hours I was just looking at the current code > of postgres. I was trying to estimate how would it fit to the current > postgres guts. > > Finally I've found more proofs that memory mapping would do a lot to > current performance, but I must admit that current storage manager is > pretty read/write oriented. It would be easier to integrate memory > mapping into buffer manager. Actually buffer manager role is to map some > parts of files into memory buffers. However it takes a lot to get > through several layers (smgr and finally md). > > I noticed that one of the very important features of mmaping is that you > can sync the buffer (even some part of it), not the whole file. So if > there would be some kind of page level locking, it would be absolutly > necessary to make sure that only committed pages are synced and we don't > overload the IO with unfinished things. > > Also, I think that there is no need to create buffers in shared memory. > I have just tested that if you map files with MAP_SHARED attribute set, > then each proces is working on exactly the same copy of memory. This means that the processes can share the memory, but these pages must be explicitly mapped in the other process before it can get to them and must be explicitly unmapped from all processes before the memory is freed up. It seems like there are basically two ways we could use this. 1) mmap in all files that might be used and just access them directly. 2) mmap in pages from files as they are needed and munmap the pages out when they are no longer needed. #1 seems easier, but it does limit us to 2gb databases on 32 bit machines. #2 could be done by having a sort of mmap helper. As soon as process A knows that it will need (might need?) a given page from a given file, it communicates this to another process B, which attempts to create a shared mmap for that page. When process A actually needs to use the page, it uses the real mmap, which should be fast if process B has already mapped this page into memory. Other processes could make use of this mapping (following proper locking etiquette), each making their request to B, which simply increments a counter on that mapping for each request after the first one. When a process is done with one of these mappings, it unmaps the page itself, and then tells B that it is done with the page. When B sees that the count on this page has gone to zero, it can either remove its own map, or retain it in some sort of cache in case it is requested again in the near future. Either way, when B figures the page is no longer being used, it unmaps the page itself. This mapping might get synced by the OS at unknown intervals, but processes can sync the pages themselves, say at the end of a transaction. Ocie
On Wed, 15 Apr 1998, Michal Mosiewicz wrote: > isolated from files it's possible to close some descriptors without > throwing buffers. However it seems (tried it) that memory mapping works > even after a file descriptor is closed. So, is this possible to cross > the limit of open files by using memory mapping? Or maybe the descriptor > remains open until munmap call? Or maybe it's just a Linux feature? Nope, thats how it works. A good friend of mine used this in some modifications to INN (probably in INN -current right now). Sending an article involved opening the file, mmapping it, closing the fd, writing the mapped area and munmap-ing. Its pretty slick. Be careful of the file changing under you. /* Matthew N. Dodd | A memory retaining a love you had for life winter@jurai.net | As cruel as it seems nothing ever seems to http://www.jurai.net/~winter | go right - FLA M 3.1:53 */
> > While having some spare two hours I was just looking at the current code > of postgres. I was trying to estimate how would it fit to the current > postgres guts. > > Finally I've found more proofs that memory mapping would do a lot to > current performance, but I must admit that current storage manager is > pretty read/write oriented. It would be easier to integrate memory > mapping into buffer manager. Actually buffer manager role is to map some > parts of files into memory buffers. However it takes a lot to get > through several layers (smgr and finally md). > > I noticed that one of the very important features of mmaping is that you > can sync the buffer (even some part of it), not the whole file. So if > there would be some kind of page level locking, it would be absolutly > necessary to make sure that only committed pages are synced and we don't > overload the IO with unfinished things. We really don't need to worry about it. Our goal it to control flushing of pg_log to disk. If we control that, we don't care if the non-pg_log pages go to disk. In a crash, any non-synced pg_log transactions are rolled-back. We are spoiled because we have just one compact central file to worry about sync-ing. > > Also, I think that there is no need to create buffers in shared memory. > I have just tested that if you map files with MAP_SHARED attribute set, > then each proces is working on exactly the same copy of memory. > > I have also noticed more interesting things, maybe somebody would > clarify on that since I'm not so literate with mmaping. First thing I > was wondering about was how would we deal with open descriptor limits if > we use direct buffer-to-file mappings. While currently buffers are > isolated from files it's possible to close some descriptors without > throwing buffers. However it seems (tried it) that memory mapping works > even after a file descriptor is closed. So, is this possible to cross > the limit of open files by using memory mapping? Or maybe the descriptor > remains open until munmap call? Or maybe it's just a Linux feature? Not sure about this, but the open file limit is not a restriction for us very often, it is. It is a per-backend issue, and I can't imagine cases where a backend has more than 64 file descriptors open. If so, you can increase the kernel limits, usually. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)