Re: PERFORMANCE IMPROVEMENT by mapping WAL FILES - Mailing list pgsql-hackers
From | Janardhana Reddy |
---|---|
Subject | Re: PERFORMANCE IMPROVEMENT by mapping WAL FILES |
Date | |
Msg-id | 3BB83DF0.8946973@mediaring.com.sg Whole thread Raw |
In response to | Re: PERFORMANCE IMPROVEMENT by mapping WAL FILES (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: PERFORMANCE IMPROVEMENT by mapping WAL FILES
Re: PERFORMANCE IMPROVEMENT by mapping WAL FILES |
List | pgsql-hackers |
I have just completed the functional testing the WAL using mmap , it is working fine, I have tested by commenting out the "CreateCheckPoint " functionality so that when i kill the postgres and restart it will redo all the records from the WAL log file which is updated using mmap. Just i need to clean code and to do some stress testing.By the end of thisweek i should able to complete the stress test and generate the patch file . As Tom Lane mentioned i see the problem in portability to all platforms, what i propose is to use mmap for only WAL for some platforms like linux,freebsd etc . For other platforms we canuse the existing method by slightly modifying thewrite() routine to write only the modified part of the page. Regards jana > > > OK, I have talked to Tom Lane about this on the phone and we have a few > ideas. > > Historically, we have avoided mmap() because of portability problems, > and because using mmap() to write to large tables could consume lots of > address space with little benefit. However, I perhaps can see WAL as > being a good use of mmap. > > First, there is the issue of using mmap(). For OS's that have the > mmap() MAP_SHARED flag, different backends could mmap the same file and > each see the changes. However, keep in mind we still have to fsync() > WAL, so we need to use msync(). > > So, looking at the benefits of using mmap(), we have overhead of > different backends having to mmap something that now sits quite easily > in shared memory. Now, I can see mmap reducing the copy from user to > kernel, but there are other ways to fix that. We could modify the > write() routines to write() 8k on first WAL page write and later write > only the modified part of the page to the kernel buffers. The old > kernel buffer is probably still around so it is unlikely to require a > read from the file system to read in the rest of the page. This reduces > the write from 8k to something probably less than 4k which is better > than we can do with mmap. > > I will add a TODO item to this effect. > > As far as reducing the write to disk from 8k to 4k, if we have to > fsync/msync, we have to wait for the disk to spin to the proper location > and at that point writing 4k or 8k doesn't seem like much of a win. > > In summary, I think it would be nice to reduce the 8k transfer from user > to kernel on secondary page writes to only the modified part of the > page. I am uncertain if mmap() or anything else will help the physical > write to the disk. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
pgsql-hackers by date: