Thread: mmap for zeroing WAL log
[ redirected to pgsql-hackers instead of -patches ] Matthew Kirkwood <matthew@hairy.beasts.org> writes: > On Sat, 24 Feb 2001, Bruce Momjian wrote: >> I am confused why mmap() is better than writing to a real file. > It isn't, except that it allows to initialise the logfile in > one syscall, without first allocating and zeroing (and hence > dirtying) 16Mb of memory. Uh, the existing code does not zero 16Mb of memory... it zeroes 8K and then writes that block repeatedly. It's possible that the overhead of a syscall for each 8K block is significant, but on the other hand writing a block at a time is a heavily used and heavily optimized path in all Unixen. It's at least as plausible that the mmap-as-source-of-zeroes path will be slower! I think this is worth looking into, but I'm very far from being sold on it... regards, tom lane
Matthew Kirkwood <matthew@hairy.beasts.org> writes: > I had assumed that the overhead would come from synchronous > metadata incurring writes of at least the inode, block bitmap > and probably an indirect block for each syscall. No Unix that I've ever heard of forces metadata to disk after each "write" call; anyone who tried it would have abysmal performance. That's what fsync and the syncer daemon are for. regards, tom lane
On Sat, 24 Feb 2001, Tom Lane wrote: > >> I am confused why mmap() is better than writing to a real file. > > > It isn't, except that it allows to initialise the logfile in > > one syscall, without first allocating and zeroing (and hence > > dirtying) 16Mb of memory. > > Uh, the existing code does not zero 16Mb of memory... it zeroes > 8K and then writes that block repeatedly. See the "one syscall" bit above. > It's possible that the overhead of a syscall for each 8K block is > significant, I had assumed that the overhead would come from synchronous metadata incurring writes of at least the inode, block bitmap and probably an indirect block for each syscall. > but on the other hand writing a block at a time is a heavily used and > heavily optimized path in all Unixen. It's at least as plausible that > the mmap-as-source-of-zeroes path will be slower! Results: On Linux/ext2, it appears good for a gain of 3-5% for log creations (via a fairly minimal test program). On FreeBSD 4.1-RELEASE/ffs (with all of sync/async/softupdates) it is a couple of percent worse in elapsed time, but consumes around a third more system CPU time (12sec vs 9sec on one test system). I am awaiting numbers from reiserfs but, for now, it looks like I am far from vindicated. Matthew.
On Tue, 27 Feb 2001, Tom Lane wrote: > Matthew Kirkwood <matthew@hairy.beasts.org> writes: > > I had assumed that the overhead would come from synchronous > > metadata incurring writes of at least the inode, block bitmap > > and probably an indirect block for each syscall. > > No Unix that I've ever heard of forces metadata to disk after each > "write" call; anyone who tried it would have abysmal performance. > That's what fsync and the syncer daemon are for. My understanding was that that's exactly what ffs' synchronous metadata writes do. Am I missing something here? Do they jsut schedule I/O, but return without waiting for its completion? Matthew.