Re: Filesystem benchmarking for pg 8.3.3 server - Mailing list pgsql-performance
From | Ron Mayer |
---|---|
Subject | Re: Filesystem benchmarking for pg 8.3.3 server |
Date | |
Msg-id | 48A36090.6070309@cheapcomplexdevices.com Whole thread Raw |
In response to | Re: Filesystem benchmarking for pg 8.3.3 server ("Scott Marlowe" <scott.marlowe@gmail.com>) |
Responses |
Re: Filesystem benchmarking for pg 8.3.3 server
|
List | pgsql-performance |
Scott Marlowe wrote: >IDE came up corrupted every single time. Greg Smith wrote: > you've drank the kool-aid ... completely > ridiculous ...unsafe fsync ... md0 RAID-1 > array (aren't there issues with md and the barriers?) Alright - I'll eat my words. Or mostly. I still haven't found IDE drives that lie; but if the testing I've done today, I'm starting to think that: 1a) ext3 fsync() seems to lie badly. 1b) but ext3 can be tricked not to lie (but not in the way you might think). 2a) md raid1 fsync() sometimes doesn't actually sync 2b) I can't trick it not to. 3a) some IDE drives don't even pretend to support letting you know when their cache is flushed 3b) but the kernel will happily tell you about any such devices; as well as including md raid ones. In more detail. I tested on a number of systems and disks including new (this year) and old (1997) IDE drives; and EXT3 with and without the "barrier=1" mount option. First off - some IDE drives don't even support the relatively recent ATA command that apparently lets the software know when a cache flush is complete. Apparently on those you will get messages in your system logs: %dmesg | grep 'disabling barriers' JBD: barrier-based sync failed on md1 - disabling barriers JBD: barrier-based sync failed on hda3 - disabling barriers and %hdparm -I /dev/hdf | grep FLUSH_CACHE_EXT will not show you anything on those devices. IMHO that's cool; and doesn't count as a lying IDE drive since it didn't claim to support this. Second of all - ext3 fsync() appears to me to be *extremely* stupid. It only seems to correctly do the correct flushing (and waiting) for a drive's cache to be flushed when a file's inode has changed. For example, in the test program below, it will happily do a real fsync (i.e. the program take a couple seconds to run) so long as I have the "fchmod()" statements are in there. It will *NOT* wait on my system if I comment those fchmod()'s out. Sadly, I get the same behavior with and without the ext3 barrier=1 mount option. :( ========================================================== /* ** based on http://article.gmane.org/gmane.linux.file-systems/21373 ** http://thread.gmane.org/gmane.linux.kernel/646040 */ #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> int main(int argc,char *argv[]) { if (argc<2) { printf("usage: fs <filename>\n"); exit(1); } int fd = open (argv[1], O_RDWR | O_CREAT | O_TRUNC, 0666); int i; for (i=0;i<100;i++) { char byte; pwrite (fd, &byte, 1, 0); fchmod (fd, 0644); fchmod (fd, 0664); fsync (fd); } } ========================================================== Since it does indeed wait when the inode's touched, I think it suggests that it's not the hard drive that's lying, but rather ext3. So I take back what I said about linux and write barriers being sane. They're not. But AFACT, all the (6 different) IDE drives I've seen work as advertised, and the kernel happily seems to spews boot messages when it finds one that doesn't support knowing when a cache flush finished.
pgsql-performance by date: