Re: Filesystem benchmarking for pg 8.3.3 server - Mailing list pgsql-performance
From | Greg Smith |
---|---|
Subject | Re: Filesystem benchmarking for pg 8.3.3 server |
Date | |
Msg-id | Pine.GSO.4.64.0808140825330.26747@westnet.com Whole thread Raw |
In response to | Re: Filesystem benchmarking for pg 8.3.3 server (Ron Mayer <rm_pg@cheapcomplexdevices.com>) |
Responses |
Re: Filesystem benchmarking for pg 8.3.3 server
Re: Filesystem benchmarking for pg 8.3.3 server |
List | pgsql-performance |
On Wed, 13 Aug 2008, Ron Mayer wrote: > First off - some IDE drives don't even support the relatively recent ATA > command that apparently lets the software know when a cache flush is > complete. Right, so this is one reason you can't assume barriers will be available. And barriers don't work regardless if you go through the device mapper, like some LVM and software RAID configurations; see http://lwn.net/Articles/283161/ > Second of all - ext3 fsync() appears to me to be *extremely* stupid. > It only seems to correctly do the correct flushing (and waiting) for a > drive's cache to be flushed when a file's inode has changed. This is bad, but the way PostgreSQL uses fsync seems to work fine--if it didn't, we'd all see unnaturally high write rates all the time. > So I take back what I said about linux and write barriers > being sane. They're not. Right. Where Linux seems to be at right now is that there's this occasional problem people run into where ext3 volumes can get corrupted if there are out of order writes to its journal: http://en.wikipedia.org/wiki/Ext3#No_checksumming_in_journal http://archives.free.net.ph/message/20070518.134838.52e26369.en.html (By the way: I just fixed the ext3 Wikipedia article to reflect the current state of things and dumped a bunch of reference links in to there, including some that are not listed here. I prefer to keep my notes about interesting topics in Wikipedia instead of having my own copies whenever possible). There are two ways to get around this issue ext3. You can disable write caching, changing your default mount options to "data=journal". In the PostgreSQL case, the way the WAL is used seems to keep corruption at bay even with the default "data=ordered" case, but after reading up on this again I'm thinking I may want to switch to "journal" anyway in the future (and retrofit some older installs with that change). I also avoid using Linux LVM whenever possible for databases just on general principle; one less flakey thing in the way. The other way, barriers, is just plain scary unless you know your disk hardware does the right thing and the planets align just right, and even then it seems buggy. I personally just ignore the fact that they exist on ext3, and maybe one day ext4 will get this right. By the way: there is a great ext3 "torture test" program that just came out a few months ago that's useful for checking general filesystem corruption in this context I keep meaning to try, if you've got some cycles to spare working in this area check it out: http://uwsg.indiana.edu/hypermail/linux/kernel/0805.2/1470.html -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
pgsql-performance by date: