Thread: Re: Best filesystem for PostgreSQL Database Cluster under Linux
After a long battle with technology, "Pete de Zwart" <dezwart@froob.net>, an earthling, wrote: > Greetings to one and all, > > I've been trying to find some information on selecting an optimal > filesystem setup for a volume that will only contain a PostgreSQL Database > Cluster under Linux. Searching through the mailing list archive showed some > promising statistics on the various filesystems available to Linux, ranging > from ext2 through reiserfs and xfs. > > I have come to understand that PostgreSQLs Write Ahead Logging > (WAL) performs a lot of the journal functionality provided by the > majoirty of contemporary filesystems and that having both WAL and > filesystem journalling can degrade performance. > > Could anyone point me in the right direction so that I can read > up some more on this issue to discern which filesystem to choose and > how to tune both the FS and PostgreSQL so that they can compliment > each other? I've attempted to find this information via the FAQ, > Google and the mailing list archives but have lucked out for the > moment. Your understanding of the impact of filesystem journalling isn't entirely correct. In the cases of interest, journalling is done on metadata, not on the contents of files, with the result that there isn't really that much overlap between the two forms of "journalling" that are taking place. I did some benchmarking last year that compared, on a write-heavy load, ext3, XFS, and JFS. I found that ext3 was materially (if memory serves, 15%) slower than the others, and that there was a persistent _slight_ (a couple percent) advantage to JFS over XFS. This _isn't_ highly material, particularly considering that I was working with a 100% Write load, whereas "real world" work is likely to have more of a mixture. If you have reason to consider one filesystem or another better supported by your distribution vendor, THAT is a much more important reason to pick a particular filesystem than 'raw speed.' -- output = ("cbbrowne" "@" "cbbrowne.com") http://cbbrowne.com/info/fs.html Rules of the Evil Overlord #138. "The passageways to and within my domain will be well-lit with fluorescent lighting. Regrettably, the spooky atmosphere will be lost, but my security patrols will be more effective." <http://www.eviloverlord.com/>
Thanks for the info. I managed to pull out some archived posts to this list from the PostgreSQL web site about this issue which have helped a bit. Unfortunatly, the FS has been chosen before considering the impact of it on I/O for PostgreSQL. As the Cluster is sitting on it's on 200GB IDE drive for the moment and the system is partially live, it's not feasable to change the underlying file system without great pain and suffering. In the great fsync debates that I've seen, the pervasive opinion about journalling file systems under Linux and PostgreSQL is to have the filesystem mount option data=writeback, assuming that fsync in PostgreSQL will handle coherency of the file data and the FS will handle metadata. This is all academic to a point, as tuning the FS will get a small improvement on I/O compared to the improvement potential of moving to SCSI/FCAL, that and getting more memory. Regards, Pete de Zwart. "Christopher Browne" <cbbrowne@acm.org> wrote in message news:m3zmzgayzl.fsf@knuth.knuth.cbbrowne.com... > Your understanding of the impact of filesystem journalling isn't > entirely correct. In the cases of interest, journalling is done on > metadata, not on the contents of files, with the result that there > isn't really that much overlap between the two forms of "journalling" > that are taking place. > > I did some benchmarking last year that compared, on a write-heavy > load, ext3, XFS, and JFS. > > I found that ext3 was materially (if memory serves, 15%) slower than > the others, and that there was a persistent _slight_ (a couple > percent) advantage to JFS over XFS. > > This _isn't_ highly material, particularly considering that I was > working with a 100% Write load, whereas "real world" work is likely to > have more of a mixture. > > If you have reason to consider one filesystem or another better > supported by your distribution vendor, THAT is a much more important > reason to pick a particular filesystem than 'raw speed.'
On Wed, 12 Jan 2005 07:25:43 +1100, Pete de Zwart <dezwart@froob.net> wrote: [snip] > improvement on I/O compared to the improvement potential of moving to > SCSI/FCAL, that and getting more memory. > I would like to ask the question that continues to loom large over all DBAs. SCSI, FCAL and SATA, which works best. Most FCAL loops have a speed limit of either 1Gbps or 2Gbps. This is only 100MB/sec or 200MB/sec. U320 SCSI can handle 320MB/sec and the AMCC (formerly 3Ware) SATA Raid cards show throughput over 400MB/sec with good IOs/sec on PCI-X. I am not prepared to stand by whilst someone makes a sideways claim that SCSI or FCAL is implicitly going to give better performance than anything else. It will depend on your data set, and how you configure your drives, and how good your controller is. We have a Compaq Smart Array controler with a 3 drive RAID 5 than can't break 10MB/sec write on a Bonnie++ benchmark. This is virtualy the slowest system in our datacenter, but has a modern controler and 10k disks, whilst our PATA systems manage much better throughput. (Yes I know that MB/sec is not the only speed measure, it also does badly on IO/sec). Alex Turner NetEconomist