Re: Raid 10 chunksize - Mailing list pgsql-performance

From Greg Smith
Subject Re: Raid 10 chunksize
Date
Msg-id alpine.GSO.2.01.0904030556470.4011@westnet.com
Whole thread Raw
In response to Re: Raid 10 chunksize  (Scott Carey <scott@richrelevance.com>)
List pgsql-performance
On Thu, 2 Apr 2009, Scott Carey wrote:

> The big one, is this quote from the linux kernel list:
> " Right now, if you want a reliable database on Linux, you _cannot_
> properly depend on fsync() or fdatasync().  Considering how much Linux
> is used for critical databases, using these functions, this amazes me.
> "

Things aren't as bad as that out of context quote makes them seem.  There
are two main problem situations here:

1) You cannot trust Linux to flush data to a hard drive's write cache.
Solution:  turn off the write cache.  Given the general poor state of
targeted fsync on Linux (quoting from a downthread comment by David Lang:
"in data=ordered mode, the default for most distros, ext3 can end up
having to write all pending data when you do a fsync on one file"), those
fsyncs were likely to blow out the drive cache anyway.

2) There are no hard guarantees about write ordering at the disk level; if
you write blocks ABC and then fsync, you might actually get, say, only B
written before power goes out.  I don't believe the PostgreSQL WAL design
will be corrupted by this particular situation, because until that fsync
comes back saying all 3 are done none of them are relied upon.

> Interestingly, postgres would be safer on linux if it used
> sync_file_range instead of fsync() but that has other drawbacks and
> limitations

I have thought about whether it would be possible to add a Linux-specific
improvement here into the code path that does something custom in this
area for Windows/Mac OS X when you use fsync_method=fsync_writethrough

We really should update the documentation in this area before 8.4 ships.
I'm looking into moving the "Tuning PostgreSQL WAL Synchronization" paper
I wrote onto the wiki and then fleshing it out with all this
filesystem-specific trivia.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Raid 10 chunksize
Next
From: Matthew Wakeling
Date:
Subject: Rewriting using rules for performance