Re: O_DSYNC broken on MacOS X? - Mailing list pgsql-hackers

From Darren Duncan
Subject Re: O_DSYNC broken on MacOS X?
Date
Msg-id 4CA54F79.6030508@darrenduncan.net
Whole thread Raw
In response to Re: O_DSYNC broken on MacOS X?  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: O_DSYNC broken on MacOS X?
List pgsql-hackers
Greg Smith wrote:
> You didn't quote the next part of that, which says "fsync() is not 
> sufficient to guarantee that your data is on stable
> storage and on MacOS X we provide a fcntl(), called F_FULLFSYNC, to ask 
> the drive to flush all buffered data to stable storage."  That's exactly 
> what turning on fsync_writethrough does in PostgreSQL.  See 
> http://archives.postgresql.org/pgsql-hackers/2005-04/msg00390.php as the 
> first post on this topic that ultimately led to that behavior being 
> implemented.
> 
>  From the perspective of the database, whether or not the behavior is 
> standards compliant isn't the issue.  Whether pages make it to physical 
> disk or not when fsync is called, or when O_DSYNC writes are done on 
> platforms that support them, is the important part.  If you the OS 
> doesn't do that, it is doing nothing useful from the perspective of the 
> database's expectations.  And that's not true on Darwin unless you 
> specify F_FULLFSYNC, which doesn't happen by default in PostgreSQL.  It 
> only does that when you switch wal_sync_method=fsync_writethrough

Greg Smith also wrote:
> The main downside to switching the default on either OS X or Windows is
developers using those platforms for test deployments will suffer greatly from a
performance drop for data they don't really care about. As those two in
particular are much more likely to be client development platforms, too, that's
a scary thing to consider.

I think that, bottom line, Postgres should be defaulting to whatever the safest 
and most reliable behavior is, per each platform, because data integrity is the 
most important thing, ensuring that a returning commit has actually written data 
to disk.  If performance is worse, then so what?  Code that does nothing has the 
best performance of all, and is also generally useless.

Whenever there is a tradeoff to be made, reliability for speed, then users 
should have to explicitly choose the less reliable option, which would 
demonstrate they know what they're doing.  Let the testers explicitly choose a 
faster and less reliable option for the data they don't care about, and 
otherwise by default users who don't better should get the safest option, for 
data they likely care about.  That is a DBMS priority.

This matter reminds me of a discussion on the SQLite list years ago about 
whether pragma synchronous=normal or synchronous=full should be the default, and 
thankfully 'full' won.

-- Darren Duncan


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: patch: SQL/MED(FDW) DDL
Next
From: Greg Smith
Date:
Subject: Re: O_DSYNC broken on MacOS X?