Re: possible new option for wal_sync_method - Mailing list pgsql-hackers

From Andres Freund
Subject Re: possible new option for wal_sync_method
Date
Msg-id 201202161932.09708.andres@anarazel.de
Whole thread Raw
In response to possible new option for wal_sync_method  (Dan Scales <scales@vmware.com>)
Responses Re: possible new option for wal_sync_method
List pgsql-hackers
Hi,

On Thursday, February 16, 2012 06:18:23 PM Dan Scales wrote:
> When running Postgres on a single ext3 filesystem on Linux, we find that
> the attached simple patch gives significant performance benefit (7-8% in
> numbers below).  The patch adds a new option for wal_sync_method, which
> is "open_direct".  With this option, the WAL is always opened with
> O_DIRECT (but not O_SYNC or O_DSYNC).  For Linux, the use of only
> O_DIRECT should be correct.  All WAL logs are fully allocated before
> being used, and the WAL buffers are 8K-aligned, so all direct writes are
> guaranteed to complete before returning.  (See
> http://lwn.net/Articles/348739/)
I don't think that behaviour is safe in the face of write caches in the IO 
path. Linux takes care to issue flush/barrier instructions when necessary if 
you issue an fsync/fdatasync, but to my knowledge it does not when O_DIRECT is 
used (That would suck performancewise).
I think that behaviour is safe if you have no externally visible write caching 
enabled but thats not exactly easy to get/document knowledge.


Why should there otherwise be any performance difference between O_DIRECT|
O_SYNC and O_DIRECT in wal write case? There is no metadata that needs to be 
written and I have a hard time imaging that the check whether there is 
metadata is that expensive.

I guess a more interesting case would be comparing O_DIRECT|O_SYNC with 
O_DIRECT + fdatasync() or even O_DIRECT +  
sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE | 
SYNC_FILE_RANGE_WAIT_AFTER)

Any special reason youve did that comparison on ext3? Especially with 
data=ordered its behaviour regarding syncs is pretty insane performancewise. 
Ext4 would be a bit more interesting...

Andres


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: patch for parallel pg_dump
Next
From: Dimitri Fontaine
Date:
Subject: Re: [trivial patch] typo in doc/src/sgml/sepgsql.sgml