Re: We really ought to do something about O_DIRECT and data=journalled on ext4 - Mailing list pgsql-hackers

From Greg Smith
Subject Re: We really ought to do something about O_DIRECT and data=journalled on ext4
Date
Msg-id 4CF6D0A5.8080501@2ndquadrant.com
Whole thread Raw
In response to Re: We really ought to do something about O_DIRECT and data=journalled on ext4  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: We really ought to do something about O_DIRECT and data=journalled on ext4
List pgsql-hackers
Tom Lane wrote:
> I think the best answer is to get out of the business of using
> O_DIRECT by default, especially seeing that available evidence
> suggests it might not be a performance win anyway.
>   

I was concerned that open_datasync might be doing a better job of 
forcing data out of drive write caches.  But the tests I've done on 
RHEL6 so far suggest that's not true; the write guarantees seem to be 
the same as when using fdatasync.  And there's certainly one performance 
regression possible going from fdatasync to open_datasync, the case 
where you're overflowing wal_buffers before you actually commit.

Below is a test of the troublesome behavior on the same RHEL6 system I 
gave test_fsync performance test results from at 
http://archives.postgresql.org/message-id/4CE2EBF8.4040602@2ndquadrant.com

This confirms that the kernel now defining O_DSYNC behavior as being 
available, but not actually supporting it when running the filesystem in 
journaled mode, is the problem here.  That's clearly a kernel bug and no 
fault of PostgreSQL, it's just never been exposed in a default 
configuration before.  The RedHat bugzilla report seems a bit unclear 
about what's going on here, may be worth updating that to note the 
underlying cause.

Regardless, I'm now leaning heavily toward the idea of avoiding 
open_datasync by default given this bug, and backpatching that change to 
at least 8.4.  I'll do some more database-level performance tests here 
just as a final sanity check on that.  My gut feel is now that we'll 
eventually be taking something like Marti's patch, adding some more 
documentation around it, and applying that to HEAD as well as some 
number of back branches.

$ mount | head -n 1
/dev/sda7 on / type ext4 (rw)
$ cat $PGDATA/postgresql.conf | grep wal_sync_method
#wal_sync_method = fdatasync        # the default is the first option
$ pg_ctl start
server starting
LOG:  database system was shut down at 2010-12-01 17:20:16 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
$ psql -c "show wal_sync_method"wal_sync_method
-----------------open_datasync

[Edit /etc/fstab, change mount options to be "data=journal" and reboot]

$ mount | grep journal
/dev/sda7 on / type ext4 (rw,data=journal)
$ cat postgresql.conf | grep wal_sync_method
#wal_sync_method = fdatasync        # the default is the first option
$ pg_ctl start
server starting
LOG:  database system was shut down at 2010-12-01 12:14:50 EST
PANIC:  could not open file "pg_xlog/000000010000000000000001" (log file 
0, segment 1): Invalid argument
LOG:  startup process (PID 2690) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
$ pg_ctl stop

$ vi $PGDATA/postgresql.conf
$ cat $PGDATA/postgresql.conf | grep wal_sync_method
wal_sync_method = fdatasync        # the default is the first option
$ pg_ctl start
server starting
LOG:  database system was shut down at 2010-12-01 12:14:40 EST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: crash-safe visibility map, take three
Next
From: Jim Nasby
Date:
Subject: Re: crash-safe visibility map, take three