Re: We really ought to do something about O_DIRECT and data=journalled on ext4 - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: We really ought to do something about O_DIRECT and data=journalled on ext4 |
Date | |
Msg-id | 4CF6D0A5.8080501@2ndquadrant.com Whole thread Raw |
In response to | Re: We really ought to do something about O_DIRECT and data=journalled on ext4 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: We really ought to do something about O_DIRECT and data=journalled on ext4
|
List | pgsql-hackers |
Tom Lane wrote: > I think the best answer is to get out of the business of using > O_DIRECT by default, especially seeing that available evidence > suggests it might not be a performance win anyway. > I was concerned that open_datasync might be doing a better job of forcing data out of drive write caches. But the tests I've done on RHEL6 so far suggest that's not true; the write guarantees seem to be the same as when using fdatasync. And there's certainly one performance regression possible going from fdatasync to open_datasync, the case where you're overflowing wal_buffers before you actually commit. Below is a test of the troublesome behavior on the same RHEL6 system I gave test_fsync performance test results from at http://archives.postgresql.org/message-id/4CE2EBF8.4040602@2ndquadrant.com This confirms that the kernel now defining O_DSYNC behavior as being available, but not actually supporting it when running the filesystem in journaled mode, is the problem here. That's clearly a kernel bug and no fault of PostgreSQL, it's just never been exposed in a default configuration before. The RedHat bugzilla report seems a bit unclear about what's going on here, may be worth updating that to note the underlying cause. Regardless, I'm now leaning heavily toward the idea of avoiding open_datasync by default given this bug, and backpatching that change to at least 8.4. I'll do some more database-level performance tests here just as a final sanity check on that. My gut feel is now that we'll eventually be taking something like Marti's patch, adding some more documentation around it, and applying that to HEAD as well as some number of back branches. $ mount | head -n 1 /dev/sda7 on / type ext4 (rw) $ cat $PGDATA/postgresql.conf | grep wal_sync_method #wal_sync_method = fdatasync # the default is the first option $ pg_ctl start server starting LOG: database system was shut down at 2010-12-01 17:20:16 EST LOG: database system is ready to accept connections LOG: autovacuum launcher started $ psql -c "show wal_sync_method"wal_sync_method -----------------open_datasync [Edit /etc/fstab, change mount options to be "data=journal" and reboot] $ mount | grep journal /dev/sda7 on / type ext4 (rw,data=journal) $ cat postgresql.conf | grep wal_sync_method #wal_sync_method = fdatasync # the default is the first option $ pg_ctl start server starting LOG: database system was shut down at 2010-12-01 12:14:50 EST PANIC: could not open file "pg_xlog/000000010000000000000001" (log file 0, segment 1): Invalid argument LOG: startup process (PID 2690) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure $ pg_ctl stop $ vi $PGDATA/postgresql.conf $ cat $PGDATA/postgresql.conf | grep wal_sync_method wal_sync_method = fdatasync # the default is the first option $ pg_ctl start server starting LOG: database system was shut down at 2010-12-01 12:14:40 EST LOG: database system is ready to accept connections LOG: autovacuum launcher started -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
pgsql-hackers by date: