Thread: Filesystem Direct I/O and WAL sync option
All, I'm very curious to know if we may expect or guarantee any data consistency with WAL sync=OFF but using file system mounted in Direct I/O mode (means every write() system call called by PG really writes to disk before return)... So may we expect data consistency: - none? - per checkpoint basis? - full?... Thanks a lot for any info! Rgds, -Dimitri
Dimitri wrote: > I'm very curious to know if we may expect or guarantee any data > consistency with WAL sync=OFF but using file system mounted in Direct > I/O mode (means every write() system call called by PG really writes > to disk before return)... You'd have to turn that mode on on the data drives as well to get consistency, because fsync=off disables checkpoint fsyncs of the data files as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Yes, disk drives are also having cache disabled or having cache on controllers and battery protected (in case of more high-level storage) - but is it enough to expect data consistency?... (I was surprised about checkpoint sync, but does it always calls write() anyway? because in this way it should work without fsync)... On 7/3/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Dimitri wrote: > > I'm very curious to know if we may expect or guarantee any data > > consistency with WAL sync=OFF but using file system mounted in Direct > > I/O mode (means every write() system call called by PG really writes > > to disk before return)... > > You'd have to turn that mode on on the data drives as well to get > consistency, because fsync=off disables checkpoint fsyncs of the data > files as well. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com >
"Dimitri" <dimitrik.fr@gmail.com> writes: > Yes, disk drives are also having cache disabled or having cache on > controllers and battery protected (in case of more high-level > storage) - but is it enough to expect data consistency?... (I was > surprised about checkpoint sync, but does it always calls write() > anyway? because in this way it should work without fsync)... Well if everything is mounted in sync mode then I suppose you have the same guarantee as if fsync were called after every single write. If that's true then surely that's at least as good. I'm curious how it performs though. Actually it seems like in that configuration fsync should be basically zero-cost. In other words, you should be able to leave fsync=on and get the same performance (whatever that is) and not have to worry about any risks. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Yes Gregory, that's why I'm asking, because from 1800 transactions/sec I'm jumping to 2800 transactions/sec! and it's more than important performance level increase :)) Rgds, -Dimitri On 7/4/07, Gregory Stark <stark@enterprisedb.com> wrote: > > "Dimitri" <dimitrik.fr@gmail.com> writes: > > > Yes, disk drives are also having cache disabled or having cache on > > controllers and battery protected (in case of more high-level > > storage) - but is it enough to expect data consistency?... (I was > > surprised about checkpoint sync, but does it always calls write() > > anyway? because in this way it should work without fsync)... > > Well if everything is mounted in sync mode then I suppose you have the same > guarantee as if fsync were called after every single write. If that's true > then surely that's at least as good. I'm curious how it performs though. > > Actually it seems like in that configuration fsync should be basically > zero-cost. In other words, you should be able to leave fsync=on and get the > same performance (whatever that is) and not have to worry about any risks. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > >
"Dimitri" <dimitrik.fr@gmail.com> writes: > Yes Gregory, that's why I'm asking, because from 1800 transactions/sec > I'm jumping to 2800 transactions/sec! and it's more than important > performance level increase :)) wow. That's kind of suspicious though. Does the new configuration take advantage of the lack of the filesystem cache by increasing the size of shared_buffers? Even then I wouldn't expect such a big boost unless you got very lucky with the size of your working set compared to the two sizes of shared_buffers. It seems likely that somehow this change is not providing the same guarantees as fsync. Perhaps fsync is actually implementing IDE write barriers and the sync mode is just flushing buffers to the hard drive cache and then returning. What transaction rate do you get if you just have a single connection streaming inserts in autocommit mode? What kind of transaction rate do you get with both sync mode on and fsync=on in Postgres? And did you say this with a battery backed cache? In theory fsync=on/off and shouldn't make much difference at all with a battery backed cache. Stranger and stranger. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory, thanks for good questions! :)) I got more lights on my throughput here :)) The running OS is Solaris9 (customer is still not ready to upgrade to Sol10), and I think the main "sync" issue is coming from the old UFS implementation... UFS mounted with 'forcedirectio' option uses different "sync" logic as well accepting concurrent writing to the same file which is giving here a higher performance level. I did not expect really so big gain, so did not think to replay the same test with direct I/O on and fsync=on too. For my big surprise - it also reached 2800 tps as with fsync=off !!! So, initial question is no more valid :)) As well my tests are executed just to validate server + storage capabilities, and honestly it's really pity to see them used under old Solaris version :)) but well, at least we know what kind of performance they may expect currently, and think about migration before the end of this year... Seeing at least 10.000 random writes/sec on storage sub-system during live database test was very pleasant to customer and make feel them comfortable for their production... Thanks a lot for all your help! Best regards! -Dimitri On 7/4/07, Gregory Stark <stark@enterprisedb.com> wrote: > > "Dimitri" <dimitrik.fr@gmail.com> writes: > > > Yes Gregory, that's why I'm asking, because from 1800 transactions/sec > > I'm jumping to 2800 transactions/sec! and it's more than important > > performance level increase :)) > > wow. That's kind of suspicious though. Does the new configuration take > advantage of the lack of the filesystem cache by increasing the size of > shared_buffers? Even then I wouldn't expect such a big boost unless you got > very lucky with the size of your working set compared to the two sizes of > shared_buffers. > > It seems likely that somehow this change is not providing the same > guarantees > as fsync. Perhaps fsync is actually implementing IDE write barriers and the > sync mode is just flushing buffers to the hard drive cache and then > returning. > > What transaction rate do you get if you just have a single connection > streaming inserts in autocommit mode? What kind of transaction rate do you > get > with both sync mode on and fsync=on in Postgres? > > And did you say this with a battery backed cache? In theory fsync=on/off and > shouldn't make much difference at all with a battery backed cache. Stranger > and stranger. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > >
On Tue, Jul 03, 2007 at 04:06:29PM +0100, Heikki Linnakangas wrote: > Dimitri wrote: > >I'm very curious to know if we may expect or guarantee any data > >consistency with WAL sync=OFF but using file system mounted in Direct > >I/O mode (means every write() system call called by PG really writes > >to disk before return)... > > You'd have to turn that mode on on the data drives as well to get > consistency, because fsync=off disables checkpoint fsyncs of the data > files as well. BTW, it might be worth trying the different wal_sync_methods. IIRC, Jonah's seen some good results from open_datasync. -- Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
On 7/9/07, Jim C. Nasby <decibel@decibel.org> wrote: > BTW, it might be worth trying the different wal_sync_methods. IIRC, > Jonah's seen some good results from open_datasync. On Linux, using ext3, reiser, or jfs, I've seen open_sync perform quite better than fsync/fdatasync in most of my tests. But, I haven't done significant testing with direct I/O lately. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
Yes, I tried all WAL sync methods, but there was no difference... However, there was a huge difference when I run the same tests under Solaris10 - 'fdatasync' option gave the best performance level. On the same time direct I/O did not make difference on Solaris 10 :) So the main rule - there is no universal rule :) just adapt system options according your workload... Direct I/O will generally speed-up write operation due avoiding buffer flashing overhead as well concurrent writing (breaking POSIX limitation of single writer per given file on the same time). But on the same time it may slow-down your read operations, and you may need 64bit PG version to use big cache to still keep same performance level on SELECT queries. And again, there are other file systems like QFS (for ex.) which may give you the best of both worlds: direct write and buffered read on the same time! etc. etc. etc. :) Rgds, -Dimitri On 7/9/07, Jonah H. Harris <jonah.harris@gmail.com> wrote: > On 7/9/07, Jim C. Nasby <decibel@decibel.org> wrote: > > BTW, it might be worth trying the different wal_sync_methods. IIRC, > > Jonah's seen some good results from open_datasync. > > On Linux, using ext3, reiser, or jfs, I've seen open_sync perform > quite better than fsync/fdatasync in most of my tests. But, I haven't > done significant testing with direct I/O lately. > > -- > Jonah H. Harris, Software Architect | phone: 732.331.1324 > EnterpriseDB Corporation | fax: 732.331.1301 > 33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com > Iselin, New Jersey 08830 | http://www.enterprisedb.com/ >