Thread: fsync = true beneficial on ext3?
I'm curious what the consensus is, if any, on use of fsync on ext3 filesystems with postgresql 7.3.4 or later. I did some recent performance tests demonstrating a 45%-70% performance improvement for simple inserts with fsync off on one particular system. Does fsync = true buy me any additional recoverability beyond ext3's journal recovery? If we write something without sync'ing, presumably it's immediately journaled? So even if the DB crashes prior to fsync'ing, are we fully recoverable? I've been running a few pgsql clusters on ext3 with fsync = false, suffered numerous OS crashes, and have yet to lose any data or see any corruption from any of those crashes. Have I just been lucky? TIA. Ed
"Ed L." <pgsql@bluepolka.net> writes: > If we write something without sync'ing, presumably it's immediately > journaled? I was under the impression that ext3 journals only filesystem metadata, not the contents of files. > I've been running a few pgsql clusters on ext3 with fsync = > false, suffered numerous OS crashes, and have yet to lose any data or see > any corruption from any of those crashes. Have I just been lucky? Doesn't sound very safe to me. regards, tom lane
On Sun, 08 Feb 2004 14:02:26 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > If we write something without sync'ing, presumably it's immediately > > journaled? > I was under the impression that ext3 journals only filesystem metadata, > not the contents of files. by default, it journals everything, but you can set it to journal metadata only, i think with the mount option data=writeback. do a "man mount" and look for ext3 options for details on the data= option. richard -- Richard Welty rwelty@averillpark.net Averill Park Networking 518-573-7592 Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security
On Sunday February 8 2004 12:02, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > If we write something without sync'ing, presumably it's immediately > > journaled? > > I was under the impression that ext3 journals only filesystem metadata, > not the contents of files. Ah, didn't know how that worked. So I gather there is really no kernel-level substitute for fsync = true when it comes to guaranteeing data is flushed to disk at commit time, I guess? In linux, does pgsql's fsync call at commit time obviate the need for bdflush to do any flushing for that particular data? I'm wondering if there are bdflush adjustments to be made to improve disk write efficiency given we can count on fsync = true to guarantee that . Also, with fsync = true and wal using fdatasync, and assuming all is on the same disk (which I know is not optimal), is there a particular ext3 mode (data=writeback?) that gives better performance while maintaining best recoverability?
FYI - Ext3 has 3 modes : data=ordered(default) : metadata is journaled (at write time data is written before metadata - i.e ordered) data=journal: data and metadata are journaled data=writeback: metadata journaled (no ordering at write time) The default will not help to protect database integrity if fsync is false (as only metadata is journaled) Will data=journal mode help? I am uncertain. A casual reading if these definitions suggests that it *might* - anyone know for sure? regards Mark Richard Welty wrote: > >by default, it journals everything, but you can set it to journal metadata >only, i think with the mount option data=writeback. do a "man mount" >and look for ext3 options for details on the data= option. > > > >
On Mon, Feb 09, 2004 at 03:13:08PM +1300, Mark Kirkwood wrote: > FYI - Ext3 has 3 modes : > > data=ordered(default) : metadata is journaled (at write time data is > written before metadata - i.e ordered) > data=journal: data and metadata are journaled > data=writeback: metadata journaled (no ordering at write time) Thanks for that. > The default will not help to protect database integrity if fsync is > false (as only metadata is journaled) > > Will data=journal mode help? I am uncertain. A casual reading if these > definitions suggests that it *might* - anyone know for sure? My problem is that journalling works on a per-file basis. ie, the data for a file is written before that file's metadata. However, the fsync is used for the WAL segments and if you can't guarentee the WAL will hit the disk before the data segments (different files), you're stuffed I think. Or maybe WAL is not that sensitive to that kind of reordering. Maybe it only depends on the WAL being consistant. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > (... have gone from d-i being barely usable even by its developers > anywhere, to being about 20% done. Sweet. And the last 80% usually takes > 20% of the time, too, right?) -- Anthony Towns, debian-devel-announce
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > My problem is that journalling works on a per-file basis. ie, the data for a > file is written before that file's metadata. However, the fsync is used for > the WAL segments and if you can't guarentee the WAL will hit the disk before > the data segments (different files), you're stuffed I think. > Or maybe WAL is not that sensitive to that kind of reordering. Maybe it only > depends on the WAL being consistant. The entire *point* of WAL is that WAL entries must hit disk before any of the data-file changes they describe (that's why it's called write AHEAD log). Without this you can't use WAL replay to ensure the data files are brought to a fully consistent state. So yes, we do have to have cross-file write ordering guarantees. fsync is a pretty blunt tool for enforcing cross-file write ordering, but it's the only one available... regards, tom lane
Ed L. wrote: > > I'm curious what the consensus is, if any, on use of fsync on ext3 > filesystems with postgresql 7.3.4 or later. I did some recent performance > tests demonstrating a 45%-70% performance improvement for simple inserts > with fsync off on one particular system. Does fsync = true buy me any > additional recoverability beyond ext3's journal recovery? Yes, it does. Without fsync, you can't be sure the data has been pushed to the disk drive in case of an OS crash or power failure. > If we write something without sync'ing, presumably it's immediately > journaled? So even if the DB crashes prior to fsync'ing, are we fully > recoverable? I've been running a few pgsql clusters on ext3 with fsync = > false, suffered numerous OS crashes, and have yet to lose any data or see > any corruption from any of those crashes. Have I just been lucky? The fsync makes sure it hits the drive, rather than staying in the kernel cache during an OS failure. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Sun, 8 Feb 2004, Ed L. wrote: > > I'm curious what the consensus is, if any, on use of fsync on ext3 > filesystems with postgresql 7.3.4 or later. I did some recent performance > tests demonstrating a 45%-70% performance improvement for simple inserts > with fsync off on one particular system. Does fsync = true buy me any > additional recoverability beyond ext3's journal recovery? > > If we write something without sync'ing, presumably it's immediately > journaled? So even if the DB crashes prior to fsync'ing, are we fully > recoverable? I've been running a few pgsql clusters on ext3 with fsync = > false, suffered numerous OS crashes, and have yet to lose any data or see > any corruption from any of those crashes. Have I just been lucky? With all the other posts on this topic, I just want to point out that it's all theory until you build your machine, set it up, initiate a hundred or so parallel transactions, and pull the plug in the middle. Without pulling the plug, you just don't know for sure. And you need to do it a few times, in case your machine "got lucky" once and might fail on subsequent power fails.
Actually, I don't think even that is a valid test. The absence of a failure doesn't mean one can't occur in this case. Doesn't matter if you try the test 1 or 10,000 times; the test will only be conclusive if you actually see a failure. On Mon, Feb 09, 2004 at 10:19:15AM -0700, scott.marlowe wrote: > On Sun, 8 Feb 2004, Ed L. wrote: > > > > > I'm curious what the consensus is, if any, on use of fsync on ext3 > > filesystems with postgresql 7.3.4 or later. I did some recent performance > > tests demonstrating a 45%-70% performance improvement for simple inserts > > with fsync off on one particular system. Does fsync = true buy me any > > additional recoverability beyond ext3's journal recovery? > > > > If we write something without sync'ing, presumably it's immediately > > journaled? So even if the DB crashes prior to fsync'ing, are we fully > > recoverable? I've been running a few pgsql clusters on ext3 with fsync = > > false, suffered numerous OS crashes, and have yet to lose any data or see > > any corruption from any of those crashes. Have I just been lucky? > > With all the other posts on this topic, I just want to point out that it's > all theory until you build your machine, set it up, initiate a hundred or > so parallel transactions, and pull the plug in the middle. > > Without pulling the plug, you just don't know for sure. And you need to > do it a few times, in case your machine "got lucky" once and might fail on > subsequent power fails. > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Sounds like "fsync = true" is the consensus for any circumstances where data loss is intolerable. Thx.
Would a battery backed Card do the trick? On Tuesday 10 February 2004 00:42, Bruce Momjian wrote: > Ed L. wrote: > > I'm curious what the consensus is, if any, on use of fsync on ext3 > > filesystems with postgresql 7.3.4 or later. I did some recent > > performance tests demonstrating a 45%-70% performance improvement for > > simple inserts with fsync off on one particular system. Does fsync = > > true buy me any additional recoverability beyond ext3's journal recovery? > > Yes, it does. Without fsync, you can't be sure the data has been pushed > to the disk drive in case of an OS crash or power failure. > > > If we write something without sync'ing, presumably it's immediately > > journaled? So even if the DB crashes prior to fsync'ing, are we fully > > recoverable? I've been running a few pgsql clusters on ext3 with fsync = > > false, suffered numerous OS crashes, and have yet to lose any data or see > > any corruption from any of those crashes. Have I just been lucky? > > The fsync makes sure it hits the drive, rather than staying in the > kernel cache during an OS failure.
JM wrote: > Would a battery backed Card do the trick? No because the fsync causes the data to hit the card. Without the fscync, the data could remain only in the kernel cache. --------------------------------------------------------------------------- > > > > > On Tuesday 10 February 2004 00:42, Bruce Momjian wrote: > > Ed L. wrote: > > > I'm curious what the consensus is, if any, on use of fsync on ext3 > > > filesystems with postgresql 7.3.4 or later. I did some recent > > > performance tests demonstrating a 45%-70% performance improvement for > > > simple inserts with fsync off on one particular system. Does fsync = > > > true buy me any additional recoverability beyond ext3's journal recovery? > > > > Yes, it does. Without fsync, you can't be sure the data has been pushed > > to the disk drive in case of an OS crash or power failure. > > > > > If we write something without sync'ing, presumably it's immediately > > > journaled? So even if the DB crashes prior to fsync'ing, are we fully > > > recoverable? I've been running a few pgsql clusters on ext3 with fsync = > > > false, suffered numerous OS crashes, and have yet to lose any data or see > > > any corruption from any of those crashes. Have I just been lucky? > > > > The fsync makes sure it hits the drive, rather than staying in the > > kernel cache during an OS failure. > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Yep, it does. We use the lsi megaraid in our postgresql box with and it has passed all the power plug pull tests we've thrown at it. On Tue, 10 Feb 2004, JM wrote: > Would a battery backed Card do the trick? > > > > > On Tuesday 10 February 2004 00:42, Bruce Momjian wrote: > > Ed L. wrote: > > > I'm curious what the consensus is, if any, on use of fsync on ext3 > > > filesystems with postgresql 7.3.4 or later. I did some recent > > > performance tests demonstrating a 45%-70% performance improvement for > > > simple inserts with fsync off on one particular system. Does fsync = > > > true buy me any additional recoverability beyond ext3's journal recovery? > > > > Yes, it does. Without fsync, you can't be sure the data has been pushed > > to the disk drive in case of an OS crash or power failure. > > > > > If we write something without sync'ing, presumably it's immediately > > > journaled? So even if the DB crashes prior to fsync'ing, are we fully > > > recoverable? I've been running a few pgsql clusters on ext3 with fsync = > > > false, suffered numerous OS crashes, and have yet to lose any data or see > > > any corruption from any of those crashes. Have I just been lucky? > > > > The fsync makes sure it hits the drive, rather than staying in the > > kernel cache during an OS failure. > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
I can see you never took statistics... On Mon, 9 Feb 2004, Jim C. Nasby wrote: > Actually, I don't think even that is a valid test. The absence of a > failure doesn't mean one can't occur in this case. Doesn't matter if you > try the test 1 or 10,000 times; the test will only be conclusive if you > actually see a failure. > > On Mon, Feb 09, 2004 at 10:19:15AM -0700, scott.marlowe wrote: > > On Sun, 8 Feb 2004, Ed L. wrote: > > > > > > > > I'm curious what the consensus is, if any, on use of fsync on ext3 > > > filesystems with postgresql 7.3.4 or later. I did some recent performance > > > tests demonstrating a 45%-70% performance improvement for simple inserts > > > with fsync off on one particular system. Does fsync = true buy me any > > > additional recoverability beyond ext3's journal recovery? > > > > > > If we write something without sync'ing, presumably it's immediately > > > journaled? So even if the DB crashes prior to fsync'ing, are we fully > > > recoverable? I've been running a few pgsql clusters on ext3 with fsync = > > > false, suffered numerous OS crashes, and have yet to lose any data or see > > > any corruption from any of those crashes. Have I just been lucky? > > > > With all the other posts on this topic, I just want to point out that it's > > all theory until you build your machine, set it up, initiate a hundred or > > so parallel transactions, and pull the plug in the middle. > > > > Without pulling the plug, you just don't know for sure. And you need to > > do it a few times, in case your machine "got lucky" once and might fail on > > subsequent power fails. > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > > >
Bruce Momjian <pgman@candle.pha.pa.us> writes: > JM wrote: > > Would a battery backed Card do the trick? > > No because the fsync causes the data to hit the card. Without the > fscync, the data could remain only in the kernel cache. A battery backed card for the transaction logs wouldn't make it safe to run without fsync, but it would make the fsyncs basically free. -- greg