Thread: fsync = true beneficial on ext3?

fsync = true beneficial on ext3?

From
"Ed L."
Date:
I'm curious what the consensus is, if any, on use of fsync on ext3
filesystems with postgresql 7.3.4 or later.  I did some recent performance
tests demonstrating a 45%-70% performance improvement for simple inserts
with fsync off on one particular system.  Does fsync = true buy me any
additional recoverability beyond ext3's journal recovery?

If we write something without sync'ing, presumably it's immediately
journaled?  So even if the DB crashes prior to fsync'ing, are we fully
recoverable?  I've been running a few pgsql clusters on ext3 with fsync =
false, suffered numerous OS crashes, and have yet to lose any data or see
any corruption from any of those crashes.  Have I just been lucky?

TIA.

Ed


Re: fsync = true beneficial on ext3?

From
Tom Lane
Date:
"Ed L." <pgsql@bluepolka.net> writes:
> If we write something without sync'ing, presumably it's immediately
> journaled?

I was under the impression that ext3 journals only filesystem metadata,
not the contents of files.

> I've been running a few pgsql clusters on ext3 with fsync =
> false, suffered numerous OS crashes, and have yet to lose any data or see
> any corruption from any of those crashes.  Have I just been lucky?

Doesn't sound very safe to me.

            regards, tom lane

Re: fsync = true beneficial on ext3?

From
Richard Welty
Date:
On Sun, 08 Feb 2004 14:02:26 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Ed L." <pgsql@bluepolka.net> writes:
> > If we write something without sync'ing, presumably it's immediately
> > journaled?

> I was under the impression that ext3 journals only filesystem metadata,
> not the contents of files.

by default, it journals everything, but you can set it to journal metadata
only, i think with the mount option data=writeback. do a "man mount"
and look for ext3 options for details on the data= option.

richard
--
Richard Welty                                         rwelty@averillpark.net
Averill Park Networking                                         518-573-7592
    Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security

Re: fsync = true beneficial on ext3?

From
"Ed L."
Date:
On Sunday February 8 2004 12:02, Tom Lane wrote:
> "Ed L." <pgsql@bluepolka.net> writes:
> > If we write something without sync'ing, presumably it's immediately
> > journaled?
>
> I was under the impression that ext3 journals only filesystem metadata,
> not the contents of files.

Ah, didn't know how that worked.  So I gather there is really no
kernel-level substitute for fsync = true when it comes to guaranteeing data
is flushed to disk at commit time, I guess?

In linux, does pgsql's fsync call at commit time obviate the need for
bdflush to do any flushing for that particular data?  I'm wondering if
there are bdflush adjustments to be made to improve disk write efficiency
given we can count on fsync = true to guarantee that .

Also, with fsync = true and wal using fdatasync, and assuming all is on the
same disk (which I know is not optimal), is there a particular ext3 mode
(data=writeback?) that gives better performance while maintaining best
recoverability?


Re: fsync = true beneficial on ext3?

From
Mark Kirkwood
Date:
FYI - Ext3 has 3 modes :

data=ordered(default) : metadata is journaled (at write time data is
written before metadata - i.e ordered)
data=journal: data and metadata are journaled
data=writeback: metadata journaled (no ordering at write time)

The default will not help to protect database integrity if fsync is
false (as only metadata is journaled)

Will data=journal mode help? I am uncertain. A casual reading if these
definitions suggests that it *might* - anyone know for sure?

regards

Mark


Richard Welty wrote:

>
>by default, it journals everything, but you can set it to journal metadata
>only, i think with the mount option data=writeback. do a "man mount"
>and look for ext3 options for details on the data= option.
>
>
>
>


Re: fsync = true beneficial on ext3?

From
Martijn van Oosterhout
Date:
On Mon, Feb 09, 2004 at 03:13:08PM +1300, Mark Kirkwood wrote:
> FYI - Ext3 has 3 modes :
>
> data=ordered(default) : metadata is journaled (at write time data is
> written before metadata - i.e ordered)
> data=journal: data and metadata are journaled
> data=writeback: metadata journaled (no ordering at write time)

Thanks for that.

> The default will not help to protect database integrity if fsync is
> false (as only metadata is journaled)
>
> Will data=journal mode help? I am uncertain. A casual reading if these
> definitions suggests that it *might* - anyone know for sure?

My problem is that journalling works on a per-file basis. ie, the data for a
file is written before that file's metadata. However, the fsync is used for
the WAL segments and if you can't guarentee the WAL will hit the disk before
the data segments (different files), you're stuffed I think.

Or maybe WAL is not that sensitive to that kind of reordering. Maybe it only
depends on the WAL being consistant.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> (... have gone from d-i being barely usable even by its developers
> anywhere, to being about 20% done. Sweet. And the last 80% usually takes
> 20% of the time, too, right?) -- Anthony Towns, debian-devel-announce

Attachment

Re: fsync = true beneficial on ext3?

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> My problem is that journalling works on a per-file basis. ie, the data for a
> file is written before that file's metadata. However, the fsync is used for
> the WAL segments and if you can't guarentee the WAL will hit the disk before
> the data segments (different files), you're stuffed I think.

> Or maybe WAL is not that sensitive to that kind of reordering. Maybe it only
> depends on the WAL being consistant.

The entire *point* of WAL is that WAL entries must hit disk before any
of the data-file changes they describe (that's why it's called write
AHEAD log).  Without this you can't use WAL replay to ensure the data
files are brought to a fully consistent state.  So yes, we do have to
have cross-file write ordering guarantees.  fsync is a pretty blunt tool
for enforcing cross-file write ordering, but it's the only one
available...

            regards, tom lane

Re: fsync = true beneficial on ext3?

From
Bruce Momjian
Date:
Ed L. wrote:
>
> I'm curious what the consensus is, if any, on use of fsync on ext3
> filesystems with postgresql 7.3.4 or later.  I did some recent performance
> tests demonstrating a 45%-70% performance improvement for simple inserts
> with fsync off on one particular system.  Does fsync = true buy me any
> additional recoverability beyond ext3's journal recovery?

Yes, it does.  Without fsync, you can't be sure the data has been pushed
to the disk drive in case of an OS crash or power failure.

> If we write something without sync'ing, presumably it's immediately
> journaled?  So even if the DB crashes prior to fsync'ing, are we fully
> recoverable?  I've been running a few pgsql clusters on ext3 with fsync =
> false, suffered numerous OS crashes, and have yet to lose any data or see
> any corruption from any of those crashes.  Have I just been lucky?

The fsync makes sure it hits the drive, rather than staying in the
kernel cache during an OS failure.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: fsync = true beneficial on ext3?

From
"scott.marlowe"
Date:
On Sun, 8 Feb 2004, Ed L. wrote:

>
> I'm curious what the consensus is, if any, on use of fsync on ext3
> filesystems with postgresql 7.3.4 or later.  I did some recent performance
> tests demonstrating a 45%-70% performance improvement for simple inserts
> with fsync off on one particular system.  Does fsync = true buy me any
> additional recoverability beyond ext3's journal recovery?
>
> If we write something without sync'ing, presumably it's immediately
> journaled?  So even if the DB crashes prior to fsync'ing, are we fully
> recoverable?  I've been running a few pgsql clusters on ext3 with fsync =
> false, suffered numerous OS crashes, and have yet to lose any data or see
> any corruption from any of those crashes.  Have I just been lucky?

With all the other posts on this topic, I just want to point out that it's
all theory until you build your machine, set it up, initiate a hundred or
so parallel transactions, and pull the plug in the middle.

Without pulling the plug, you just don't know for sure.  And you need to
do it a few times, in case your machine "got lucky" once and might fail on
subsequent power fails.


Re: fsync = true beneficial on ext3?

From
"Jim C. Nasby"
Date:
Actually, I don't think even that is a valid test. The absence of a
failure doesn't mean one can't occur in this case. Doesn't matter if you
try the test 1 or 10,000 times; the test will only be conclusive if you
actually see a failure.

On Mon, Feb 09, 2004 at 10:19:15AM -0700, scott.marlowe wrote:
> On Sun, 8 Feb 2004, Ed L. wrote:
>
> >
> > I'm curious what the consensus is, if any, on use of fsync on ext3
> > filesystems with postgresql 7.3.4 or later.  I did some recent performance
> > tests demonstrating a 45%-70% performance improvement for simple inserts
> > with fsync off on one particular system.  Does fsync = true buy me any
> > additional recoverability beyond ext3's journal recovery?
> >
> > If we write something without sync'ing, presumably it's immediately
> > journaled?  So even if the DB crashes prior to fsync'ing, are we fully
> > recoverable?  I've been running a few pgsql clusters on ext3 with fsync =
> > false, suffered numerous OS crashes, and have yet to lose any data or see
> > any corruption from any of those crashes.  Have I just been lucky?
>
> With all the other posts on this topic, I just want to point out that it's
> all theory until you build your machine, set it up, initiate a hundred or
> so parallel transactions, and pull the plug in the middle.
>
> Without pulling the plug, you just don't know for sure.  And you need to
> do it a few times, in case your machine "got lucky" once and might fail on
> subsequent power fails.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: fsync = true beneficial on ext3?

From
"Ed L."
Date:
Sounds like "fsync = true" is the consensus for any circumstances where data
loss is intolerable.

Thx.


Re: fsync = true beneficial on ext3?

From
JM
Date:
Would a battery backed Card do the trick?




On Tuesday 10 February 2004 00:42, Bruce Momjian wrote:
> Ed L. wrote:
> > I'm curious what the consensus is, if any, on use of fsync on ext3
> > filesystems with postgresql 7.3.4 or later.  I did some recent
> > performance tests demonstrating a 45%-70% performance improvement for
> > simple inserts with fsync off on one particular system.  Does fsync =
> > true buy me any additional recoverability beyond ext3's journal recovery?
>
> Yes, it does.  Without fsync, you can't be sure the data has been pushed
> to the disk drive in case of an OS crash or power failure.
>
> > If we write something without sync'ing, presumably it's immediately
> > journaled?  So even if the DB crashes prior to fsync'ing, are we fully
> > recoverable?  I've been running a few pgsql clusters on ext3 with fsync =
> > false, suffered numerous OS crashes, and have yet to lose any data or see
> > any corruption from any of those crashes.  Have I just been lucky?
>
> The fsync makes sure it hits the drive, rather than staying in the
> kernel cache during an OS failure.


Re: fsync = true beneficial on ext3?

From
Bruce Momjian
Date:
JM wrote:
> Would a battery backed Card do the trick?

No because the fsync causes the data to hit the card.  Without the
fscync, the data could remain only in the kernel cache.

---------------------------------------------------------------------------

>
>
>
>
> On Tuesday 10 February 2004 00:42, Bruce Momjian wrote:
> > Ed L. wrote:
> > > I'm curious what the consensus is, if any, on use of fsync on ext3
> > > filesystems with postgresql 7.3.4 or later.  I did some recent
> > > performance tests demonstrating a 45%-70% performance improvement for
> > > simple inserts with fsync off on one particular system.  Does fsync =
> > > true buy me any additional recoverability beyond ext3's journal recovery?
> >
> > Yes, it does.  Without fsync, you can't be sure the data has been pushed
> > to the disk drive in case of an OS crash or power failure.
> >
> > > If we write something without sync'ing, presumably it's immediately
> > > journaled?  So even if the DB crashes prior to fsync'ing, are we fully
> > > recoverable?  I've been running a few pgsql clusters on ext3 with fsync =
> > > false, suffered numerous OS crashes, and have yet to lose any data or see
> > > any corruption from any of those crashes.  Have I just been lucky?
> >
> > The fsync makes sure it hits the drive, rather than staying in the
> > kernel cache during an OS failure.
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: fsync = true beneficial on ext3?

From
"scott.marlowe"
Date:
Yep, it does.  We use the lsi megaraid in our postgresql box with and it
has passed all the power plug pull tests we've thrown at it.

On Tue, 10 Feb 2004, JM wrote:

> Would a battery backed Card do the trick?
>
>
>
>
> On Tuesday 10 February 2004 00:42, Bruce Momjian wrote:
> > Ed L. wrote:
> > > I'm curious what the consensus is, if any, on use of fsync on ext3
> > > filesystems with postgresql 7.3.4 or later.  I did some recent
> > > performance tests demonstrating a 45%-70% performance improvement for
> > > simple inserts with fsync off on one particular system.  Does fsync =
> > > true buy me any additional recoverability beyond ext3's journal recovery?
> >
> > Yes, it does.  Without fsync, you can't be sure the data has been pushed
> > to the disk drive in case of an OS crash or power failure.
> >
> > > If we write something without sync'ing, presumably it's immediately
> > > journaled?  So even if the DB crashes prior to fsync'ing, are we fully
> > > recoverable?  I've been running a few pgsql clusters on ext3 with fsync =
> > > false, suffered numerous OS crashes, and have yet to lose any data or see
> > > any corruption from any of those crashes.  Have I just been lucky?
> >
> > The fsync makes sure it hits the drive, rather than staying in the
> > kernel cache during an OS failure.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


Re: fsync = true beneficial on ext3?

From
"scott.marlowe"
Date:
I can see you never took statistics...

On Mon, 9 Feb 2004, Jim C. Nasby wrote:

> Actually, I don't think even that is a valid test. The absence of a
> failure doesn't mean one can't occur in this case. Doesn't matter if you
> try the test 1 or 10,000 times; the test will only be conclusive if you
> actually see a failure.
>
> On Mon, Feb 09, 2004 at 10:19:15AM -0700, scott.marlowe wrote:
> > On Sun, 8 Feb 2004, Ed L. wrote:
> >
> > >
> > > I'm curious what the consensus is, if any, on use of fsync on ext3
> > > filesystems with postgresql 7.3.4 or later.  I did some recent performance
> > > tests demonstrating a 45%-70% performance improvement for simple inserts
> > > with fsync off on one particular system.  Does fsync = true buy me any
> > > additional recoverability beyond ext3's journal recovery?
> > >
> > > If we write something without sync'ing, presumably it's immediately
> > > journaled?  So even if the DB crashes prior to fsync'ing, are we fully
> > > recoverable?  I've been running a few pgsql clusters on ext3 with fsync =
> > > false, suffered numerous OS crashes, and have yet to lose any data or see
> > > any corruption from any of those crashes.  Have I just been lucky?
> >
> > With all the other posts on this topic, I just want to point out that it's
> > all theory until you build your machine, set it up, initiate a hundred or
> > so parallel transactions, and pull the plug in the middle.
> >
> > Without pulling the plug, you just don't know for sure.  And you need to
> > do it a few times, in case your machine "got lucky" once and might fail on
> > subsequent power fails.
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
>


Re: fsync = true beneficial on ext3?

From
Greg Stark
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> JM wrote:
> > Would a battery backed Card do the trick?
>
> No because the fsync causes the data to hit the card.  Without the
> fscync, the data could remain only in the kernel cache.

A battery backed card for the transaction logs wouldn't make it safe to run
without fsync, but it would make the fsyncs basically free.

--
greg