Thread: mount -o async - is it safe?

mount -o async - is it safe?

From
Shane Wright
Date:
Hi,

We've recently set up our database (7.4.9) with our new hosting provider.
We have two database servers running RHEL 4 in a cluster; one active and
one hot-spare.  They share a [fibre-channel connected] SAN partition; the
active server has it mounted.


Now my question is this; the provider has, by default, mounted it with -o
sync; so all reads/writes are synchronous.  This doesn't result in the
greatest of performance, and indeed remounting -o async is significantly
faster.

They tell me this is so mySQL databases don't get corrupted in the event of
a crash.  which is fine...

But as Postgres uses fsync() to force committed transactions to disk, then
this shouldn't be necessary, right?

(I know this is based on the assumption the SAN doesn't lie about its syncs,
but then surely it would lie to the kernel with -o sync anyway?)


If we turn sync off, surely PostgreSQL keeps the data consistent, ext3
journalling  keeps the filesystem clean [assuming other mount options left at
defaults], and then everything should be ok with either a server crash, power
failure, storage failure, whatever.  right?


I've googled and come up with some info; the most relevant of
which is here:
http://archives.postgresql.org/pgsql-general/2003-11/msg01515.php
http://archives.postgresql.org/pgsql-general/2003-11/msg01592.php


If anyone can confirm either way that'd be great - or even just point me in
the direction of enough firm info to work it out myself ;)

Thanks,

Shane

Re: mount -o async - is it safe?

From
Martijn van Oosterhout
Date:
On Thu, Jan 19, 2006 at 09:42:59AM +0000, Shane Wright wrote:
> Now my question is this; the provider has, by default, mounted it with -o
> sync; so all reads/writes are synchronous.  This doesn't result in the
> greatest of performance, and indeed remounting -o async is significantly
> faster.
>
> They tell me this is so mySQL databases don't get corrupted in the event of
> a crash.  which is fine...
>
> But as Postgres uses fsync() to force committed transactions to disk, then
> this shouldn't be necessary, right?

That depends. As long as the data is appropriately sync()ed when
PostgreSQL asks, it should be fine. However, from reading the manpage
it's not clear if fsync() still works when mounted -o async.

If -o async means "all I/O is asyncronous except stuff explicitly
fsync()ed" you're fine. Otherwise...

The usual advice is to stick the WAL on a properly synced partition and
stick the rest somewhere else. Note, I have no experience with this,
it's just what I've heard.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: mount -o async - is it safe?

From
Doug McNaught
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:

> That depends. As long as the data is appropriately sync()ed when
> PostgreSQL asks, it should be fine. However, from reading the manpage
> it's not clear if fsync() still works when mounted -o async.
>
> If -o async means "all I/O is asyncronous except stuff explicitly
> fsync()ed" you're fine. Otherwise...

That's the way it works.  Async is the default setting for most
filesystems, but fsync() is always honored, at last as far as
non-lying hardware will allow.  :)

> The usual advice is to stick the WAL on a properly synced partition and
> stick the rest somewhere else. Note, I have no experience with this,
> it's just what I've heard.

This might not be optimal, as having every write synchronous actually
results in more synced writes than are strictly necessary.

-Doug

Re: mount -o async - is it safe?

From
Shane Wright
Date:
Hi,

thanks :)

> > If -o async means "all I/O is asyncronous except stuff explicitly
> > fsync()ed" you're fine. Otherwise...
>
> That's the way it works.  Async is the default setting for most
> filesystems, but fsync() is always honored, at last as far as
> non-lying hardware will allow.  :)

That sounds good :)

ext's journalling should take care of the rest I guess - does that sound ok?
I have read in various places I think that pgSQL doesn't need any
directory-level operations in keeping WAL up to date so provided the ext3
partition remains mountable then the database should be fine,

> > The usual advice is to stick the WAL on a properly synced partition and
> > stick the rest somewhere else. Note, I have no experience with this,
> > it's just what I've heard.
>
> This might not be optimal, as having every write synchronous actually
> results in more synced writes than are strictly necessary.

Actually I thought that *all* the database had to have fsync() work correctly;
not for integrity on failed transactions, but to maintain integrity during
checkpointing as well.  But I could well be wrong!

thanks,

Shane

Re: mount -o async - is it safe?

From
Doug McNaught
Date:
Shane Wright <shane.wright@edigitalresearch.com> writes:

> Actually I thought that *all* the database had to have fsync() work correctly;
> not for integrity on failed transactions, but to maintain integrity during
> checkpointing as well.  But I could well be wrong!

I think you're write, but what I was thinking of is the scenario where
WAL writes are done in small increments, then committed with fsync()
once a full page has been written.  With a sync mount this would
result in the equivalent of fsync() for every small write, which would
hurt a lot.

I dimly recall this sort of thing being discussed in the past, but I
don't know offhand whether PG does its WAL writes in small chunks or
page-at-a-time.

-Doug

Re: mount -o async - is it safe?

From
"Jim C. Nasby"
Date:
On Thu, Jan 19, 2006 at 09:34:00AM -0500, Doug McNaught wrote:
> Shane Wright <shane.wright@edigitalresearch.com> writes:
>
> > Actually I thought that *all* the database had to have fsync() work correctly;
> > not for integrity on failed transactions, but to maintain integrity during
> > checkpointing as well.  But I could well be wrong!

You're correct; if the OS or drives lie about fsync'ing the base tables
during a checkpoint you can end up with a corrupted database. The only
'upside' here is that checkpoints don't happen as often, so the risk is
slightly less, but it's still there.

And all the debate about filesystem options is pointless unless they
have also turned off any unsafe write caching by the drives.

> I dimly recall this sort of thing being discussed in the past, but I
> don't know offhand whether PG does its WAL writes in small chunks or
> page-at-a-time.

It's done in pages, but remember that every commit requires an fsync of
WAL.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: mount -o async - is it safe?

From
Tom Lane
Date:
Shane Wright <shane.wright@edigitalresearch.com> writes:
> If we turn sync off, surely PostgreSQL keeps the data consistent, ext3
> journalling  keeps the filesystem clean [assuming other mount options left at
> defaults], and then everything should be ok with either a server crash, power
> failure, storage failure, whatever.  right?

I checked around with some of Red Hat's kernel folk, and the bottom line
seems to be that it's OK as long as you trust the hardware:

:> Question is, can fsync(2) be trusted to behave properly, ie, not return
:> until all writes are down to disk, if the SAN is mounted -o async ?
:
: async is the default, which is the whole point of having things like
: fsync, fdatasync, O_DIRECT, etc.  You can trust fsync as far as you can
: trust the hardware.  The call will not return until the SAN says the
: data has been written.
:
: In reality, the SAN is probably buffering these writes (possibly into
: SRAM or battery-backed RAM), and the disks are probably buffering them
: again, but you've got redundant power supplies and UPSs, right?

            regards, tom lane

Re: mount -o async - is it safe?

From
Shane Wright
Date:
Hi Tom,

> > If we turn sync off, surely PostgreSQL keeps the data consistent, ext3
> > journalling  keeps the filesystem clean [assuming other mount options
> > left at defaults], and then everything should be ok with either a server
> > crash, power failure, storage failure, whatever.  right?
>
> I checked around with some of Red Hat's kernel folk, and the bottom line
> seems to be that it's OK as long as you trust the hardware:

fabulous, thanks :)

> :> Question is, can fsync(2) be trusted to behave properly, ie, not return
> :> until all writes are down to disk, if the SAN is mounted -o async ?
> :
> : async is the default, which is the whole point of having things like
> : fsync, fdatasync, O_DIRECT, etc.  You can trust fsync as far as you can
> : trust the hardware.  The call will not return until the SAN says the
> : data has been written.
> :
> : In reality, the SAN is probably buffering these writes (possibly into
> : SRAM or battery-backed RAM), and the disks are probably buffering them
> : again, but you've got redundant power supplies and UPSs, right?

that sounds true (and it has) - but presumably this is the case whether we
mount -o sync or not?   I.e. if its going to buffer, then its going to do so
whether its postgres or the kernel sync'ing the writes?

(specifically that the SAN likely buffers anyway - IMO having to trust the
hardware to some degree is a given ;)

Cheers

Shane