Re: File Systems Compared - Mailing list pgsql-performance

From Bruno Wolff III
Subject Re: File Systems Compared
Date
Msg-id 20061215164439.GA27926@wolff.to
Whole thread Raw
In response to Re: File Systems Compared  (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
Responses Re: File Systems Compared
List pgsql-performance
On Thu, Dec 14, 2006 at 13:21:11 -0800,
  Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:
> Bruno Wolff III wrote:
> > On Thu, Dec 14, 2006 at 01:39:00 -0500,
> >   Jim Nasby <decibel@decibel.org> wrote:
> >> On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote:
> >>> This appears to be changing under Linux. Recent kernels have write
> >>> barriers implemented using cache flush commands (which
> >>> some drives ignore,  so you need to be careful).
>
> Is it true that some drives ignore this; or is it mostly
> an urban legend that was started by testers that didn't
> have kernels with write barrier support.   I'd be especially
> interested in knowing if there are any currently available
> drives which ignore those commands.

I saw posts claiming this, but no specific drives mentioned. I did see one
post that claimed that the cache flush command was mandated (not optional)
by the spec.

> >>> In very recent kernels, software raid using raid 1 will also
> >>> handle write barriers. To get this feature, you are supposed to
> >>> mount ext3 file systems with the barrier=1 option. For other file
> >>> systems, the parameter may need to be different.
>
> With XFS the default is apparently to enable write barrier
> support unless you explicitly disable it with the nobarrier mount option.
> It also will warn you in the system log if the underlying device
> doesn't have write barrier support.

I think there might be a similar patch for ext3 going into 2.6.19. I haven't
checked a 2.6.19 kernel to make sure though.

>
> SGI recommends that you use the "nobarrier" mount option if you do
> have a persistent (battery backed) write cache on your raid device.
>
>   http://oss.sgi.com/projects/xfs/faq.html#wcache
>
>
> >> But would that actually provide a meaningful benefit? When you
> >> COMMIT, the WAL data must hit non-volatile storage of some kind,
> >> which without a BBU or something similar, means hitting the platter.
> >> So I don't see how enabling the disk cache will help, unless of
> >> course it's ignoring fsync.
>
> With write barriers, fsync() waits for the physical disk; but I believe
> the background writes from write() done by pdflush don't have to; so
> it's kinda like only disabling the cache for WAL files and the filesystem's
> journal, but having it enabled for the rest of your write activity (the
> tables except at checkpoints?  the log file?).

Not exactly. Whenever you commit the file system log or fsync the wal file,
all previously written blocks will be flushed to the disk platter, before
any new write requests are honored. So journalling semantics will work
properly.

> > Note the use case for this is more for hobbiests or development boxes. You can
> > only use it on software raid (md) 1, which rules out most "real" systems.
> >
>
> Ugh.  Looking for where that's documented; and hoping it is or will soon
> work on software 1+0 as well.

I saw a comment somewhere that raid 0 provided some problems and the suggestion
was to handle the barrier at a different level (though I don't know how you
could). So I don't belive 1+0 or 5 are currently supported or will be in the
near term.

The other feature I would like is to be able to use write barriers with
encrypted file systems. I haven't found anythign on whether or not there
are near term plans by any one to support that.

pgsql-performance by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: File Systems Compared
Next
From: Ron
Date:
Subject: Re: New to PostgreSQL, performance considerations