Re: On Linux Filesystems - Mailing list pgsql-performance

From Bruce Momjian
Subject Re: On Linux Filesystems
Date
Msg-id 200308120416.h7C4Gfn11426@candle.pha.pa.us
Whole thread Raw
In response to On Linux Filesystems  (Christopher Browne <cbbrowne@acm.org>)
List pgsql-performance
Here is one talking about ext2 corruption from power failure from 2002:


http://groups.google.com/groups?q=ext2+corrupt+%22power+failure%22&hl=en&lr=&ie=UTF-8&selm=alvrj5%249in%241%40usc.edu&rnum=9

---------------------------------------------------------------------------

pgman wrote:
>
> As I remember, there were clear cases that ext2 would fail to recover,
> and it was known to be a limitation of the file system implementation.
> Some of the ext2 developers were in the room at Red Hat when I said
> that, so if it was incorrect, they would hopefully have spoken up.  I
> addressed the comments directly to them.
>
> To be recoverasble, you have to be careful how you sync metadata to
> disk.  All the journalling file systems, and the BSD UFS do that.  I am
> told ext2 does not.  I don't know much more than that.
>
> As I remember years ago, ext2 was faster than UFS, but it was true
> because ext2 didn't guarantee failure recovery.  Now, with UFS soft
> updates, the have similar performance characteristics, but UFS is still
> crash-safe.
>
> However, I just tried google and couldn't find any documented evidence
> that ext2 isn't crash-safe, so maybe I am wrong.
>
> ---------------------------------------------------------------------------
>
> Christopher Browne wrote:
> > Bruce Momjian commented:
> >
> >  "Uh, the ext2 developers say it isn't 100% reliable" ... "I mentioned
> >  it while I was visiting Red Hat, and they didn't refute it."
> >
> > 1.  Nobody has gone through any formal proofs, and there are few
> > systems _anywhere_ that are 100% reliable.  NASA has occasionally lost
> > spacecraft to software bugs, so nobody will be making such rash claims
> > about ext2.
> >
> > 2.  Several projects have taken on the task of introducing journalled
> > filesystems, most notably ext3 (sponsored by RHAT via Stephen Tweedy)
> > and ReiserFS (oft sponsored by SuSE).  (I leave off JFS/XFS since they
> > existed long before they had any relationship with Linux.)
> >
> > Participants in such projects certainly have interest in presenting
> > the notion that they provide improved reliability over ext2.
> >
> > 3.  There is no "apologist" for ext2 that will either (stupidly and
> > futilely) claim it to be flawless.  Nor is there substantial interest
> > in improving it; the sort people that would be interested in that sort
> > of thing are working on the other FSes.
> >
> > This also means that there's no one interested in going into the
> > guaranteed-to-be-unsung effort involved in trying to prove ext2 to be
> > "formally reliable."
> >
> > 4.  It would be silly to minimize the impact of commercial interest.
> > RHAT has been paying for the development of a would-be ext2 successor.
> > For them to refute your comments wouldn't be in their interests.
> >
> > Note that these are "warm and fuzzy" comments, the whole lot.  The
> > 80-some thousand lines of code involved in ext2, ext3, reiserfs, and
> > jfs are no more amenable to absolute mathematical proof of reliability
> > than the corresponding BSD FFS code.
> >
> > 6. Such efforts would be futile, anyways.  Disks are mechanical
> > devices, and, as such, suffer from substantial reliability issues
> > irrespective of the reliability of the software.  I have lost sleep on
> > too many occasions due to failures of:
> >  a) Disk drives,
> >  b) Disk controllers [the worst Oracle failure I encountered resulted
> >     from this], and
> >  c) OS memory management.
> >
> > I used ReiserFS back in its "bleeding edge" days, and find myself a
> > lot more worried about losing data to flakey disk controllers.
> >
> > It frankly seems insulting to focus on ext2 in this way when:
> >
> >  a) There aren't _hard_ conclusions to point to, just soft ones;
> >
> >  b) The reasons for you hearing vaguely negative things about ext2
> >     are much more likely political than they are technical.
> >
> > I wish there were more "hard and fast" conclusions to draw, to be able
> > to conclusively say that one or another Linux filesystem was
> > unambiguously preferable for use with PostgreSQL.  There are not
> > conclusive metrics, either in terms of speed or of some notion of
> > "reliability."  I'd expect ReiserFS to be the poorest choice, and for
> > XFS to be the best, but I only have fuzzy reasons, as opposed to
> > metrics.
> >
> > The absence of measurable metrics of the sort is _NOT_ a proof that
> > (say) FreeBSD is conclusively preferable, whatever your own
> > preferences (I'll try to avoid characterizing it as "prejudices," as
> > that would be unkind) may be.  That would represent a quite separate
> > debate, and one that doesn't belong here, certainly not on a thread
> > where the underlying question was "Which Linux FS is preferred?"
> >
> > If the OSDB TPC-like benchmarks can get "packaged" up well enough to
> > easily run and rerun them, there's hope of getting better answers,
> > perhaps even including performance metrics for *BSD.  That, not
> > Linux-baiting, is the answer...
> > --
> > select 'cbbrowne' || '@' || 'acm.org';
> > http://www.ntlug.org/~cbbrowne/sap.html
> > (eq? 'truth 'beauty)  ; to avoid unassigned-var error, since compiled code
> >                       ; will pick up previous value to var set!-ed,
> >                       ; the unassigned object.
> > -- from BBN-CL's cl-parser.scm
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-performance by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: On Linux Filesystems
Next
From: Neil Conway
Date:
Subject: Re: Perfomance Tuning