Re: On Linux Filesystems - Mailing list pgsql-performance

From Bruce Momjian
Subject Re: On Linux Filesystems
Date
Msg-id 200308120407.h7C47C910495@candle.pha.pa.us
Whole thread Raw
In response to On Linux Filesystems  (Christopher Browne <cbbrowne@acm.org>)
List pgsql-performance
As I remember, there were clear cases that ext2 would fail to recover,
and it was known to be a limitation of the file system implementation.
Some of the ext2 developers were in the room at Red Hat when I said
that, so if it was incorrect, they would hopefully have spoken up.  I
addressed the comments directly to them.

To be recoverasble, you have to be careful how you sync metadata to
disk.  All the journalling file systems, and the BSD UFS do that.  I am
told ext2 does not.  I don't know much more than that.

As I remember years ago, ext2 was faster than UFS, but it was true
because ext2 didn't guarantee failure recovery.  Now, with UFS soft
updates, the have similar performance characteristics, but UFS is still
crash-safe.

However, I just tried google and couldn't find any documented evidence
that ext2 isn't crash-safe, so maybe I am wrong.

---------------------------------------------------------------------------

Christopher Browne wrote:
> Bruce Momjian commented:
>
>  "Uh, the ext2 developers say it isn't 100% reliable" ... "I mentioned
>  it while I was visiting Red Hat, and they didn't refute it."
>
> 1.  Nobody has gone through any formal proofs, and there are few
> systems _anywhere_ that are 100% reliable.  NASA has occasionally lost
> spacecraft to software bugs, so nobody will be making such rash claims
> about ext2.
>
> 2.  Several projects have taken on the task of introducing journalled
> filesystems, most notably ext3 (sponsored by RHAT via Stephen Tweedy)
> and ReiserFS (oft sponsored by SuSE).  (I leave off JFS/XFS since they
> existed long before they had any relationship with Linux.)
>
> Participants in such projects certainly have interest in presenting
> the notion that they provide improved reliability over ext2.
>
> 3.  There is no "apologist" for ext2 that will either (stupidly and
> futilely) claim it to be flawless.  Nor is there substantial interest
> in improving it; the sort people that would be interested in that sort
> of thing are working on the other FSes.
>
> This also means that there's no one interested in going into the
> guaranteed-to-be-unsung effort involved in trying to prove ext2 to be
> "formally reliable."
>
> 4.  It would be silly to minimize the impact of commercial interest.
> RHAT has been paying for the development of a would-be ext2 successor.
> For them to refute your comments wouldn't be in their interests.
>
> Note that these are "warm and fuzzy" comments, the whole lot.  The
> 80-some thousand lines of code involved in ext2, ext3, reiserfs, and
> jfs are no more amenable to absolute mathematical proof of reliability
> than the corresponding BSD FFS code.
>
> 6. Such efforts would be futile, anyways.  Disks are mechanical
> devices, and, as such, suffer from substantial reliability issues
> irrespective of the reliability of the software.  I have lost sleep on
> too many occasions due to failures of:
>  a) Disk drives,
>  b) Disk controllers [the worst Oracle failure I encountered resulted
>     from this], and
>  c) OS memory management.
>
> I used ReiserFS back in its "bleeding edge" days, and find myself a
> lot more worried about losing data to flakey disk controllers.
>
> It frankly seems insulting to focus on ext2 in this way when:
>
>  a) There aren't _hard_ conclusions to point to, just soft ones;
>
>  b) The reasons for you hearing vaguely negative things about ext2
>     are much more likely political than they are technical.
>
> I wish there were more "hard and fast" conclusions to draw, to be able
> to conclusively say that one or another Linux filesystem was
> unambiguously preferable for use with PostgreSQL.  There are not
> conclusive metrics, either in terms of speed or of some notion of
> "reliability."  I'd expect ReiserFS to be the poorest choice, and for
> XFS to be the best, but I only have fuzzy reasons, as opposed to
> metrics.
>
> The absence of measurable metrics of the sort is _NOT_ a proof that
> (say) FreeBSD is conclusively preferable, whatever your own
> preferences (I'll try to avoid characterizing it as "prejudices," as
> that would be unkind) may be.  That would represent a quite separate
> debate, and one that doesn't belong here, certainly not on a thread
> where the underlying question was "Which Linux FS is preferred?"
>
> If the OSDB TPC-like benchmarks can get "packaged" up well enough to
> easily run and rerun them, there's hope of getting better answers,
> perhaps even including performance metrics for *BSD.  That, not
> Linux-baiting, is the answer...
> --
> select 'cbbrowne' || '@' || 'acm.org';
> http://www.ntlug.org/~cbbrowne/sap.html
> (eq? 'truth 'beauty)  ; to avoid unassigned-var error, since compiled code
>                       ; will pick up previous value to var set!-ed,
>                       ; the unassigned object.
> -- from BBN-CL's cl-parser.scm
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-performance by date:

Previous
From: Christopher Browne
Date:
Subject: On Linux Filesystems
Next
From: Bruce Momjian
Date:
Subject: Re: On Linux Filesystems