Thread: On Linux Filesystems

On Linux Filesystems

From
Christopher Browne
Date:
Bruce Momjian commented:

 "Uh, the ext2 developers say it isn't 100% reliable" ... "I mentioned
 it while I was visiting Red Hat, and they didn't refute it."

1.  Nobody has gone through any formal proofs, and there are few
systems _anywhere_ that are 100% reliable.  NASA has occasionally lost
spacecraft to software bugs, so nobody will be making such rash claims
about ext2.

2.  Several projects have taken on the task of introducing journalled
filesystems, most notably ext3 (sponsored by RHAT via Stephen Tweedy)
and ReiserFS (oft sponsored by SuSE).  (I leave off JFS/XFS since they
existed long before they had any relationship with Linux.)

Participants in such projects certainly have interest in presenting
the notion that they provide improved reliability over ext2.

3.  There is no "apologist" for ext2 that will either (stupidly and
futilely) claim it to be flawless.  Nor is there substantial interest
in improving it; the sort people that would be interested in that sort
of thing are working on the other FSes.

This also means that there's no one interested in going into the
guaranteed-to-be-unsung effort involved in trying to prove ext2 to be
"formally reliable."

4.  It would be silly to minimize the impact of commercial interest.
RHAT has been paying for the development of a would-be ext2 successor.
For them to refute your comments wouldn't be in their interests.

Note that these are "warm and fuzzy" comments, the whole lot.  The
80-some thousand lines of code involved in ext2, ext3, reiserfs, and
jfs are no more amenable to absolute mathematical proof of reliability
than the corresponding BSD FFS code.

6. Such efforts would be futile, anyways.  Disks are mechanical
devices, and, as such, suffer from substantial reliability issues
irrespective of the reliability of the software.  I have lost sleep on
too many occasions due to failures of:
 a) Disk drives,
 b) Disk controllers [the worst Oracle failure I encountered resulted
    from this], and
 c) OS memory management.

I used ReiserFS back in its "bleeding edge" days, and find myself a
lot more worried about losing data to flakey disk controllers.

It frankly seems insulting to focus on ext2 in this way when:

 a) There aren't _hard_ conclusions to point to, just soft ones;

 b) The reasons for you hearing vaguely negative things about ext2
    are much more likely political than they are technical.

I wish there were more "hard and fast" conclusions to draw, to be able
to conclusively say that one or another Linux filesystem was
unambiguously preferable for use with PostgreSQL.  There are not
conclusive metrics, either in terms of speed or of some notion of
"reliability."  I'd expect ReiserFS to be the poorest choice, and for
XFS to be the best, but I only have fuzzy reasons, as opposed to
metrics.

The absence of measurable metrics of the sort is _NOT_ a proof that
(say) FreeBSD is conclusively preferable, whatever your own
preferences (I'll try to avoid characterizing it as "prejudices," as
that would be unkind) may be.  That would represent a quite separate
debate, and one that doesn't belong here, certainly not on a thread
where the underlying question was "Which Linux FS is preferred?"

If the OSDB TPC-like benchmarks can get "packaged" up well enough to
easily run and rerun them, there's hope of getting better answers,
perhaps even including performance metrics for *BSD.  That, not
Linux-baiting, is the answer...
--
select 'cbbrowne' || '@' || 'acm.org';
http://www.ntlug.org/~cbbrowne/sap.html
(eq? 'truth 'beauty)  ; to avoid unassigned-var error, since compiled code
                      ; will pick up previous value to var set!-ed,
                      ; the unassigned object.
-- from BBN-CL's cl-parser.scm

Re: On Linux Filesystems

From
Bruce Momjian
Date:
As I remember, there were clear cases that ext2 would fail to recover,
and it was known to be a limitation of the file system implementation.
Some of the ext2 developers were in the room at Red Hat when I said
that, so if it was incorrect, they would hopefully have spoken up.  I
addressed the comments directly to them.

To be recoverasble, you have to be careful how you sync metadata to
disk.  All the journalling file systems, and the BSD UFS do that.  I am
told ext2 does not.  I don't know much more than that.

As I remember years ago, ext2 was faster than UFS, but it was true
because ext2 didn't guarantee failure recovery.  Now, with UFS soft
updates, the have similar performance characteristics, but UFS is still
crash-safe.

However, I just tried google and couldn't find any documented evidence
that ext2 isn't crash-safe, so maybe I am wrong.

---------------------------------------------------------------------------

Christopher Browne wrote:
> Bruce Momjian commented:
>
>  "Uh, the ext2 developers say it isn't 100% reliable" ... "I mentioned
>  it while I was visiting Red Hat, and they didn't refute it."
>
> 1.  Nobody has gone through any formal proofs, and there are few
> systems _anywhere_ that are 100% reliable.  NASA has occasionally lost
> spacecraft to software bugs, so nobody will be making such rash claims
> about ext2.
>
> 2.  Several projects have taken on the task of introducing journalled
> filesystems, most notably ext3 (sponsored by RHAT via Stephen Tweedy)
> and ReiserFS (oft sponsored by SuSE).  (I leave off JFS/XFS since they
> existed long before they had any relationship with Linux.)
>
> Participants in such projects certainly have interest in presenting
> the notion that they provide improved reliability over ext2.
>
> 3.  There is no "apologist" for ext2 that will either (stupidly and
> futilely) claim it to be flawless.  Nor is there substantial interest
> in improving it; the sort people that would be interested in that sort
> of thing are working on the other FSes.
>
> This also means that there's no one interested in going into the
> guaranteed-to-be-unsung effort involved in trying to prove ext2 to be
> "formally reliable."
>
> 4.  It would be silly to minimize the impact of commercial interest.
> RHAT has been paying for the development of a would-be ext2 successor.
> For them to refute your comments wouldn't be in their interests.
>
> Note that these are "warm and fuzzy" comments, the whole lot.  The
> 80-some thousand lines of code involved in ext2, ext3, reiserfs, and
> jfs are no more amenable to absolute mathematical proof of reliability
> than the corresponding BSD FFS code.
>
> 6. Such efforts would be futile, anyways.  Disks are mechanical
> devices, and, as such, suffer from substantial reliability issues
> irrespective of the reliability of the software.  I have lost sleep on
> too many occasions due to failures of:
>  a) Disk drives,
>  b) Disk controllers [the worst Oracle failure I encountered resulted
>     from this], and
>  c) OS memory management.
>
> I used ReiserFS back in its "bleeding edge" days, and find myself a
> lot more worried about losing data to flakey disk controllers.
>
> It frankly seems insulting to focus on ext2 in this way when:
>
>  a) There aren't _hard_ conclusions to point to, just soft ones;
>
>  b) The reasons for you hearing vaguely negative things about ext2
>     are much more likely political than they are technical.
>
> I wish there were more "hard and fast" conclusions to draw, to be able
> to conclusively say that one or another Linux filesystem was
> unambiguously preferable for use with PostgreSQL.  There are not
> conclusive metrics, either in terms of speed or of some notion of
> "reliability."  I'd expect ReiserFS to be the poorest choice, and for
> XFS to be the best, but I only have fuzzy reasons, as opposed to
> metrics.
>
> The absence of measurable metrics of the sort is _NOT_ a proof that
> (say) FreeBSD is conclusively preferable, whatever your own
> preferences (I'll try to avoid characterizing it as "prejudices," as
> that would be unkind) may be.  That would represent a quite separate
> debate, and one that doesn't belong here, certainly not on a thread
> where the underlying question was "Which Linux FS is preferred?"
>
> If the OSDB TPC-like benchmarks can get "packaged" up well enough to
> easily run and rerun them, there's hope of getting better answers,
> perhaps even including performance metrics for *BSD.  That, not
> Linux-baiting, is the answer...
> --
> select 'cbbrowne' || '@' || 'acm.org';
> http://www.ntlug.org/~cbbrowne/sap.html
> (eq? 'truth 'beauty)  ; to avoid unassigned-var error, since compiled code
>                       ; will pick up previous value to var set!-ed,
>                       ; the unassigned object.
> -- from BBN-CL's cl-parser.scm
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: On Linux Filesystems

From
Bruce Momjian
Date:
Here is one talking about ext2 corruption from power failure from 2002:


http://groups.google.com/groups?q=ext2+corrupt+%22power+failure%22&hl=en&lr=&ie=UTF-8&selm=alvrj5%249in%241%40usc.edu&rnum=9

---------------------------------------------------------------------------

pgman wrote:
>
> As I remember, there were clear cases that ext2 would fail to recover,
> and it was known to be a limitation of the file system implementation.
> Some of the ext2 developers were in the room at Red Hat when I said
> that, so if it was incorrect, they would hopefully have spoken up.  I
> addressed the comments directly to them.
>
> To be recoverasble, you have to be careful how you sync metadata to
> disk.  All the journalling file systems, and the BSD UFS do that.  I am
> told ext2 does not.  I don't know much more than that.
>
> As I remember years ago, ext2 was faster than UFS, but it was true
> because ext2 didn't guarantee failure recovery.  Now, with UFS soft
> updates, the have similar performance characteristics, but UFS is still
> crash-safe.
>
> However, I just tried google and couldn't find any documented evidence
> that ext2 isn't crash-safe, so maybe I am wrong.
>
> ---------------------------------------------------------------------------
>
> Christopher Browne wrote:
> > Bruce Momjian commented:
> >
> >  "Uh, the ext2 developers say it isn't 100% reliable" ... "I mentioned
> >  it while I was visiting Red Hat, and they didn't refute it."
> >
> > 1.  Nobody has gone through any formal proofs, and there are few
> > systems _anywhere_ that are 100% reliable.  NASA has occasionally lost
> > spacecraft to software bugs, so nobody will be making such rash claims
> > about ext2.
> >
> > 2.  Several projects have taken on the task of introducing journalled
> > filesystems, most notably ext3 (sponsored by RHAT via Stephen Tweedy)
> > and ReiserFS (oft sponsored by SuSE).  (I leave off JFS/XFS since they
> > existed long before they had any relationship with Linux.)
> >
> > Participants in such projects certainly have interest in presenting
> > the notion that they provide improved reliability over ext2.
> >
> > 3.  There is no "apologist" for ext2 that will either (stupidly and
> > futilely) claim it to be flawless.  Nor is there substantial interest
> > in improving it; the sort people that would be interested in that sort
> > of thing are working on the other FSes.
> >
> > This also means that there's no one interested in going into the
> > guaranteed-to-be-unsung effort involved in trying to prove ext2 to be
> > "formally reliable."
> >
> > 4.  It would be silly to minimize the impact of commercial interest.
> > RHAT has been paying for the development of a would-be ext2 successor.
> > For them to refute your comments wouldn't be in their interests.
> >
> > Note that these are "warm and fuzzy" comments, the whole lot.  The
> > 80-some thousand lines of code involved in ext2, ext3, reiserfs, and
> > jfs are no more amenable to absolute mathematical proof of reliability
> > than the corresponding BSD FFS code.
> >
> > 6. Such efforts would be futile, anyways.  Disks are mechanical
> > devices, and, as such, suffer from substantial reliability issues
> > irrespective of the reliability of the software.  I have lost sleep on
> > too many occasions due to failures of:
> >  a) Disk drives,
> >  b) Disk controllers [the worst Oracle failure I encountered resulted
> >     from this], and
> >  c) OS memory management.
> >
> > I used ReiserFS back in its "bleeding edge" days, and find myself a
> > lot more worried about losing data to flakey disk controllers.
> >
> > It frankly seems insulting to focus on ext2 in this way when:
> >
> >  a) There aren't _hard_ conclusions to point to, just soft ones;
> >
> >  b) The reasons for you hearing vaguely negative things about ext2
> >     are much more likely political than they are technical.
> >
> > I wish there were more "hard and fast" conclusions to draw, to be able
> > to conclusively say that one or another Linux filesystem was
> > unambiguously preferable for use with PostgreSQL.  There are not
> > conclusive metrics, either in terms of speed or of some notion of
> > "reliability."  I'd expect ReiserFS to be the poorest choice, and for
> > XFS to be the best, but I only have fuzzy reasons, as opposed to
> > metrics.
> >
> > The absence of measurable metrics of the sort is _NOT_ a proof that
> > (say) FreeBSD is conclusively preferable, whatever your own
> > preferences (I'll try to avoid characterizing it as "prejudices," as
> > that would be unkind) may be.  That would represent a quite separate
> > debate, and one that doesn't belong here, certainly not on a thread
> > where the underlying question was "Which Linux FS is preferred?"
> >
> > If the OSDB TPC-like benchmarks can get "packaged" up well enough to
> > easily run and rerun them, there's hope of getting better answers,
> > perhaps even including performance metrics for *BSD.  That, not
> > Linux-baiting, is the answer...
> > --
> > select 'cbbrowne' || '@' || 'acm.org';
> > http://www.ntlug.org/~cbbrowne/sap.html
> > (eq? 'truth 'beauty)  ; to avoid unassigned-var error, since compiled code
> >                       ; will pick up previous value to var set!-ed,
> >                       ; the unassigned object.
> > -- from BBN-CL's cl-parser.scm
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: On Linux Filesystems

From
Andrew Sullivan
Date:
On Mon, Aug 11, 2003 at 10:58:18PM -0400, Christopher Browne wrote:
> 1.  Nobody has gone through any formal proofs, and there are few
> systems _anywhere_ that are 100% reliable.

I think the problem is that ext2 is known to be not perfectly crash
safe.  That is, fsck on reboot after a crash can cause, in some
extreme cases, recently-fscynced data to end up in lost+found/.  The
data may or may not be recoverable from there.

I don't think anyone would object to such a characterisation of ext2.
It was not designed, ever, for perfect data safety -- it was designed
as a reasonably good compromise for most cases.  _Every_ filesystem
entails some compromises.  This happens to be the one entailed by
ext2.

For production use with valuable data, for my money (or, more
precisely, my time when a system panics for no good reason), it is
always worth the additional speed penalty to use something like
metadata journalling.  Maybe others have more time to spare.

> perhaps even including performance metrics for *BSD.  That, not
> Linux-baiting, is the answer...

I didn't see anyone Linux-baiting.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: On Linux Filesystems

From
"scott.marlowe"
Date:
On Tue, 12 Aug 2003, Andrew Sullivan wrote:

> On Mon, Aug 11, 2003 at 10:58:18PM -0400, Christopher Browne wrote:
> > 1.  Nobody has gone through any formal proofs, and there are few
> > systems _anywhere_ that are 100% reliable.
>
> I think the problem is that ext2 is known to be not perfectly crash
> safe.  That is, fsck on reboot after a crash can cause, in some
> extreme cases, recently-fscynced data to end up in lost+found/.  The
> data may or may not be recoverable from there.
>
> I don't think anyone would object to such a characterisation of ext2.
> It was not designed, ever, for perfect data safety -- it was designed
> as a reasonably good compromise for most cases.  _Every_ filesystem
> entails some compromises.  This happens to be the one entailed by
> ext2.
>
> For production use with valuable data, for my money (or, more
> precisely, my time when a system panics for no good reason), it is
> always worth the additional speed penalty to use something like
> metadata journalling.  Maybe others have more time to spare.

I think the issue here is if you are running with the async mount option,
then it is quite likely that your volume will be corrupted if there are
writes going on and power fails.

I'm pretty sure that as long as the partition is mounted sync, this isn't
a problem.

I have seen reports where ext3 caused the data corruption (old kernels,
2.4.4 and before I believe) problem, not ext2.  I.e. the addition of
journaling caused data loss.

Given that possibility, it may well have been at one time that ext2 was a
safer bet than ext3.

> > perhaps even including performance metrics for *BSD.  That, not
> > Linux-baiting, is the answer...
>
> I didn't see anyone Linux-baiting.

No more than the typical, light hearted stuff we toss back and forth.  I
certainly wasn't upset by any of it.


Re: On Linux Filesystems

From
Josh Berkus
Date:
People:

> On Mon, Aug 11, 2003 at 10:58:18PM -0400, Christopher Browne wrote:
> > 1.  Nobody has gone through any formal proofs, and there are few
> > systems _anywhere_ that are 100% reliable.
>
> I think the problem is that ext2 is known to be not perfectly crash
> safe.  That is, fsck on reboot after a crash can cause, in some
> extreme cases, recently-fscynced data to end up in lost+found/.  The
> data may or may not be recoverable from there.

Aside from that, as recently as eighteen months ago I had to manually fsck an
ext2 system after an unexpected power-out.   After my interactive session the
system recovered and no data was lost.  However, the client lost 3.5 hours of
work time ... 2.5 hours for me to get to the site, and 1 hour to recover the
server (mostly waiting time).

So it's a tradeoff with loss of performance vs. recovery time.   In a server
room with redundant backup power supplies, "clean room" security and
fail-over services, I can certainly imagine that data journalling would not
be needed.   That is, however, the minority ...

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: On Linux Filesystems

From
Andrew Sullivan
Date:
On Tue, Aug 12, 2003 at 09:36:21AM -0700, Josh Berkus wrote:
> So it's a tradeoff with loss of performance vs. recovery time.   In
> a server room with redundant backup power supplies, "clean room"
> security and fail-over services, I can certainly imagine that data
> journalling would not be needed.

You can have all the redundant power, high availability hardware, and
ultra-Robocop security going, and still have crashes: so far as I
know, _nobody_ makes perfectly reliable hardware, and the harder you
push it, the more likely you are to find trouble.  And certainly,
when you have a surprise outage because the CPU where the kernel
happened to be burned itself up, an extra hour or two offline while
you do fsck is liable to make you cry out variations of those four
letters more than once. :-/

A
--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: On Linux Filesystems

From
Toni Schlichting
Date:
Christopher, I appreciate your comments. At the end it goes down to personal
experience with one or the other file system. From that I can tell, that
I have
made good experience with UFS, EXT2, and XFS. I made catastrophic ex-
perience with ReiserFS (not during operation but you are a looser when
it fails
because the recovery methods are likely to be insufficient)

So at the end if somebody runs technical equipment, regardless whether it's
a computer or a chemical fab. It can fail and you need to make up your mind
about contingency.

This is due even before you start operating the equipment.

So waste too much time on thinking about the perfect file system. But
evaluate
the potential damage that can result from failure. Develop a Backup&Recovery
strategy and test it, test it and test it again, so that you can do it
blindly when it's
due.

Ciao, Toni

>I wish there were more "hard and fast" conclusions to draw, to be able
>to conclusively say that one or another Linux filesystem was
>unambiguously preferable for use with PostgreSQL.  There are not
>conclusive metrics, either in terms of speed or of some notion of
>"reliability."  I'd expect ReiserFS to be the poorest choice, and for
>XFS to be the best, but I only have fuzzy reasons, as opposed to
>metrics.
>
>The absence of measurable metrics of the sort is _NOT_ a proof that
>(say) FreeBSD is conclusively preferable, whatever your own
>preferences (I'll try to avoid characterizing it as "prejudices," as
>that would be unkind) may be.  That would represent a quite separate
>debate, and one that doesn't belong here, certainly not on a thread
>where the underlying question was "Which Linux FS is preferred?"
>
>If the OSDB TPC-like benchmarks can get "packaged" up well enough to
>easily run and rerun them, there's hope of getting better answers,
>perhaps even including performance metrics for *BSD.  That, not
>Linux-baiting, is the answer...
>
>