Re: Postgresql Performance on an HP DL385 and - Mailing list pgsql-performance

From mark@mark.mielke.cc
Subject Re: Postgresql Performance on an HP DL385 and
Date
Msg-id 20060815193951.GA13695@mark.mielke.cc
Whole thread Raw
In response to Re: Postgresql Performance on an HP DL385 and  (Michael Stone <mstone+postgres@mathom.us>)
Responses Re: Postgresql Performance on an HP DL385 and
List pgsql-performance
On Tue, Aug 15, 2006 at 03:02:56PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 02:33:27PM -0400, mark@mark.mielke.cc wrote:
> >>>Are 'we' sure that such a setup can't lose any data?
> >>Yes. If you check the archives, you can even find the last time this was
> >>discussed...
> >I looked last night (coincidence actually) and didn't find proof that
> >you cannot lose data.
> You aren't going to find proof, any more than you'll find proof that you
> won't lose data if you do lose a journalling fs. (Because there isn't
> any.) Unfortunately, many people misunderstand the what a metadata
> journal does for you, and overstate its importance in this type of
> application.

Yes, many people do. :-)

> >How do you deal with the file system structure being updated before the
> >data blocks are (re-)written?
> *That's what the postgres log is for.* If the latest xlog entries don't
> make it to disk, they won't be replayed; if they didn't make it to
> disk, the transaction would not have been reported as commited. An
> application that understands filesystem semantics can guarantee data
> integrity without metadata journaling.

No. This is not true. Updating the file system structure (inodes, indirect
blocks) touches a separate part of the disk than the actual data. If
the file system structure is modified, say, to extend a file to allow
it to contain more data, but the data itself is not written, then upon
a restore, with a system such as ext2, or ext3 with writeback, or xfs,
it is possible that the end of the file, even the postgres log file,
will contain a random block of data from the disk. If this random block
of data happens to look like a valid xlog block, it may be played back,
and the database corrupted.

If the file system is only used for xlog data, the chance that it looks
like a valid block increases, would it not?

> >>The bottom line is that the only reason you need a metadata journalling
> >>filesystem is to save the fsck time when you come up. On a little
> >>partition like xlog, that's not an issue.
> >fsck isn't only about time to fix. fsck is needed, because the file system
> >is broken.
> fsck is needed to reconcile the metadata with the on-disk allocations.
> To do that, it reads all the inodes and their corresponding directory
> entries. The time to do that is proportional to the size of the
> filesystem, hence the comment about time. fsck is not needed "because
> the filesystem is broken", it's needed because the filesystem is marked
> dirty.

This is also wrong. fsck is needed because the file system is broken.

It takes time, because it doesn't have a journal to help it, therefore it
must look through the entire file system and guess what the problems are.
There are classes of problems such as I describe above, for which fsck
*cannot* guess how to solve the problem. There is not enough information
available for it to deduce that anything is wrong at all.

The probability is low, for sure - but then, the chance of a file system
failure is already low.

Betting on ext2 + postgresql xlog has not been confirmed to me as reliable.

Telling me that journalling is misunderstood doesn't prove to me that you
understand it.

I don't mean to be offensive, but I won't accept what you say, as it does
not make sense with my understanding of how file systems work. :-)

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


pgsql-performance by date:

Previous
From: Frank Wiles
Date:
Subject: Re: Beginner optimization questions, esp. regarding
Next
From: mark@mark.mielke.cc
Date:
Subject: Re: Postgresql Performance on an HP DL385 and