Re: Postgresql Performance on an HP DL385 and - Mailing list pgsql-performance
| From | mark@mark.mielke.cc |
|---|---|
| Subject | Re: Postgresql Performance on an HP DL385 and |
| Date | |
| Msg-id | 20060815193951.GA13695@mark.mielke.cc Whole thread Raw |
| In response to | Re: Postgresql Performance on an HP DL385 and (Michael Stone <mstone+postgres@mathom.us>) |
| Responses |
Re: Postgresql Performance on an HP DL385 and
|
| List | pgsql-performance |
On Tue, Aug 15, 2006 at 03:02:56PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 02:33:27PM -0400, mark@mark.mielke.cc wrote:
> >>>Are 'we' sure that such a setup can't lose any data?
> >>Yes. If you check the archives, you can even find the last time this was
> >>discussed...
> >I looked last night (coincidence actually) and didn't find proof that
> >you cannot lose data.
> You aren't going to find proof, any more than you'll find proof that you
> won't lose data if you do lose a journalling fs. (Because there isn't
> any.) Unfortunately, many people misunderstand the what a metadata
> journal does for you, and overstate its importance in this type of
> application.
Yes, many people do. :-)
> >How do you deal with the file system structure being updated before the
> >data blocks are (re-)written?
> *That's what the postgres log is for.* If the latest xlog entries don't
> make it to disk, they won't be replayed; if they didn't make it to
> disk, the transaction would not have been reported as commited. An
> application that understands filesystem semantics can guarantee data
> integrity without metadata journaling.
No. This is not true. Updating the file system structure (inodes, indirect
blocks) touches a separate part of the disk than the actual data. If
the file system structure is modified, say, to extend a file to allow
it to contain more data, but the data itself is not written, then upon
a restore, with a system such as ext2, or ext3 with writeback, or xfs,
it is possible that the end of the file, even the postgres log file,
will contain a random block of data from the disk. If this random block
of data happens to look like a valid xlog block, it may be played back,
and the database corrupted.
If the file system is only used for xlog data, the chance that it looks
like a valid block increases, would it not?
> >>The bottom line is that the only reason you need a metadata journalling
> >>filesystem is to save the fsck time when you come up. On a little
> >>partition like xlog, that's not an issue.
> >fsck isn't only about time to fix. fsck is needed, because the file system
> >is broken.
> fsck is needed to reconcile the metadata with the on-disk allocations.
> To do that, it reads all the inodes and their corresponding directory
> entries. The time to do that is proportional to the size of the
> filesystem, hence the comment about time. fsck is not needed "because
> the filesystem is broken", it's needed because the filesystem is marked
> dirty.
This is also wrong. fsck is needed because the file system is broken.
It takes time, because it doesn't have a journal to help it, therefore it
must look through the entire file system and guess what the problems are.
There are classes of problems such as I describe above, for which fsck
*cannot* guess how to solve the problem. There is not enough information
available for it to deduce that anything is wrong at all.
The probability is low, for sure - but then, the chance of a file system
failure is already low.
Betting on ext2 + postgresql xlog has not been confirmed to me as reliable.
Telling me that journalling is misunderstood doesn't prove to me that you
understand it.
I don't mean to be offensive, but I won't accept what you say, as it does
not make sense with my understanding of how file systems work. :-)
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
pgsql-performance by date: