Re: Postgresql Performance on an HP DL385 and - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Postgresql Performance on an HP DL385 and
Date
Msg-id 20060815191505.GH21363@pervasive.com
Whole thread Raw
In response to Re: Postgresql Performance on an HP DL385 and  (Michael Stone <mstone+postgres@mathom.us>)
Responses Re: Postgresql Performance on an HP DL385 and
Re: Postgresql Performance on an HP DL385 and
List pgsql-performance
On Tue, Aug 15, 2006 at 03:02:56PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 02:33:27PM -0400, mark@mark.mielke.cc wrote:
> >On Tue, Aug 15, 2006 at 01:26:46PM -0400, Michael Stone wrote:
> >>On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote:
> >>>Are 'we' sure that such a setup can't lose any data?
> >>Yes. If you check the archives, you can even find the last time this was
> >>discussed...
> >
> >I looked last night (coincidence actually) and didn't find proof that
> >you cannot lose data.
>
> You aren't going to find proof, any more than you'll find proof that you
> won't lose data if you do lose a journalling fs. (Because there isn't
> any.) Unfortunately, many people misunderstand the what a metadata
> journal does for you, and overstate its importance in this type of
> application.
>
> >How do you deal with the file system structure being updated before the
> >data blocks are (re-)written?
>
> *That's what the postgres log is for.* If the latest xlog entries don't
> make it to disk, they won't be replayed; if they didn't make it to
> disk, the transaction would not have been reported as commited. An
> application that understands filesystem semantics can guarantee data
> integrity without metadata journaling.

So what causes files to get 'lost' and get stuck in lost+found?
AFAIK that's because the file was written before the metadata. Now, if
fsync'ing a file also ensures that all the metadata is written, then
we're probably fine... if not, then we're at risk every time we create a
new file (every WAL segment if archiving is on, and every time a
relation passes a 1GB boundary).

FWIW, the way that FreeBSD gets around the need to fsck a dirty
filesystem before use without using a journal is to ensure that metadate
operations are always on the drive before the actual data is written.
There's still a need to fsck a dirty filesystem, but it can now be done
in the background, with the filesystem mounted and in use.

> >>The bottom line is that the only reason you need a metadata journalling
> >>filesystem is to save the fsck time when you come up. On a little
> >>partition like xlog, that's not an issue.
> >
> >fsck isn't only about time to fix. fsck is needed, because the file system
> >is broken.
>
> fsck is needed to reconcile the metadata with the on-disk allocations.
> To do that, it reads all the inodes and their corresponding directory
> entries. The time to do that is proportional to the size of the
> filesystem, hence the comment about time. fsck is not needed "because
> the filesystem is broken", it's needed because the filesystem is marked
> dirty.
>
> Mike Stone
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

pgsql-performance by date:

Previous
From: Michael Stone
Date:
Subject: Re: Postgresql Performance on an HP DL385 and
Next
From: Vivek Khera
Date:
Subject: Re: Dell PowerEdge 2950 performance