Home > mailing lists

Re: SCSI vs. IDE performance test - Mailing list pgsql-general

From	Allen Landsidel
Subject	Re: SCSI vs. IDE performance test
Date	October 27, 2003 19:40:38
Msg-id	6.0.0.22.0.20031027183334.0245bb08@pop.hotpop.com Whole thread Raw
In response to	Re: SCSI vs. IDE performance test (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: SCSI vs. IDE performance test
List	pgsql-general

Tree view

Tom, this discussion brings up something that's been bugging me about the
recommendations for getting more performance out of PG.. in particular the
one that suggests you put your WAL files on a different physical drive from
the database.

Consider the following scenario:
Database on drive1
WAL on drive2

1. PG write of some sort occurs.
2. PG writes out the WAL.
3. PG writes out the data.
4. PG updates the WAL to reflect data actually written.
5. System crashes/reboots/whatever.

With the DB and the WAL on different drives, it seems possible to me that
drive2 could've fsync()'d or otherwise properly written all of the data
out, but drive1 could have failed somewhere along the way and not actually
written the data to the DB.

The next time PG is brought up, the WAL would indicate the transaction, as
it were, was a success.. but the data wouldn't actually be there.

In the case of using only one drive, the rollback (from a FS perspective)
couldn't possibly occur in such a way as to leave step 4 as a success, but
step 3 as a failure -- worst case, the data would be written out but the
WAL wouldn't have been updated (rolled back say by the FS) and thus PG will
roll back the data itself, or use whatever mechanism it uses to insure data
integrity is consistent with the WAL.

Am I smoking something here or is this a real, if rare in practice, risk
that occurs when you have the WAL on a different drive than the data is on?

At 17:39 10/27/2003, Tom Lane wrote:
>"Rick Gigger" <rick@alpinenetworking.com> writes:
> > It seems to me file system journaling should fix the whole problem by
> giving
> > you a record of what was actually commited to disk and what was not.
>
>Nope, a journaling FS has exactly the same problem Postgres does
>(because the underlying "WAL" concept is the same: write the log entries
>before you change the files they describe).  If the drive lies about
>write order, the FS can be screwed just as badly.  Now the FS code might
>have a low-level way to force write order that Postgres doesn't have
>access to ... but simply uttering the magic incantation "journaling file
>system" will not make this problem disappear.
>
>                         regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html

pgsql-general by date:

From: Tom Lane
Date: 27 October 2003, 19:38:41
Subject: Re: What is an RT index?

From: Tom Lane
Date: 27 October 2003, 20:06:16
Subject: Re: SCSI vs. IDE performance test

Re: SCSI vs. IDE performance test - Mailing list pgsql-general

Previous

Next