Re: SCSI vs. IDE performance test - Mailing list pgsql-general

From Allen Landsidel
Subject Re: SCSI vs. IDE performance test
Date
Msg-id 6.0.0.22.0.20031027183334.0245bb08@pop.hotpop.com
Whole thread Raw
In response to Re: SCSI vs. IDE performance test  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: SCSI vs. IDE performance test
List pgsql-general
Tom, this discussion brings up something that's been bugging me about the
recommendations for getting more performance out of PG.. in particular the
one that suggests you put your WAL files on a different physical drive from
the database.

Consider the following scenario:
Database on drive1
WAL on drive2

1. PG write of some sort occurs.
2. PG writes out the WAL.
3. PG writes out the data.
4. PG updates the WAL to reflect data actually written.
5. System crashes/reboots/whatever.

With the DB and the WAL on different drives, it seems possible to me that
drive2 could've fsync()'d or otherwise properly written all of the data
out, but drive1 could have failed somewhere along the way and not actually
written the data to the DB.

The next time PG is brought up, the WAL would indicate the transaction, as
it were, was a success.. but the data wouldn't actually be there.

In the case of using only one drive, the rollback (from a FS perspective)
couldn't possibly occur in such a way as to leave step 4 as a success, but
step 3 as a failure -- worst case, the data would be written out but the
WAL wouldn't have been updated (rolled back say by the FS) and thus PG will
roll back the data itself, or use whatever mechanism it uses to insure data
integrity is consistent with the WAL.

Am I smoking something here or is this a real, if rare in practice, risk
that occurs when you have the WAL on a different drive than the data is on?


At 17:39 10/27/2003, Tom Lane wrote:
>"Rick Gigger" <rick@alpinenetworking.com> writes:
> > It seems to me file system journaling should fix the whole problem by
> giving
> > you a record of what was actually commited to disk and what was not.
>
>Nope, a journaling FS has exactly the same problem Postgres does
>(because the underlying "WAL" concept is the same: write the log entries
>before you change the files they describe).  If the drive lies about
>write order, the FS can be screwed just as badly.  Now the FS code might
>have a low-level way to force write order that Postgres doesn't have
>access to ... but simply uttering the magic incantation "journaling file
>system" will not make this problem disappear.
>
>                         regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: What is an RT index?
Next
From: Tom Lane
Date:
Subject: Re: SCSI vs. IDE performance test