Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Checkpoint cost, looks like it is WAL/CRC
Date
Msg-id 200507071527.j67FRgx10146@candle.pha.pa.us
Whole thread Raw
In response to Re: Checkpoint cost, looks like it is WAL/CRC  ("Zeugswetter Andreas DAZ SD" <ZeugswetterA@spardat.at>)
List pgsql-hackers
Zeugswetter Andreas DAZ SD wrote:
> 
> >> Are you sure about that? That would probably be the normal case, but 
> >> are you promised that the hardware will write all of the sectors of a
> 
> >> block in order?
> > 
> > I don't think you can possibly assume that.  If the block 
> > crosses a cylinder boundary then it's certainly an unsafe 
> > assumption, and even within a cylinder (no seek required) I'm 
> > pretty sure that disk drives have understood "write the next 
> > sector that passes under the heads"
> > for decades.
> 
> A lot of hardware exists, that guards against partial writes
> of single IO requests (a persistent write cache for a HP raid 
> controller for intel servers costs ~500$ extra).
> 
> But, the OS usually has 4k (some 8k) filesystem buffer size,
> and since we do not use direct io for datafiles, the OS might decide 
> to schedule two 4k writes differently for one 8k page.
> 
> If you do not build pg to match your fs buffer size you cannot
> guard against partial writes with hardware :-(
> 
> We could alleviate that problem with direct io for datafiles.

Now that is an interesting analysis.  I thought people who used
batter-backed drive cache wouldn't have partial page write problems, but
I now see it is certainly possible.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Checkpoint cost, looks like it is WAL/CRC
Next
From: Greg Stark
Date:
Subject: Re: Checkpoint cost, looks like it is WAL/CRC