Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers

From Zeugswetter Andreas DAZ SD
Subject Re: Checkpoint cost, looks like it is WAL/CRC
Date
Msg-id E1539E0ED7043848906A8FF995BDA57945BA2A@m0143.s-mxs.net
Whole thread Raw
In response to Checkpoint cost, looks like it is WAL/CRC  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Checkpoint cost, looks like it is WAL/CRC
List pgsql-hackers
>> Are you sure about that? That would probably be the normal case, but
>> are you promised that the hardware will write all of the sectors of a

>> block in order?
>
> I don't think you can possibly assume that.  If the block
> crosses a cylinder boundary then it's certainly an unsafe
> assumption, and even within a cylinder (no seek required) I'm
> pretty sure that disk drives have understood "write the next
> sector that passes under the heads"
> for decades.

A lot of hardware exists, that guards against partial writes
of single IO requests (a persistent write cache for a HP raid
controller for intel servers costs ~500$ extra).

But, the OS usually has 4k (some 8k) filesystem buffer size,
and since we do not use direct io for datafiles, the OS might decide
to schedule two 4k writes differently for one 8k page.

If you do not build pg to match your fs buffer size you cannot
guard against partial writes with hardware :-(

We could alleviate that problem with direct io for datafiles.

Andreas


pgsql-hackers by date:

Previous
From: Koichi Suzuki
Date:
Subject: A couple of patches for PostgreSQL 64bit support
Next
From: "Zeugswetter Andreas DAZ SD"
Date:
Subject: Re: Checkpoint cost, looks like it is WAL/CRC