On Fri, Dec 30, 2011 at 11:58 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On 12/29/11, Ants Aasma <ants.aasma@eesti.ee> wrote:
>> Unless I'm missing something, double-writes are needed for all writes,
>> not only the first page after a checkpoint. Consider this sequence of
>> events:
>>
>> 1. Checkpoint
>> 2. Double-write of page A (DW buffer write, sync, heap write)
>> 3. Sync of heap, releasing DW buffer for new writes.
>> ... some time goes by
>> 4. Regular write of page A
>> 5. OS writes one part of page A
>> 6. Crash!
>>
>> Now recovery comes along, page A is broken in the heap with no
>> double-write buffer backup nor anything to recover it by in the WAL.
>
> Isn't 3 the very definition of a checkpoint, meaning that 4 is not
> really a regular write as it is the first one after a checkpoint?
I think you nailed it.
> But it doesn't seem safe to me replace a page from the DW buffer and
> then apply WAL to that replaced page which preceded the age of the
> page in the buffer.
That's what LSNs are for.
If we write the page to the checkpoint buffer just once per
checkpoint, recovery can restore the double-written versions of the
pages and then begin WAL replay, which will restore all the subsequent
changes made to the page. Recovery may also need to do additional
double-writes if it encounters pages that for which we wrote WAL but
never flushed the buffer, because a crash during recovery can also
create torn pages. When we reach a restartpoint, we fsync everything
down to disk and then nuke the double-write buffer. Similarly, in
normal running, we can nuke the double-write buffer at checkpoint
time, once the fsyncs are complete.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company