Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Checkpoint cost, looks like it is WAL/CRC
Date
Msg-id 87ackytyyd.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Checkpoint cost, looks like it is WAL/CRC  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Checkpoint cost, looks like it is WAL/CRC  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> "Zeugswetter Andreas DAZ SD" <ZeugswetterA@spardat.at> writes:
> > Only workable solution would imho be to write the LSN to each 512
> > byte block (not that I am propagating that idea). 
> 
> We're not doing anything like that, as it would create an impossible
> space-management problem (or are you happy with limiting tuples to
> 500 bytes?).  What we *could* do is calculate a page-level CRC and
> store it in the page header just before writing out.  Torn pages
> would then manifest as a wrong CRC on read.  No correction ability,
> but at least a reliable detection ability.

At the same time as you do the CRC you can copy the bytes to a fresh page
skipping the LSNs. Likewise, when writing out the page you have to calculate
the CRC; at the same time as you calculate the CRC you write out the bytes to
a temporary buffer adding LSNs and write that to disk.

This would be "zero-copy" if you're already scanning the bytes to calculate
the CRC since you can add and remove LSNs at the same time. It does require an
extra buffer to store the page in before writing and that entails some amount
of cache thrashing. But maybe you could reuse the same buffer over and over
again for every read/write.

-- 
greg



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Checkpoint cost, looks like it is WAL/CRC
Next
From: Tom Lane
Date:
Subject: Re: Checkpoint cost, looks like it is WAL/CRC