Re: Block-level CRC checks - Mailing list pgsql-hackers

From Joshua D. Drake
Subject Re: Block-level CRC checks
Date
Msg-id 1259604977.26322.5.camel@jd-desktop.iso-8859-1.charter.com
Whole thread Raw
In response to Re: Block-level CRC checks  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Mon, 2009-11-30 at 13:21 +0000, Simon Riggs wrote:
> On Fri, 2008-10-17 at 12:26 -0300, Alvaro Herrera wrote:
> > So this discussion died with no solution arising to the
> > hint-bit-setting-invalidates-the-CRC problem.
> > 
> > Apparently the only solution in sight is to WAL-log hint bits.  Simon
> > opines it would be horrible from a performance standpoint to WAL-log
> > every hint bit set, and I think we all agree with that.  So we need to
> > find an alternative mechanism to WAL log hint bits.
> 
> It occurred to me that maybe we don't need to WAL-log the CRC checks.
> 
> Proposal
> 
> * We reserve enough space on a disk block for a CRC check. When a dirty
> block is written to disk we calculate and annotate the CRC value, though
> this is *not* WAL logged.
> 
> * In normal running we re-check the CRC when we read the block back into
> shared_buffers.
> 
> * In recovery we will overwrite the last image of a block from WAL, so
> we ignore the block CRC check, since the WAL record was already CRC
> checked. If full_page_writes = off, we ignore and zero the block's CRC
> for any block touched during recovery. We do those things because the
> block CRC in the WAL is likely to be different to that on disk, due to
> hints.
> 
> * We also re-check the CRC on a block immediately before we dirty the
> block (for any reason). This minimises the possibility of in-memory data
> corruption for blocks.
> 
> So in the typical case all blocks moving from disk <-> memory and from
> clean -> dirty are CRC checked. So in the case where we have
> full_page_writes = on then we have a good CRC every time. In the
> full_page_writes = off case we are exposed only on the blocks that
> changed during last checkpoint cycle and only if we crash. That seems
> good because most databases are up 99% of the time, so any corruptions
> are likely to occur in normal running, not as a result of crashes.
> 
> This would be a run-time option.
> 
> Like it?
> 

Just FYI, Alvaro is out of town and our of email access (almost
exclusively). It may take him another week or so to get back to this.

Joshua D. Drake



> -- 
>  Simon Riggs           www.2ndQuadrant.com
> 
> 


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: ProcessUtility_hook
Next
From: Tom Lane
Date:
Subject: Re: Deleted WAL files held open by backends in Linux