Re: How much do the hint bits help? - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: How much do the hint bits help?
Date
Msg-id 1293031905.1193.28292.camel@ebony
Whole thread Raw
In response to Re: How much do the hint bits help?  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: How much do the hint bits help?
Re: How much do the hint bits help?
List pgsql-hackers
On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:
> On 22.12.2010 16:52, Simon Riggs wrote:
> > On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:
> >> On 22.12.2010 15:59, Simon Riggs wrote:
> >>> On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:
> >>>> My gut feeling is that a reasonable compromise is to set hint bits like
> >>>> we do today, but don't mark the page as dirty when only hint bits are
> >>>> set. That way you get the benefit of hint bits for tuples that are
> >>>> frequently accessed and stay in buffer cache. But you don't spend any
> >>>> extra I/O to set them. I'd really like to see a worst-case scenario
> >>>> benchmark of a patch that does that.
> >>>
> >>> That sounds great, but still prevents block checksums and that is a very
> >>> valuable feature for robustness.
> >>
> >> It does? The problem with block checksums is that if you modify a page
> >> and don't have a corresponding WAL record for it, like a hint bit
> >> update, you can have a torn page so that the checksum doesn't match.
> >> Refraining from dirtying the page when a hint bit is updated avoids the
> >> problem. With that change, we only ever write pages to disk that have a
> >> WAL record associated with it, with full-page images as necessary to
> >> avoid torn pages.
> >
> > Which then leads to a block CRC not matching the block in memory.

> Do you envision that the CRC is calculated at every update, or only when 
> a page is written out from the buffer cache? 

At every update, so there is a clear assertion that the CRC matches the
block.

> If the former, you could 
> recalculate the CRC at a hint bit update too. If the latter, the hint 
> bits are included in the page image that you checksum just like any 
> other data.

If we didn't have hint bits, we wouldn't need to recalculate the CRC
each time one was updated...

> > So what you suggest works only if we restrict CRC checking to blocks
> > incoming to the buffer cache, but leaves us unable to do CRC checks on
> > blocks once in the buffer cache. Since many blocks stay in cache almost
> > constantly, we're left with the situation that the most heavily used
> > parts of the database seldom get CRC checked.
> 
> There's plenty of stuff in memory that's not covered by an 
> application-level CRC. That's what ECC RAM is for. 

http://www.google.com/research/pubs/archive/35162.pdf

Google research shows that each DIMM has an 8% chance per annum of
uncorrectable memory errors, even on ECC.

If you have large RAM, like everybody now does, your incidence of this
type of error will be much higher than it was in previous years, so our
perception of what is necessary now to protect databases is out of date.

We have data under our care, and will be much more likely to receive
this kind of error because of the amount of RAM we use.

> Updating the CRC at 
> every update to a page seems really expensive, but it's an orthogonal 
> issue to hint bits.

Clearly, the frequency with which we set hint bits affects the frequency
we can sensibly update CRCs. It shouldn't be up to us to decide how much
protection a user wants to give their data.

There might be two or three settings that make sense, but clearly we
need to be able to limit hint-bit setting to allow us to have a usable
CRC check. So there is a very string connection between turning this
optimisation off and gaining CRC checking as a feature.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: SQL/MED - core functionality
Next
From: David Fetter
Date:
Subject: Re: SQL/MED - core functionality