Home > mailing lists

Re: corrupt pages detected by enabling checksums - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: corrupt pages detected by enabling checksums
Date	May 9, 2013 00:34:35
Msg-id	1368059668.20500.31.camel@sussancws0025 Whole thread Raw
In response to	Re: corrupt pages detected by enabling checksums (Jim Nasby <jim@nasby.net>)
Responses	Re: corrupt pages detected by enabling checksums
List	pgsql-hackers

Tree view

On Wed, 2013-05-08 at 17:56 -0500, Jim Nasby wrote:
> Apologies if this is a stupid question, but is this mostly an issue
> due to torn pages? IOW, if we had a way to ensure we never see torn
> pages, would that mean an invalid CRC on a WAL page indicated there
> really was corruption on that page?
> 
> Maybe it's worth putting (yet more) thought into the torn page
> issue... :/

Sort of. For data, a page is the logically-atomic unit that is expected
to be intact. For WAL, a record is the logically-atomic unit that is
expected to be intact.

So it might be better to say that the issue for the WAL is "torn
records". A record might be larger than a page (it can hold up to three
full-page images in one record), but is often much smaller.

We use a CRC to validate that the WAL record is fully intact. The
concern is that, if it fails the CRC check, we *assume* that it's
because it wasn't completely flushed yet (i.e. a "torn record"). Based
on that assumption, neither that record nor any later record contains
committed transactions, so we can safely consider that the end of the
WAL (as of the crash) and bring the system up.

The problem is that the assumption is not always true: a CRC failure
could also indicate real corruption of WAL records that have been
previously flushed successfully, and may contain committed transactions.
That can mean we bring the system up way too early, corrupting the
database.

Unfortunately, it seems that doing any kind of validation to determine
that we have a valid end-of-the-WAL inherently requires some kind of
separate durable write somewhere. It would be a tiny amount of data (an
LSN and maybe some extra crosscheck information), so I could imagine
that would be just fine given the right hardware; but if we just write
to disk that would be pretty bad. Ideas welcome.

Regards,Jeff Davis

pgsql-hackers by date:

From: David Fetter
Date: 09 May 2013, 00:12:54
Subject: Re: Proposal to add --single-row to psql

From: Fujii Masao
Date: 09 May 2013, 00:59:17
Subject: Re: Fast promotion failure

Re: corrupt pages detected by enabling checksums - Mailing list pgsql-hackers

Previous

Next