Re: Enabling Checksums - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Enabling Checksums
Date
Msg-id 5139A377.1040905@vmware.com
Whole thread Raw
In response to Re: Enabling Checksums  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Enabling Checksums  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
On 08.03.2013 05:31, Bruce Momjian wrote:
> Also, don't all modern storage drives have built-in checksums, and
> report problems to the system administrator?  Does smartctl help report
> storage corruption?
>
> Let me take a guess at answering this --- we have several layers in a
> database server:
>
>     1 storage
>     2 storage controller
>     3 file system
>     4 RAM
>     5 CPU
>
> My guess is that storage checksums only cover layer 1, while our patch
> covers layers 1-3, and probably not 4-5 because we only compute the
> checksum on write.

There is a thing called "Data Integrity Field" and/or "Data Integrity 
Extensions", that allow storing a checksum with each disk sector, and 
verifying the checksum in each layer. The basic idea is that instead of 
512 byte sectors, the drive is formatted to use 520 byte sectors, with 
the extra 8 bytes used for the checksum and some other metadata. That 
gets around the problem we have in PostgreSQL, and that filesystems 
have, which is that you need to store the checksum somewhere along with 
the data.

When a write I/O request is made in the OS, the OS calculates the 
checksum and passes it to through the controller to the drive. The drive 
verifies the checksum, and aborts the I/O request if it doesn't match. 
On a read, the checksum is read from the drive along with the actual 
data, passed through the controller, and the OS verifies it. This covers 
layers 1-2 or 1-3.

Now, this requires all the components to have support for that. I'm not 
an expert on these things, but I'd guess that that's a tall order today. 
I don't know which hardware vendors and kernel versions support that. 
But things usually keep improving, and hopefully in a few years, you can 
easily buy a hardware stack that supports DIF all the way through.

In theory, the OS could also expose the DIF field to the application, so 
that you get end-to-end protection from the application to the disk. 
This means that the application somehow gets access to those extra bytes 
in each sector, and you have to calculate and verify the checksum in the 
application. There are no standard APIs for that yet, though.

See https://www.kernel.org/doc/Documentation/block/data-integrity.txt.

- Heikki



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: 9.2.3 crashes during archive recovery
Next
From: Heikki Linnakangas
Date:
Subject: Re: Enabling Checksums