Home > mailing lists

Re: Enabling Checksums - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Enabling Checksums
Date	March 4, 2013 16:00:01
Msg-id	1362412800.26602.32.camel@jdavis Whole thread Raw
In response to	Re: Enabling Checksums (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: Enabling Checksums Re: Enabling Checksums
List	pgsql-hackers

Tree view

On Mon, 2013-03-04 at 10:36 +0200, Heikki Linnakangas wrote:
> On 04.03.2013 09:11, Simon Riggs wrote:
> > Are there objectors?
> 
> FWIW, I still think that checksumming belongs in the filesystem, not 
> PostgreSQL.

Doing checksums in the filesystem has some downsides. One is that you
need to use a copy-on-write filesystem like btrfs or zfs, which (by
design) will fragment the heap on random writes. If we're going to start
pushing people toward those systems, we will probably need to spend some
effort to mitigate this problem (aside: my patch to remove
PD_ALL_VISIBLE might get some new wind behind it).

There are also other issues, like what fraction of our users can freely
move to btrfs, and when. If it doesn't happen to be already there, you
need root to get it there, which has never been a requirement before.

I don't fundamentally disagree. We probably need to perform reasonably
well on btrfs in COW mode[1] regardless, because a lot of people will be
using it a few years from now. But there are a lot of unknowns here, and
I'm concerned about tying checksums to a series of things that will be
resolved a few years from now, if ever.

[1] Interestingly, you can turn off COW mode on btrfs, but you lose
checksums if you do.

>  If you go ahead with this anyway, at the very least I'd like 
> to see some sort of a comparison with e.g btrfs. How do performance, 
> error-detection rate, and behavior on error compare? Any other metrics 
> that are relevant here?

I suspect it will be hard to get an apples-to-apples comparison here
because of the heap fragmentation, which means that a sequential scan is
not so sequential. That may be acceptable for some workloads but not for
others, so it would get tricky to compare. And any performance numbers
from an experimental filesystem are somewhat suspect anyway.

Also, it's a little more challenging to test corruption on a filesystem,
because you need to find the location of the file you want to corrupt,
and corrupt it out from underneath the filesystem.

Greg may have more comments on this matter.

Regards,Jeff Davis

pgsql-hackers by date:

From: Cliff_Bytes
Date: 04 March 2013, 15:57:08
Subject: Re: LIBPQ Implementation Requiring BYTEA Data

From: Tom Lane
Date: 04 March 2013, 16:04:41
Subject: Re: Seg fault when processing large SPI cursor (PG9.13)

Re: Enabling Checksums - Mailing list pgsql-hackers

Previous

Next