Re: better page-level checksums - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: better page-level checksums
Date
Msg-id CAJ7c6TOKTzaf_JL4bMb=b4OV0e9ftmisQ4qBrSFhDVhgAMYwRA@mail.gmail.com
Whole thread Raw
In response to Re: better page-level checksums  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: better page-level checksums
List pgsql-hackers
Hi Robert,

> I don't think that a separate fork is a good option for reasons that I
> articulated previously: I think it will be significantly more complex
> to implement and add extra I/O.
>
> I am not completely opposed to the idea of making the algorithm
> pluggable but I'm not very excited about it either. Making the
> algorithm pluggable probably wouldn't be super-hard, but allowing a
> checksum of arbitrary size rather than one of a short list of fixed
> sizes might complicate efforts to ensure this doesn't degrade
> performance. And I'm not sure what the benefit is, either. This isn't
> like archive modules or custom backup targets where the feature
> proposes to interact with things outside the server and we don't know
> what's happening on the other side and so need to offer an interface
> that can accommodate what the user wants to do. Nor is it like a
> custom background worker or a custom data type which lives fully
> inside the database but the desired behavior could be anything. It's
> not even like column compression where I think that the same small set
> of strategies are probably fine for everybody but some people think
> that customizing the behavior by datatype would be a good idea. All
> it's doing is taking a fixed size block of data and checksumming it. I
> don't see that as being something where there's a lot of interesting
> things to experiment with from an extension point of view.

I see your point. Makes sense.

So, to clarify, what we are trying to achieve here is to reduce the
probability of an event when a page gets corrupted but the checksum is
accidentally the same as it was before the corruption, correct? And we
also assume that neither file system nor hardware catched this
corruption.

If that's the case I would say that using something like SHA256 would
be an overkill, not only because of the consumed disk space but also
because SHA256 is expensive. Allowing the user to choose from 16-bit,
32-bit and maybe 64-bit checksums should be enough. I would also
suggest that no matter how we do it, if the user chooses 16-bit
checksums the performance and the disk consumption should remain as
they currently are.

Regarding the particular choice of a hash function I would suggest the
MurmurHash family [1]. This is basically the industry standard (it's
good, it's fast, and relatively simple), and we already have
murmurhash32() in the core. We also have hash_bytes_extended() to get
64-bit checksums, but I have no strong opinion on whether this
particular hash function should be used for pages or not. I believe
some benchmarking is appropriate.

There is also a 128-bit version of MurmurHash. Personally I doubt that
it may be of value in practice, but it will not hurt to support it
either, while we are at it. (Probably not in the MVP, though). And if
we are going to choose this path, I see no reason not to support
SHA256 as well, for the paranoid users.

[1]: https://en.wikipedia.org/wiki/MurmurHash

-- 
Best regards,
Aleksander Alekseev



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: pgcon unconference / impact of block size on performance
Next
From: Tom Lane
Date:
Subject: Re: "buffer too small" or "path too long"?