Home > mailing lists

Re: better page-level checksums - Mailing list pgsql-hackers

From	Aleksander Alekseev
Subject	Re: better page-level checksums
Date	June 13, 2022 16:59:36
Msg-id	CAJ7c6TOKTzaf_JL4bMb=b4OV0e9ftmisQ4qBrSFhDVhgAMYwRA@mail.gmail.com Whole thread Raw
In response to	Re: better page-level checksums (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: better page-level checksums
List	pgsql-hackers

Tree view

Hi Robert,

> I don't think that a separate fork is a good option for reasons that I
> articulated previously: I think it will be significantly more complex
> to implement and add extra I/O.
>
> I am not completely opposed to the idea of making the algorithm
> pluggable but I'm not very excited about it either. Making the
> algorithm pluggable probably wouldn't be super-hard, but allowing a
> checksum of arbitrary size rather than one of a short list of fixed
> sizes might complicate efforts to ensure this doesn't degrade
> performance. And I'm not sure what the benefit is, either. This isn't
> like archive modules or custom backup targets where the feature
> proposes to interact with things outside the server and we don't know
> what's happening on the other side and so need to offer an interface
> that can accommodate what the user wants to do. Nor is it like a
> custom background worker or a custom data type which lives fully
> inside the database but the desired behavior could be anything. It's
> not even like column compression where I think that the same small set
> of strategies are probably fine for everybody but some people think
> that customizing the behavior by datatype would be a good idea. All
> it's doing is taking a fixed size block of data and checksumming it. I
> don't see that as being something where there's a lot of interesting
> things to experiment with from an extension point of view.

I see your point. Makes sense.

So, to clarify, what we are trying to achieve here is to reduce the
probability of an event when a page gets corrupted but the checksum is
accidentally the same as it was before the corruption, correct? And we
also assume that neither file system nor hardware catched this
corruption.

If that's the case I would say that using something like SHA256 would
be an overkill, not only because of the consumed disk space but also
because SHA256 is expensive. Allowing the user to choose from 16-bit,
32-bit and maybe 64-bit checksums should be enough. I would also
suggest that no matter how we do it, if the user chooses 16-bit
checksums the performance and the disk consumption should remain as
they currently are.

Regarding the particular choice of a hash function I would suggest the
MurmurHash family [1]. This is basically the industry standard (it's
good, it's fast, and relatively simple), and we already have
murmurhash32() in the core. We also have hash_bytes_extended() to get
64-bit checksums, but I have no strong opinion on whether this
particular hash function should be used for pages or not. I believe
some benchmarking is appropriate.

There is also a 128-bit version of MurmurHash. Personally I doubt that
it may be of value in practice, but it will not hurt to support it
either, while we are at it. (Probably not in the MVP, though). And if
we are going to choose this path, I see no reason not to support
SHA256 as well, for the paranoid users.

[1]: https://en.wikipedia.org/wiki/MurmurHash

-- 
Best regards,
Aleksander Alekseev

pgsql-hackers by date:

From: Tomas Vondra
Date: 13 June 2022, 16:05:38
Subject: Re: pgcon unconference / impact of block size on performance

From: Tom Lane
Date: 13 June 2022, 17:25:01
Subject: Re: "buffer too small" or "path too long"?

Re: better page-level checksums - Mailing list pgsql-hackers

Previous

Next