Re: better page-level checksums - Mailing list pgsql-hackers
From | Aleksander Alekseev |
---|---|
Subject | Re: better page-level checksums |
Date | |
Msg-id | CAJ7c6TOKTzaf_JL4bMb=b4OV0e9ftmisQ4qBrSFhDVhgAMYwRA@mail.gmail.com Whole thread Raw |
In response to | Re: better page-level checksums (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: better page-level checksums
|
List | pgsql-hackers |
Hi Robert, > I don't think that a separate fork is a good option for reasons that I > articulated previously: I think it will be significantly more complex > to implement and add extra I/O. > > I am not completely opposed to the idea of making the algorithm > pluggable but I'm not very excited about it either. Making the > algorithm pluggable probably wouldn't be super-hard, but allowing a > checksum of arbitrary size rather than one of a short list of fixed > sizes might complicate efforts to ensure this doesn't degrade > performance. And I'm not sure what the benefit is, either. This isn't > like archive modules or custom backup targets where the feature > proposes to interact with things outside the server and we don't know > what's happening on the other side and so need to offer an interface > that can accommodate what the user wants to do. Nor is it like a > custom background worker or a custom data type which lives fully > inside the database but the desired behavior could be anything. It's > not even like column compression where I think that the same small set > of strategies are probably fine for everybody but some people think > that customizing the behavior by datatype would be a good idea. All > it's doing is taking a fixed size block of data and checksumming it. I > don't see that as being something where there's a lot of interesting > things to experiment with from an extension point of view. I see your point. Makes sense. So, to clarify, what we are trying to achieve here is to reduce the probability of an event when a page gets corrupted but the checksum is accidentally the same as it was before the corruption, correct? And we also assume that neither file system nor hardware catched this corruption. If that's the case I would say that using something like SHA256 would be an overkill, not only because of the consumed disk space but also because SHA256 is expensive. Allowing the user to choose from 16-bit, 32-bit and maybe 64-bit checksums should be enough. I would also suggest that no matter how we do it, if the user chooses 16-bit checksums the performance and the disk consumption should remain as they currently are. Regarding the particular choice of a hash function I would suggest the MurmurHash family [1]. This is basically the industry standard (it's good, it's fast, and relatively simple), and we already have murmurhash32() in the core. We also have hash_bytes_extended() to get 64-bit checksums, but I have no strong opinion on whether this particular hash function should be used for pages or not. I believe some benchmarking is appropriate. There is also a 128-bit version of MurmurHash. Personally I doubt that it may be of value in practice, but it will not hurt to support it either, while we are at it. (Probably not in the MVP, though). And if we are going to choose this path, I see no reason not to support SHA256 as well, for the paranoid users. [1]: https://en.wikipedia.org/wiki/MurmurHash -- Best regards, Aleksander Alekseev
pgsql-hackers by date: