Re: better page-level checksums - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: better page-level checksums |
Date | |
Msg-id | CAH2-WzkHDD33osG3CT0wrX9s_7MvRPSQOgkm7Hdj1F9prDUudA@mail.gmail.com Whole thread Raw |
In response to | Re: better page-level checksums (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Tue, Jun 14, 2022 at 10:43 AM Robert Haas <robertmhaas@gmail.com> wrote: > Hmm, but on the other hand, if you imagine a scenario in which the > "storage system extra blob" is actually a nonce for TDE, you need to > be able to find it before you've decrypted the rest of the page. If > pd_checksum gives you the offset of that data, you need to exclude it > from what gets encrypted, which means that you need encrypt three > separate non-contiguous areas of the page whose combined size is > unlikely to be a multiple of the encryption algorithm's block size. > That kind of sucks (and putting it at the end of the page makes it way > better). I don't have a great understanding of how that cost will be felt in detail right now, because I don't know enough about the project and the requirements for TDE in general. > That said, I certainly agree that finding the special space needs to > be fast. The question in my mind is HOW fast it needs to be, and what > techniques we might be able to use to dodge the problem. For instance, > suppose that, during the startup sequence, we look at the control > file, figure out the size of the 'storage system extra blob', and > based on that each AM figures out the byte-offset of its special space > and caches that in a global variable. Then, instead of > PageGetSpecialSpace(page) it does PageGetBtreeSpecialSpace(page) or > whatever, where the implementation is ((char*) page) + > the_afformentioned_global_variable. Is that going to be too slow? Who knows? For now the important point is that there is a tension between the requirements of TDE, and the requirements of access methods (especially index access methods). It's possible that this will turn out not to be much of a problem. But the burden of proof is yours. Making a big change to the on-disk format like this (a change that affects every access method) should be held to an exceptionally high standard. There are bound to be tacit or even explicit assumptions made by access methods that you risk breaking here. The reality is that all of the access method code evolved in an environment where the special space size was constant and generic for a given BLCKSZ. I don't have much sympathy for any suggestion that code written 20 years ago should have known not to make these assumptions. I have a lot more sympathy for the idea that it's a general problem with our infrastructure (particularly code in bufpage.c and the delicate assumptions made by its callers) -- a problem that is worth addressing with a broad solution that enables lots of different work. We don't necessarily get another shot at this if we get it wrong now. > There's going to have to be some compromise here. On the one hand > you're going to have people who want to be able to do run-time > conversions between page formats even at the cost of extra runtime > overhead on top of what the basic feature necessarily implies. On the > other hand you're going to have people who don't think any overhead at > all is acceptable, even if it's purely nominal and only visible on a > microbenchmark. Such arguments can easily become holy wars. How many times has a big change to the on-disk format of this kind of magnitude taken place, post-pg_upgrade? I would argue that this would be the first, since it is the moral equivalent of extending the size of the generic page header. For all I know the overhead will be perfectly fine, and everybody wins. I just want to be adamant that we're making the right trade-offs, and maximizing the benefit from any new cost imposed on access method code. -- Peter Geoghegan
pgsql-hackers by date: