Re: better page-level checksums - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: better page-level checksums |
Date | |
Msg-id | CA+TgmoaYfUuWpMG0ynC-z2ofTpFdA6a6QCyrzV3iCmkVjwoYpA@mail.gmail.com Whole thread Raw |
In response to | Re: better page-level checksums (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: better page-level checksums
|
List | pgsql-hackers |
On Mon, Jun 13, 2022 at 5:14 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > It's not that I disagree with (or dislike the idea of) increasing the > resilience of checksums, I just want to be very careful that we don't > trade (potentially significant) runtime performance for features > people might not use. This thread seems very related to the 'storing > an explicit nonce'-thread, which also wants to reclaim space from a > page that is currently used by AMs, while AMs would lose access to > certain information on pages and certain optimizations that they could > do before. I'm very hesitant to let just any modification to the page > format go through because someone needs extra metadata attached to a > page. Right. So, to be clear, I think there is an opportunity to store ONE extra blob of data in the page. It might be an extended checksum, or it might be a nonce for cryptographic authentication, but it can't be both. I think this is OK, because in earlier discussions of TDE, it seems that if you're using encryption and also want to verify page integrity, you'll use an encryption system that produces some kind of verifier, and you'll store that into this space in the page instead of using an enhanced-checksum feature. In other words, I'm imagining creating a space at the end of each page for some sort of enhanced security or data integrity feature, and you can either choose not to use one (in which case things work as they do today), or you can choose an extended checksums feature, or maybe in the future you can choose some form of TDE that involves storing a nonce or a page verifier in the page. But you just get one. Now, the logical question to ask is: well, if there's only one opportunity to store an extra blob of data on every page, is this the best way to use it? What if someone comes along with another feature that also wants to store a blob of data on every page, and they can't do it because this proposal got there first? My answer is: well, if that additional feature is something that provides encryption or tamper-resistance or data integrity or security in any form, then it can just be added as a new option for how you use this blob of space, and users who prefer the new thing to the existing options can pick it. If it's something else, then .... what is it, exactly? It seems to me that the kinds of things that require space in *every* page of the cluster are really the things that fall into this category. For example, Stephen mused earlier that maybe while we're at it we could find a way to include an XID epoch in every page. Maybe so, but we wouldn't actually want that in *every* page. We would only want it in the heap pages. And as far as I can see that's pretty generally how things go. There are plenty of projects that might want extra space in each page *for a certain AM* and I don't see any reason why what I propose to do here would rule that out. I think this and that could both be done, and doing this might even make doing that easier by putting in place some useful infrastructure. What I don't think we can get away with is having multiple systems that are each taking a bite out of every page for every AM -- but I think that's OK, because I don't think there's a lot of need for multiple such systems. > That reminds me, there's one more item to be put on the compatibility > checklist: Currently, the FSM code assumes it can use all space on a > page (except the page header) for its total of 3 levels of FSM data. > Mixing page formats would break how it currently works, as changing > the space that is available on a page will change the fanout level of > each leaf in the tree, which our current code can't handle. To change > the page format of one page in the FSM would thus either require a > rewrite of the whole FSM fork, or extra metadata attached to the > relation that details where the format changes. A similar issue exists > with the VM fork. I agree with all of this except I think that "mixing page formats" is a thing we can't do. > That being said, I think that it could be possible to reuse > pd_checksum as an extra area indicator between pd_upper and > pd_special, so that we'd get [pageheader][pd_linp...] pd_lower [hole] > pd_upper [datas] pd_storage_ext [blackbox] pd_special [special area]. > This should require limited rework in current AMs, especially if we > provide a global MAX_STORAGE_EXT_SIZE that AMs can use to get some > upper limit on how much overhead the storage uses per page. This is an interesting alternative. It's unclear to me that it makes anything better if the [blackbox] area is before the special area vs. afterward. And either way, if that area is fixed-size across the cluster, you don't really need to use pd_checksum to find it, because you can just know where it is. A possible advantage of this approach is that it might make it simpler to cope with a scenario where some pages in the cluster have this blackbox space and others don't. I wasn't really thinking that on-line page format conversions were likely to be practical, but certainly the chances are better if we've got an explicit pointer to the extra space vs. just knowing where it has to be. > Alternatively, we could claim some space on a page using a special > line pointer at the start of the page referring to storage data, while > having the same limitation on size. That sounds messy. > One last option is we recognise that there are two storage locations > of pages that have different data requirements -- on-disk that > requires checksums, and in-memory that requires LSNs. Currently, those > fields are both stored on the page in distinct fields, but we could > (_could_) update the code to drop LSN when we store the page, and drop > the checksum when we load the page (at the cost of redo speed when > recovering from an unclean shutdown). That would provide an extra 64 > bits on the page without breaking storage, assuming AMs don't already > misuse pd_lsn. It seems wrong to me to say that we don't need the LSN for a page stored on disk. Recovery relies on it. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: