Re: storing an explicit nonce - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: storing an explicit nonce |
Date | |
Msg-id | CA+TgmoZ5bK89mG7KO8Bp0qHS2w11j_-YHC=nV0Bs9oGqpqk6=w@mail.gmail.com Whole thread Raw |
In response to | Re: storing an explicit nonce (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: storing an explicit nonce
Re: storing an explicit nonce Re: storing an explicit nonce |
List | pgsql-hackers |
On Tue, May 25, 2021 at 7:58 PM Stephen Frost <sfrost@snowman.net> wrote: > The simple thought I had was masking them out, yes. No, you can't > re-encrypt a different page with the same nonce. (Re-encrypting the > exact same page with the same nonce, however, just yields the same > cryptotext and therefore is fine). In the interest of not being viewed as too much of a naysayer, let me first reiterate that I am generally in favor of TDE going forward and am not looking to throw up unnecessary obstacles in the way of making that happen. That said, I don't see how this particular idea can work. When we want to write a page out to disk, we need to identify which bits in the page are hint bits, so that we can avoid including them in what is encrypted, which seems complicated and expensive. But even worse, when we then read a page back off of disk, we'd need to decrypt everything except for the hint bits, but how do we know which bits are hint bits if the page isn't decrypted yet? We can't annotate an 8kB page that might be full with enough extra information to say where the non-encrypted parts are and still have the result be guaranteed to fit within 8kb. Also, it's not just hint bits per se, but anything that would cause us to use MarkBufferDirtyHint(). For a btree index, per _bt_check_unique and _bt_killitems, that includes the entire line pointer array, because of how ItemIdMarkDead() is used. Even apart from the problem of how decryption would know which things we encrypted and which things we didn't, I really have a hard time believing that it's OK to exclude the entire line pointer array in every btree page from encryption from a security perspective. Among other potential problems, that's leaking all the information an attacker could possibly want to have about where their known plaintext might occur in the page. However, I believe that if we store the nonce in the page explicitly, as proposed here, rather trying to derive it from the LSN, then we don't need to worry about this kind of masking, which I think is better from both a security perspective and a performance perspective. There is one thing I'm not quite sure about, though. I had previously imagined that each page would have a nonce and we could just do nonce++ each time we write the page. But that doesn't quite work if the standby can do more writes of the same page than the master. One vague idea I have for fixing this is: let each page's 16-byte nonce consist of 8 random bytes and an 8-byte counter that will be incremented on every write. But, the first time a standby writes each page, force a "key rotation" where the 8-byte random value is replaced with a new one, different one from what the master is using for that page. Detecting this is a bit expensive, because it probably means we need to store the TLI that last wrote each page on every page too, but maybe it could be made to work; we're talking about a feature that is expensive by nature. However, I'm a little worried about the cryptographic properties of this approach. It would often mean that an attacker who has full filesystem access can get multiple encrypted images of the same data, each encrypted with a different nonce. I don't know whether that's a hazard or not, but it feels like the sort of thing that, if I were a cryptographer, I would be pleased to have. Another idea might be - instead of doing nonce++ every time we write the page, do nonce=random(). That's eventually going to repeat a value, but it's extremely likely to take a *super* long time if there are enough bits. A potentially rather large problem, though, is that generating random numbers in large quantities isn't very cheap. Anybody got a better idea? I really like your (Stephen's) idea of including something in the special space that permits integrity checking. One thing that is quite nice about that is we could do it first, as an independent patch, before we did TDE. It would be an independently useful feature, and it would mean that if there are any problems with the code that injects stuff into the special space, we could try to track those down in a non-TDE context. That's really good, because in a TDE context, the pages are going to be garbled and unreadable (we hope, anyway). If we have a problem that we can reproduce with just an integrity-checking token shoved into every page, you can look at the page and try to understand what went wrong. So I really like this direction both from the point of view of improving integrity checking, and also from the point of view of being able to debug problems. Now, one downside of this approach is that if we have the ability to turn integrity-checking tokens on and off, and separately we can turn encryption on and off, then we can't simplify down to two cases as Andres was advocating above; you have to cater to a variety of possible values of how-much-stuff-we-squeezed-into-the-special space. At that point you kind of end up with the approach the draft patches were already taking, which Andres was worried would be expensive. I am not entirely certain, however, that I understand what the proposal is here exactly for integrity verification. I Googled "AES-GCM using/storing tags" but it didn't help me that much, because I don't really know the subject area. A really simple integrity verifier for a page would be to store the db OID, ts OID, relfilenode, and block number in the page, and check them on read, preventing blocks from moving around without us noticing. But I gather that perhaps the idea here is to store something like hash(db_oid||ts_oid||relfilenode||block||block_contents) in each page, basically a beefed-up checksum that is too wide to fake easily. It's probably more complicated than that, though: I admit to having limited knowledge of modern cryptography. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: