Re: Transparent column encryption - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | Re: Transparent column encryption |
Date | |
Msg-id | c3d59b70-cf22-a4bf-4613-91d040b988bd@enterprisedb.com Whole thread Raw |
In response to | Re: Transparent column encryption (Jacob Champion <jchampion@timescale.com>) |
Responses |
Re: Transparent column encryption
Re: Transparent column encryption Re: Transparent column encryption |
List | pgsql-hackers |
On 15.07.22 19:47, Jacob Champion wrote: >> The CEK key >> material is in turn encrypted by an assymmetric key called the column >> master key (CMK). > > I'm not yet understanding why the CMK is asymmetric. I'm not totally sure either. I started to build it that way because other systems were doing it that way, too. But I have been thinking about adding a symmetric alternative for the CMKs as well (probably AESKW). I think there are a couple of reasons why asymmetric keys are possibly useful for CMKs: Some other products make use of secure enclaves to do computations on (otherwise) encrypted values on the server. I don't fully know how that works, but I suspect that asymmetric keys can play a role in that. (I don't have any immediate plans for that in my patch. It seems to be a dying technology at the moment.) Asymmetric keys gives you some more options for how you set up the keys at the beginning. For example, you create the asymmetric key pair on the host where your client program that wants access to the encrypted data will run. You put the private key in an appropriate location for run time. You send the public key to another host. On that other host, you create the CEK, encrypt it with the CMK, and then upload it into the server (CREATE COLUMN ENCRYPTION KEY). Then you can wipe that second host. That way, you can be even more sure that the unencrypted CEK isn't left anywhere. I'm not sure whether this method is very useful in practice, but it's interesting. In any case, as I mentioned above, this particular aspect is up for discussion. Also note that if you use a KMS (cmklookup "run" method), the actual algorithm doesn't even matter (depending on details of the KMS setup), since you just tell the KMS "decrypt this", and the KMS knows by itself what algorithm to use. Maybe there should be a way to specify "unknown" in the ckdcmkalg field. >> +#define PG_CEK_AEAD_AES_128_CBC_HMAC_SHA_256 130 >> +#define PG_CEK_AEAD_AES_192_CBC_HMAC_SHA_384 131 >> +#define PG_CEK_AEAD_AES_256_CBC_HMAC_SHA_384 132 >> +#define PG_CEK_AEAD_AES_256_CBC_HMAC_SHA_512 133 > > It looks like these ciphersuites were abandoned by the IETF. Are there > existing implementations of them that have been audited/analyzed? Are > they safe (and do we know that the claims made in the draft are > correct)? How do they compare to other constructions like AES-GCM-SIV > and XChacha20-Poly1305? The short answer is, these same algorithms are used in equivalent products (see MS SQL Server, MongoDB). They even reference the same exact draft document. Besides that, here is my analysis for why these are good choices: You can't use any of the counter modes, because since the encryption happens on the client, there is no way to coordinate to avoid nonce reuse. So among mainstream modes, you are basically left with AES-CBC with a random IV. In that case, even if you happen to reuse an IV, the possible damage is very contained. And then, if you want to use AEAD, you combine that with some MAC, and HMAC is just as good as any for that. The referenced draft document doesn't really contain any additional cryptographic insights, it's just a guide on a particular way to put these two together. So altogether I think this is a pretty solid choice. >> +-- \gencr >> +-- (This just tests the parameter passing; there is no encryption here.) >> +CREATE TABLE test_gencr (a int, b text); >> +INSERT INTO test_gencr VALUES (1, 'one') \gencr >> +SELECT * FROM test_gencr WHERE a = 1 \gencr >> + a | b >> +---+----- >> + 1 | one >> +(1 row) >> + >> +INSERT INTO test_gencr VALUES ($1, $2) \gencr 2 'two' >> +SELECT * FROM test_gencr WHERE a IN ($1, $2) \gencr 2 3 >> + a | b >> +---+----- >> + 2 | two >> +(1 row) > I'd expect \gencr to error out without sending plaintext. I know that > under the hood this is just setting up a prepared statement, but if I'm > using \gencr, presumably I really do want to be encrypting my data. > Would it be a problem to always set force-column-encryption for the > parameters we're given here? Any unencrypted columns could be provided > directly. Yeah, this needs a bit of refinement. You don't want something named "encr" but it only encrypts some of the time. We could possibly do what you suggest and make it set the force-encryption flag, or maybe rename it or add another command that just uses prepared statements and doesn't promise anything about encryption from its name. This also ties in with how pg_dump will eventually work. I think by default pg_dump will just dump things encrypted and set it up so that COPY writes it back encrypted. But there should probably be a mode that dumps out plaintext and then uses one of these commands to load the plaintext back in. What these psql commands need to do also depends on what pg_dump needs them to do. >> + <para> >> + Null values are not encrypted by transparent column encryption; null values >> + sent by the client are visible as null values in the database. If the fact >> + that a value is null needs to be hidden from the server, this information >> + needs to be encoded into a nonnull value in the client somehow. >> + </para> > > This is a major gap, IMO. Especially with the switch to authenticated > ciphers, because it means you can't sign your NULL values. And having > each client or user that's out there solve this with a magic in-band > value seems like a recipe for pain. > > Since we're requiring "canonical" use of text format, and the docs say > there are no embedded or trailing nulls allowed in text values, could we > steal the use of a single zero byte to mean NULL? One additional > complication would be that the client would have to double-check that > we're not writing a NULL into a NOT NULL column, and complain if it > reads one during decryption. Another complication would be that the > client would need to complain if it got a plaintext NULL. You're already alluding to some of the complications. Also consider that null values could arise from, say, outer joins. So you could be in a situation where encrypted and unencrypted null values coexist. And of course the server doesn't know about the encrypted null values. So how do you maintain semantics, like for aggregate functions, primary keys, anything that treats null values specially? How do clients deal with a mix of encrypted and unencrypted null values, how do they know which one is real. What if the client needs to send a null value back as a parameter? All of this would create enormous complications, if they can be solved at all. I think a way to look at this is that this column encryption feature isn't suitable for disguising the existence or absence of data, it can only disguise the particular data that you know exists. >> + <para> >> + The <quote>associated data</quote> in these algorithms consists of 4 >> + bytes: The ASCII letters <literal>P</literal> and <literal>G</literal> >> + (byte values 80 and 71), followed by the algorithm ID as a 16-bit unsigned >> + integer in network byte order. >> + </para> > > Is this AD intended as a placeholder for the future, or does it serve a > particular purpose? It has been recommended that you include the identity of the encryption algorithm in the AD. This protects the client from having to decrypt stuff that wasn't meant to be decrypted (in that way).
pgsql-hackers by date: