Re: Transparent column encryption - Mailing list pgsql-hackers

From Jacob Champion
Subject Re: Transparent column encryption
Date
Msg-id CAAWbhmj_zvS77fr7Ar1X3iM7uXuu9Fa_-d13qM4W9mwtWmyReA@mail.gmail.com
Whole thread Raw
In response to Re: Transparent column encryption  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
List pgsql-hackers
On Mon, Jul 18, 2022 at 3:53 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> Some other products make use of secure enclaves to do computations on
> (otherwise) encrypted values on the server.  I don't fully know how that
> works, but I suspect that asymmetric keys can play a role in that.  (I
> don't have any immediate plans for that in my patch.  It seems to be a
> dying technology at the moment.)
>
> Asymmetric keys gives you some more options for how you set up the keys
> at the beginning.  For example, you create the asymmetric key pair on
> the host where your client program that wants access to the encrypted
> data will run.  You put the private key in an appropriate location for
> run time.  You send the public key to another host.  On that other host,
> you create the CEK, encrypt it with the CMK, and then upload it into the
> server (CREATE COLUMN ENCRYPTION KEY).  Then you can wipe that second
> host.  That way, you can be even more sure that the unencrypted CEK
> isn't left anywhere.  I'm not sure whether this method is very useful in
> practice, but it's interesting.

As long as it's clear to people trying this that the "public" key
cannot actually be made public, I suppose. That needs to be documented
IMO. I like your idea of providing a symmetric option as well.

> In any case, as I mentioned above, this particular aspect is up for
> discussion.
>
> Also note that if you use a KMS (cmklookup "run" method), the actual
> algorithm doesn't even matter (depending on details of the KMS setup),
> since you just tell the KMS "decrypt this", and the KMS knows by itself
> what algorithm to use.  Maybe there should be a way to specify "unknown"
> in the ckdcmkalg field.

+1, an officially client-defined method would probably be useful.

> The short answer is, these same algorithms are used in equivalent
> products (see MS SQL Server, MongoDB).  They even reference the same
> exact draft document.
>
> Besides that, here is my analysis for why these are good choices: You
> can't use any of the counter modes, because since the encryption happens
> on the client, there is no way to coordinate to avoid nonce reuse.  So
> among mainstream modes, you are basically left with AES-CBC with a
> random IV.  In that case, even if you happen to reuse an IV, the
> possible damage is very contained.)

I think both AES-GCM-SIV and XChaCha20-Poly1305 are designed to handle
the nonce problem as well. In any case, if I were deploying this, I'd
want to know the characteristics/limits of our chosen suites (e.g. how
much data can be encrypted per key) so that I could plan rotations
correctly. Something like the table in [1]?

> > Since we're requiring "canonical" use of text format, and the docs say
> > there are no embedded or trailing nulls allowed in text values, could we
> > steal the use of a single zero byte to mean NULL? One additional
> > complication would be that the client would have to double-check that
> > we're not writing a NULL into a NOT NULL column, and complain if it
> > reads one during decryption. Another complication would be that the
> > client would need to complain if it got a plaintext NULL.
>
> You're already alluding to some of the complications.  Also consider
> that null values could arise from, say, outer joins.  So you could be in
> a situation where encrypted and unencrypted null values coexist.

(I realize I'm about to wade into the pool of what NULL means in SQL,
the subject of which I've stayed mostly, gleefully, ignorant.)

To be honest that sounds pretty useful. Any unencrypted null must have
come from the server computation; it's a clear claim by the server
that no such rows exist. (If the encrypted column is itself NOT NULL
then there's no ambiguity to begin with, I think.) That wouldn't be
transparent behavior anymore, so it may (understandably) be a non-goal
for the patch, but it really does sound useful.

And it might be a little extreme, but if I as a user decided that I
wanted in-band encrypted null, it wouldn't be particularly surprising
to me if such a column couldn't be included in an outer join. Just
like I can't join on a randomized encrypted column, or add two
encrypted NUMERICs to each other. In fact I might even want the server
to enforce NOT NULL transparently on the underlying pg_encrypted_*
column, to make sure that I didn't accidentally push an unencrypted
NULL by mistake.

> And of
> course the server doesn't know about the encrypted null values.  So how
> do you maintain semantics, like for aggregate functions, primary keys,
> anything that treats null values specially?

Could you elaborate? Any special cases seem like they'd be important
to document regardless of whether or not we support in-band null
encryption. For example, do you plan to support encrypted primary
keys, null or not? That seems like it'd be particularly difficult
during CEK rotation.

> How do clients deal with a
> mix of encrypted and unencrypted null values, how do they know which one
> is real.

That one seems straightforward -- a bare null in an encrypted column
is an assertion by the server. An encrypted null had to have come from
the client side originally.

> What if the client needs to send a null value back as a
> parameter?

Couldn't the client just encrypt it, same as any other column? Or am I
missing what you mean by "parameter" here?

> All of this would create enormous complications, if they can
> be solved at all.

That could be. But I'm wondering if the complications exist
regardless, and the null example is just making them more obvious.

> It has been recommended that you include the identity of the encryption
> algorithm in the AD.  This protects the client from having to decrypt
> stuff that wasn't meant to be decrypted (in that way).

Do you have a link? I'd like to read up on that -- I naively assumed
that the suite wouldn't be able to decrypt another AEAD cipher without
complaining.

--Jacob

[1] https://doc.libsodium.org/secret-key_cryptography/aead



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Fix gcc warning in sync.c (usr/src/backend/storage/sync/sync.c)
Next
From: Jacob Champion
Date:
Subject: Re: Transparent column encryption