Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Joe Conway
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id c878de71-a0c3-96b2-3e11-9ac2c35357c3@joeconway.com
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On 7/13/19 5:58 PM, Tomas Vondra wrote:
> On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:
>>[2] also says provides additional support for AES 256. It also mentions
>>CBC versus XTS -- I came across this elsewhere and it bears discussion:
>>
>>"Currently, the following pairs of encryption modes are supported:
>>
>>    AES-256-XTS for contents and AES-256-CTS-CBC for filenames
>>    AES-128-CBC for contents and AES-128-CTS-CBC for filenames
>>    Adiantum for both contents and filenames
>>
>>If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
>>
>>AES-128-CBC was added only for low-powered embedded devices with crypto
>>accelerators such as CAAM or CESA that do not support XTS."
>>---
>>[2] also states this, which again makes me think in terms of table being
>>the moral equivalent to a file:
>>
>>"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
>>at the block device level. This allows it to encrypt different files
>>with different keys and to have unencrypted files on the same
>>filesystem. This is useful for multi-user systems where each user’s
>>data-at-rest needs to be cryptographically isolated from the others.
>>However, except for filenames, fscrypt does not encrypt filesystem
>>metadata."

<snip>

>>[5] has this to say which seems independent of mode:
>>
>>"When encrypting data with a symmetric block cipher, which uses blocks
>>of n bits, some security concerns begin to appear when the amount of
>>data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
>>bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
>>blocks). This means a limit of more than 250 millions of terabytes,
>>which is sufficiently large not to be a problem. That's precisely why
>>AES was defined with 128-bit blocks, instead of the more common (at that
>>time) 64-bit blocks: so that data size is practically unlimited."
>>
>
> FWIW I was a bit confused at first, because the copy paste mangled the
> formulas a bit - it should have been 2^(n/2) and n*2^(n/2).

Yeah, sorry about that.

>>But goes on to say:
>>"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
>>reach that number of bits the probability of a collision will grow
>>quickly and you will be way over 50% probability of a collision by the
>>time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
>>collision negligible I recommend encrypting no more than n*2^(n/4) bits
>>with the same key. In the case of AES that works out to 64GB"
>>
>>It is hard to say if that recommendation is per key or per key+IV.
>
> Hmm, yeah. The question is what collisions they have in mind? Presumably
> it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
> IV, so per key+IV.

Seems likely.

>>But I did find that files in an encrypted file system are encrypted with
>>derived keys from a master key, and I view this as analogous to what we
>>are doing.
>>
>
> My understanding always was that we'd do something like that, i.e. we'd
> have a master key (or perhaps multiple of them, for various users), but
> the data would be encrypted with secondary (generated) keys, and those
> secondary keys would be encrypted by the master key. At least that's
> what was proposed at the beginning of this thread by Insung Moon.

In my email I linked the wrong page for [2]. The correct one is here:
[2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html

Following that, I think we could end up with three tiers:

1. A master key encryption key (KEK): this is the ley supplied by the
   database admin using something akin to ssl_passphrase_command

2. A master data encryption key (MDEK): this is a generated key using a
   cryptographically secure pseudo-random number generator. It is
   encrypted using the KEK, probably with Key Wrap (KW):
   or maybe better Key Wrap with Padding (KWP):

3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
    table specific keys.

3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
    generate new keys when needed for WAL (based on the other info we
    need to change WAL keys every 68 GB unless I read that wrong).

I believe that would allows us to have multiple keys but they are
derived securely from the one DEK using available info similar to the
way we intend to use LSN to derive the IVs -- perhaps table.oid for
tables and something else for WAL.

We also need to figure out how/when to generate new WDEK. Maybe every
checkpoint, also meaning we would have to force a checkpoint every 68GB?

[HKDF]: https://tools.ietf.org/html/rfc5869
[KW]: https://tools.ietf.org/html/rfc3394
[KWP]: https://tools.ietf.org/html/rfc5649


> But AFAICS the 2-tier key scheme is primarily motivated by operational
> reasons, i.e. effort to rotate the master key etc. So I would not expect
> to find recommendations to use multiple keys in sources primarily
> dealing with cryptography.

It does in [2]


> One extra thing we should consider is authenticated encryption. We can't
> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
> as that does not provide integrity protection (i.e. can't detect when
> the ciphertext was corrupted due to disk failure or intentionally). And
> we can't quite rely on checksums, because that checksums the plaintext
> and is stored encrypted.

I agree that authenticated encryption would be a good goal. I'm not sure
we need to require it for the first version, although it would mean
another option for the encryption type. That may be another good reason
to allow both AES 128 and AES 256 CTR/CBC in the first version, as it
will hopefully ensure that when we add different modes later it will be
less painful.

We could check the CRC prior to encryption and throw an ERROR if it is
not correct. After decryption we can check it again -- if it no longer
matches we would know there way a corruption or change of the
ciphertext, no?

Hmm, I guess the entire page of ciphertext could be faked including CRC,
so this would only really cover corruption, not an intentional change if
it were done properly.

> Which seems pretty annoying, because then the checksums won't verify
> data as sent to the storage system, and verify checksums would require
> access to all keys (how do you do that in offline mode?).

Given the scheme above I don't see why that would be an issue. The keys
are all accessible via the MDEK, which is in turn available via the KEK.

> But the main issue with checksum-then-encrypt is it's essentially
> "MAC-then-Encrypt" and that does not provide Authenticated Encryption
> security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
> in which case we'll need to store the MAC somewhere (probably in the
> same place as the nonce/IV/key/... for each page).


Yeah, that's why I think maybe this is a v2 feature.


> I've also stumbled upon [2], which is a nice doctoral thesis about disk
> encryption - in particular chapter 4 is a nice overview of the threat
> model and use cases. That guy also had a nice talk at FOSDEM 2018 about
> data dm-integrity etc. [3]
>
> [1] https://www.cosic.esat.kuleuven.be/school-iot/slides/AuthenticatedEncryptionII.pdf
> [2] https://is.muni.cz/th/vesfr/final.pdf
> [3] https://ftp.fau.de/fosdem/2018/Janson/cryptsetup.mp4

Awesome links -- thanks!

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] Implement uuid_version()
Next
From: Alvaro Herrera
Date:
Subject: Re: Conflict handling for COPY FROM