Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id 20190715014759.nxmi5bdqsnewl6jf@development
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Joe Conway <mail@joeconway.com>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Sun, Jul 14, 2019 at 12:13:45PM -0400, Joe Conway wrote:
>On 7/13/19 5:58 PM, Tomas Vondra wrote:
>> On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:
>>>[2] also says provides additional support for AES 256. It also mentions
>>>CBC versus XTS -- I came across this elsewhere and it bears discussion:
>>>
>>>"Currently, the following pairs of encryption modes are supported:
>>>
>>>    AES-256-XTS for contents and AES-256-CTS-CBC for filenames
>>>    AES-128-CBC for contents and AES-128-CTS-CBC for filenames
>>>    Adiantum for both contents and filenames
>>>
>>>If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
>>>
>>>AES-128-CBC was added only for low-powered embedded devices with crypto
>>>accelerators such as CAAM or CESA that do not support XTS."
>>>---
>>>[2] also states this, which again makes me think in terms of table being
>>>the moral equivalent to a file:
>>>
>>>"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
>>>at the block device level. This allows it to encrypt different files
>>>with different keys and to have unencrypted files on the same
>>>filesystem. This is useful for multi-user systems where each user’s
>>>data-at-rest needs to be cryptographically isolated from the others.
>>>However, except for filenames, fscrypt does not encrypt filesystem
>>>metadata."
>
><snip>
>
>>>[5] has this to say which seems independent of mode:
>>>
>>>"When encrypting data with a symmetric block cipher, which uses blocks
>>>of n bits, some security concerns begin to appear when the amount of
>>>data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
>>>bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
>>>blocks). This means a limit of more than 250 millions of terabytes,
>>>which is sufficiently large not to be a problem. That's precisely why
>>>AES was defined with 128-bit blocks, instead of the more common (at that
>>>time) 64-bit blocks: so that data size is practically unlimited."
>>>
>>
>> FWIW I was a bit confused at first, because the copy paste mangled the
>> formulas a bit - it should have been 2^(n/2) and n*2^(n/2).
>
>Yeah, sorry about that.
>
>>>But goes on to say:
>>>"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
>>>reach that number of bits the probability of a collision will grow
>>>quickly and you will be way over 50% probability of a collision by the
>>>time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
>>>collision negligible I recommend encrypting no more than n*2^(n/4) bits
>>>with the same key. In the case of AES that works out to 64GB"
>>>
>>>It is hard to say if that recommendation is per key or per key+IV.
>>
>> Hmm, yeah. The question is what collisions they have in mind? Presumably
>> it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
>> IV, so per key+IV.
>
>Seems likely.
>
>>>But I did find that files in an encrypted file system are encrypted with
>>>derived keys from a master key, and I view this as analogous to what we
>>>are doing.
>>>
>>
>> My understanding always was that we'd do something like that, i.e. we'd
>> have a master key (or perhaps multiple of them, for various users), but
>> the data would be encrypted with secondary (generated) keys, and those
>> secondary keys would be encrypted by the master key. At least that's
>> what was proposed at the beginning of this thread by Insung Moon.
>
>In my email I linked the wrong page for [2]. The correct one is here:
>[2] https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html
>
>Following that, I think we could end up with three tiers:
>
>1. A master key encryption key (KEK): this is the ley supplied by the
>   database admin using something akin to ssl_passphrase_command
>
>2. A master data encryption key (MDEK): this is a generated key using a
>   cryptographically secure pseudo-random number generator. It is
>   encrypted using the KEK, probably with Key Wrap (KW):
>   or maybe better Key Wrap with Padding (KWP):
>
>3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
>    table specific keys.
>
>3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
>    generate new keys when needed for WAL (based on the other info we
>    need to change WAL keys every 68 GB unless I read that wrong).
>
>I believe that would allows us to have multiple keys but they are
>derived securely from the one DEK using available info similar to the
>way we intend to use LSN to derive the IVs -- perhaps table.oid for
>tables and something else for WAL.
>
>We also need to figure out how/when to generate new WDEK. Maybe every
>checkpoint, also meaning we would have to force a checkpoint every 68GB?
>

I think that very much depends on what exactly the 68GB refers to - key
or key+IV? If key+IV, then I suppose we can use LSN as IV and we would
not need to change checkpoints. But it's not clear to me why we would
need to force checkpoints at all? Surely we can just write a WAL message
about switching to the new key, or something like that?

>[HKDF]: https://tools.ietf.org/html/rfc5869
>[KW]: https://tools.ietf.org/html/rfc3394
>[KWP]: https://tools.ietf.org/html/rfc5649
>
>
>> But AFAICS the 2-tier key scheme is primarily motivated by operational
>> reasons, i.e. effort to rotate the master key etc. So I would not expect
>> to find recommendations to use multiple keys in sources primarily
>> dealing with cryptography.
>
>It does in [2]
>
>
>> One extra thing we should consider is authenticated encryption. We can't
>> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
>> as that does not provide integrity protection (i.e. can't detect when
>> the ciphertext was corrupted due to disk failure or intentionally). And
>> we can't quite rely on checksums, because that checksums the plaintext
>> and is stored encrypted.
>
>I agree that authenticated encryption would be a good goal. I'm not sure
>we need to require it for the first version, although it would mean
>another option for the encryption type. That may be another good reason
>to allow both AES 128 and AES 256 CTR/CBC in the first version, as it
>will hopefully ensure that when we add different modes later it will be
>less painful.
>
>We could check the CRC prior to encryption and throw an ERROR if it is
>not correct. After decryption we can check it again -- if it no longer
>matches we would know there way a corruption or change of the
>ciphertext, no?
>
>Hmm, I guess the entire page of ciphertext could be faked including CRC,
>so this would only really cover corruption, not an intentional change if
>it were done properly.
>

I don't think any of the schemes discussed here provides protection
against this sort of replay attacks (i.e. replacing a page with an older
copy of the page). That would probably require having some global
checksum or something like that.

>> Which seems pretty annoying, because then the checksums won't verify
>> data as sent to the storage system, and verify checksums would require
>> access to all keys (how do you do that in offline mode?).
>
>Given the scheme above I don't see why that would be an issue. The keys
>are all accessible via the MDEK, which is in turn available via the KEK.
>

I just don't know how the offline tools will access the KMS to get the
keys. But maybe that's not an issue. But even then I think it's kinda
against the idea of checksums that they would not checksum what was sent
to the storage system.

>> But the main issue with checksum-then-encrypt is it's essentially
>> "MAC-then-Encrypt" and that does not provide Authenticated Encryption
>> security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
>> in which case we'll need to store the MAC somewhere (probably in the
>> same place as the nonce/IV/key/... for each page).
>
>
>Yeah, that's why I think maybe this is a v2 feature.
>

Maybe - as long as we design it with enough flexibility to enable it
later, that might work. That depends on where we store the metadata,
etc.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Using unique btree indexes for pathkeys with one extra column
Next
From: David Rowley
Date:
Subject: Re: Change ereport level for QueuePartitionConstraintValidation