Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:
>On 7/13/19 9:38 AM, Joe Conway wrote:
>> On 7/11/19 9:05 PM, Bruce Momjian wrote:
>>> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:
>>>> On 7/11/19 6:37 PM, Bruce Momjian wrote:
>>>> > Our first implementation will encrypt the entire cluster.  We can later
>>>> > consider encryption per table or tablespace.  It is unclear if
>>>> > encrypting different parts of the system with different keys is useful
>>>> > or feasible.  (This is separate from key rotation.)
>>>>
>>>> I still object strongly to using a single key for the entire database. I
>>>> think we can use a single key for WAL, but we need some way to split the
>>>> heap so that multiple keys are used. If not by tablespace, then some
>>>> other method.
>>>
>>> What do you base this on?
>
>Ok, so here we go. See links below. I skimmed through the entire thread
>and FWIW it was exhausting.
>
>To some extent this degenerated into a general search for relevant
>information:
>
>---
>[1] and [2] show that at least some file system encryption uses a
>different key per file.
>---
>[2] also shows that file system encryption uses a KDF (key derivation
>function) which we may want to use ourselves. The analogy would be
>per-table derived key instead of per file derived key. Note that KDF is
>a safe way to derive a key and it is not the same as a "related key"
>which was mentioned on another email as an attack vector.
>---
>[2] also says provides additional support for AES 256. It also mentions
>CBC versus XTS -- I came across this elsewhere and it bears discussion:
>
>"Currently, the following pairs of encryption modes are supported:
>
>    AES-256-XTS for contents and AES-256-CTS-CBC for filenames
>    AES-128-CBC for contents and AES-128-CTS-CBC for filenames
>    Adiantum for both contents and filenames
>
>If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.
>
>AES-128-CBC was added only for low-powered embedded devices with crypto
>accelerators such as CAAM or CESA that do not support XTS."
>---
>[2] also states this, which again makes me think in terms of table being
>the moral equivalent to a file:
>
>"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
>at the block device level. This allows it to encrypt different files
>with different keys and to have unencrypted files on the same
>filesystem. This is useful for multi-user systems where each user’s
>data-at-rest needs to be cryptographically isolated from the others.
>However, except for filenames, fscrypt does not encrypt filesystem
>metadata."
>---
>[3] suggests 68 GB per key and unique IV in GCM mode.
>---
>[4] specifies 68 GB per key and unique IV in CTR mode -- this applies
>directly to our proposal to use CTR for WAL.
>---
>[5] has this to say which seems independent of mode:
>
>"When encrypting data with a symmetric block cipher, which uses blocks
>of n bits, some security concerns begin to appear when the amount of
>data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
>bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
>blocks). This means a limit of more than 250 millions of terabytes,
>which is sufficiently large not to be a problem. That's precisely why
>AES was defined with 128-bit blocks, instead of the more common (at that
>time) 64-bit blocks: so that data size is practically unlimited."
>

FWIW I was a bit confused at first, because the copy paste mangled the
formulas a bit - it should have been 2^(n/2) and n*2^(n/2).

>But goes on to say:
>"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
>reach that number of bits the probability of a collision will grow
>quickly and you will be way over 50% probability of a collision by the
>time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
>collision negligible I recommend encrypting no more than n*2^(n/4) bits
>with the same key. In the case of AES that works out to 64GB"
>
>It is hard to say if that recommendation is per key or per key+IV.

Hmm, yeah. The question is what collisions they have in mind? Presumably
it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
IV, so per key+IV.

>---
>[6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to
>imply a single key is used although at one point it says "transparent
>data encryption master key, also known as the transparent data
>encryption protector". The term "master key" indicates that they likely
>use derived keys under the covers.
>---
>[7] is generally useful read about how many of the things we have been
>discussing are done in SQL Server
>---
>[8] was referenced by Sehrope. In addition to support for AES 256 for
>long term use, table 5.1 is interesting. It lists CBC mode as "legacy"
>but not "future".
>---
>[9] IETF RFC for KDF
>---
>[10] IETF RFC for Key wrapping -- this is probably how we should wrap
>the master key with the Key Encryption Key (KEK) -- i.e. the outer key
>provided by the user or command on postmaster start
>---
>
>Based on all of that I cannot find a requirement that we use more than
>one key per database.
>
>But I did find that files in an encrypted file system are encrypted with
>derived keys from a master key, and I view this as analogous to what we
>are doing.
>

My understanding always was that we'd do something like that, i.e. we'd
have a master key (or perhaps multiple of them, for various users), but
the data would be encrypted with secondary (generated) keys, and those
secondary keys would be encrypted by the master key. At least that's
what was proposed at the beginning of this thread by Insung Moon.

But AFAICS the 2-tier key scheme is primarily motivated by operational
reasons, i.e. effort to rotate the master key etc. So I would not expect
to find recommendations to use multiple keys in sources primarily
dealing with cryptography.

One extra thing we should consider is authenticated encryption. We can't
just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
as that does not provide integrity protection (i.e. can't detect when
the ciphertext was corrupted due to disk failure or intentionally). And
we can't quite rely on checksums, because that checksums the plaintext
and is stored encrypted.

Which seems pretty annoying, because then the checksums won't verify
data as sent to the storage system, and verify checksums would require
access to all keys (how do you do that in offline mode?).

But the main issue with checksum-then-encrypt is it's essentially
"MAC-then-Encrypt" and that does not provide Authenticated Encryption
security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
in which case we'll need to store the MAC somewhere (probably in the
same place as the nonce/IV/key/... for each page).

I've also stumbled upon [2], which is a nice doctoral thesis about disk
encryption - in particular chapter 4 is a nice overview of the threat
model and use cases. That guy also had a nice talk at FOSDEM 2018 about
data dm-integrity etc. [3]

[1] https://www.cosic.esat.kuleuven.be/school-iot/slides/AuthenticatedEncryptionII.pdf

[2] https://is.muni.cz/th/vesfr/final.pdf

[3] https://ftp.fau.de/fosdem/2018/Janson/cryptsetup.mp4


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [sqlsmith] Crash in mcv_get_match_bitmap
Next
From: "Karl O. Pinc"
Date:
Subject: Re: Patch to document base64 encoding