Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Sehrope Sarkuni
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id CAH7T-are4DWvunDWknRcUGGLgv8H2FgPmQ8TTajCoztEadK+iA@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
List pgsql-hackers
Hi,

Some more thoughts on CBC vs CTR modes. There are a number of
advantages to using CTR mode for page encryption.

CTR encryption modes can be fully parallelized, whereas CBC can only
parallelized for decryption. While both can use AES specific hardware
such as AES-NI, CTR modes can go a step further and use vectorized
instructions.

On an i7-8559U (with AES-NI) I get a 4x speed improvement for
CTR-based modes vs CBC when run on 8K of data:

# openssl speed -evp ${cipher}
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
aes-128-cbc    1024361.51k  1521249.60k  1562033.41k  1571663.87k
1574537.90k  1575512.75k
aes-128-ctr     696866.85k  2214441.86k  4364903.85k  5896221.35k
6559735.81k  6619594.75k
aes-128-gcm     642758.92k  1638619.09k  3212068.27k  5085193.22k
6366035.97k  6474006.53k
aes-256-cbc     940906.25k  1114628.44k  1131255.13k  1138385.92k
1140258.13k  1143592.28k
aes-256-ctr     582161.82k  1896409.32k  3216926.12k  4249708.20k
4680299.86k  4706375.00k
aes-256-gcm     553513.89k  1532556.16k  2705510.57k  3931744.94k
4615812.44k  4673093.63k

For relation data where the encryption is going to be per page,
there's flexibility in how the CTR nonce (IV + counter) is generated.
With an 8K page, the counter need only go up to 512 for each page
(8192-bytes per page / 16-bytes per AES-block). That would require
9-bits for the counter. Rounding that up to 16-bits allows for wider
pages and it still uses only two bytes of the counter while ensuring
that it'd be unique per AES-block. The remaining 14-bytes would be
populated with some other data that is guaranteed unique per
page-write to allow encryption via the same per-relation-file derived
key. From what I gather, the LSN is a candidate though it'd have to be
stored in plaintext for decryption.

What's important is that writing the two pages (either different
locations or the same page back again) never reuses the same nonce
with the same key. Using the same nonce with a different key is fine.

With any of these schemes the same inputs will generate the same
outputs. With CTR mode for WAL this would be an issue if the same key
and deterministic nonce (ex: LSN + offset) is reused in multiple
places. That does not have to be the same cluster either. For example
if two replicas are promoted from the same backup with the same master
key, they would generate the same WAL CTR stream, reusing the
key/nonce pair. Ditto for starting off with a master key and deriving
per-relation keys in a cloned installation off some deterministic
attribute such as oid.

This can be avoided by deriving new keys per file (not just per
relation) from a random salt. It'd be stored out of band and combined
with the master key to derive the specific key used for that CTR
stream. If there's a desire for supporting multiple ciphers or key
sizes, that could be stored alongside the salt. Perhaps use the same
location or lack of it to indicate "not encrypted" as well.

Per-file salts and derived keys would facilitate re-keying a table
piecemeal, file by file, by generating a new salt/derived-key,
encrypting a copy of the decrypted file, and doing an atomic rename.
The files contents would change but its length and any references to
pages or byte offsets would stay valid. (I think this would work for
CBC modes too as there's nothing CTR specific about it.)

I'm not sure of is how to handle randomizing the relation file IV in a
cloned database. Until the key for a relation file or segment is
rotated it'd have the same deterministic IV generated as its source as
the LSN would continue from the same point. One idea is with 128-bits
for the IV, one could have 64-bits for LSN, 16-bits for AES-block
counter, and the remaining 48-bits be randomized; though you'd need to
store those 48-bits somewhere per-page (basically it's a salt per
page). That'd give some protection from the clone's new data be
encrypted with the same stream as the parent's. Another option would
be to track ranges of LSNs and have a centralized list of 48-bit
randomized salts. That would remove the need for additional salt per
page though you'd have to do a lookup on that shared list to figure
out which to use.

CTR mode is definitely more complicated than a pure random-IV + CBC
but with any deterministic generation of IVs for CBC mode you're going
to have some of these same problems as well.

Regarding CRCs, CTR mode has the advantage of not destroying the rest
of the stream to replace the CRC bytes. With CBC mode any change would
cascade and corrupt the rest of data the down stream from that block.
With CTR mode you can overwrite the CRC's location with the CRC or a
truncated MAC of the encrypted data as each byte is encrypted
separately. At decryption time you simply ignore the decrypted output
of those bytes and zero them out again. A CRC of encrypted data (but
not a partial MAC) could be checked offline without access to the key.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: Tom Lane
Date:
Subject: Re: POC: converting Lists into arrays