Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id CAD21AoD3_kEyE2V72OVnBrG+47tD1JpC1dp8q4R-XzvWm3UiBA@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  ("Jonathan S. Katz" <jkatz@postgresql.org>)
List pgsql-hackers
On Fri, Jul 26, 2019 at 2:18 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Jul 18, 2019 at 12:04:25PM +0900, Masahiko Sawada wrote:
> > I've re-considered the design of TDE feature based on the discussion
> > so far. The one of the main open question is the granular of
> > encryption objects: cluster encryption or more-granular-than-cluster
> > encryption. The followings describe about the new TDE design when we
> > choose table-level encryption or something-new-group-level encryption.
> >
> > General
> > ========
> > We will use AES and support both AES-128 and AES-256. User can specify
> > the new initdb option something like --aes-128 or --aes-256 to enable
> > encryption and must specify --encryption-key-passphrase-command along
> > with. (I guess we also require openssl library.) If these options are
> > specified, we write the key length to the control file and derive the
> > KEK and generate MDEK during initdb. wal_log_hints will be enabled
> > automatically in encryption mode, like we do for checksum mode,
>
> Agreed.  pg_control will store the none/AES128/AES256 indicator.
>
> > Key Management
> > ==============
> > We will use 3-tier key architecture as Joe proposed.
> >
> >   1. A master key encryption key (KEK): this is the ley supplied by the
> >      database admin using something akin to ssl_passphrase_command
> >
> >   2. A master data encryption key (MDEK): this is a generated key using a
> >      cryptographically secure pseudo-random number generator. It is
> >      encrypted using the KEK, probably with Key Wrap (KW):
> >      or maybe better Key Wrap with Padding (KWP):
> >
> >   3a. Per table data encryption keys (TDEK): use MDEK and HKDF to generate
> >       table specific keys.
>
> What is the value of a per-table encryption key?  How is HKDF derived?

Per-table encryption key is derived from MDEK with salt and its OID as
info. I think we can store salts for each encryption keys into the
separate file so that off-line tool also can read it.

> Are we still unclear if the 68GB limit is per encryption key or per
> encryption key/IV combination?

I think that 68GB refers to key+IV but I'll research that.

>
> >   3b. WAL data encryption keys (WDEK):  Similarly use MDEK and a HKDF to
> >       generate new keys when needed for WAL.
> >
> > We store MDEK to the plain file (say global/pgkey) after encrypted
> > with the KEK. I might want to store the hash of passphrase of the KEK
> > in order to verify the correctness of the given passphrase. However we
> > don't need to store TDEK and WDEK as we can derive them as needed. The
> > key file can be read by both backend processes and front-end tools.
>
> Yes, we need to verify the pass phrase.
>
> > When postmaster startup, it reads the key file and decrypts MDEK and
> > derive WDEK using key id for WDEK. WDEK is loaded to the key hash map
> > (keyid -> key) on the shared memory. Also we derive TDEK as needed
> > when reading tables or indexes and add it to the key hash map as well
> > if not exists.
> >
> > Buffer Encryption
> > ==============
> > We will use AES-CBC for buffer encryption. We will add key id (4byte)
>
> I think we might want to use CTR for this, and will post after this.
>
> > to after the pd_lsn(8byte) in PageHeaderData and we will not encrypt
> > first 16 byte of each pages so the LSN and key id can be used. We can
> > store an invalid key id to tell us that the table is not encrypted.
> > There two benefits of storing key id to the page header: offline tools
> > can get key id (and know the table is encrypted or not) and it's
> > helpful for online rekey in the future.
>
> I don't remember anyone suggesting different keys for different tables.
> How would this even be managed by the user?

I think it's still unclear whether we implement one key for whole
database cluster or different keys for different table as the first
version. I'm evaluating the performance overhead of the latter that
you concerned and will share it.

I prefer tablespace-level or something-new-group-level than
table-level but if we choose the latter we can create a new group of
tables that are encrypted with the same key. That is user create a
group and then associate tables to that group. Tablespace-level is
implemented in the patch I submitted before. Or it's just idea but
another idea could be to allow users to create encryption key object
first and then specify which tables are encrypted with which
encryption key in DDL. For example, user creates an encryption keys
with name by SQL function and creates an encrypted table by CREATE
TABLE ... WITH (encryption_key = 'mykey');.

>
> > I've considered to store IV and key id to a new fork but I felt that
> > it is complex because we will always need to have the fork on the
> > shared buffer when any pages of its main fork is written to the disk.
> > If almost buffers of the shared buffers are dirtied and theirs new
> > forks are not  loaded to the shared buffer, we might need to load the
> > new fork and write the page to the disk and then evict some pages,
> > over and over.
> >
> > We will use (page lsn, page number) to create a nonce. IVs are created
> > by encrypting the nonce with its TDEK.
>
> Agreed.
>
> > WAL Encryption
> > =============
> > We will use AES-CTR for WAL encryption and encrypt each WAL pages with WDEK.
> >
> > We will use WAL segment number to create a nonce. Similar to buffer
> > encryption, IVs are created using by the nonce and WDEK.
>
> Yes.  If there is concern about collision of table/index and WAL IVs, we
> can add a constant to the two uses, as Joe Conway mentioned.
>
> > If we want to support enabling or disabling encryption after initdb we
> > might want to have key id in the WAL page header.
> >
> > Front-end Tool Support
> > ==================
> > We will add --encryption-key-passphrase-command option to the
> > front-end tools that read database files or WAL segment files directly.
> > They can get KEK via --encryption-key-passphrase-command and get MDEK
> > by reading the key file. Also they can know the key length by checking
> > the control file. Since they can derive TDEK using by key id stored in
> > the page header they can decrypt database files. Similarly, they also
> > can decrypt WAL as they can know the key id of WDEK.
> >
> > Master Key Rotation
> > ================
> > We will support new command-line tool that rotates the master key
> > offline. It accepts --old-encryption-key-passphrase-command option and
> > --new-encryption-key-passphrase-command to get old KEK and new KEK
> > respectively. It decrypt MDEK with the old key and encrypt it with
> > the new key.
>
> That handles changing the passphrase, but what about rotating the
> encryption key?  Don't we want to support that, at least in offline
> mode?

Yeah, supporting rotating the encryption key is a good idea. Agreed.

After more thoughts, it's a just idea but I wonder if the first
implementation step of TDE for v13 could be key management module.
That is, (in 3-tier case) PostgreSQL gets KEK by passphrase command or
directly, and creates MDEK. User can create an encryption key with
name using by SQL function, and the key manager derives DEK and store
its salt to the disk. Also we have an internal interface to get an
encryption key.

The good point is not only to develop incrementally but also that if
PostgreSQL is able to manage (symmetric) encryption keys inside
database cluster and has interfaces to get and add keys, pgcrypt also
will be able to use it. That way, we will provide column-level TDE
first by combination of pgcrypt, triggers and views while keeping
encryption keys truly secret. After that we can add other level TDE
using the key management module. We would then be able to focus on how
to encrypt buffer and WAL.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: Fetching timeline during recovery
Next
From: Bruce Momjian
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)