Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=YaRnw@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
List pgsql-hackers
On Mon, Jun 17, 2019 at 11:02 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Mon, Jun 17, 2019 at 08:39:27AM -0400, Joe Conway wrote:
> >On 6/17/19 8:29 AM, Masahiko Sawada wrote:
> >> From perspective of  cryptographic, I think the fine grained TDE would
> >> be better solution. Therefore if we eventually want the fine grained
> >> TDE I wonder if it might be better to develop the table/tablespace TDE
> >> first while keeping it simple as much as possible in v1, and then we
> >> can provide the functionality to encrypt other data in database
> >> cluster to satisfy the encrypting-everything requirement. I guess that
> >> it's easier to incrementally add encryption target objects rather than
> >> making it fine grained while not changing encryption target objects.
> >>
> >> FWIW I'm writing a draft patch of per tablespace TDE and will submit
> >> it in this month. We can more discuss the complexity of the proposed
> >> TDE using it.
> >
> >+1
> >
> >Looking forward to it.
> >
>
> Yep. In particular, I'm interested in those aspects:
>

Attached the draft version patch sets of per tablespace transparent
data at rest encryption. The patch doesn't support full functionality,
it includes:

* Per tablespace encryption
* Encryption and decryption buffer data when disk I/O.
* 2 tier key hierarchy and key rotation
* Temporary file encryption (based on the patch Antonin proposd)
* System catalog encryption
* Generic key management API and test module
* Simple TAP test

but doesn't include for now (I'm writing):

* WAL encryption
* Replication supports
* pg_upgrade support
* Documentation
* README

and doesn't support:

* SLRU data encryption
* other system file encryption (pg_twophase, pg_subtrans, backup_label etc)
* Server log encryption

Before explaining the detail of the patch let me share my thoughts on
the following points.

> (1) What's the proposed minimum viable product, and how do we expect to
> extend it with the more elaborate features. I don't expect perfect
> specification, but we should have some idea so that we don't paint
> ourselves in the corner.

I think the minimum viable product should support the following features.

* Fine grained encryption object control (not using single key for
whole database cluster).
* Encrypt and decrypt tables (including system catalogs), indexes,
TOAST tables, WAL and temporary files when disk I/O.
* Passing either password, passphrase or encryption key to postgres
server without the risk of being written to files.
* Front-end programs provided by PostgreSQL source code work as much
as possible.
* Key rotation

I think that the following features would be added.

* SLRU and other data encryption. I think we can use an another
encryption key for these data.
* Support other encryption algorithms. I don't have any idea so far
but it would be not hard to support other symmetric-key algorithm.
* Faster key rotation. It can be done by having 2 tier key hierarchy.
* Integrate with external key management services. The patch
implements this but I'm sure there are other ways to integrate with
external key management services.

>
> (2) How does it affect recovery, backups and replication (both physical
> and logical)? That is, which other parts need to know the encryption keys
> to function properly?

If we encrypt whole 8kB WAL block (in cluster-wide encryption case) it
would be not hard because we just encrypt before writing to the disk
with single key. On the other hand if we encrypt some WAL records it
could be hard; it requires changes around WAL assembly code so that it
can obtain encryption keys and encrypt WAL data before inserting to
the WAL buffer. Since WAL is encrypted the recovery needs to obtain
all encryption keys and decrypt the encrypted WAL.

For streaming replication, since basically wal senders don't need to
know the actual contents of WAL (although xlogreader need to know WAL
header for validation) they send WAL data in encrypted state. And wal
receiver decrypt them. Therefore encryption keys also must be
replicated. On the other hand, logical replication (and logical
decoding) needs to decrypt WAL data when decoding. Since the logical
decoding is performed in PostgreSQL server side it's not hard to
obtain all encryption keys. It can send change sets both in
unencrypted state and even in encrypted sate if encrypt them again. We
would change xlogreader code so that it can decrypt WAL. So I think
that logical replication will be able to get WAL data in unencrypted
state without special operation.

For backups, physical backup must be encrypted even if we get it by
pg_basebackup, otherwise we cannot protect data from a malicious
backup operator threats. And encryption keys also must be backed up
together. Because this is data at rest encryption, logical backups can
be taken in unencrypted state. I think we would need nothing special
for backups.

>
> (3) What does it mean for external tools (pg_waldump, pg_upgrade,
> pg_rewind etc.)?

I think that this definitely affects at least pg_waldump, pg_upgrade,
pg_checksums and pg_rewind. By changing WAL format or giving
encryption keys to these programs we can support pg_waldump and
pg_rewind even for encrypted database. I prefer the former because
passing encryption keys to front-end programs could be risk of key
leakage. It also would affects external tools that reads or writes
database file and WAL directly. For instance pg_rman, which is a
recovery management tool, read database file and takes a backup
without a hole in each pages. Such programs would need encryption
keys.

Here is the details of patches.

Usage
======
To enable TDE feature please specify --with-openssl configuration
option. Also, please set kmgr_plugin_library GUC parameter in
postgresql.conf, which specifies the library for key managemnt
program. The patch includes contrib/kmgr_file which is the test
program for key management and store the master key in the local disk.
So for test purpose you can set kmgr_plugin_library = 'kmgr_file'.

After starting up postgres server, you can create an encrypted
tablespace by specifying 'encryption' option like,

CREATE TABLESPACE enctblspc LOCATION '/path/to/tblsp' WITH (encryption = on);

And then the tables, indexes and TOAST tables created on the
tablespace will be encrypted at rest.

For system catalogs, system catalogs on pg_default and global are not
encrypted. If you want to encrypt system catalogs, we need to create a
database on an encrypted tablespace. During copying database file from
source database we either encrypt/reencrypt each system catalogs.
You can enable and disable encryption of the table by moving
tablepsace between encrypted tablespace and non-encrypted tablespace.

Changes
=======

* 0001-Add-encryption-module-supporting-AES-256-by-using-op.patch

This patch is mostly based on the patch Antoin proposed[1] but I
modified some contents. This patch adds encrption function and
decryption function using openssl. It currently support AES-256-XTS
for buffer data encryption and AES-256-CTE for WAL encryption.

* 0002-Add-kmgr-plugin-APIs.patch

This patch adds new generic key managment APIs: startup, get,
generate, isexist and remove. Kmgr plugin programs can define these
primitive function to manage the master key that could be located at
external server. The plugin program is specified by
'kmgr_plugin_library' GUC parameter, and loaded when postmaster starts
up.

0003-Add-key-management-module-for-transparent-data-encry.patch

This patch adds key management module, which is responsible for
tablespace key management. All tablepace keys are persisted to the
file on disk, called keyring file, and loaded to the hash table on the
shared memory when postmatser starts up. The tablespace keys on the
shared memory is not encrypted state. Whenever a encrypted tablepsace
is created or dropped the keyring file is modified.

Master key identifier is used as the key for the master key. It
consists of system identifier and sequence number starting from 0 like
'pg_master_key-6707524-0000'. The sequence number is incremented
whenever key rotation.

When key rotation, we generate a new master key id in PostgreSQL core
and ask the kmgr plugin to generate new master key identified by the
new master key. And then update all tablespace keys in the keyring
file by reencrypting with the new master key.

0004-Add-facility-to-give-process-local-encryption-key.patch

This patch adds functionallity to get a process-local temporary key,
which is intended to use for temporary file encryption.

0005-Encrypt-and-decrypt-data-on-encrypted-tablespace-whe.patch

This patch support buffer encrption; encrypts and decrypt database
data when disk I/O. It adds new smgr callbacks smgrencrypta and
smgrdecrypt, and mdencrypt and mddecrypt but please note that
currently the patch supports only heap and nbtree, I'm trying to
support other access methods. Basically, when bufmgr reads buffer or
writes buffer through the shared buffer the access methods don't need
to care about the buffer encryption. However when the access methods
themselves write the buffer directly to the disk it needs to call
smgrencrypt.

0006-Encrypt-buffile.patch

This is the patch proposed Antonin. Since I've not look the detail of
this patch yet I'll look it.

0007-Make-Reorderbuffer-encrypt-spilled-out-file.patch

Same as above.

0008-Support-tablespace-encryption.patch

This patch adds 'encryption' option to tablespace.

0009-Add-kmgr-plugin-test-module-kmgr_file.patch

This patch adds a test module for kmgr plugin. It generates random
master key string and stores it to the local disk. Since this store
the master key without encryption this is for test purpose. It also
has TAP test for TDE.

Feedback and comment are very welcome.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal - patch: psql - sort_by_size
Next
From: Julien Rouhaud
Date:
Subject: Re: Avoid full GIN index scan when possible