Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id 20190815151913.GC25063@momjian.us
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
List pgsql-hackers
On Thu, Aug 15, 2019 at 06:10:24PM +0900, Masahiko Sawada wrote:
> On Thu, Aug 15, 2019 at 10:19 AM Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Wed, Aug 14, 2019 at 04:36:35PM +0200, Antonin Houska wrote:
> > > I can work on it right away but don't know where to start.
> >
> > I think the big open question is whether there will be acceptance of an
> > all-cluster encyption feature.  I guess if no one objects, we can move
> > forward.
> >
> 
> I still feel that we need to have per table/tablespace keys although
> it might not be the first implementation. I think the safeness of both
> table/tablespace level and cluster level would be almost the same but
> the former would have an advantage in terms of operation and
> performance.

I assume you are talking about my option #1.  I can see if you only need
a few tables encrypted, e.g., credit card numbers, it can be excessive
to encrypt the entire cluster.  (I think you would need to encrypt
pg_statistic too.)

The tricky part will be WAL --- if we encrypt all of WAL, the per-table
overhead might be minimal compared to the WAL encryption overhead.  The
better solution would be to add a flag to WAL records to indicate
encrypted entries, but you would then leak when an encryption change
happens and WAL record length.  (FYI, numeric values have different
lengths, as do character strings.)  I assume we would still use a single
key for all tables/indexes, and one for WAL, plus key rotation
requirements.

I personally would like to see full cluster implemented first to find
out exactly what the overhead is.  As I stated earlier, the overhead of
determining which things to encrypt, both in code complexity, user
interface, and processing overhead, might not be worth it.

I can see why you would think that encrypting less would be easier than
encrypting more, but security boundaries are hard to construct, and
anything that requires a user API, even more so.

> > > At least it should be clear how [2] will retrieve the master key because [1]
> > > should not do it in a differnt way. (The GUC cluster_passphrase_command
> > > mentioned in [3] seems viable, although I think [1] uses approach which is
> > > more convenient if the passphrase should be read from console.)
> 
> I think that we can also provide a way to pass encryption key directly
> to postmaster rather than using passphrase. Since it's common that
> user stores keys in KMS it's useful if we can do that.

Why would it not be simpler to have the cluster_passphrase_command run
whatever command-line program it wants?  If you don't want to use a
shell command, create an executable and call that.

> > > Rotation of
> > > the master key is another thing that both versions of the feature should do in
> > > the same way. And of course, the fronend applications need consistent approach
> > > too.
> >
> > I don't see the value of an external library for key storage.
> 
> I think that big benefit is that PostgreSQL can seamlessly work with
> external services such as KMS. For instance, when key rotation,
> PostgreSQL can register new key to KMS and use it, and it can remove
> keys when it no longer necessary. That is, it can enable PostgreSQL to
> not only not only getting key from KMS but also registering and
> removing keys. And we also can decrypt MDEK in KMS instead of doing in
> PostgreSQL which is more safety. In addition, once someone create the
> plugin library of an external services individual projects don't need
> to create that.

I think the big win for an external library is when you don't want the
overhead of calling an external program.  For example, we certainly
would not want to call an external program while processing a query.  Do
we have any such requirements for encryption, especially since we only
are going to allow offline mode for encryption mode changes and key
rotation in the first version?

> BTW I've created PoC patch for cluster encryption feature. Attached
> patch set has done some items of TODO list and some of them can be
> used even for finer granularity encryption. Anyway, the implemented
> components are followings:

Nice, thanks.

> * Initialization stuff (initdb support). initdb has new command line
> options: --enc-cipher and --cluster-passphrase-command. --enc-cipher
> option accepts either aes-128 or aes-256 values while
> --cluster-passphrase-command accepts an arbitrary command. ControlFile
> has an integer indicating cluster encryption support, 'off', 'aes-128'
> or 'aes-256'.

Nice.  If we get agreement we want to do this for PG 13, we can start
applying these patches.

> * 3-tier encryption keys. During initdb we create KEK and MDEK and
> write the meta data file(global/pg_kmgr file). When postmaster startup
> it reads the kmgr file, verifies the passphrase using HMAC, unwraps
> MDEK and derives TDEK and WDEK from MDEK. Currently MDEK, TDEK and
> WDEK are stored into shared memory as this is still PoC but we also
> can have them in process local memory.

Uh, I thought we were going to have the TDEK and WDEK be created
separately, rather than derived from a single key, so we could do key
rotation on them independently, which might help with promoting standby
servers.

For example, someone could create a standby, rotate the TDEK right away,
then, once the standby is promoted, they can rotate the WDEK and have a
server that never reuses keys from the old primary.  Is that not a
user-case worth worrying about?  Maybe we need to discuss that more.

Oh, here's an even better reason to use separate, non-derived keys for
TDEK and WDEK.  How would you rotate keys for a primary server and its
standbys?  If the TDEK and WDEK are derived from the same key, you could
not modify the TDEK independently of the WDEK.  However, if they are
decoupled, you could shut down and rotate the TDEK of each standby, then
switch-over to a standby and rotate the TDEK on the old primary.  Once
you have rotated all the TDEK keys, you could shut down all servers and
quickly rotate the WDEK.  (The WDEK has to be the same for streaming
replication to work.)

> * All cryptographic functions are implemented using OpenSSL. Since
> HKDF and key wrap have been introduced in OpenSSL 1.1.0 it requires
> 1.1.0 or higher.

Sure.

> * Buffer encryption. All tables and indexes data except for vm and fsm
> are transparently encrypted.

Nice.

> Missing features so far are followings:
> 
> * WAL encryption
> * Temporary file encryption
> * Command-line tool to change passphrase (KEK key rotation)

I think we need the command-line tool to also rotate TDEK and WDEK, if
we go in that direction.

> * Front-end tool support (pg_waldump, pg_rewind)
> * Documentation
> * Regression tests
> 
> Since some of above items are already implemented in other patches we
> can use them.
> 
> We can create database cluster while enabling cluster encryption as follows:
> 
> $ initdb -D data --enc-cipher=aes-128
> --cluster-passphrase-command='echo "secret password"'
> $ pg_controldata | grep encryption
> 
> 
> Data encryption cipher:               aes-128
> $ pg_ctl start

Nice!

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Sergei Kornilov
Date:
Subject: Re: Change ereport level for QueuePartitionConstraintValidation
Next
From: Bruce Momjian
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)