Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id 20190706160514.b67q4f7abcxfdahk@momjian.us
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
List pgsql-hackers
On Fri, Jul  5, 2019 at 10:24:39PM +0200, Tomas Vondra wrote:
> I agree this is a pretty crucial challenge, and those requirements seem
> in direct conflict. Users use encryption to protect privacy of the data,
> but we need access to some of the data to implement some of the
> important tasks of a RDBMS.
> 
> And it's not just about things like recovery or replication. How do you
> do ANALYZE on encrypted data? Sure, if a user runs it in a session that
> has the right key, that's fine. But what about autovacuum/autoanalyze?

There might be a way to defer ANALYZE and autovacuum/autoanalyze, but
what about VACUUM FREEZE?  We can defer that too, but not the clog
truncation that is eventually the product of the freeze.

What about referential integrity constraints that need to check primary
keys in the encrypted tables?  I also don't see a way of delaying that,
and if you can't do referential integrity into the encrypted tables, it
reduces the value of having encrypted data in the same database rather
than in another database or cluster?

I still feel we have not clearly described what the options are:

1.  Encrypt everything

2.  Encrypt only some tables (for performance reasons), and use only one
key, or use multiple keys to allow for key rotation.  All keys are
always unlocked.

3.  Encrypt only some tables with different keys, and the keys are not
always unlocked.

As Tomas already stated, using tablespaces to distinguish encrypted from
non-encrypted tables doesn't make sense since the file system used for
the storage is immaterial to the encryptions status. An easier way would
be to just add a bit to WAL that would indicate if the rest of the WAL
record is encrypted, though I doubt the performance boost is worth the
complexity.

I see the attraction of #3, but operationally it is unclear how we can
decouple data that is not always accessible, as outlined above.  We
could probably work around the WAL issues, but it is going to be much
more overhead.  It is also unclear how the user supplies the keys ---
are they done at boot time, and if so, how are later keys unlocked, or
does the client provide it?  If the client provides it, isn't it better
to do client-side encryption, or have the client use pgcrypto with some
key management around it like pgcryptokey?  This presentation shows how
to use triggers to implement transparent encryption at the column level:

    https://momjian.us/main/writings/crypto_hw_use.pdf#page=77

Structurally, I understand the desire to push key control out to users
in #3, but it then becomes very complex to construct a system where the
data is tightly coupled in a Postgres cluster.  The pgcrypto method
above works because it decouples row control, like xmin/xmax and WAL
replay, which is not encrypted, with the row payload, which is
encrypted.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Optimize partial TOAST decompression
Next
From: Steven Pousty
Date:
Subject: Switching PL/Python to Python 3 by default in PostgreSQL 12