Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id 20190613150725.2xmdaywxjf3empwf@momjian.us
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
List pgsql-hackers
On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote:
> On Thu, Jun 13, 2019 at 3:48 AM Bruce Momjian <bruce@momjian.us> wrote:
> > The big question is how many people will be mixing encrypted and
> > unencrypted data in the same cluster, and care about performance?  Just
> > because someone might care is not enough of a justification.  They can
> > certainly create separate encrypted and non-encrypted clusters. Can we
> > implement level 6 and then implement levels 3-5 later if desired?
> 
> I guess most users are interested in performance. Users don't want to
> sacrifice performance for security and vice versa. Fine grained
> control would allow us to seek a compromise point.

Well, what does that add to the argument?  Yes, everyone cares about
performance, but it is the magnitude of the performance impact vs. the
complexity that is the issue here.  Also, by definition, users will
trade performance for security because encrypting data will slow down
the database.  The open question is how much, and if that overhead is
reasonable based on the complexity.

What I don't want to do is to design a system that is more complex than
required, and it might become so complex we might never get it done.

> > How would you configure the WAL to know which key to use if we did #5?
> > Wouldn't system tables and statistics, and perhaps referential integry
> > allow for information leakage?
> 
> We use a something like a map between tablespace oid and encryption
> key as a separate file (maybe stored in $PGDATA/global), called
> keyring. Using the keyring we can obtain encryption key by tablespace
> oid. For WAL, we add a flag to XLogRecord which indicates whether the
> WAL record is encrypted, and we already have relfilenode in the header
> data of WAL. So we can obtain the tablespace oid from the part and
> obtain the corresponding encryption key.

OK.

> > > 2. Encryption Objects.
> > > Indexes, WAL and TOAST table pertaining to encrypted tables, and
> > > temporary files must also be encrypted but we need to discuss whether
> > > we encrypt non-user data as well such as SLRU data, vm and fsm, and
> > > perhaps even other files such as 2PC state files, backend_label etc.
> > > Encryption everything is required by some use case but it's also true
> > > that there are users who wish to encrypt database while minimizing
> > > performance overheads.
> >
> > I don't think we need to encrypt the "status" files like SLRU data, vm
> > and fsm.
> 
> I agree.

Good.

> > Good point about pg_waldump.  I am a little worried we might open a
> > security hole making a new API so they work, so maybe we should avoid
> > it.
> 
> Yeah, in principle since data key of 2 tier key architecture should
> not go outside database I think we should not tell data keys to
> utility commands. So the rearranging WAL format seems to be a better
> solution but is there any reason why the main data is placed at end of
> WAL record? I wonder if we can assemble WAL records as following order
> and encrypt only 3 and 4.
> 
> 1. Header data (XLogRecord and other headers)
> 2. Main data (xl_heap_insert, xl_heap_update etc + related data)
> 3. Block data (Tuple data, FPI)
> 4. Sub data (e.g tuple data for logical decoding)

Yes, that does sound like a reasonable idea.  It is similar to us not
encrypting the clog --- there is little value.  However, if we only
encrypt the cluster, we don't need to expose the relfilenode and we can
just encrypt the entire WAL --- I like that simplicity.  We might find
that the complexity of encrypting only certain tablespaces makes the
system slower than just encrypting the entire cluster.

> > > Also, for system catalog encryption, it could be a hard part. System
> > > catalogs are initially created at initdb time and created by copying
> > > from template1 when CREATE DATABASE. Therefore we would need to either
> > > modify initdb so that it's aware of encryption keys and KMS or modify
> > > database creation so that it copies database file while encrypting
> > > them.
> >
> > I assume initdb will use the same API that you would use to start the
> > server itself, e.g., type in a password, or contact a key server.
> 
> I realized that in XTS encryption mode since we craft the tweak using
> relfilenode we will need to have the different tweaks for system
> catalogs in new database would change. So we might need to re-encrypt
> system catalogs when CREATE DATABASE after all. I suspect that even
> the cluster-wide encryption has the same problem.

Yes, this is why I want to just do cluster-wide encryption at this
stage.

In addition, while the 8k blocks would use a block cipher, the WAL would
likely use a stream cipher, and it will be very hard to use multiple
stream ciphers in a single WAL file.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ReplicationSlotCtl: undefined reference
Next
From: Pavel Trukhanov
Date:
Subject: Re: Improve handling of pg_stat_statements handling of bind "IN" variables