Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id CAD21AoA9+Sf9WOu-TaEJG2L3BQjv6bh8g0cKs0=kjnGzk1QuJg@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
List pgsql-hackers
On Thu, Jun 13, 2019 at 3:48 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Jun  5, 2019 at 11:54:04AM +0900, Masahiko Sawada wrote:
> > On Fri, May 10, 2019 at 2:42 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > I think we need to step back and see what we want to do.  There are six
> > > levels of possible encryption:
> > >
> > > 1.  client-side column encryption
> > > 2.  server-side column encryption
> > > 3.  table-level
> > > 4.  database-level
> > > 5.  tablespace-level
> > > 6.  cluster-level
> > >
> > > 1 & 2 encrypt the data in the WAL automatically, and option 6 is
> > > encrypting the entire WAL.  This leaves 3-5 as cases where there will be
> > > mismatch between the object-level encryption and WAL.  I don't think it
> > > is very valuable to use these options so reencryption will be easier.
> > > In many cases, taking any object offline might cause the application to
> > > fail, and having multiple encrypted data keys active will allow key
> > > replacement to be done on an as-needed basis.
> > >
> >
> > Summarizing the design discussion so far and the discussion I had at
> > PGCon, there are several basic design items here. Each of them is
> > loosely related and there are trade-off.
> >
> > 1. Encryption Levels.
> > As Bruce suggested there are 6 levels.  The fine grained control will
> > help to suppress performance overheads of tables that we don't
> > actually need to encrypt. Even in terms of security it might help
> > since we don't give the key users who don't or cannot access to
> > encrypted tables. But whichever we choose the level, we can protect
> > data from attack bypassing PostgresSQL's ACL such as reading database
> > file directly, as long as we encrypt data inside database. Threats we
> > want to protect by has already gotten consensus so far, I think.
>
> I think level 6 is an obvious must-have.  I think the big question is
> whether we gain enough by implementing levels 3-5 compared to the
> complexity of the code and user interface.
>
> The big question is how many people will be mixing encrypted and
> unencrypted data in the same cluster, and care about performance?  Just
> because someone might care is not enough of a justification.  They can
> certainly create separate encrypted and non-encrypted clusters. Can we
> implement level 6 and then implement levels 3-5 later if desired?
>

I guess most users are interested in performance. Users don't want to
sacrifice performance for security and vice versa. Fine grained
control would allow us to seek a compromise point.

From another point of view, there are our clients who want rather to
use different keys from keys other systems using for security reason
in multi tenant environment. Also, when key leakage they need to
re-encrypt database data. This feature would help such situations. We
can use different keys for each systems and can re-encrypt data
without database down.

> > Among these levels, the tablespace level would be somewhat different
> > from others because it corresponds to physical directories rather than
> > database objects. So in principles it's possible that tables are
> > created on an encrypted tablespace while indexes are created on
> > non-encrypted tablespace, which does not make sense though. But having
> > less encryption keys would be better for simple architecture.
>
> How would you configure the WAL to know which key to use if we did #5?
> Wouldn't system tables and statistics, and perhaps referential integry
> allow for information leakage?

We use a something like a map between tablespace oid and encryption
key as a separate file (maybe stored in $PGDATA/global), called
keyring. Using the keyring we can obtain encryption key by tablespace
oid. For WAL, we add a flag to XLogRecord which indicates whether the
WAL record is encrypted, and we already have relfilenode in the header
data of WAL. So we can obtain the tablespace oid from the part and
obtain the corresponding encryption key.

>
> > 2. Encryption Objects.
> > Indexes, WAL and TOAST table pertaining to encrypted tables, and
> > temporary files must also be encrypted but we need to discuss whether
> > we encrypt non-user data as well such as SLRU data, vm and fsm, and
> > perhaps even other files such as 2PC state files, backend_label etc.
> > Encryption everything is required by some use case but it's also true
> > that there are users who wish to encrypt database while minimizing
> > performance overheads.
>
> I don't think we need to encrypt the "status" files like SLRU data, vm
> and fsm.

I agree.

>
> > 3. Encryption keys.
> > Encryption levels would be relevant with the number of encryption keys
> > we use. The database cluster levels would use single encryption key
> > and can encrypt everything easier including non-user data such as xact
> > WALs and SRLU data with the same key. On the other hand, for instance
> > the table level would use multiple keys and can encrypt tables with
> > different encryption keys. One advantage of having multiple keys in
> > database would be that it can re-encrypt encrypted database object
> > as-needed basis. For instance in multi tenant architecture, the
> > stopping database cluster would affect all services but we can
> > re-encrypt data one by one while minimizing downtime of each services
> > if we use multiple keys. Even in terms of security, having multiple
> > keys helps the diversification of risk.
>
> I agree we need a 2 tier key hierarchy.   See my pgcryptokey extension
> as an example:
>
>         http://momjian.us/download/pgcryptokey/

Thanks.

>
> > Apart from the above discussion, there are random concerns about the
> > design regarding to the fine grained design. For WAL encryption, as a
> > result of discussion so far I'm going to use the same encryption for
> > WAL encryption as that used for tables. Given that approach, it would
> > be required to make utility commands that read WAL (pg_waldump and
> > pg_rewind) be able to get arbitrary encryption keys. pg_waldump might
> > require even an encryption keys of WAL of which table has already been
> > dropped. As I discussed at PGCon[3], by rearranging WAL format would
> > solve this issue but it doesn't resolve fundamental issue.
>
> Good point about pg_waldump.  I am a little worried we might open a
> security hole making a new API so they work, so maybe we should avoid
> it.

Yeah, in principle since data key of 2 tier key architecture should
not go outside database I think we should not tell data keys to
utility commands. So the rearranging WAL format seems to be a better
solution but is there any reason why the main data is placed at end of
WAL record? I wonder if we can assemble WAL records as following order
and encrypt only 3 and 4.

1. Header data (XLogRecord and other headers)
2. Main data (xl_heap_insert, xl_heap_update etc + related data)
3. Block data (Tuple data, FPI)
4. Sub data (e.g tuple data for logical decoding)

>
> > Also, for system catalog encryption, it could be a hard part. System
> > catalogs are initially created at initdb time and created by copying
> > from template1 when CREATE DATABASE. Therefore we would need to either
> > modify initdb so that it's aware of encryption keys and KMS or modify
> > database creation so that it copies database file while encrypting
> > them.
>
> I assume initdb will use the same API that you would use to start the
> server itself, e.g., type in a password, or contact a key server.

I realized that in XTS encryption mode since we craft the tweak using
relfilenode we will need to have the different tweaks for system
catalogs in new database would change. So we might need to re-encrypt
system catalogs when CREATE DATABASE after all. I suspect that even
the cluster-wide encryption has the same problem.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Timur Birsh
Date:
Subject: Re: [PATCH] vacuumlo: print the number of large objects going to be removed
Next
From: Daniel Gustafsson
Date:
Subject: Backend specific ifdefs in sha2.h