[Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS) - Mailing list pgsql-hackers

From Moon, Insung
Subject [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date
Msg-id 031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp
Whole thread Raw
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)  (Antonin Houska <ah@cybertec.at>)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Aleksander Alekseev <a.alekseev@postgrespro.ru>)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Nico Williams <nico@cryptonector.com>)
List pgsql-hackers
Hello Hackers,

This propose a way to develop "Table-level" Transparent Data Encryption (TDE) and Key Management Service (KMS) support
in
PostgreSQL.


Issues on data encryption of PostgreSQL
==========
Currently, in PostgreSQL, data encryption can be using pgcrypto Tool.
However, it is inconvenient to use pgcrypto to encrypts data in some cases.

There are two significant inconveniences.

First, if we use pgcrypto to encrypt/decrypt data, we must call pgcrypto functions everywhere we encrypt/decrypt.
Second, we must modify application program code much if we want to do database migration to PostgreSQL from other
databasesthat is
 
using TDE.

To resolved these inconveniences, many users want to support TDE.
There have also been a few proposals, comments, and questions to support TDE in the PostgreSQL community.

However, currently PostgreSQL does not support TDE, so in development community, there are discussions whether it's
necessaryto
 
support TDE or not.

In these discussions, there were requirements necessary to support TDE in PostgreSQL.

1) The performance overhead of encryption and decryption database data must be minimized
2) Need to support WAL encryption.
3) Need to support Key Management Service.

Therefore, I'd like to propose the new design of TDE that deals with both above requirements.
Since this feature will become very large, I'd like to hear opinions from community before starting making the patch.

First, my proposal is table-level TDE which is that user can specify tables begin encrypted. 
Indexes, TOAST table and WAL associated with the table that enables TDE are also encrypted.

Moreover, I want to support encryption for large object as well.
But I haven't found a good way for it so far. So I'd like to remain it as future TODO.

My proposal has five characteristics features of "table-level TDE".

1) Buffer-level data encryption and decryption
2) Per-table encryption
3) 2-tier encryption key management
4) Working with external key management services(KMS)
5) WAL encryption

Here are more details for each items.


1. Buffer-level data encryption and decryption
==================
Transparent data encryption and decryption accompany by storage operation 
With ordinally way like using pgcrypto, the biggest problem with encrypted data is the performance overhead of
decryptingthe data
 
each time the run to queries.

My proposal is to encrypt and decrypt data when performing DISK I/O operation to minimize performance overhead.
Therefore, the data in the shared memory layer is unencrypted so that performance overhead can minimize.

With this design, data encryption/decryption implementations can be developed by modifying the codes of the storage and
buffer
manager modules, 
which are responsible for performing DISK I/O operation.


2. Per-table encryption
==================
User can enable TDE per table as they want.
I introduce new storage parameter "encryption_enabled" which enables TDE at table-level.

    // Generate  the encryption table
       CREATE TABLE foo WITH ( ENCRYPTION_ENABLED = ON );

    // Change to the non-encryption table
       ALTER TABLE foo SET ( ENCRYPTION_ENABLED = OFF );

This approach minimizes the overhead for tables that do not require encryption options.
For tables that enable TDE, the corresponding table key will be generated with random values, and it's stored into the
newsystem
 
catalog after being encrypted by the master key.

BTW, I want to support CBC mode encryption[3]. However, I'm not sure how to use the IV in CBC mode for this proposal. 
I'd like to hear opinions by security engineer.


3. 2-tier encryption key management
==================
when it comes time to change cryptographic keys, there is a performance overhead to decryption and re-encryption to all
data.

To solve this problem we employee 2-tier encryption. 
2-tier encryption is All table keys can be stored in the database cluster after being encrypted by the master key, And
masterkeys
 
must be stored at external of PostgreSQL.

Therefore, without master key, it is impossible to decrypt the table key. Thus, It is impossible to decrypt the
databasedata.
 

When changing the key, it's not necessary to re-encrypt for all data.
We use the new master key only to decrypt and re-encrypt the table key, these operations for minimizing the performance
overhead.

For table keys, all TDE-enabled tables have different table keys. 
And for master key, all database have different master keys. Table keys are encrypted by the master key of its own
database.
For WAL encryption, we have another cryptographic key. WAL-key is also encrypted by a master key, but it is shared
acrossthe
 
database cluster.


4. Working with external key management services(KMS)
==================
A key management service is an integrated approach for generating, fetching and managing encryption keys for key
control.
 
They may cover all aspects of security from the secure generation of keys, secure storing keys, and secure fetching
keysup to
 
encryption key handling.
Also, various types of KMSs are provided by many companies, and users can choose them.

Therefore I would like to manage the master key using KMS.
Also, my proposal is to create callback APIs(generate_key, fetch_key, store_key) in the form of a plug-in so that users
canuse many
 
types of KMS as they want.

In KMIP protocol and most KMS manage keys by string IDs. We can get keys by key ID from KMS. 
So in my proposal, all master keys are distinguished by its ID, called "master key ID". 
The master key ID is made, for example, using the database oid and a sequence number, like <OID>_<SeqNo>. And they are
managedin
 
PostgreSQL.
    
When database startup, all master key ID is loaded to shared memory, and they are protected by LWLock.

When it comes time to rotate the master keys, run this query.

    ALTER SYSTEM ROTATION MASTER KEY;

In this query, the master key is rotated with the following step.
1. Generate new master key,
2. Change master key IDs and emit corresponding WAL
3. Re-encrypt all table keys on its database

Also during checkpoint, master key IDs on shared memory become a permanent condition.


5. WAL encryption
==================
If we encrypt all WAL records, performance overhead can be significant.
Therefore, this proposes a method to encrypt only WAL record excluding WAL header when writing WAL on the WAL buffer,
insteadof
 
encrypting a whole WAL record.
WAL encryption key is generated separately when the TDE-enabled table is created the first time. We use 2-tier
encryptionfor WAL
 
encryption as well.
So, when it comes time to rotate the WAL encryption key, run this query.

    ALTER SYSTEM ROTATION WAL KEY;

Next, I will explain how to encrypt WAL.

To do this operation, I add a flag to WAL header which indicates whether the subsequent WAL data is encrypted or not.

Then, when we write WAL for encryption table we write "encrypted" WAL on WAL buffer layer.

In recovery, we read WAL header and check the flag of encryption, and judges whether WAL must be decrypted.
In the case of PITR, we use WAL key ID in the backup file.

With this approach, the performance overhead of writing and reading the WAL for unencrypted tables would be almost the
sameas
 
before.


==================
I'd like to discuss the design before starting making any change of code.
After a more discussion I want to make a PoC.
Feedback and suggestion are very welcome.

Finally, thank you initial design input for Masahiko Sawada.

Thank you.

[1] What does TDE mean?
    > https://en.wikipedia.org/wiki/Transparent_Data_Encryption

[2] What does KMS mean?
    > https://en.wikipedia.org/wiki/Key_management#Key_Management_System

[3] What does CBC-Mode mean?
    > https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation
    
[4] Recently discussed mail
    https://www.postgresql.org/message-id/CA%2BCSw_tb3bk5i7if6inZFc3yyf%2B9HEVNTy51QFBoeUk7UE_V%3Dw%40mail.gmail.com


Regards.
Moon.
----------------------------------------
Moon, Insung
NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
----------------------------------------





pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: [GSoC] github repo and initial work
Next
From: Magnus Hagander
Date:
Subject: Re: pg_replication_slot_advance to return NULL instead of 0/0 if slotnot advanced