Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers
| From | Masahiko Sawada | 
|---|---|
| Subject | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) | 
| Date | |
| Msg-id | CAD21AoCqx=95bfjdvuPJQpxgi6X-TZXchr6mQSpAHuri7dty1w@mail.gmail.com Whole thread Raw | 
| In response to | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) (Bruce Momjian <bruce@momjian.us>) | 
| Responses | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) | 
| List | pgsql-hackers | 
On Fri, May 10, 2019 at 2:42 AM Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, May 9, 2019 at 05:49:12PM +0900, Masahiko Sawada wrote: > > In terms of keys, one advantage could be that we have less keys with > > per-tablespace keys. > > > > Let me briefly explain the current design I'm thinking. The design > > employees 2-tier key architecture. That is, a database cluster has one > > master key and per-tablespace keys which are encrypted with the master > > key before storing to disk. Each tablespace keys are generated > > randomly inside database when CREATE TABLESPACE. The all encrypted > > tablespace keys are stored together with the master key ID to the file > > (say, $PGDATA/base/pg_tblsp_keys). That way, the startup process can > > easily get all tablespace keys and the master key ID before starting > > recovery, and therefore can get the all decrypted tablespace keys. The > > reason why it doesn't store per-tablespace keys in a column of > > pg_tabelspace is that we also encrypt pg_tablespace with the > > tablespace key. We could take a way to not encrypt only pg_tablespace, > > however it instead would require to scan pg_tablespace before > > recovery, and eventually we would need to not encrypt pg_attribute > > that should be encrypted. > > > > During the recovery I'm also thinking the idea you suggested; wrapper > > WAL records have tablespace OID that is the lookup key for tablespace > > key and the startup process can get the tablespace key. > > > > Given that the above design less data keys is better. Obviously > > per-tablespace keys are less than per-table keys. And even if we > > employee per-tablespace keys we can allow user to specify per-table > > encryption by using the same encryption key within the tablespace. > > > > FYI one advantage of per-tablespace encryption from user perspective > > would be less conversion when database migration. Using > > default_tablespace parameter we need less modification of create table > > DDL. > > I think we need to step back and see what we want to do. There are six > levels of possible encryption: > > 1. client-side column encryption > 2. server-side column encryption > 3. table-level > 4. database-level > 5. tablespace-level > 6. cluster-level > > 1 & 2 encrypt the data in the WAL automatically, and option 6 is > encrypting the entire WAL. This leaves 3-5 as cases where there will be > mismatch between the object-level encryption and WAL. I don't think it > is very valuable to use these options so reencryption will be easier. > In many cases, taking any object offline might cause the application to > fail, and having multiple encrypted data keys active will allow key > replacement to be done on an as-needed basis. > Summarizing the design discussion so far and the discussion I had at PGCon, there are several basic design items here. Each of them is loosely related and there are trade-off. 1. Encryption Levels. As Bruce suggested there are 6 levels. The fine grained control will help to suppress performance overheads of tables that we don't actually need to encrypt. Even in terms of security it might help since we don't give the key users who don't or cannot access to encrypted tables. But whichever we choose the level, we can protect data from attack bypassing PostgresSQL's ACL such as reading database file directly, as long as we encrypt data inside database. Threats we want to protect by has already gotten consensus so far, I think. Among these levels, the tablespace level would be somewhat different from others because it corresponds to physical directories rather than database objects. So in principles it's possible that tables are created on an encrypted tablespace while indexes are created on non-encrypted tablespace, which does not make sense though. But having less encryption keys would be better for simple architecture. 2. Encryption Objects. Indexes, WAL and TOAST table pertaining to encrypted tables, and temporary files must also be encrypted but we need to discuss whether we encrypt non-user data as well such as SLRU data, vm and fsm, and perhaps even other files such as 2PC state files, backend_label etc. Encryption everything is required by some use case but it's also true that there are users who wish to encrypt database while minimizing performance overheads. 3. Encryption keys. Encryption levels would be relevant with the number of encryption keys we use. The database cluster levels would use single encryption key and can encrypt everything easier including non-user data such as xact WALs and SRLU data with the same key. On the other hand, for instance the table level would use multiple keys and can encrypt tables with different encryption keys. One advantage of having multiple keys in database would be that it can re-encrypt encrypted database object as-needed basis. For instance in multi tenant architecture, the stopping database cluster would affect all services but we can re-encrypt data one by one while minimizing downtime of each services if we use multiple keys. Even in terms of security, having multiple keys helps the diversification of risk. 4. Key rotation and 2 tier key hierarchy. Another design point would be key rotation and using 2 tier key hierarchy. Periodic key rotation is very important and is required by some security standard. For instance PCI DSS 3.6.4 states "Cryptographic key changes for keys that have reached the end of their cryptoperiod (for example, after a defined period of time has passed and/or after a certain amount of cipher-text has been produced by a given key), as defined by the associated application vendor or key owner, and based on industry best practices and guidelines"[1]. A cryptoperiod is the time span during which a specific cryptographic key is authorized for use[2] (sometimes represents the number of transactions). It is defined based on multiple factors such as key length, key strength, algorithms. So it depends on users systems, key rotation can be required even every a few months. 2 tier key hierarchy is a technique for faster key rotation; it uses 2 types of keys: master key and data key. The master key is stored outside database whereas the data key is stored inside database. The data key is used to encrypt actual database data and is encrypted with the master key before storing to the disk. And the master key must be stored outside database. When key rotation, we rotate the master key and re-encrypt only data key rather than database data. Therefore since we don't need to access and modify database data the key rotation will complete in a second. Without it, we will end up with re-encrypting database data, which could take a long time depending on the encryption levels. Because the data key must not be go outside database we might need to provide a mechanism of key rotation in database side, for example providing pg_rotate_encryption_key() SQL function or ALTER SYSTEM ROTATE KEY SQL command so that PostgreSQL rotate data keys with the new master key So it might also require PostgreSQL to get the new master key from an external location within the command, and therefore it's relevant with key management. 5. Key management. Encryption key (the master key in 2 tier key hierarchy) must be taken from an external location. If only getting, it might be enough to get it using shell command like curl. However it's more secure if we can seamlessly integrate PostgreSQL with key management services. And DBAs would no longer need to care key lifecycles. In addition to that, dedicated programs or knowledge will not be necessary in individual user systems. Furthermore, integration with KMS might be helpful even for column-level encryption; we could combine pgcrypto with it to provide per column TDE. Apart from the above discussion, there are random concerns about the design regarding to the fine grained design. For WAL encryption, as a result of discussion so far I'm going to use the same encryption for WAL encryption as that used for tables. Given that approach, it would be required to make utility commands that read WAL (pg_waldump and pg_rewind) be able to get arbitrary encryption keys. pg_waldump might require even an encryption keys of WAL of which table has already been dropped. As I discussed at PGCon[3], by rearranging WAL format would solve this issue but it doesn't resolve fundamental issue. Also, for system catalog encryption, it could be a hard part. System catalogs are initially created at initdb time and created by copying from template1 when CREATE DATABASE. Therefore we would need to either modify initdb so that it's aware of encryption keys and KMS or modify database creation so that it copies database file while encrypting them. There are two proposals so far: the cluster wide encryption and per tablespace encryption. It's good if we pick a good thing from each proposals in order to provide an useful TDE feature to users. And whatever design the community accepts I'd like to contribute to do that. Comments and feedback are very welcome! [1] https://www.pcisecuritystandards.org/document_library?category=pcidss&document=pci_dss [2] https://en.wikipedia.org/wiki/Cryptoperiod [3] https://www.slideshare.net/masahikosawada98/transparent-data-encryption-in-postgresql/28 Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: