Re: Transparent Data Encryption (TDE) and encrypted files - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Transparent Data Encryption (TDE) and encrypted files |
Date | |
Msg-id | 20191004213100.i3ws54hpjsbalorc@development Whole thread Raw |
In response to | Re: Transparent Data Encryption (TDE) and encrypted files (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Transparent Data Encryption (TDE) and encrypted files
|
List | pgsql-hackers |
On Fri, Oct 04, 2019 at 03:57:32PM -0400, Bruce Momjian wrote: >On Fri, Oct 4, 2019 at 09:18:58AM -0400, Robert Haas wrote: >> I think everyone would agree that if you have no information about a >> database other than the contents of pg_clog, that's not a meaningful >> information leak. You would be able to tell which transactions >> committed and which transactions aborted, but since you know nothing >> about the data inside those transactions, it's of no use to you. >> However, in that situation, you probably wouldn't be attacking the >> database in the first place. Most likely you have some knowledge about >> what it contains. Maybe there's a stream of sensor data that flows >> into the database, and you can see that stream. By watching pg_clog, >> you can see when a particular bit of data is rejected. That could be >> valuable. > >It is certainly true that seeing activity in _any_ cluster file could >leak information. However, even if we encrypted all the cluster files, >bad actors could still get information by analyzing the file sizes and >size changes of relation files, and the speed of WAL creation, and even >monitor WAL for write activity (WAL file byte changes). I would think >that would leak more information than clog. > Yes, those information leaks seem unavoidable. >I am not sure how you could secure against that information leak. While >file system encryption might do that at the storage layer, it doesn't do >anything at the mounted file system layer. > That's because FDE is only meant to protect against passive attacker, essentially stealing the device. It's useless when someone gains access to a mounted disk, so these information leaks are irrelevant. (I'm only talking about encryption at the block device level. I'm not sure about details e.g. for the encryption built into ext4, etc.) >The current approach is to encrypt anything that contains user data, >which includes heap, index, and WAL files. I think replication slots >and logical replication might also fall into that category, which is why >I started this thread. > Yes, I think those bits have to be encrypted too. BTW I'm not sure why you list replication slots and logical replication independently, those are mostly the same thing I think. For physical slots we probably don't need to encrypt anything, but for logical slots we may spill decoded data to files (so those will contain user data). >I can see some saying that all cluster files should be encrypted, and I >can respect that argument. However, as outlined in the diagram linked >to from the blog entry: > > https://momjian.us/main/blogs/pgblog/2019.html#September_27_2019 > >I feel that TDE, since it has limited value, and can't really avoid all >information leakage, should strive to find the intersection of ease of >implementation, security, and compliance. If people don't think that >limited file encryption is secure, I get it. However, encrypting most >or all files I think would lead us into such a "difficult to implement" >scope that I would not longer be able to work on this feature. I think >the code complexity, fragility, potential unreliability, and even >overhead of trying to encrypt most/all files would lead TDE to be >greatly delayed or never implemented. I just couldn't recommend it. >Now, I might be totally wrong, and encryption of everything might be >just fine, but I have to pick my projects, and such an undertaking seems >far too risky for me. > I agree some trade-offs will be needed, to make the implementation at all possible (irrespectedly of the exact design). But I think those trade-offs need to be conscious, based on some technical arguments why it's OK to consider a particular information leak acceptable, etc. For example it may be fine when assuming the attacker only gets a single static copy of the data directory, but not when having the ability to observe changes made by a running instance. In a way, my concern is somehat the opposite of yours - that we'll end up with a feature (which necessarily adds complexity) that however does not provide sufficient security for various use cases. And I don't know where exactly the middle ground is, TBH. >Just for some detail, we have solved the block-level encryption problem >by using CTR mode in most cases, but there is still a requirement for a >nonce for every encryption operation. You can use derived keys too, but >you need to set up those keys for every write to encrypt files. Maybe >it is possible to set up a write API that handles this transparently in >the code, but I don't know how to do that cleanly, and I doubt if the >value of encrypting everything is worth it. > >As far as encrypting the log file, I can see us adding documentation to >warn about that, and even issue a server log message if encryption is >enabled and syslog is not being used. (I don't know how to test if >syslog is being shipped to a remote server.) > Not sure. I wonder if it's possible to setup syslog so that it encrypts the data on storage, and if that would be a suitable solution e.g. for PCI DSS purposes. (It seems at least rsyslogd supports that.) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: