Re: Transparent Data Encryption (TDE) and encrypted files - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Transparent Data Encryption (TDE) and encrypted files
Date
Msg-id 20191004213100.i3ws54hpjsbalorc@development
Whole thread Raw
In response to Re: Transparent Data Encryption (TDE) and encrypted files  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Transparent Data Encryption (TDE) and encrypted files
List pgsql-hackers
On Fri, Oct 04, 2019 at 03:57:32PM -0400, Bruce Momjian wrote:
>On Fri, Oct  4, 2019 at 09:18:58AM -0400, Robert Haas wrote:
>> I think everyone would agree that if you have no information about a
>> database other than the contents of pg_clog, that's not a meaningful
>> information leak. You would be able to tell which transactions
>> committed and which transactions aborted, but since you know nothing
>> about the data inside those transactions, it's of no use to you.
>> However, in that situation, you probably wouldn't be attacking the
>> database in the first place. Most likely you have some knowledge about
>> what it contains. Maybe there's a stream of sensor data that flows
>> into the database, and you can see that stream.  By watching pg_clog,
>> you can see when a particular bit of data is rejected. That could be
>> valuable.
>
>It is certainly true that seeing activity in _any_ cluster file could
>leak information.  However, even if we encrypted all the cluster files,
>bad actors could still get information by analyzing the file sizes and
>size changes of relation files, and the speed of WAL creation, and even
>monitor WAL for write activity (WAL file byte changes).  I would think
>that would leak more information than clog.
>

Yes, those information leaks seem unavoidable. 

>I am not sure how you could secure against that information leak.  While
>file system encryption might do that at the storage layer, it doesn't do
>anything at the mounted file system layer.
>

That's because FDE is only meant to protect against passive attacker,
essentially stealing the device. It's useless when someone gains access
to a mounted disk, so these information leaks are irrelevant.

(I'm only talking about encryption at the block device level. I'm not
sure about details e.g. for the encryption built into ext4, etc.)

>The current approach is to encrypt anything that contains user data,
>which includes heap, index, and WAL files.  I think replication slots
>and logical replication might also fall into that category, which is why
>I started this thread.
>

Yes, I think those bits have to be encrypted too.

BTW I'm not sure why you list replication slots and logical replication
independently, those are mostly the same thing I think. For physical
slots we probably don't need to encrypt anything, but for logical slots
we may spill decoded data to files (so those will contain user data).

>I can see some saying that all cluster files should be encrypted, and I
>can respect that argument.  However, as outlined in the diagram linked
>to from the blog entry:
>
>    https://momjian.us/main/blogs/pgblog/2019.html#September_27_2019
>
>I feel that TDE, since it has limited value, and can't really avoid all
>information leakage, should strive to find the intersection of ease of
>implementation, security, and compliance.  If people don't think that
>limited file encryption is secure, I get it.  However, encrypting most
>or all files I think would lead us into such a "difficult to implement"
>scope that I would not longer be able to work on this feature.  I think
>the code complexity, fragility, potential unreliability, and even
>overhead of trying to encrypt most/all files would lead TDE to be
>greatly delayed or never implemented.  I just couldn't recommend it.
>Now, I might be totally wrong, and encryption of everything might be
>just fine, but I have to pick my projects, and such an undertaking seems
>far too risky for me.
>

I agree some trade-offs will be needed, to make the implementation at
all possible (irrespectedly of the exact design). But I think those
trade-offs need to be conscious, based on some technical arguments why
it's OK to consider a particular information leak acceptable, etc. For
example it may be fine when assuming the attacker only gets a single
static copy of the data directory, but not when having the ability to
observe changes made by a running instance.

In a way, my concern is somehat the opposite of yours - that we'll end
up with a feature (which necessarily adds complexity) that however does
not provide sufficient security for various use cases.

And I don't know where exactly the middle ground is, TBH.

>Just for some detail, we have solved the block-level encryption problem
>by using CTR mode in most cases, but there is still a requirement for a
>nonce for every encryption operation.  You can use derived keys too, but
>you need to set up those keys for every write to encrypt files.  Maybe
>it is possible to set up a write API that handles this transparently in
>the code, but I don't know how to do that cleanly, and I doubt if the
>value of encrypting everything is worth it.
>
>As far as encrypting the log file, I can see us adding documentation to
>warn about that, and even issue a server log message if encryption is
>enabled and syslog is not being used.  (I don't know how to test if
>syslog is being shipped to a remote server.)
>

Not sure. I wonder if it's possible to setup syslog so that it encrypts
the data on storage, and if that would be a suitable solution e.g. for
PCI DSS purposes. (It seems at least rsyslogd supports that.)


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays
Next
From: Bruce Momjian
Date:
Subject: Re: format of pg_upgrade loadable_libraries warning