Re: Moving forward with TDE - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Moving forward with TDE |
Date | |
Msg-id | CAOuzzgqvzgWiQfc97dFUE6q3G0vtZO4nZCo-OUCRmG-gU3+KxA@mail.gmail.com Whole thread Raw |
In response to | Re: Moving forward with TDE (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Moving forward with TDE
|
List | pgsql-hackers |
Greetings,
On Mon, Mar 27, 2023 at 18:17 Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar 28, 2023 at 12:01:56AM +0200, Stephen Frost wrote:
> Greetings,
>
> On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Mar 8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:
> > Agreed, though the latest efforts include an option for *authenticated*
> > encryption as well as unauthenticated. That makes it much more
> > difficult to make undetected changes to the data that's protected by
> > the authenticated encryption being used.
>
> I thought some more about this. GCM-style authentication of encrypted
> data has value because it assumes the two end points are secure but that
> a malicious actor could modify data during transfer. In the Postgres
> case, it seems the two end points and the transfer are all in the same
> place. Therefore, it is unclear to me the value of using GCM-style
> authentication because if the GCM-level can be modified, so can the end
> points, and the encryption key exposed.
>
>
> What are the two end points you are referring to and why don’t you feel there
> is an opportunity between them for a malicious actor to attack the system?
Uh, TLS can use GCM and in this case you assume the sender and receiver
are secure, no?
TLS does use GCM.. pretty much exclusively as far as I can recall. So do a lot of other things though..
> There are simpler cases to consider than an online attack on a single
> independent system where an attacker having access to modify the data in
> transit between PG and the storage would imply the attacker also having access
> to read keys out of PG’s memory.
I consider the operating system and its processes as much more of a
single entity than TLS over a network.
This may be the case sometimes but there’s absolutely no shortage of other cases and it’s almost more the rule these days, that there is some kind of network between the OS processes and the storage- a SAN, an iSCSI network, NFS, are all quite common.
> As specific examples, consider:
>
> An attack against the database system where the database server is shut down,
> or a backup, and the encryption key isn’t available on the system.
>
> The backup system itself, not running as the PG user (an option supported by PG
> and at least pgbackrest) being compromised, thus allowing for injection of
> changes into a backup or into a restore.
I then question why we are not adding encryption to pg_basebackup or
pgbackrest rather than the database system.
Pgbackrest has encryption and authentication of it … but that doesn’t actually address the attack vector that I outlined. If the backup user is compromised then they can change the data before it gets to the storage. If the backup user is compromised then they have access to whatever key is used to encrypt and authenticate the backup and therefore can trivially manipulate the data.
Encryption of backups by the backup tool serves to protect the data after it leaves the backup system and is stored in cloud storage or in whatever format the repository takes. This is beneficial, particularly when the data itself offers no protection, but simply not the same.
> The beginning of this discussion also very clearly had individuals voicing
> strong opinions that unauthenticated encryption methods were not acceptable as
> an end-state for PG due to the clear issue of there then being no protection
> against modification of data. The approach we are working towards provides
What were the _technical_ reasons for those objections?
I believe largely the ones I’m bringing up here and which I outline above… I don’t mean to pretend that any of this is of my own independent construction. I don’t believe it is and my apologies if it came across that way.
> both the unauthenticated option, which clearly has value to a large number of
> our collective user base considering the number of commercial implementations
> which have now arisen, and the authenticated solution which goes further and
> provides the level clearly expected of the PG community. This gets us a win-win
> situation.
>
> > There's clearly user demand for it as there's a number of organizations
> > who have forks which are providing it in one shape or another. This
> > kind of splintering of the community is actually an actively bad thing
> > for the project and is part of what killed Unix, by at least some pretty
> > reputable accounts, in my view.
>
> Yes, the number of commercial implementations of this is a concern. Of
> course, it is also possible that those commercial implementations are
> meeting checkbox requirements rather than technical ones, and the
> community has been hostile to check box-only features.
>
>
> I’ve grown weary of this argument as the other major piece of work it was
> routinely applied to was RLS and yet that has certainly been seen broadly as a
> beneficial feature with users clearly leveraging it and in more than some
> “checkbox” way.
RLS has to overcome that objection, and I think it did, as was better
for doing that.
Beyond it being called a checkbox - what were the arguments against it? I don’t object to being challenged to point out the use cases, but I feel that at least some very clear and straight forward ones are outlined from what has been said above. I also don’t believe those are the only ones but I don’t think I could enumerate every use case for RLS either, even after seeing it used for quite a few years. I do seriously question the level of effort expected of features that are claimed to be “Checkbox” and tossed almost exclusively for that reason on this list given the success of the ones that have been accepted and are in active use by our users today.
> We, as a community, are clearly losing value by lack of this capability, if by
> no other measure than simply the numerous users of the commercial
> implementations feeling that they simply can’t use PG without this feature, for
> whatever their reasoning.
That is true, but I go back to my concern over useful feature vs. check
box.
While it’s easy to label something as checkbox, I don’t feel we have been fair to our users in doing so as it has historically prevented features which our users are demanding and end up getting from commercial providers until we implement them ultimately anyway. This particular argument simply doesn’t seem to actually hold the value that proponents of it claim, for us at least, and we have clear counter-examples which we can point to and I hope we learn from those.
Thanks!
Stephen
pgsql-hackers by date: