Re: Transparent column encryption - Mailing list pgsql-hackers

From Jacob Champion
Subject Re: Transparent column encryption
Date
Msg-id 4e00941c1028e6760da23e8f0ed42f2873bdf923.camel@vmware.com
Whole thread Raw
In response to Re: Transparent column encryption  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Transparent column encryption  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Wed, 2021-12-08 at 02:58 +0100, Tomas Vondra wrote:
> 
> On 12/8/21 00:26, Jacob Champion wrote:
> > On Tue, 2021-12-07 at 22:21 +0100, Tomas Vondra wrote:
> > > IMO it's impossible to solve this attack within TCE, because it requires 
> > > ensuring consistency at the row level, but TCE obviously works at column 
> > > level only.
> > 
> > I was under the impression that clients already had to be modified to
> > figure out how to encrypt the data? If part of that process ends up
> > including enforcement of encryption for a specific column set, then the
> > addition of AEAD data could hypothetically be part of that hand-
> > waviness.
> 
> I think "transparency" here means the client just uses the regular
> prepared-statement API without having to explicitly encrypt/decrypt any
> data. The problem is we can't easily tie this to other columns in the
> table, because the client may not even know what values are in those
> columns.

The way I originally described my request -- "I'd like to be able to
tie an encrypted value to other column (or external) data" -- was not
very clear.

With my proposed model -- where the DBA (and the server) are completely
untrusted, and the DBA needs to be prevented from using the encrypted
value -- I don't think there's a useful way for the client to use
associated data that comes from the server. The client has to know what
the AD should be beforehand, because otherwise the DBA can make it so
the server returns whatever is correct.

> Imagine you do this
> 
>   UPDATE t SET encrypted_column = $1 WHERE another_column = $2;
> 
> but you want to ensure the encrypted value belongs to a particular row
> (which may or may not be identified by the another_column value). How
> would the client do that? Should it fetch the value or what?
> 
> Similarly, what if the client just does
> 
>   SELECT encrypted_column FROM t;
> 
> How would it verify the values belong to the row, without having all the
> data for the row (or just the required columns)?

So with my (hopefully more clear) model above, it wouldn't. The client
would already have the AD, and somehow tell libpq what that data was
for the query.

The rabbit hole I led you down is one where we use the rest of the row
as AD, to try to freeze pieces of it in place. That might(?) have some
useful security properties (if the client defines its use and doesn't
defer to the server). But it's not what I intended to propose and I'd
have to think about that case some more.

In my credit card example, I'm imagining something like (forgive the
contrived syntax):

    SELECT address, :{aead(users.credit_card, 'user@example.com')}
      FROM users WHERE email = 'user@example.com';

    UPDATE users
       SET :{aead(users.credit_card, 'user@example.com')} = '1234-...'
     WHERE email = 'user@example.com';

The client explicitly links a table's column to its AD for the duration
of the query. This approach can't scale to

    SELECT credit_card FROM users;

because in this case the AD for each row is different, but I'd argue
that's ideal for this particular case. The client doesn't need to (and
probably shouldn't) grab everyone's credit card details all at once, so
there's no reason to optimize for it.

> > Unless "transparent" means that the client completely defers to the
> > server on whether to encrypt or not, and silently goes along with it if
> > the server tells it not to encrypt?
> I think that's probably a valid concern - a "bad DBA" could alter the
> table definition to not contain the "ENCRYPTED" bits, and then peek at
> the plaintext values.
> 
> But it's not clear to me how exactly would the AEAD prevent this?
> Wouldn't that be also specified on the server, somehow? In which case
> the DBA could just tweak that too, no?
>
> In other words, this issue seems mostly orthogonal to the AEAD, and the
> right solution would be to allow the client to define which columns have
> to be encrypted (in which case altering the server definition would not
> be enough).

Right, exactly. When I mentioned AEAD I had assumed that "allow the
client to define which columns have to be encrypted" was already
planned or in the works; I just misunderstood pieces of Peter's email.
It's that piece where a client would probably have to add details
around AEAD and its use.

> > That would only protect against a
> > _completely_ passive DBA, like someone reading unencrypted backups,
> > etc. And that still has a lot of value, certainly. But it seems like
> > this prototype is very close to a system where the client can reliably
> > secure data even if the server isn't trustworthy, if that's a use case
> > you're interested in.
> 
> Right. IMHO the "passive attacker" is a perfectly fine model for use
> cases that would be fine with e.g. pgcrypto if there was no risk of
> leaking plaintext values to logs, system catalogs, etc.
> 
> If we can improve it to provide (at least some) protection against
> active attackers, that'd be a nice bonus.

I agree that resistance against offline attacks is a useful step
forward (it seems to be a strict improvement over pgcrypto). I have a
feeling that end users will *expect* some protection against online
attacks too, since an evil DBA is going to be well-positioned to do
exactly that.

> > > It's probably possible to get something like this (row-level AEAD) by 
> > > encrypting enriched data, i.e. not just the card number, but {user ID, 
> > > card number} or something like that, and verify that in the webapp. The 
> > > problem of course is that the "user ID" is just another column in the 
> > > table, and there's nothing preventing the DBA from modifying that too.
> > 
> > Right. That's why the client has to be able to choose AD according to
> > the application. In my previous example, the victim's email address can
> > be copied by the DBA, but they wouldn't be able to authenticate as that
> > user and couldn't convince the client to use the plaintext on their
> > behalf.
> 
> Well, yeah. But I'm not sure how to make that work easily, because the
> client may not have the data :-(
> 
> I was thinking about using a composite data type combining the data with
> the extra bits - that'd not be all that transparent as it'd require the
> client to build this manually and then also cross-check it after loading
> the data. So the user would be responsible for having all the data.
> 
> But doing that automatically/transparently seems hard, because how would
> you deal e.g. with SELECT queries reading data through a view or CTE?
> 
> How would you declare this, either at the client or server?

I'll do some more thinking on the case you're talking about here, where
pieces of the row are transparently tied together.

> Do any other databases have this capability? How do they do it?

BigQuery advertises AEAD support. I don't think their model is the same
as ours, though; from the docs it looks like it's essentially pgcrypto,
where you tell the server to encrypt stuff for you.

--Jacob

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Fix typos - "an" instead of "a"
Next
From: "David G. Johnston"
Date:
Subject: Re: Fix typos - "an" instead of "a"