Re: Transparent column encryption - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Transparent column encryption
Date
Msg-id b12ab6ca-5a87-e79f-f03b-02b42c42e6bd@enterprisedb.com
Whole thread Raw
In response to Re: Transparent column encryption  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On 30.03.23 20:35, Stephen Frost wrote:
> I do feel that column encryption is a useful capability and there's
> large parts of this approach that I agree with, but I dislike the idea
> of having our clients be able to depend on what gets returned for
> non-encrypted columns while not being able to trust what encrypted
> column results are and then trying to say it's 'transparent'.  To that
> end, it seems like just saying they get back a bytea and making it clear
> that they have to provide the validation would be clear, while keeping
> much of the rest.

[Note that the word "transparent" has been removed from the feature 
name.  I just didn't change the email thread name.]

These thoughts are reasonable, but I think there is a tradeoff to be 
made between having featureful data validation and enhanced security. 
If you want your database system to validate your data, you have to send 
it in plain text.  If you want to have your database system not see the 
plain text, then it cannot validate it.  But there is still utility in it.

You can't really depend on what gets returned even in the non-encrypted 
case, unless you cryptographically sign the schema against modification 
or something like that.  So realistically, a client that cares strongly 
about the data it receives has to do some kind of client-side validation 
anyway.

Note also that higher-level client libraries like JDBC effectively do 
client-side data validation, for example when you call 
ResultSet.getInt() etc.

This is also one of the reasons why the user facing type declaration 
exists.  You could just make all encrypted columns of type "opaque" or 
something and not make any promises about what's inside.  But client 
APIs sort or rely on the application being able to ask the result set 
for what's inside a column value.  If we just say, we don't know, then 
applications (or driver APIs) will have to be changed to accommodate 
that, but the intention was to not require that.  So instead we say, 
it's supposed to be int, and then if it's sometimes actually not int, 
then your application throws an exception you can deal with.  This is 
arguably a better developer experience, even if it concerns the data 
type purist.

But do you have a different idea about how it should work?

> Expanding out from that I'd imagine, pie-in-the-sky
> and in some far off land, having our data type in/out validation
> functions moved to the common library and then adding client-side
> validation of the data going in/out of the encrypted columns would allow
> application developers to be able to trust what we're returning (as long
> as they're using libpq- and we'd have to document that independent
> implementations of the protocol have to provide this or just continue to
> return bytea's).

As mentioned, some client libraries effectively already do that.  But 
even if we could make this much more comprehensive, I don't see how this 
can ever actually satisfy your point.  It would require that all 
participating clients apply validation all the time, and all other 
clients can rely on that happening, which is impossible.



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Thoughts on using Text::Template for our autogenerated code?
Next
From: Tom Lane
Date:
Subject: Re: pgsql: Clean up role created in new subscription test.