Re: Transparent column encryption - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Transparent column encryption
Date
Msg-id CA+Tgmoaukq-Ho=8AMp6tSS5py=05gm1KNkRm6Le9akjO-4Q+zQ@mail.gmail.com
Whole thread Raw
In response to Re: Transparent column encryption  (Peter Eisentraut <peter@eisentraut.org>)
Responses Re: Transparent column encryption  (Jelte Fennema-Nio <postgres@jeltef.nl>)
List pgsql-hackers
On Wed, Apr 10, 2024 at 6:13 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> Obviously, it's early days, so there will be plenty of time to have
> discussions on various other aspects of this patch.  I'm keeping a keen
> eye on the discussion of protocol extensions, for example.

I think the way that you handled that is clever, and along the lines
of what I had in mind when I invented the _pq_ stuff.

More specifically, the way that the ColumnEncryptionKey and
ColumnMasterKey messages are handled is exactly the way that I was
imagining things would work. The client uses _pq_.column_encryption to
signal that it can understand those messages, and the server responds
by including them. I assume that if the client doesn't signal
understanding, then the server simply omits sending those messages. (I
have not checked the code.)

I'm less certain about the changes to the ParameterDescription and
RowDescription messages. I see a couple of potential problems. One is
that, if you say you can understand column encryption messages, the
extra fields are included even for unencrypted columns. The client
must choose at connection startup whether it ever wishes to read any
encrypted data; if so, it pays a portion of that overhead all the
time. Another potential problem is with the scalability of this
design. Suppose that we could not only encrypt columns, but also
compress, fold, mutilate, and spindle them. Then there might end up
being a dizzying array of variation in the format of what is supposed
to be the same message. Perhaps it's not so bad: as long as the
documentation is clear about in which order the additional fields will
appear in the relevant messages when more than one relevant feature is
used, it's probably not too difficult for clients to cope. And it is
probably also true that the precise size of, say, a RowDescription
message will rarely be performance-critical. But another thought is
that we might try to redesign this so that we simply add more message
types rather than mutating message types i.e. after sending the
RowDescription message, if any columns are encrypted, we additionally
send a RowEncryptionDescription message. Then this treatment becomes
symmetric with the handling of ColumnEncryptionKey and ColumnMasterKey
messages, and there's no overhead when the feature is unused.

With regard to the Bind message, I suggest that we regard the protocol
change as reserving a currently-unused bit in the message to indicate
whether the value is pre-encrypted, without reference to the protocol
extension. It could be legal for a client that can't understand
encryption message from the server to supply an encrypted value to be
inserted into a column. And I don't think we would ever want the bit
that's being reserved here to be used by some other extension for some
other purpose, even when this extension isn't used. So I don't see a
need for this to be tied into the protocol extension.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Speed up clean meson builds by ~25%
Next
From: Justin Pryzby
Date:
Subject: Re: pg17 issues with not-null contraints