Re: Transparent Data Encryption (TDE) and encrypted files - Mailing list pgsql-hackers

From Moon, Insung
Subject Re: Transparent Data Encryption (TDE) and encrypted files
Date
Msg-id CAEMmqBvQ2bDJFdHbabM5Ti0MFfwpYrZV-NqkWaCwKe2mqUi2Uw@mail.gmail.com
Whole thread Raw
In response to Re: Transparent Data Encryption (TDE) and encrypted files  (Antonin Houska <ah@cybertec.at>)
Responses Re: Transparent Data Encryption (TDE) and encrypted files
List pgsql-hackers
Hello.

On Tue, Oct 8, 2019 at 8:52 PM Antonin Houska <ah@cybertec.at> wrote:
>
> Robert Haas <robertmhaas@gmail.com> wrote:
>
> > On Mon, Oct 7, 2019 at 3:01 PM Antonin Houska <ah@cybertec.at> wrote:
> > > However the design doesn't seem to be stable enough at the
> > > moment for coding to make sense.
> >
> > Well, I think the question is whether working further on your patch
> > could produce some things that everyone would agree are a step
> > forward.
>
> It would have made a lot of sense several months ago (Masahiko Sawada actually
> used parts of our patch in the previous version of his patch (see [1]), but
> the requirement to use a different IV for each execution of the encryption
> changes things quite a bit.
>
> Besides the relation pages and SLRU (CLOG), which are already being discussed
> elsewhere in the thread, let's consider other two file types:
>
> * Temporary files (buffile.c): we derive the IV from PID of the process that
>   created the file + segment number + block within the segment. This
>   information does not change if you need to write the same block again. If
>   new IV should be used for each encryption run, we can simply introduce an
>   in-memory counter that generates the IV for each block. However it becomes
>   trickier if the temporary file is shared by multiple backends. I think it
>   might still be easier to expose the IV values to other backends via shared
>   memory than to store them on disk ...

I think encrypt a temporary file in a slightly different way.
Previously, I had a lot of trouble with IV uniqueness, but I have
proposed a unique encryption key for each file.

First, in the case of the CTR mode to be used, 32 bits are used for
the counter in the 128-bit nonce value.
Here, the counter increases every time 16 bytes are encrypted, and
theoretically, if nonce 96 bits are the same, a total of 64 GiB can be
encrypted.

Therefore, in the case of buffile.c that creates a temporary file due
to lack of work_mem, it is possible to use up to 1GiB per file, so it
is possible to encrypt to a simple IV value sufficiently safely.
The problem is that a vulnerability occurs when 96-bit nonce values
excluding Counter are the same values.

I also tried to generate IV using PID (32bit) + tempCounter (64bit) at
first, but in the worst-case PID and tempCounter are used in the same
values.
Therefore, the uniqueness of the encryption key was considered without
considering the uniqueness of the IV value.

The encryption key uses a separate key for each file, as described earlier.
First, it generates a hash value randomly for the file, and uses the
hash value and KEK (or MDEK) to derive and use the key with
HMAC-SHA256.
In this case, there is no need to store the encryption key separately
if it is not necessary to keep it in a separate IV file or memory.
(IV is a hash value of 64 bits and a counter of 32 bits.)

Also, currently, the temporary file name is specified by the current
PID.tempFileCounter, but if this is set to
PID.tempFileCounter.hashvalue, we can encrypt and decrypt in any
process thinking about.

Reference URL
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption


>
> * "Buffered transient file". This is to be used instead of OpenTransientFile()
>   if user needs the option to encrypt the file. (Our patch adds this API to
>   buffile.c. Currently we use it in reorderbuffer.c to encrypt the data
>   changes produced by logical decoding, but there should be more use cases.)

Agreed.

Best regards.
Moon.

>
>   In this case we cannot keep the IVs in memory because user can close the
>   file anytime and open it much later. So we derive the IV by hashing the file
>   path. However if we should generate the IV again and again, we need to store
>   it on disk in another way, probably one IV value per block (PGAlignedBlock).
>
>   However since our implementation of both these file types shares some code,
>   it might yet be easier if the shared temporary file also stored the IV on
>   disk instead of exposing it via shared memory ...
>
> Perhaps this is what I can work on, but I definitely need some feedback.
>
> [1] https://www.postgresql.org/message-id/CAD21AoBjrbxvaMpTApX1cEsO=8N=nc2xVZPB0d9e-VjJ=YaRnw@mail.gmail.com
>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>
>



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: Remove pqsignal() from libpq's official exports list.
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Block level parallel vacuum