Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Sehrope Sarkuni
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id CAH7T-arosbP_JODsbh_qcLy_N-bo=1JnpqOLMsajh+=8XcLwKg@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
List pgsql-hackers
On Wed, Aug 7, 2019 at 1:39 PM Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Aug  7, 2019 at 11:41:51AM -0400, Sehrope Sarkuni wrote:
> On Wed, Aug 7, 2019 at 7:19 AM Bruce Momjian <bruce@momjian.us> wrote:
>
>     On Wed, Aug  7, 2019 at 05:13:31PM +0900, Masahiko Sawada wrote:
>     > I understood. IIUC in your approach postgres processes encrypt WAL
>     > records when inserting to the WAL buffer. So WAL data is encrypted
>     > even on the WAL buffer.
>
>
> I was originally thinking of not encrypting the shared WAL buffers but that may
> have issues. If the buffers are already encrypted and contiguous in shared
> memory, it's possible to write out many via a single pg_pwrite(...) call as is
> currently done in XLogWrite(...).

The shared buffers will not be encrypted --- they are encrypted only
when being written to storage.  We felt encrypting shared buffers will
be too much overhead, for little gain.  I don't know if we will encrypt
while writing to the WAL buffers or while writing the WAL buffers to
the file system.

My mistake on the wording. By "shared WAL buffers" I meant the shared memory used for WAL buffers, XLogCtl->pages. Not the shared buffers for pages.
 
> If they're not encrypted you'd need to do more work in that critical section.
> That'd involve allocating a commensurate amount of memory to hold the encrypted
> pages and then encrypting them all prior to the single pg_pwrite(...) call.
> Reusing one buffer is possible but it would require encrypting and writing the
> pages one by one. Both of those seem like a bad idea.

Well, right now the 8k pages is part of the WAL stream, so I don't know
it would be any more overhead than other WAL writes. 

The total work is the same but when it happens, memory usage, or number of syscalls could change.

Right now the XLogWrite(...) code can write many WAL pages at once via a single call to pg_pwrite(...): https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/xlog.c;h=f55352385732c6b0124eff5265462f3883fe7435;hb=HEAD#l2491

If the blocks are not encrypted then you either need to allocate and encrypt everything (could be up to wal_buffers max size) to do it as one write, or encrypt chunks of WAL and do multiple writes. I'm not sure how big an issue this would be in practice as it'd be workload specific. 
 
I am hoping we can
generate the encryption bit stream in chunks earlier so we can just to
the XOR was we are writing the data to the WAL buffers.

For pure CTR that sounds doable as it'd be the same as doing an XOR with encrypted zero. Anything with a built-in MAC like GCM would not work though (I'm not proposing we use that, just keeping it in mind).

You'd also increase your memory requirements (one allocation for the encrypted stream and one for the encrypted data right?).
 
> Better to pay the encryption cost at the time of WAL record creation and keep
> the writing process as fast and simple as possible.

Yes, I don't think we know at the time of WAL record creation what
_offset_ the records will have when then are written to WAL, so I am
thinking we need to do it later, and as I said, I am hoping we can
generate the encryption bit stream earlier.

>     > It works but I think the implementation might be complex; For example
>     > using openssl, we would use EVP functions to encrypt data by
>     > AES-256-CTR. We would need to make IV and pass it to them and these
>     > functions however don't manage the counter value of nonce as long as I
>     > didn't miss. That is, we need to calculate the correct counter value
>     > for each encryption and pass it to EVP functions. Suppose we encrypt
>     > 20 bytes of WAL. The first 16 bytes is encrypted with nonce of
>     > (segment_number, 0) and the next 4 bytes is encrypted with nonce of
>     > (segment_number, 1). After that suppose we encrypt 12 bytes of WAL. We
>     > cannot use nonce of (segment_number, 2) but should use nonce of
>     > (segment_number , 1). Therefore we would need 4 bytes padding and to
>     > encrypt it and then to throw that 4 bytes away .
>
>     Since we want to have per-byte control over encryption, for both
>     heap/index pages (skip LSN and CRC), and WAL (encrypt to the last byte),
>     I assumed we would need to generate a bit stream of a specified size and
>     do the XOR ourselves against the data.  I assume ssh does this, so we
>     would have to study the method.
>
>
> The lower level non-EVP OpenSSL functions allow specifying the offset within
> the 16-byte AES block from which the encrypt/decrypt should proceed. It's the
> "num" parameter of their encrypt/decrypt functions. For a continuous encrypted
> stream such as a WAL file, a "pread(...)" of a possibly non-16-byte aligned
> section would involve determining the 16-byte counter (byte_offset / 16) and
> the intra-block offset (byte_offset % 16). I'm not sure how one handles
> initializing the internal encrypted counter and that might be one more step
> that would need be done. But it's definitely possible to read / write less than
> a block via those APIs (not the EVP ones).
>
> I don't think the EVP functions have parameters for the intra-block offset but
> you can mimic it by initializing the IV/block counter and then skipping over
> the intra-block offset by either reading or writing a dummy partial block. The
> EVP read and write functions both deal with individual bytes so once you've
> seeked to your desired offset you can read or write the real individual bytes.

Can we generate the bit stream in 1MB chunks or something and just XOR
as needed?

With the provisos above, yes I think that would work though I don't think it's a good idea. Better to start off using the functions directly and then look into optimizing only if they're a bottleneck. As a first pass I'd break it up as separate writes with the encryption happening at write time. If that works fine there's no need to complicate things further.
 
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/ 

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Documentation clarification re: ANALYZE
Next
From: Andres Freund
Date:
Subject: Re: crash 11.5~ (and 11.4)