Re: [PATCH] buffile: ensure start offset is aligned with BLCKSZ - Mailing list pgsql-hackers

From Sasasu
Subject Re: [PATCH] buffile: ensure start offset is aligned with BLCKSZ
Date
Msg-id cc3110c5-b15c-3fa5-ad83-9dd7193d10a8@sasa.su
Whole thread Raw
In response to Re: [PATCH] buffile: ensure start offset is aligned with BLCKSZ  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
On 2021/11/29 18:05, Antonin Houska wrote:
> Does this test really pass regression tests? In BufFileRead(), I would
> understand if you did
>
> +            file->pos = offsetInBlock;
> +            file->curOffset -= offsetInBlock;
>
> rather than
>
> +            file->pos += offsetInBlock;
> +            file->curOffset -= offsetInBlock;

It pass all regression tests. this patch is compatible with
BufFileSeek().

to generate a correct alignment, we need to make sure
  pos_new + offset_new = pos_old + offset_old
            offset_new = offset_old - offset_old % BLCKSZ
it means
  pos_new = pos_old + offset_old % BLCKSZ
          = pos_old + "offsetInBlock"

with your code, backend will read a wrong buffile at the end of buffile
reading. for example: physical file size = 20 and pos = 10, off = 10,
read start at 20. after the '=' code: pos = 10, off = 0, read start at
10, which is wrong.

> Anyway, BufFileDumpBuffer() does not seem to enforce curOffset to end up at
> block boundary, not to mention BufFileSeek().
>
> When I was implementing this for our fork [1], I concluded that the encryption
> code path is too specific, so I left the existing code for the unecrypted data
> and added separate functions for the encrypted data.
>
> One specific thing is that if you encrypt and write n bytes, but only need
> part of it later, you need to read and decrypt exactly those n bytes anyway,
> otherwise the decryption won't work. So I decided not only to keep curOffset
> at BLCKSZ boundary, but also to read / write BLCKSZ bytes at a time. This also
> makes sense if the scope of the initialization vector (IV) is BLCKSZ bytes.
>
> Another problem is that you might want to store the IV somewhere in between
> the data. In short, the encryption makes the buffered IO rather different and
> the specific code should be kept aside, although the same API is used to
> invoke it.
>
but I want to make less change on existed code. with this path. the only
code added to critical code path is this:

diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index 3be08eb723..ceae85584b 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -512,6 +512,9 @@ BufFileDumpBuffer(BufFile *file)
         /* and the buffer is aligned with BLCKSZ */
         Assert(file->curOffset % BLCKSZ == 0);

+        /* encrypt before write */
+        TBD_ENC(file->buffer.data + wpos /* buffer */, bytestowrite /* size */, file->curOffset /* context to find IV
*/);
+
         thisfile = file->files[file->curFile];
         bytestowrite = FileWrite(thisfile,
                                  file->buffer.data + wpos,
@@ -582,6 +585,9 @@ BufFileRead(BufFile *file, void *ptr, size_t size)
             BufFileLoadBuffer(file);
             if (file->nbytes <= 0 || (file->nbytes == file->pos && file->nbytes != BLCKSZ))
                 break;            /* no more data available */
+
+            /* decrypt after read */
+            TBD_DEC(file->buffer /* buffer */, file->nbytes /* size */, file->curOffset /* context to find IV */);
         }

         nthistime = file->nbytes - file->pos;

those change will allow TDE to use any encryption algorithm (read offset
and write offset are matched) and implement on-the-fly IV generation.

Attachment

pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: parallel vacuum comments
Next
From: Amit Kapila
Date:
Subject: Re: parallel vacuum comments