Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
Date
Msg-id CA+Tgmoawc7hQW2SrcvVSfw8JyYV8UU4fyQ56nxKn_BtPhXfhdQ@mail.gmail.com
Whole thread Raw
In response to Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3  (Antonin Houska <ah@cybertec.at>)
Responses Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
List pgsql-hackers
On Thu, Mar 14, 2019 at 9:26 AM Antonin Houska <ah@cybertec.at> wrote:
> The point of #ifdef USE_ENCRYPTION is that we rely on OpenSSL, so the code
> needs to compile even w/o OpenSSL (of course the encryption won't be enabled
> in that case). I'll try to reduce the use of this construct only to the code
> blocks that really reference the OpenSSL functions.

That sounds like a good place to start.  The other part will be
avoiding referencing the OpenSSL functions from too many places.  For
example, right now we have secure_read() -- which I know some people
hate, or at least hate the name of] -- which lets you send data to the
client without having to care whether that data is going to be
SSL-encrypted on its way over the network.  One can argue about
whether that particular abstraction is properly-designed, but as Tom
recently felt the urge to remind me on another thread, the value of
abstraction in general is not in doubt.  I think we need one here, so
that the SSL-specific bits can be isolated in a relatively small
number of files, and then clients can ask to write a file, or write a
block to a file, or whatever, and ignore the question of whether that
is going to get encrypted somehow behind the scenes.

> Since buffile.c needs to handle this kind of problem (and it already does in
> the last patch version), I think even other than temporary files could be
> handled by this module. The patch below adds functions BufFileOpenTransient(),
> BufFileWriteTransient(), etc. that can replace OpenTransientFile, write(),
> etc. respectively. Once we implement the encryption, these functions will also
> hide handling of the encryption blocks from user. In this (preliminary) patch
> As an example, I applied this approach to ReorderBufferSerializeChange() and
> the function seems a bit simpler now. (ReorderBufferRestoreChange() would need
> more work to adopt this approach.)

Yeah, I don't have time to look at this in detail right now, but this
kind of thing seems like a promising approach to me.

> > The interaction of this capability with certain tricks that PostgreSQL
> > plays needs some thought -- and documentation, at least developer
> > documentation, maybe user documentation.  One area is recovery.
> > Writing WAL relies on the fact that if you are in the midst of
> > rewriting a block and the power fails, bytes that are the same in both
> > the old and new block images will be undisturbed; encryption will make
> > that false.  How's that going to work?
>
> Yes, if only a single bit is different, decryption will turn the whole
> encryption block (16 bytes) into garbage. So we need to enforce
> full_page_writes=on if encryption is enabled. The last version of the patch
> does not do that.

I believe that we need full_page_writes=on and wal_log_hints=on, but
they don't fix the problem that I'm describing.  Suppose that the very
last page in the write-ahead log contains 3.5kB of good data and 7kB
of zeroes.  Now, somebody comes along and writes a new write-ahead log
record that is 1k in size.  In memory, we update the block: it now
contains 4.5kB of good data and 6.5kB of zeroes.  Now, as we're
writing the block to do disk, the power is removed.  Consequently, the
first 4kB hits the platter and the other 4kB disappears into the void.

If the WAL is not encrypted, the state at this point is that the 3.5kB
of good data that we had previously is still there, and the first 500
bytes of the new WAL record are also there, and the rest is missing.
Recovery will detect that there's a valid-looking record at end of
WAL, but when it tries to checksum that record, it will fail, and so
that trailing half-record will just get ignored.  And that's fine.

If the WAL *is* encrypted, the state at this point is that the block
is unreadable, because the first 4kB of the block is the first half of
the bits that resulted from encrypting 8kB of data that includes the
new record, and the second 4kB of the block is the second half of the
bits that resulted from encrypting 8kB of data that did not include
the new record, and since the new record perturbed every bit on the
page with probability ~50%, what that means is you now have garbage.
That means that not only did we lose the new record, but we also lost
the 3.5kB of good data which the page previously contained.  That's
NOT ok.  Some of the changes associated with those WAL records may
have been flushed to disk already, or there may be commits in there
that were acknowledged to the client, and we can't just lose them.

One idea I have for fixing this problem is to encrypt the individual
WAL records rather than the blocks.  That leaks some information, of
course, but maybe not in a way that really matters.

> > Hint bits also rely on this principle.  I thought there might be some
> > interaction between this work and wal_log_hints for this reason, but I see
> > nothing of that sort in the patch.
>
> I'm not sure if the hint bit is still a problem if we first copy the shared
> buffer to backend-local memory and encrypt it there. That's what the patch
> does.

That has the same problem that I described in the previous paragraph,
except for the relation data files rather than the WAL itself.  See
the manual's description of wal_log_hints:

       <para>
        When this parameter is <literal>on</literal>, the
<productname>PostgreSQL</productname>
        server writes the entire content of each disk page to WAL during the
        first modification of that page after a checkpoint, even for
        non-critical modifications of so-called hint bits.
       </para>

For encryption to work, you need that.  A hint bit write might convert
the page to garbage, just as in the previous example, and this option
make sure that if that happens, there will be an FPW, allowing you to
recover... unless of course you also hit the problem I described
above, but assuming that's fixed somehow you still have this issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Ordered Partitioned Table Scans
Next
From: Pavel Stehule
Date:
Subject: Re: string_to_array, array_to_string function without separator