Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3 - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3 |
Date | |
Msg-id | 14646.1552570130@localhost Whole thread Raw |
In response to | Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
|
List | pgsql-hackers |
Thanks for feedback! Robert Haas <robertmhaas@gmail.com> wrote: > However, this is also quite invasive. It changes a lot of code and it > doesn't do so in a very predictable way. It's not like you went > through and replaced every call to write() with a call to > SpecialEncyptionMagicWrite(). Rather, there are new #ifdef > USE_ENCRYPTION blocks all over the code base. I think it will be > important to look for ways to refactor this functionality to reduce > that sort of thing to the absolute minimum possible. Encryption needs > to be something that the code needs to know about here and there, but > not everywhere. The point of #ifdef USE_ENCRYPTION is that we rely on OpenSSL, so the code needs to compile even w/o OpenSSL (of course the encryption won't be enabled in that case). I'll try to reduce the use of this construct only to the code blocks that really reference the OpenSSL functions. > Some of the changes are things that could perhaps be done as separate, > preparatory patches. For instance, pgstat.c gets a heavy refactoring > to make it do I/O in bigger chunks. There's no reason that you > couldn't separate some of those changes out, clean them up, and get > them committed separately. It would be more efficient even as things > stand. OK, well, there is one reason: we may be getting a > shared-memory stats collector soon, and if we do, then the need for > all of this will go away. But the general point is valid: whatever is > a useful cleanup apart from this patch should be done separately from > this patch. We'll omit the stats collector related changes so far because the patch [1] is under active development, and that would require us to rebase our patch often. And if the stats files will eventually go away, we won't need to encrypt the statistics. > As far as possible, we need to try to do things in the same way in the > encrypted and non-encrypted cases. For example, it's pretty hard to > feel good about code like this: > > + sz_hdr = sizeof(ReorderBufferDiskChange); > + if (data_encrypted) > +#ifdef USE_ENCRYPTION > + sz_hdr = TYPEALIGN(ENCRYPTION_BLOCK, sz_hdr); > +#else > + ENCRYPTION_NOT_SUPPORTED_MSG; > +#endif /* USE_ENCRYPTION */ > > You won't be able to have much confidence that this code works both > with and without encryption unless you test it both ways, because the > file format is different in those two cases, and not just by being > encrypted. That means that anyone who modifies this code in the > future has two ways of breaking it. They can break the normal case, > or they can break the encrypted case. If encryption were available as > a sort of service that everyone could use, then that would probably be > fine, but like I said above, I think something as invasive as this > currently is will lead to a lot of complaints. The problem I tried to solve here is that the whole encryption block (16 bytes) needs to be read and decrypted even if you need only part of its data. In the snippet above I tried to ensure that the whole blocks are always read, but I agree this approach is fragile. Since buffile.c needs to handle this kind of problem (and it already does in the last patch version), I think even other than temporary files could be handled by this module. The patch below adds functions BufFileOpenTransient(), BufFileWriteTransient(), etc. that can replace OpenTransientFile, write(), etc. respectively. Once we implement the encryption, these functions will also hide handling of the encryption blocks from user. In this (preliminary) patch As an example, I applied this approach to ReorderBufferSerializeChange() and the function seems a bit simpler now. (ReorderBufferRestoreChange() would need more work to adopt this approach.) > The code needs a lot more documentation, not only SGML documentation > but also code comments and probably a README file someplace. ok, we'll do that. > The interaction of this capability with certain tricks that PostgreSQL > plays needs some thought -- and documentation, at least developer > documentation, maybe user documentation. One area is recovery. > Writing WAL relies on the fact that if you are in the midst of > rewriting a block and the power fails, bytes that are the same in both > the old and new block images will be undisturbed; encryption will make > that false. How's that going to work? Yes, if only a single bit is different, decryption will turn the whole encryption block (16 bytes) into garbage. So we need to enforce full_page_writes=on if encryption is enabled. The last version of the patch does not do that. > Hint bits also rely on this principle. I thought there might be some > interaction between this work and wal_log_hints for this reason, but I see > nothing of that sort in the patch. I'm not sure if the hint bit is still a problem if we first copy the shared buffer to backend-local memory and encrypt it there. That's what the patch does. > Full page writes seem related too; I don't know how something like this can > be safe without both full_page_writes and wal_log_hints forced on. I don't > know whether using encryption should forcibly override those settings or > whether it should just refuse to start unless they are set properly, but I > think it probably needs to do one or the other. Maybe it's more polite to refuse to start so that user knows what's going on. I'm not sure if PG ever changes any configuration variable forcibly. > Hope this helps in some way. I think it would be good if this effort > went forward in some way. Sure, it helps! Thanks. [1] https://commitfest.postgresql.org/22/1708/ -- Antonin Houska https://www.cybertec-postgresql.com
Attachment
pgsql-hackers by date: