Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3 |
Date | |
Msg-id | CA+TgmoawhAmdVaSYt3YLA5dsgoc-oXf72O3ybFAqiARfyTL4iA@mail.gmail.com Whole thread Raw |
In response to | Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3 (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
|
List | pgsql-hackers |
On Fri, Sep 14, 2018 at 10:04 AM Antonin Houska <ah@cybertec.at> wrote: > Toshi Harada <harada.toshi@po.ntt-tx.co.jp> wrote: > > Even if you apply "data-at-rest-encryption-wip-2018.07.25.patch" to the master branch, > > the same patch error will occur. > > The version attached to this email should be applicable to the current > branch. Sorry for the delay. I have a few comments on this patch. It doesn't seem useful to go through and make really detailed review comments on everything at this stage, because it's pretty far from being committable, at least IMHO. However, I have a few general thoughts. First of all, this is an impressive piece of work. It's obvious even from looking at this quickly that a lot of energy has gone into making this work, and it touches a lot of stuff, not without reason. For example, it's quite impressive that this touches things like the write-ahead log, logical decoding, and the stats collector, all of which write to disk. However, this is also quite invasive. It changes a lot of code and it doesn't do so in a very predictable way. It's not like you went through and replaced every call to write() with a call to SpecialEncyptionMagicWrite(). Rather, there are new #ifdef USE_ENCRYPTION blocks all over the code base. I think it will be important to look for ways to refactor this functionality to reduce that sort of thing to the absolute minimum possible. Encryption needs to be something that the code needs to know about here and there, but not everywhere. Some of the changes are things that could perhaps be done as separate, preparatory patches. For instance, pgstat.c gets a heavy refactoring to make it do I/O in bigger chunks. There's no reason that you couldn't separate some of those changes out, clean them up, and get them committed separately. It would be more efficient even as things stand. OK, well, there is one reason: we may be getting a shared-memory stats collector soon, and if we do, then the need for all of this will go away. But the general point is valid: whatever is a useful cleanup apart from this patch should be done separately from this patch. As far as possible, we need to try to do things in the same way in the encrypted and non-encrypted cases. For example, it's pretty hard to feel good about code like this: + sz_hdr = sizeof(ReorderBufferDiskChange); + if (data_encrypted) +#ifdef USE_ENCRYPTION + sz_hdr = TYPEALIGN(ENCRYPTION_BLOCK, sz_hdr); +#else + ENCRYPTION_NOT_SUPPORTED_MSG; +#endif /* USE_ENCRYPTION */ You won't be able to have much confidence that this code works both with and without encryption unless you test it both ways, because the file format is different in those two cases, and not just by being encrypted. That means that anyone who modifies this code in the future has two ways of breaking it. They can break the normal case, or they can break the encrypted case. If encryption were available as a sort of service that everyone could use, then that would probably be fine, but like I said above, I think something as invasive as this currently is will lead to a lot of complaints. The code needs a lot more documentation, not only SGML documentation but also code comments and probably a README file someplace. There is a lot of discussion in the comments here of encryption tweaks, but there's no general introduction to the topic (what's a tweak?) or -- and I think this is important -- what the idea between the choice of various tweaks in different places actually is. There's probably some motivating philosophy behind the way the tweaks are chosen that should be explained someplace. Think about it this way: if some hapless developer, let's say me, wants to add a new module to PostgreSQL which happens to write data to disk, that person needs an easy way to understand what they need to do to make that new code play nice with encryption. Right now, the code for every existing module is a little different (uggh) and there's this encryption tweak stuff that you about which you have to do something intelligent, but there's nothing that tells you how to be intelligent about it, at least not that I see. People are going to need something that looks more like a formula that they can just copy, and a good introduction to the principles in some appropriate place, too. The interaction of this capability with certain tricks that PostgreSQL plays needs some thought -- and documentation, at least developer documentation, maybe user documentation. One area is recovery. Writing WAL relies on the fact that if you are in the midst of rewriting a block and the power fails, bytes that are the same in both the old and new block images will be undisturbed; encryption will make that false. How's that going to work? Hint bits also rely on this principle. I thought there might be some interaction between this work and wal_log_hints for this reason, but I see nothing of that sort in the patch. Full page writes seem related too; I don't know how something like this can be safe without both full_page_writes and wal_log_hints forced on. I don't know whether using encryption should forcibly override those settings or whether it should just refuse to start unless they are set properly, but I think it probably needs to do one or the other. Hope this helps in some way. I think it would be good if this effort went forward in some way. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: