Re: Key management with tests - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Key management with tests |
Date | |
Msg-id | 20210201233132.GZ27507@tamriel.snowman.net Whole thread Raw |
In response to | Re: Key management with tests (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Key management with tests
|
List | pgsql-hackers |
Greetings, * Bruce Momjian (bruce@momjian.us) wrote: > On Fri, Jan 29, 2021 at 05:40:37PM -0500, Stephen Frost wrote: > > I hope it's pretty clear that I'm also very much in support of both this > > effort with the KMS and of TDE in general- TDE is specifically, > > Yes, thanks. I know we have privately talked about this recently, but > it is nice to have it in public like this. Certainly happy to lend my support and to spend some time working on this to move it forward. > > repeatedly, called out as a capability whose lack is blocking PG from > > being able to be used for certain use-cases that it would otherwise be > > well suited for, and that's really unfortunate. > > So, below, I am going to copy two doc paragraphs from the patch: > > The purpose of cluster file encryption is to prevent users with read > access to the directories used to store database files and write-ahead > log files from being able to access the data stored in those files. > For example, when using cluster file encryption, users who have read > access to the cluster directories for backup purposes will not be able > to decrypt the data stored in these files. It also protects against > decrypted data access after media theft. That's one valid use-case and it particularly makes sense to consider, now that we support group read-access to the data cluster. The last line seems a bit unclear- I would update it to say: Cluster file encryption also provides data-at-rest security, protecting users from data loss should the physical media on which the cluster is stored be stolen, improperly deprovisioned (not wiped or destroyed), or otherwise ends up in the hands of an attacker. > File system write access can allow for unauthorized file system data > decryption if the writes can be used to weaken the system's security > and this weakened system is later supplied with externally-stored keys. This isn't very clear as to exactly what the concern is or how an attacker would be able to thwart the system if they had write access to it. An attacker with write access could possibly attempt to replace the existing keys, but with the key wrapping that we're using, that should result in just a decryption failure (unless, of course, the attacker has the actual KEK that was used, but that's not terribly interesting to worry about since then they could just go access the files directly). Until and unless we solve the issue around storing the GCM tags for each page, we will have the risk that an attacker could modify a page in a manner that we wouldn't detect. This is the biggest concern that I have currently with the existing TDE patch sets. There's two options that I see around how to address that issue- either we arrange to create space in the page for the tag, such as by making the 'special' space on a page a bit bigger and making sure that everything understands that, or we'll need to add another fork in which we store the tags (and possibly other TDE/encryption related information). If we go with a fork then it should be possible to do WAL streaming from an unencrypted cluster to an encrypted one, which would be pretty neat, but it means another fork and another page that has to be read/written every time we modify a page. Getting some input into the trade-offs here would be really helpful. I don't think it's really reasonable to go out with TDE without having figured out the integrity side. Certainly, when I review things like NIST 800-53, it's very clear that the requirement is for both confidentiality *and* integrity. > This also does not protect from users who have read access to system > memory. This also does not detect or protect against users with write > access from removing or modifying database files. The last seems a bit obvious, but the first sentence quoted above is important to make clear. I might even say: All of the pages in memory and all of the keys which are used for the encryption and decryption are stored in the clear in memory and therefore an attacker who is able to read the memory allocated by PostgreSQL would be able to decrypt the enitre cluster. > Given what I said above, is the value of this feature for compliance, or > for actual additional security? If it just compliance, are we willing > to add all of this code just for that, even if it has limited security > value? We should answer this question now, and if we don't want it, > let's document that so users know and can consider alternatives. The feature is for both compliance and additional security. While there are other ways to achieve data-at-rest encryption, they are not always available, for a variety of reasons. > FYI, I don't think we can detect or protect against writers modifying > the data files --- even if we could do it on a block level, they could > remove trailing pages (might cause index lookup failures) or copy > pages from other tables at the same offset. Therefore, I think we can > only offer viewing security, not modification detection/prevention. Protecting against file modification isn't about finding some way to make it so that an attacker isn't able to modify the files, it's about detecting the case where an unauthorized modification has happened. Clearly if an attacker has gained write access to the system then we can't protect against the attacker using the access they've gained, but we can in most cases detect it and that's what we should be doing. It would be really unfortunate to end up with a solution here that only provides confidentiality and doesn't address integrity at all, and I don't really think it's *that* hard to do both. That said, if we must work at this in pieces and we can get agreement to handle confidentiality initially and then add integrity later, that might be reasonable. > > I appreciate the recent discussion and reviews of the KMS in particular, > > and of the patches which have been sent enabling TDE based on the KMS > > patches. Having them be relatively independent seems to be an ongoing > > I was thinking some more and I have received productive feedback from at > least eight people on the key management patch, which is very good. Agreed. > > concern and perhaps we should figure out a way to more clearly put them > > together. That is- the KMS patches have been posted on one thread, and > > TDE PoC patches which use the KMS patches have been on another thread, > > leading some to not realize that there's already been TDE PoC work done > > based on the KMS patches. Seems like it might make sense to get one > > patch set which goes all the way from the KMS and includes the TDE PoC, > > even if they don't all go in at once. > > Uh, it is worse than that. Some people saw comments about the TDE PoC > patch (e.g., buffer pins) and thought they were related to the KMS > patch, so they thought the KMS patch wasn't ready. Now, I am not saying > the KMS patch is ready, but comments on the TDE PoC patch are unrelated > to the KMS patch being ready. I do agree with that and that it can lend to some confusion. I'm not sure what the right solution there is except to continue to try and work with those who are interested and to clarify the separation. > I think the TDE PoC was a big positive because it showed the KMS patch > being used for the actual use-case we are planning, so it was truly a > proof-of-concept. Agreed. > > I'm happy to go look over the KMS patches again if that'd be helpful and > > to comment on the TDE PoC. I can also spend some time trying to improve > > I think we eventually need a full review of the TDE PoC, combined with > the Cybertec patch, and the wiki, to get them all aligned. However, as > I said already, let's get the KMS patch approved, even if we don't apply > it now, so we know we are on an approved foundation. While the Cybertec patch is interesting, I'd really like to see something that's a bit less invasive when it comes to how temporary files are handled. In particular, I think it'd be possible to have an API that's very similar to the existing one for serial reading and writing of files which wouldn't require nearly as many changes to things like reorderbuffer.c. I also believe there's some things we could do to avoid having to modify quite as many places when it comes to LSN assignment, so the base patch isn't as big. > > on each, as I've already done. A few of the larger concerns that I have > > revolve around how to store integrity information (I've tried to find a > > way to make room for such information in our existing page layout and, > > perhaps unsuprisingly, it's far from trivial to do so in a way that will > > avoid breaking the existing page layout, or where the same set of > > binaries could work on both unencrypted pages and encrypted pages with > > integrity validation information, and that's a problem that we really > > As stated above, I think we only need a byte or two for the hint bit > counter (used in the IV), as I don't think the GCM verification bytes > will add any additional security, and I bet we can find a byte or two. > We do need a separate discussion on this, either here or privately. I have to disagree here- the GCM tag adds integrity which is really quite important. Happy to chat about it independently, of course. > > should consider trying to solve...), and how to automate key rotation > > (one of the nice things about Bruce's approach to storing the keys is > > that we're leveraging the filesystem as an index- it's easy to see how > > we might extend the key-per-file approach to allow us to, say, have a > > different key for every 32GB of LSN, but if we tried to put all of the > > keys into a single file then we'd have to figure out an indexing > > solution for it which would allow us to find the key we need to decrypt > > a given page...). I tend to agree with Bruce that we need to take > > Yeah, yuck on that plan. I was very happy how the per-version directory > worked with scripts that needed to store matching state. I don't know that it's going to ultimately be the best answer, as we're essentially using the filesystem as an index, as I mentioned above, but, yeah, trying to do all of that ourselves during WAL replay doesn't seem like it would be fun to try and figure out. This is an area that I would think we'd be able to improve on in the future too- if someone wants to spend the time coming up with a single-file format that is indexed in some manner and still provides the guarantees that we need, we could very likely teach pg_upgrade how to handle that and the data set we're talking about here is quite small, even if we've got a bunch of key rotation that's happened. > > these things in steps, getting each piece implemented as we go. Maybe > > we can do that in a separate repo for a time and then bring it all > > together, as a few on this thread have voiced, but there's no doubt that > > this is a large project and it's hard to see how we could possibly > > commit all of it at once. > > I was putting stuff in a git tree/URL; you can see it here: > > https://github.com/postgres/postgres/compare/master...bmomjian:key.diff > https://github.com/postgres/postgres/compare/master...bmomjian:key.patch > https://github.com/postgres/postgres/compare/master...bmomjian:key > > However, people wanted persistent patches attached, so I started doing that. > Attached is the current patch set. Doing both seems likely to be the best option and hopefully will help everyone see the complete picture. Thanks, Stephen
Attachment
pgsql-hackers by date: