Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) |
Date | |
Msg-id | 20190809150147.dbd5h5yxh3xjpkwt@development Whole thread Raw |
In response to | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
|
List | pgsql-hackers |
On Thu, Aug 08, 2019 at 06:31:42PM -0400, Stephen Frost wrote: >Greetings, > >* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: >> On Thu, Aug 08, 2019 at 03:07:59PM -0400, Stephen Frost wrote: >> >* Bruce Momjian (bruce@momjian.us) wrote: >> >>On Tue, Jul 9, 2019 at 11:09:01AM -0400, Bruce Momjian wrote: >> >>> On Tue, Jul 9, 2019 at 10:59:12AM -0400, Stephen Frost wrote: >> >>> > * Bruce Momjian (bruce@momjian.us) wrote: >> >>> > I agree that all of that isn't necessary for an initial implementation, >> >>> > I was rather trying to lay out how we could improve on this in the >> >>> > future and why having the keying done at a tablespace level makes sense >> >>> > initially because we can then potentially move forward with further >> >>> > segregation to improve the situation. I do believe it's also useful in >> >>> > its own right, to be clear, just not as nice since a compromised backend >> >>> > could still get access to data in shared buffers that it really >> >>> > shouldn't be able to, even broadly, see. >> >>> >> >>> I think TDE is feature of questionable value at best and the idea that >> >>> we would fundmentally change the internals of Postgres to add more >> >>> features to it seems very unlikely. I realize we have to discuss it so >> >>> we don't block reasonable future feature development. >> >> >> >>I have a new crazy idea. I know we concluded that allowing multiple >> >>independent keys, e.g., per user, per table, didn't make sense since >> >>they have to be unlocked all the time, e.g., for crash recovery and >> >>vacuum freeze. >> > >> >I'm a bit confused as I never agreed that made any sense and I continue >> >to feel that it doesn't make sense to have one key for everything. >> > >> >Crash recovery doesn't happen "all the time" and neither does vacuum >> >freeze, and autovacuum processes are independent of individual client >> >backends- we don't need to (and shouldn't) have the keys in shared >> >memory. >> >> Don't people do physical replication / HA pretty much all the time? > >Strictly speaking, that isn't actually crash recovery, it's physical >replication / HA, and while those are certainly nice to have it's no >guarantee that they're required or that you'd want to have the same keys >for them- conceptually, at least, you could have WAL with one key that >both sides know and then different keys for the actual data files, if we >go with the approach where the WAL is encrypted with one key and then >otherwise is plaintext. > Uh? IMHO not breaking physical replication / HA should be pretty much required for any new feature, unless it's somehow obviously clear that it's not needed for that particular feature. I very much doubt we can make that conclusion for encrypted instances (at least I don't see why it would be the case in general). One reason is that those features are also used for backups, which I hope we both agree is not an optional feature. Maybe it's possible to modify pg_basebackup to re-encrypt all the data, but to do that it clearly needs to know all encryption keys (although not necessarily on the same side). >> >>However, that assumes that all heap/index pages are encrypted, and all >> >>of WAL. What if we encrypted only the user-data part of the page, i.e., >> >>tuple data. We left xmin/xmax unencrypted, and only stored the >> >>encrypted part of that data in WAL, and didn't encrypt any more of WAL. >> > >> >This is pretty much what Alvaro was suggesting a while ago, isn't it..? >> >Have just the user data be encrypted in the table and in the WAL stream. >> >> It's also moving us much closer to pgcrypto-style encryption ... > >Yes, it is, and there's good parts and bad parts to that, to be sure. > >> >>That might allow crash recovery and the freeze part of VACUUM FREEZE to >> >>work. (I don't think we could vacuum since we couldn't read the index >> >>pages to find the matching rows since the index values would be encrypted >> >>too. We might be able to not encrypt the tid in the index typle.) >> > >> >Why do we need the indexed values to vacuum the index..? We don't >> >today, as I recall. We would need the tids though, yes. >> >> Well, we also do collect statistics on the data, for example. But even >> if we assume we wouldn't do that for encrypted indexes (which seems like >> a pretty bad idea to me), you'd probably end up leaking information >> about ordering of the values. Which is generally a pretty serious >> information leak, AFAICS. > >I agree entirely that order information would be bad to leak- but this >is all new ground here and we haven't actually sorted out what such a >partially encrypted btree would look like. We don't actually have to >have the down-links in the tree be unencrypted to allow vacuuming of >leaf pages, after all. > Well, I'm not all that familiar with the btree code, but I still think you can deduce an awful amount of information from having the leaf pages alone (not sure if we could deduce a total order, but presumably yes). >> >>Is this something considering in version one of this feature? Probably >> >>not, but later? Never? Would the information leakage be too great, >> >>particularly from indexes? >> > >> >What would be leaking from the indexes..? That an encrypted blob in the >> >index pointed to a given tid? Wouldn't someone be able to see that same >> >information by looking directly at the relation too? >> >> Ordering of values, for example. Depending on how exactly the data is >> encrypted we might also be leaking information about which values are >> equal, etc. It also seems quite a bit more expensive to use such index. > >Using an encrypted index isn't going to be free. It's not clear that >this would be much more expensive than if the entire index is encrypted, >or that people would actually be unhappy if there was such an additional >expense if it meant that they could have vacuum run without the keys. > With whole-page encryption, the page would be decrypted when loading it into shared buffers, and then accessed without encryption/decryption (at least that's how it was proposed initially). I assume we wouldn't do that when only encrypting the index keys (because that would mean anything that accesses the index through shared buffers has to do the decryption, including autovacuum et al). Which means you have to do decryption on each index access (which you previously did not). IMHO that's a pretty clear and significant additional overhead. I know there were proposals to keep it encrypted in shared buffers, but I'm not sure that's what we'll end up doing (I have not followed the recent discussion all that closely, though). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: