Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) |
Date | |
Msg-id | CAD21AoBPkpS5EHHVLQCBucGO=g6sRnGKx4k1H-12GxNWyNGDrA@mail.gmail.com Whole thread Raw |
In response to | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) (Stephen Frost <sfrost@snowman.net>) |
List | pgsql-hackers |
On Fri, Aug 16, 2019 at 10:01 AM Stephen Frost <sfrost@snowman.net> wrote: > > Greetings, > > * Bruce Momjian (bruce@momjian.us) wrote: > > On Thu, Aug 15, 2019 at 06:10:24PM +0900, Masahiko Sawada wrote: > > > On Thu, Aug 15, 2019 at 10:19 AM Bruce Momjian <bruce@momjian.us> wrote: > > > > > > > > On Wed, Aug 14, 2019 at 04:36:35PM +0200, Antonin Houska wrote: > > > > > I can work on it right away but don't know where to start. > > > > > > > > I think the big open question is whether there will be acceptance of an > > > > all-cluster encyption feature. I guess if no one objects, we can move > > > > forward. > > > > > > I still feel that we need to have per table/tablespace keys although > > > it might not be the first implementation. I think the safeness of both > > > table/tablespace level and cluster level would be almost the same but > > > the former would have an advantage in terms of operation and > > > performance. > > > > I assume you are talking about my option #1. I can see if you only need > > a few tables encrypted, e.g., credit card numbers, it can be excessive > > to encrypt the entire cluster. (I think you would need to encrypt > > pg_statistic too.) > > Or we would need a seperate encrypted pg_statistic, or a way to encrypt > certain entries inside pg_statistic. I think we also need to encrypt other system catalogs. For instance pg_procs might also have sensitive data in prosrc column. So I think it's better to encrypt all system catalogs rather than picking up some catalogs since it would not be a big overhead. Since system catalogs are created during CREATE DATABASE by copying files tablespace level or database level encryption would be well suited with system catalog encryption. > > > The tricky part will be WAL --- if we encrypt all of WAL, the per-table > > overhead might be minimal compared to the WAL encryption overhead. The > > better solution would be to add a flag to WAL records to indicate > > encrypted entries, but you would then leak when an encryption change > > happens and WAL record length. (FYI, numeric values have different > > lengths, as do character strings.) I assume we would still use a single > > key for all tables/indexes, and one for WAL, plus key rotation > > requirements. > > I don't think the fact that a change was done to an encrypted blob is an > actual 'leak'- anyone can tell that by looking at the at the encrypted > data before and after. Further, the actual change would be encrypted, > right? Length of data is necessary to include in the vast majority of > cases that the data is being dealt with and so I'm not sure that it > makes sense for us to be worrying about that as a leak, unless you have > a specific recommendation from a well known source discussing that > concern..? > > > I personally would like to see full cluster implemented first to find > > out exactly what the overhead is. As I stated earlier, the overhead of > > determining which things to encrypt, both in code complexity, user > > interface, and processing overhead, might not be worth it. > > I disagree with this and feel that the overhead that's being discussed > here (user interface, figuring out if we should encrypt it or not, > processing overhead for those determinations) is along the lines of > UNLOGGED tables, yet there wasn't any question about if that was a valid > or useful feature to implement. The biggest challenge here is really > around key management and I agree that's difficult but it's also really > important and something that we need to be thinking about- and thinking > about how to work with multiple keys and not just one. Building in an > assumption that we will only ever work with one key would make this > capability nothing more than DBA-managed filesystem-level encryption > (though even there different tablespaces could have different keys...) > and I worry would make later work to support multiple keys more > difficult and less likely to actually happen. It's also not clear to me > why we aren't building in *some* mechanism to work with multiple keys > from the start as part of the initial design. > > > I can see why you would think that encrypting less would be easier than > > encrypting more, but security boundaries are hard to construct, and > > anything that requires a user API, even more so. > > I'm not sure I'm follwing here- I'm pretty sure everyone understands > that selective encryption will require more work to implement, in part > because an API needs to be put in place and we need to deal with > multiple keys, etc. I don't think anyone thinks that'll be "easier". > > > > > > At least it should be clear how [2] will retrieve the master key because [1] > > > > > should not do it in a differnt way. (The GUC cluster_passphrase_command > > > > > mentioned in [3] seems viable, although I think [1] uses approach which is > > > > > more convenient if the passphrase should be read from console.) > > > > > > I think that we can also provide a way to pass encryption key directly > > > to postmaster rather than using passphrase. Since it's common that > > > user stores keys in KMS it's useful if we can do that. > > > > Why would it not be simpler to have the cluster_passphrase_command run > > whatever command-line program it wants? If you don't want to use a > > shell command, create an executable and call that. > > Having direct integration with a KMS would certainly be valuable, and I > don't see a reason to deny users that option if someone would like to > spend time implementing it- in addition to a simpler mechanism such as a > passphrase command, which I believe is what was being suggested here. > > > > > > Rotation of > > > > > the master key is another thing that both versions of the feature should do in > > > > > the same way. And of course, the fronend applications need consistent approach > > > > > too. > > > > > > > > I don't see the value of an external library for key storage. > > > > > > I think that big benefit is that PostgreSQL can seamlessly work with > > > external services such as KMS. For instance, when key rotation, > > > PostgreSQL can register new key to KMS and use it, and it can remove > > > keys when it no longer necessary. That is, it can enable PostgreSQL to > > > not only not only getting key from KMS but also registering and > > > removing keys. And we also can decrypt MDEK in KMS instead of doing in > > > PostgreSQL which is more safety. In addition, once someone create the > > > plugin library of an external services individual projects don't need > > > to create that. > > > > I think the big win for an external library is when you don't want the > > overhead of calling an external program. For example, we certainly > > would not want to call an external program while processing a query. Do > > we have any such requirements for encryption, especially since we only > > are going to allow offline mode for encryption mode changes and key > > rotation in the first version? > > The strong push for a stripped-down and "first version" that is > extremely limited is really grating on me as it seems we have quite a > few people who are interested in making progress here and a small number > of others who are pushing back and putting up limitations that "the > first version can't have X" or "the first version can't have Y". > > I'm all for incremental development, but we need to be thinking about > the larger picture when we develop features and make sure that we don't > bake in assumptions that will later become very difficult for us to work > ourselves out of (especially when it comes to user interface and things > like GUCs...), but where we decide to draw a line shouldn't be based on > assumptions about what's going to be difficult and what isn't- let's let > those who want to work on this capability work on it and as we see the > progress, if there's issues which come up with a specific area that seem > likely to prove difficult to include, then we can consider backing away > from that while keeping it in mind while doing further development. I totally agree. That's why I stated the difficulty to support finer granularity encryption after supported cluster wide encryption, and worried the backward compatibility. I think we need to think implementing what users want while keeping it simple as much as possible even if it's complex. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: