Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id CAD21AoBPkpS5EHHVLQCBucGO=g6sRnGKx4k1H-12GxNWyNGDrA@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Fri, Aug 16, 2019 at 10:01 AM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Thu, Aug 15, 2019 at 06:10:24PM +0900, Masahiko Sawada wrote:
> > > On Thu, Aug 15, 2019 at 10:19 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > >
> > > > On Wed, Aug 14, 2019 at 04:36:35PM +0200, Antonin Houska wrote:
> > > > > I can work on it right away but don't know where to start.
> > > >
> > > > I think the big open question is whether there will be acceptance of an
> > > > all-cluster encyption feature.  I guess if no one objects, we can move
> > > > forward.
> > >
> > > I still feel that we need to have per table/tablespace keys although
> > > it might not be the first implementation. I think the safeness of both
> > > table/tablespace level and cluster level would be almost the same but
> > > the former would have an advantage in terms of operation and
> > > performance.
> >
> > I assume you are talking about my option #1.  I can see if you only need
> > a few tables encrypted, e.g., credit card numbers, it can be excessive
> > to encrypt the entire cluster.  (I think you would need to encrypt
> > pg_statistic too.)
>
> Or we would need a seperate encrypted pg_statistic, or a way to encrypt
> certain entries inside pg_statistic.

I think we also need to encrypt other system catalogs. For instance
pg_procs might also have sensitive data in prosrc column. So I think
it's better to encrypt all system catalogs rather than picking up some
catalogs since it would not be a big overhead. Since system catalogs
are created during CREATE DATABASE by copying files tablespace level
or database level encryption would be well suited with system catalog
encryption.

>
> > The tricky part will be WAL --- if we encrypt all of WAL, the per-table
> > overhead might be minimal compared to the WAL encryption overhead.  The
> > better solution would be to add a flag to WAL records to indicate
> > encrypted entries, but you would then leak when an encryption change
> > happens and WAL record length.  (FYI, numeric values have different
> > lengths, as do character strings.)  I assume we would still use a single
> > key for all tables/indexes, and one for WAL, plus key rotation
> > requirements.
>
> I don't think the fact that a change was done to an encrypted blob is an
> actual 'leak'- anyone can tell that by looking at the at the encrypted
> data before and after.  Further, the actual change would be encrypted,
> right?  Length of data is necessary to include in the vast majority of
> cases that the data is being dealt with and so I'm not sure that it
> makes sense for us to be worrying about that as a leak, unless you have
> a specific recommendation from a well known source discussing that
> concern..?
>
> > I personally would like to see full cluster implemented first to find
> > out exactly what the overhead is.  As I stated earlier, the overhead of
> > determining which things to encrypt, both in code complexity, user
> > interface, and processing overhead, might not be worth it.
>
> I disagree with this and feel that the overhead that's being discussed
> here (user interface, figuring out if we should encrypt it or not,
> processing overhead for those determinations) is along the lines of
> UNLOGGED tables, yet there wasn't any question about if that was a valid
> or useful feature to implement.  The biggest challenge here is really
> around key management and I agree that's difficult but it's also really
> important and something that we need to be thinking about- and thinking
> about how to work with multiple keys and not just one.  Building in an
> assumption that we will only ever work with one key would make this
> capability nothing more than DBA-managed filesystem-level encryption
> (though even there different tablespaces could have different keys...)
> and I worry would make later work to support multiple keys more
> difficult and less likely to actually happen.  It's also not clear to me
> why we aren't building in *some* mechanism to work with multiple keys
> from the start as part of the initial design.
>
> > I can see why you would think that encrypting less would be easier than
> > encrypting more, but security boundaries are hard to construct, and
> > anything that requires a user API, even more so.
>
> I'm not sure I'm follwing here- I'm pretty sure everyone understands
> that selective encryption will require more work to implement, in part
> because an API needs to be put in place and we need to deal with
> multiple keys, etc.  I don't think anyone thinks that'll be "easier".
>
> > > > > At least it should be clear how [2] will retrieve the master key because [1]
> > > > > should not do it in a differnt way. (The GUC cluster_passphrase_command
> > > > > mentioned in [3] seems viable, although I think [1] uses approach which is
> > > > > more convenient if the passphrase should be read from console.)
> > >
> > > I think that we can also provide a way to pass encryption key directly
> > > to postmaster rather than using passphrase. Since it's common that
> > > user stores keys in KMS it's useful if we can do that.
> >
> > Why would it not be simpler to have the cluster_passphrase_command run
> > whatever command-line program it wants?  If you don't want to use a
> > shell command, create an executable and call that.
>
> Having direct integration with a KMS would certainly be valuable, and I
> don't see a reason to deny users that option if someone would like to
> spend time implementing it- in addition to a simpler mechanism such as a
> passphrase command, which I believe is what was being suggested here.
>
> > > > > Rotation of
> > > > > the master key is another thing that both versions of the feature should do in
> > > > > the same way. And of course, the fronend applications need consistent approach
> > > > > too.
> > > >
> > > > I don't see the value of an external library for key storage.
> > >
> > > I think that big benefit is that PostgreSQL can seamlessly work with
> > > external services such as KMS. For instance, when key rotation,
> > > PostgreSQL can register new key to KMS and use it, and it can remove
> > > keys when it no longer necessary. That is, it can enable PostgreSQL to
> > > not only not only getting key from KMS but also registering and
> > > removing keys. And we also can decrypt MDEK in KMS instead of doing in
> > > PostgreSQL which is more safety. In addition, once someone create the
> > > plugin library of an external services individual projects don't need
> > > to create that.
> >
> > I think the big win for an external library is when you don't want the
> > overhead of calling an external program.  For example, we certainly
> > would not want to call an external program while processing a query.  Do
> > we have any such requirements for encryption, especially since we only
> > are going to allow offline mode for encryption mode changes and key
> > rotation in the first version?
>
> The strong push for a stripped-down and "first version" that is
> extremely limited is really grating on me as it seems we have quite a
> few people who are interested in making progress here and a small number
> of others who are pushing back and putting up limitations that "the
> first version can't have X" or "the first version can't have Y".
>
> I'm all for incremental development, but we need to be thinking about
> the larger picture when we develop features and make sure that we don't
> bake in assumptions that will later become very difficult for us to work
> ourselves out of (especially when it comes to user interface and things
> like GUCs...), but where we decide to draw a line shouldn't be based on
> assumptions about what's going to be difficult and what isn't- let's let
> those who want to work on this capability work on it and as we see the
> progress, if there's issues which come up with a specific area that seem
> likely to prove difficult to include, then we can consider backing away
> from that while keeping it in mind while doing further development.

I totally agree. That's why I stated the difficulty to support finer
granularity encryption after supported cluster wide encryption, and
worried the backward compatibility. I think we need to think
implementing what users want while keeping it simple as much as
possible even if it's complex.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs
Next
From: Thomas Munro
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs