Re: Proposed patch for key managment - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Proposed patch for key managment
Date
Msg-id 20201218213624.GG16415@tamriel.snowman.net
Whole thread Raw
In response to Re: Proposed patch for key managment  (Alastair Turner <minion@decodable.me>)
Responses Re: Proposed patch for key managment
Re: Proposed patch for key managment
List pgsql-hackers
Greetings Alastair,

* Alastair Turner (minion@decodable.me) wrote:
> On Wed, 16 Dec 2020 at 22:43, Stephen Frost <sfrost@snowman.net> wrote:
> > If I'm following, you're suggesting something like:
> >
> > cluster_passphrase_command = 'aws get %q'
> >
> > and then '%q' gets replaced with "Please provide the WAL DEK: ", or
> > something like that?  Prompting the user for each key?  Not sure how
> > well that's work if want to automate everything though.
> >
> > If you have specific ideas, it'd really be helpful to give examples of
> > what you're thinking.
>
> I can think of three specific ideas off the top of my head: the
> passphrase key wrapper, the secret store and the cloud/HW KMS.
>
> Since the examples expand the purpose of cluster_passphrase_command,
> let's call it cluster_key_challenge_command in the examples.

These ideas don't seem too bad but I'd think we'd pass which key we want
on the command-line using a %i or something like that, rather than using
stdin, unless there's some reason that would be an issue..?  Certainly
the CLI utilities I've seen tend to expect the name of the secret that
you're asking for to be passed on the command line.

> Starting with the passphrase key wrapper, since it's what's in place now.
>
>  - cluster_key_challenge_command = 'password_key_wrapper %q'

I do tend to like this idea of having what
cluster_key_challenge_command, or whatever we call it, expects is an
actual key and have the command that is run be a separate command that
takes the passphrase and runs the KDF (key derivation function) on it,
when a passphrase is what the user wishes to use.

That generally makes more sense to me than having the key derivation
effort built into the backend which I have a hard time seeing any
particular reason for, as long we're already calling out to some
external utility of some kind to get the key.

>  - Supplied on stdin to the process is the wrapped DEK (either a
> combined key for db files and WAL or one for each, on separate calls)
>  - %q is "Please provide WAL key wrapper password" or just "...provide
> key wrapper password"
>  - Returned on stdout is the unwrapped DEK

> For a secret store
>
>  - cluster_key_challenge_command = 'vault_key_fetch'
>  - Supplied on stdin to the process is the secret's identifier (pg_dek_xxUUIDxx)
>  - Returned on stdout is the DEK, which may never have been wrapped,
> depending on implementation
>  - Access control to the secret store is managed through the challenge
> command's own config, certs, HBA, ...

> For an HSM or cloud KMS
>
>  - cluster_key_challenge_command = 'gcp_kms_key_fetch'
>  - Supplied on stdin to the process is the the wrapped DEK (individual
> or combined)
>  - Returned on stdout is the DEK (individual or combined)
>  - Access control to the kms is managed through the challenge
> command's own config, certs, HBA, ...
>
> The secret store and HSM/KMS options may be promptless, and therefore
> amenable to automation, depending on the setup of those clients.

We can't really have what is passed on stdin, or not, be different
without having some other option say which it is and that seems like
it'd be making it overly complicated, and I get why Bruce would rather
not make this too complicated.

With the thought of trying to keep it a reasonably simple interface, I
had a thought along these lines:

- Separate command that runs the KDF, this is simple enough as it's
  really just a hash, and it returns the key on stdout.
- initdb time option that says if we're going to have PG manage the
  sub-keys, or not.
- cluster_key_command defined as "a command that is passed the ID of
  the key, or keys, required for the cluster to start"

initdb --pg-subkeys
  - Calls cluster_key_command once with "$PG_SYSTEM_ID-main" or similar
    and expects the main key to be provided on stdout.  Everything else
    is then more-or-less as is today: PG generates DEK sub-keys for data
    and WAL and then encrypts them with the main key and stores them.

As long as the enveloped keys file exists on the filesystem, when PG
starts, it'll call the cluster_key_command and will expect the 'main'
key to be provided and it'll then decrypt and verify the DEK sub-keys,
very similar to today.

In this scenario, cluster_key_command might be set to call a command
which accepts a passphrase and runs a KDF on it, or it might be set to
call out to an external vaulting system or to a Yubikey, etc.

initdb --no-pg-subkeys
  - Calls cluster_key_command for each of the sub-keys that "pg-subkeys"
    would normally generate itself, passing "$PG_SYSTEM_ID-keyid" for
    each (eg: $PG_SYSTEM_ID-data, $PG_SYSTEM_ID-wal), and getting back
    the keys on stdout to use.

When PG starts, it sees that the enveloped keys file doesn't exist and
does the same as initdb did- calls cluster_key_command multiple times to
get the keys which are needed.  We'd want to make sure to have a way
early on that checks that the data DEK provided actually decrypts the
cluster, and bail out otherwise, before actually writing any data with
that key.  I'll note though that this approach would actually allow you
to change the WAL key, if you wanted to, though, which could certainly
be nice (naturally there would be various caveats about doing so and
that replicas would have to also be switched to the new key, etc, but
that all seems reasonably solvable).  Having a stand-alone utility that
could do that for the --pg-subkeys case would be useful too (and just
generally decrypt it for viewing/backup/replacement/etc).

Down the line this might even allow us to do an online re-keying, at
least once the challenges around enabling online data checksums are
sorted out, but I don't think that's something to worry about today.
Still, it seems like this potentially provides a pretty clean interface
for that to happen eventually.

> > > > ...That
> > > > avoids the complication of having to have an API that somehow provides
> > > > more than one key, while also using the primary DEK key as-is from the
> > > > key management service and the KEK never being seen on the system where
> > > > PG is running.
> > >
> > > Other than calling out (and therefore potentially prompting) twice,
> > > what do you see as the complications of having more than one key?
> >
> > Mainly just a concern about the API and about what happens if, say, we
> > decide that we want to have another sub-key, for example.  If we're
> > handling them then there's really no issue- we just add another key in,
> > but if that's not happening then it's going to mean changes for
> > administrators.  If there's a good justification for it, then perhaps
> > that's alright, but hand waving at what the issue is doesn't really
> > help.
>
> Sorry, I wasn't trying to hand wave it away. For automated
> interactions, like big iron HSMs or cloud KSMs, the difference between
> 2 operations and 10 operations to start a DB server won't matter. For
> an admin/operator having to type 10 passwords or get 10 clean
> thumbprint scans, it would be horrible. My underlying question was, is
> that toil the only problem to be solved, or is there another reason to
> get into key combination, key splitting and the related issues which
> are less documented and less well understood than key wrapping.

I appreciate that you're not trying to hand wave it away but this also
didn't really answer the actual question- what's the advantage to having
all of the keys provided by something external rather than having that
external thing provide just one 'main' key, which we then use to decrypt
our enveloped keys that we actually use?  I can think of some possible
advantages but I'd really like to know what advantages you're seeing in
doing that.

> > > I'd describe what the current patch does as using YubiKey to encrypt
> > > and then decrypt an intermediate secret, which is then used to
> > > generate/derive a KEK, which is then used to unwrap the stored,
> > > wrapped DEK.
> >
> > This seems like a crux of at least one concern- that the current patch
> > is deriving the actual KEK from the passphrase and not just taking the
> > provided value (at least, that's what it looks like from a *very* quick
> > look into it), and that's the part that I was suggesting that we might
> > add an option for- to indicate if the cluster passphrase command is
> > actually returning a passphrase which should be used to derive a key, or
> > if it's returning a key directly to be used.  That does seem to be a
> > material difference to me and one which we should care about.
>
> Yes. Caring about that is the reason I've been making a nuisance of myself.

Right, I think we can agree on this aspect and I've chatted with Bruce
about it some also.  When a direct key can be provided, we should use
that, and not run it through a KDF.  This seems like a particularly
important case that we should care about even at this early stage.

> > > > There's an entirely independent discussion to be had about figuring out
> > > > a way to actually off-load *entirely* the encryption/decryption of data
> > > > to the linux crypto system or hardware devices, but unless someone can
> > > > step up and write code for those today, I'd suggest that we table that
> > > > effort until after we get this initial capability of having TDE with PG
> > > > doing all of the encryption/decryption.
> > >
> > > I'm hopeful that the work on abstracting OpenSSL, nsstls, etc is going
> > > to help in this direct.
> >
> > Yes, I agree with that general idea but it's a 'next step' kind of
> > thing, not something we need to try and bake in today.
>
> Agreed.

Great, glad we can agree to hold off on this until a future point- being
able to agree on how we box in this particular capability is important
or we may never end up releasing it.  I know that's one of the concerns
that others on this thread have and it's an important one.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Першин Юрий Петрович
Date:
Subject: libpq @windows : leaked singlethread_lock makes AppVerifier unhappy
Next
From: David CARLIER
Date:
Subject: [PATCH] Implements SPIN_LOCK on ARM