Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id 20190708224330.GH29202@tamriel.snowman.net
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 06:04:46PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > > > > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > > > > with a known weakness.
> > > >
> > > > As discussed below- this is about moving goalposts, and that's, in part
> > > > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> > >
> > > I agree we have to allow rekeying and allow multiple unlocked keys in
> > > the server at the same time.  The open question is whether encrypting
> > > different data with different keys and different unlock controls is
> > > possible or useful.
> >
> > I'm not sure if there's really a question about if it's *possible*?  As
> > for if it's useful, I agree there's some debate.
>
> Right, it is easily possible to keep all keys unlocked, but the value is
> minimal, and the complexity will have a cost, which is my point.

Having them all unlocked but only accessible to certain privileged
processes if very different from having them unlocked and available to
every backend process.

> > > > amount of data you transmit over a given TLS connection increases
> > > > though, the risk increases and it would be better to re-key.  How much
> > > > better?  That depends a great deal on if someone is trying to mount an
> > > > attack or not.
> > >
> > > Yep, we need to allow rekey.
> >
> > Supporting a way to rekey is definitely a good idea.
>
> It is a requirement, I think.  We might have problem tracking exactly
> what key _version_ each table (or 8k block), or WAL file are.  :-(
> Ideally we would allow only two active keys, and somehow mark each page
> as using the odd or even key at a given time, or something strange.
> (Yeah, hand waving here.)

Well, that wouldn't be the ideal since it would limit us to some small
number of GBs of data written, based on the earlier discussion, right?

I'm not sure that I can see through to a system where we are rewriting
tables that are out on disk every time we hit 60GB of data written.

Or maybe I'm misunderstanding what you're suggesting here..?

> > > Uh, well, renaming the user was a big problem, but that is the only case
> > > I can think of.  I don't see that as an issue for block or WAL sequence
> > > numbers.  If we want to use a different nonce, we have to find a way to
> > > store it or look it up efficiently.  Considering the nonce size, I don't
> > > see how that is possible.
> >
> > No, this also meant that, as an attacker, I *knew* the salt ahead of
> > time and therefore could build rainbow tables specifically for that
> > salt.  I could also use those *same* tables for any system where that
> > user had an account, even if they used different passwords on different
> > systems...
>
> Yes, 'postgres' can be used to create a nice md5 rainbow table that
> works on many servers --- good point.  Are rainbow tables possible with
> something like AES?

I'm not a cryptographer, just to be clear...  but it sure seems like if
you know what the nonce is, and a strong idea about at least what some
of the contents are, then you could work to pre-calculate a portian of
the encrypted data and be able to determine the key based on that.

> > I appreciate that *some* of this might not be completely relevant for
> > the way a nonce is used in cryptography, but I'd be very surprised to
> > have a cryptographer tell me that a deterministic nonce didn't have
> > similar issues or didn't reduce the value of the nonce significantly.
>
> This post:
>
>     https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
>
> says:
>
>     GCM is a variation on Counter Mode (CTR).  As you say, with any variant
>     of Counter Mode, it is essential  that the Nonce is not repeated with
>     the same key.  Hence CTR mode  Nonces often include either a counter or
>     a timer element: something that  is guaranteed not to repeat over the
>     lifetime of the key.
>
> CTR is what we use for WAL.  8k pages, we would use CBC, which says we
> need a random nonce.  I need to dig deeper into ECB mode attack.

That page also says:

  Using a random IV / nonce for GCM has been specified as an official
  recommendation by - for instance - NIST. If anybody suggests differently
  then that's up to them.

and a recommendation by NIST certainly holds a lot of water, at least
for me.  They also have a recommendation regarding the amount of data to
encrypt with the same key, and that number is much lower than the 96-bit
randomness of the nonce, with a recommendation to use a
cryptographically sound random, meaning that the chances of a duplicate
are extremely low.

> > > Uh, well, you are much less likely to get duplicate nonce values by
> > > using block number or WAL sequence number.  If you look at the
> > > implementations, few compute random nonce values.
> >
> > Which implementations..?  Where do their nonce values come from?  I can
> > see how a nonce might have to be naturally and deterministically random,
> > if the source for it is sufficiently varied across the key space, but
> > starting at '1' and going up with the same key seems like it's just
> > giving a potential attacker more information about what the encrypted
> > data contains...
>
> Well, in many modes the nonce is just a counter, but as stated above,
> not all modes.  I need to pull out my security books to remember for
> which ones it is safe.  (Frankly, it is a lot easier to use a random
> nonce for WAL than 8k pages.)

I do appreciate that, but given the recommendation that you can encrypt
gigabytes before needing to change, I don't know that we really gain a
lot by changing for every 8K page.

> > > And you base the random goal on what?  Nonce is number used only once,
> > > and randomness is not a requirement.  You can say you prefer it, but
> > > why, because most implementations don't use random nonce.
> >
> > The encryption schemes I've worked with in the past have used a random
> > nonce, so I'm wondering where the disconnect is between us on that.
>
> OK.
>
> > > > > > When it comes to concerns about autovacuum or other system processes,
> > > > > > those don't have any direct user connections or interactions, so having
> > > > > > them be more privileged and having access to more is reasonable.
> > > > >
> > > > > Well, I am trying to understand the value of having some keys accessible
> > > > > by some parts of the system, and some not.  I am unclear what security
> > > > > value that has.
> > > >
> > > > A very real risk is a low-privilege process gaining access to the entire
> > > > backend process, and therefore being able to access anything that
> > > > backend is able to.
> > >
> > > Well, if they get to one key, they will get to them all, right?
> >
> > That's only the case if all the keys are accessible to a backend process
> > which is under a user's control.  That would certainly be a bad
> > situation and one which I'd hope we would avoid.  If the backend that
> > the user has access to only has access to a subset of the keys, then
> > while they might be able to access the other encrypted data, they
> > wouldn't be able to decrypt it.
>
> Uh, we already have Postgres security for the data, so what attack
> vector has the user reading the RAM, but not seeing all the keys?  Isn't
> client-supplied secrets a much better option for this?

I'm all for client-supplied secrets, just to be clear, but much of the
point of this effort is to reduce the burden on the application
developers (after all, that's a lot of what we're doing in the data
layer is for...).

The attack vector, as discussed below, is where the attacker has
complete access to the backend process through some exploit that
bypasses the PG security controls.  We'd like to limit the exposure
from such a situation happening, by having large categories which can't
be breached by even an attacker whose completely compromised a backend.

Note that this will almost certainly involve the kernel, and that's why
multiple shared buffers would be needed, to make it so that a given
backend isn't actually able to access all of shared buffers, but rather,
it's only able to access that portion of the filesystem, and that
portion of shared buffers, and those keys which are able to decrypt the
data that they're, broadly, allowed to see.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: Bruce Momjian
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)