Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id 20190708232712.GI29202@tamriel.snowman.net
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Mon, Jul  8, 2019 at 06:43:31PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > On Mon, Jul  8, 2019 at 06:04:46PM -0400, Stephen Frost wrote:
> > > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > > On Mon, Jul  8, 2019 at 05:41:51PM -0400, Stephen Frost wrote:
> > > > > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > > > > Well, if it was a necessary features, I assume TLS 1.3 would have found
> > > > > > > a way to make it secure, no?  Certainly they are not shipping TLS 1.3
> > > > > > > with a known weakness.
> > > > > >
> > > > > > As discussed below- this is about moving goalposts, and that's, in part
> > > > > > at least, why re-keying isn't a *necessary* feature of TLS.  As the
> > > > >
> > > > > I agree we have to allow rekeying and allow multiple unlocked keys in
> > > > > the server at the same time.  The open question is whether encrypting
> > > > > different data with different keys and different unlock controls is
> > > > > possible or useful.
> > > >
> > > > I'm not sure if there's really a question about if it's *possible*?  As
> > > > for if it's useful, I agree there's some debate.
> > >
> > > Right, it is easily possible to keep all keys unlocked, but the value is
> > > minimal, and the complexity will have a cost, which is my point.
> >
> > Having them all unlocked but only accessible to certain privileged
> > processes if very different from having them unlocked and available to
> > every backend process.
>
> Operationally, how would that work?  We unlock them all on boot but
> somehow make them inaccessible to some backends after that?

That could work and doesn't seem like an insurmountable challenge.  The
way that's been discussed, at least somewhere in the past, is leveraging
the exec backend framework to have the user-connected backends work in
an independent space from the processes launched at startup.

> > > > > > amount of data you transmit over a given TLS connection increases
> > > > > > though, the risk increases and it would be better to re-key.  How much
> > > > > > better?  That depends a great deal on if someone is trying to mount an
> > > > > > attack or not.
> > > > >
> > > > > Yep, we need to allow rekey.
> > > >
> > > > Supporting a way to rekey is definitely a good idea.
> > >
> > > It is a requirement, I think.  We might have problem tracking exactly
> > > what key _version_ each table (or 8k block), or WAL file are.  :-(
> > > Ideally we would allow only two active keys, and somehow mark each page
> > > as using the odd or even key at a given time, or something strange.
> > > (Yeah, hand waving here.)
> >
> > Well, that wouldn't be the ideal since it would limit us to some small
> > number of GBs of data written, based on the earlier discussion, right?
>
> No, it is GB per secret-nonce combination.

Hrmpf.  I'm trying to follow the logic that draws this conclusion.

As I understand it, the NIST recommendation is a 96-bit *random* nonce,
and then there's also a recommendation to not encrypt more than 2^32
messages- much less than the 96-bit random nonce, at least potentially
because that limits the repeat-nonce risk to a very low probability.

If the amount-you-can-encrypt is really per secret+nonce combination,
then how do those recommendations make sense..?  This is where I really
think we should be reading through and understanding exactly what the
NIST recommendations are and not just trying to follow through things on
stackoverflow.

> > I'm not sure that I can see through to a system where we are rewriting
> > tables that are out on disk every time we hit 60GB of data written.
> >
> > Or maybe I'm misunderstanding what you're suggesting here..?
>
> See above.

How long would these keys be active for then in the system..?  How much
data would they potentially be used to encrypt?  Strikes me as likely to
be an awful lot...

> > > > > Uh, well, renaming the user was a big problem, but that is the only case
> > > > > I can think of.  I don't see that as an issue for block or WAL sequence
> > > > > numbers.  If we want to use a different nonce, we have to find a way to
> > > > > store it or look it up efficiently.  Considering the nonce size, I don't
> > > > > see how that is possible.
> > > >
> > > > No, this also meant that, as an attacker, I *knew* the salt ahead of
> > > > time and therefore could build rainbow tables specifically for that
> > > > salt.  I could also use those *same* tables for any system where that
> > > > user had an account, even if they used different passwords on different
> > > > systems...
> > >
> > > Yes, 'postgres' can be used to create a nice md5 rainbow table that
> > > works on many servers --- good point.  Are rainbow tables possible with
> > > something like AES?
> >
> > I'm not a cryptographer, just to be clear...  but it sure seems like if
> > you know what the nonce is, and a strong idea about at least what some
> > of the contents are, then you could work to pre-calculate a portian of
> > the encrypted data and be able to determine the key based on that.
>
> Uh, well, you would think so, but for some reason AES just doesn't allow
> that kind of attack, unless you brute force it trying every key.  The
> nonce is only to prevent someone from detecting that two output
> encryption pages contain the same contents originally.

That's certainly interesting, but such a brute-force with every key
would allow it, where, if you use a random nonce, then such an attack
would have to start working only after having access to the data, and
not be something that could be pre-computed.

> > > > I appreciate that *some* of this might not be completely relevant for
> > > > the way a nonce is used in cryptography, but I'd be very surprised to
> > > > have a cryptographer tell me that a deterministic nonce didn't have
> > > > similar issues or didn't reduce the value of the nonce significantly.
> > >
> > > This post:
> > >
> > >     https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> > >
> > > says:
> > >
> > >     GCM is a variation on Counter Mode (CTR).  As you say, with any variant
> > >     of Counter Mode, it is essential  that the Nonce is not repeated with
> > >     the same key.  Hence CTR mode  Nonces often include either a counter or
> > >     a timer element: something that  is guaranteed not to repeat over the
> > >     lifetime of the key.
> > >
> > > CTR is what we use for WAL.  8k pages, we would use CBC, which says we
> > > need a random nonce.  I need to dig deeper into ECB mode attack.
> >
> > That page also says:
> >
> >   Using a random IV / nonce for GCM has been specified as an official
> >   recommendation by - for instance - NIST. If anybody suggests differently
> >   then that's up to them.
>
> Well, if we could generate a random nonce easily, we would do that.  The
> question is how important is it for our application.

[...]

> > and a recommendation by NIST certainly holds a lot of water, at least
> > for me.  They also have a recommendation regarding the amount of data to
>
> Agreed.

This is just it though, at least from my perspective- we are saying "ok,
well, we know people recommend using a random nonce, but that's hard, so
we aren't going to do that because we don't think it's important for our
application", but we aren't cryptographers.  I liken this to whatever
discussion lead to using the username as the salt for our md5
authentication method- great intentions, but not complete understanding,
leading to a less-than-desirable result.

When it comes to this stuff, I don't think we really get to pick and
choose what we follow and what we don't.  If the recommendation from an
authority says we should use random nonces, then we *really* need to
listen and do that, because that authority is a bunch of cryptographers
with a lot more experience and who have definitely spent a great deal
more time thinking about this than we have.

If there's a recommendation from such an authority that says we *don't*
need to use a random nonce, great, I'm happy to go review that and agree
with it, but discussions on stackoverflow or similar don't hold the same
weight that a recommendation from NIST does.

> > > Well, in many modes the nonce is just a counter, but as stated above,
> > > not all modes.  I need to pull out my security books to remember for
> > > which ones it is safe.  (Frankly, it is a lot easier to use a random
> > > nonce for WAL than 8k pages.)
> >
> > I do appreciate that, but given the recommendation that you can encrypt
> > gigabytes before needing to change, I don't know that we really gain a
> > lot by changing for every 8K page.
>
> Uh, well, if you don't do that, you need to use the contents of the
> previous page for the next page, and I think we want to encrypt each 8k
> page independenty of what was before it.

I'm not sure that we really want to do this at the 8K level...  I'll
admit that I'm not completely sure *where* to draw that line then
though.

> > > Uh, we already have Postgres security for the data, so what attack
> > > vector has the user reading the RAM, but not seeing all the keys?  Isn't
> > > client-supplied secrets a much better option for this?
> >
> > I'm all for client-supplied secrets, just to be clear, but much of the
> > point of this effort is to reduce the burden on the application
> > developers (after all, that's a lot of what we're doing in the data
> > layer is for...).
> >
> > The attack vector, as discussed below, is where the attacker has
> > complete access to the backend process through some exploit that
> > bypasses the PG security controls.  We'd like to limit the exposure
> > from such a situation happening, by having large categories which can't
> > be breached by even an attacker whose completely compromised a backend.
>
> As far as I know, TDE was to prevent someone with file system access
> from reading the data.

This seems pretty questionable, doesn't it?  Who gets access to a system
without having some access to what's running at the same time?  Perhaps
if the drive is stolen out from under the running system, but then that
could be protected against using filesystem-level encryption.  If we're
trying to mimic that, which by itself would be good, then wouldn't we
want to do so with similar capabilities- that is, by having
per-tablespace keys?  Since that's what someone running with filesystem
level encryption would have.  Of course, if they don't mount all the
filesystems they've got set up then they have problems, but that's their
choice.

In the end, having this bit of flexibility allows us to have the same
level of options that someone using filesystem-level encryption would
have, but it also starts us down the path to having something which
would work against another attack vector where someone has control over
a complete running backend.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY
Next
From: Bruce Momjian
Date:
Subject: Re: Extra quote_all_identifiers in _dumpOptions