Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Date
Msg-id 20190720173030.lgkwmpxkzsdupcqe@development
Whole thread Raw
In response to Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)  (Antonin Houska <ah@cybertec.at>)
Responses Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
List pgsql-hackers
On Fri, Jul 19, 2019 at 04:02:19PM +0200, Antonin Houska wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>> On Fri, Jul 19, 2019 at 12:04:36PM +0200, Antonin Houska wrote:
>> >Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> >
>> >> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
>> >> >On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
>> >> >> One extra thing we should consider is authenticated encryption. We can't
>> >> >> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
>> >> >> as that does not provide integrity protection (i.e. can't detect when
>> >> >> the ciphertext was corrupted due to disk failure or intentionally). And
>> >> >> we can't quite rely on checksums, because that checksums the plaintext
>> >> >> and is stored encrypted.
>> >> >
>> >> >Uh, if someone modifies a few bytes of the page, we will decrypt it, but
>> >> >the checksum (per-page or WAL) will not match our decrypted output.  How
>> >> >would they make it match the checksum without already knowing the key.
>> >> >I read [1] but could not see that explained.
>> >> >
>> >>
>> >> Our checksum is only 16 bits, so perhaps one way would be to just
>> >> generate 64k of randomly modified pages and hope one of them happens to
>> >> hit the right checksum value. Not sure how practical such attack is, but
>> >> it does require just filesystem access.
>> >
>> >I don't think you can easily generate 64k of different checksums this way. If
>> >the data is random, I suppose that each set of 2^(128 - 16) blocks will
>> >contain the the same checksum after decryption. Thus even you generate 64k of
>> >different ciphertext blocks that contain the checksum, some (many?)  checksums
>> >will be duplicate. Unfortunately the math to describe this problem does not
>> >seem to be trivial.
>> >
>>
>> I'm not sure what's your point, or why you care about the 128 bits, but I
>> don't think the math is very complicated (and it's exactly the same with
>> or without encryption). The probability of checksum collision for randomly
>> modified page is 1/64k, so p=~0.00153%. So probability of *not* getting a
>> collision is (1-p)=99.9985%. So with N pages, the probability of no
>> collisions is pow((1-p),N) which behaves like this:
>>
>>      N     pow((1-p),N)
>>    --------------------
>>    10000           85%
>>    20000           73%
>>    30000           63%
>>    46000           49%
>>    200000           4%
>>
>> So with 1.6GB relation you have about 96% chance of a checksum collision.
>
>I thought your attack proposal was to find valid (encrypted) checksum for a
>given encrypted page. Instead it seems that you were only trying to say that
>it's not too hard to generate page with a valid checksum in general. Thus the
>attacker can try to modify the ciphertext again and again in a way that is not
>quite random, but the chance to pass the checksum verification may still be
>relatively high.
>
>> >Also note that if you try to generate ciphertext, decryption of which will
>> >result in particular value of checksum, you can hardly control the other 14
>> >bytes of the block, which in turn are used to verify the checksum.
>> >
>>
>> Now, I'm not saying this attack is particularly practical - it would
>> generate a fair number of checkpoint failures before getting the first
>> collision. So it'd trigger quite a few alerts, I guess.
>
>You probably mean "checksum failures". I agree. And even if the checksum
>passed the verification, page or tuple headers would probably be incorrect and
>cause other errors.
>
>> >> FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
>> >> a cryptographic hash algorithm. Now, maybe we don't want authenticated
>> >> encryption (e.g. XTS is not authenticated, unlike GCM/CCM).
>> >
>> >I'm also not sure if we should try to guarantee data authenticity /
>> >integrity. As someone already mentioned elsewhere, page MAC does not help if
>> >the whole page is replaced. (An extreme case is that old filesystem snapshot
>> >containing the whole data directory is restored, although that will probably
>> >make the database crash soon.)
>> >
>> >We can guarantee integrity and authenticity of backup, but that's a separate
>> >feature: someone may need this although it's o.k. for him to run the cluster
>> >unencrypted.
>> >
>>
>> Yes, I do agree with that. I think attempts to guarantee data authenticity
>> and/or integrity at the page level is mostly futile (replay attacks are an
>> example of why). IMHO we should consider that to be outside the threat
>> model TDE is expected to address.
>
>When writing my previous email I forgot that, besides improving data
>integrity, the authenticated encryption also tries to detect an attempt to get
>encryption key via "chosen-ciphertext attack (CCA)". The fact that pages are
>encrypted / decrypted independent from each other should not be a problem
>here. We just need to consider if this kind of CCA is the threat we try to
>protect against.
>
>> IMO a better way to handle authenticity/integrity would be based on WAL,
>> which is essentially an authoritative log of operations. We should be able
>> to parse WAL, deduce expected state (min LSN, checksums) for each page,
>> and validate the cluster state based on that.
>
>ok. A replica that was cloned from the master before any corruption could have
>happened can be used for such checks. But that should be done by an external
>tool rather than by PG core.
>
>> I still think having to decrypt the page in order to verify a checksum
>> (because the header is part of the encrypted page, and is computed from
>> the plaintext version) is not great.
>
>Should we forbid the checksums if the cluster is encrypted? Even if the
>checksum is encrypted, I think it can still help to detect I/O corruption: if
>the encrypted data is corrupted, then the checksum verification should fail
>after decryption anyway.
>

Forbid checksums? I don't see how that could be acceptable. We either have
to accept the limitations of the current design (having to decrypt
everything before checking the checksums) or change the design.

I personally think we should do the latter - not just because of this
"decrypt-then-verify" issue, but consider how much work we've done to
allow enabling checksums on-line (it's still not there, but it's likely
doable in PG13). How are we going to do that with encryption? ISTM we
should design it so that we can enable encryption on-line too - maybe not
in v1, but it should be possible. So how are we going to do that? With
checksums it's (fairly) easy because we can "not verify" the page before
we know all pages have a checksum, but with encryption that's not
possible. And if the whole page is encrypted, then what?

Of course, maybe we don't need such capability for the use-case we're
trying to solve with encryption. I can imagine that someone is running a
large system, has issues with data corruption, and decides to enable
checksums to remedy that. Maybe there's no such scenario in the privacy
case? But we can probably come up with scenarios where a new company
policy forces people to enable encryption on all systems, or something
like that.

That being said, I don't know how to solve this, but it seems to me that
any system where we can't easily decide whether a page is encrypted or not
(because everything including the page header) is encrypted has this
exact issue. Maybe we could keep some part of the header unencrypted
(likely an information leak, does not solve decrypt-then-verify). Or maybe
we need to store some additional information on each page (which breaks
on-disk format).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: vignesh C
Date:
Subject: Re: block-level incremental backup