Re: XTS cipher mode for cluster file encryption - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: XTS cipher mode for cluster file encryption
Date
Msg-id f51aad97-5f3d-0a61-69d9-2c297ea934e1@enterprisedb.com
Whole thread Raw
In response to Re: XTS cipher mode for cluster file encryption  (Stephen Frost <sfrost@snowman.net>)
Responses Re: XTS cipher mode for cluster file encryption  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers

On 10/26/21 21:43, Stephen Frost wrote:
> Greetings,
> 
> * Yura Sokolov (y.sokolov@postgrespro.ru) wrote:
>> ... >>
>> Integrity could be based on simple non-cryptographic checksum, and it could
>> be checked after decryption. It would be imposible to intentionally change
>> encrypted page in a way it will pass checksum after decription.
> 
> No, it wouldn't be impossible when we're talking about non-cryptographic
> checksums.  That is, in fact, why you'd call them that.  If it were
> impossible (or at least utterly impractical) then you'd be able to claim
> that it's cryptographic-level integrity validation.
>

Yeah, our checksums are probabilistic protection against rare and random 
bitflips cause by hardware, not against an attacker in the crypto sense.

To explain why it's not enough, consider our checksum is uint16, i.e. 
there are only 64k possible values. In other words, you can try flipping 
bits in the encrypted page, and after generating 64k you're guaranteed 
to have at least one collision. Yes, it's harder to get collision with 
the existing checksum, and compression methods that diffuse bits better 
makes it harder to get a valid page after decryption, but it's simply 
not the same thing as a crypto integrity.

Let's not try inventing something custom, there's been enough crypto 
failures due to smart custom stuff in the past already.

BTW I'm not sure what the existing patches do, but I wonder if we should 
calculate the checksum before or after encryption. I'd say it should be 
after encryption, because checksums were meant as a protection against 
issues at the storage level, so the checksum should be on what's written 
to storage, and it'd also allow offline verification of checksums etc. 
(Of course, that'd make the whole idea of relying on our checksums even 
more futile.)

Note: Maybe there are reasons why the checksum needs to be calculated 
before encryption, not sure.

>> Currently we have 16bit checksum, and it is very small. But having larger
>> checksum is orthogonal (ie doesn't bound) to having encryption.
> 
> Sure, but that would also require a page-format change.  We've pointed
> out the downsides of that and what it would prevent in terms of
> use-cases.  That's still something that might happen but it would be a
> different effort from this.
> 

... and if such page format ends up happening, it'd be fairly easy to 
just add some extra crypto data into the page header and not rely on the 
data checksums at all.

>> In fact, Adiantum is easily made close to SIV construction:
>> - just leave last 8/16 bytes zero. If after decription they are zero,
>>    then integrity check passed.
>> That is because SIV and Adiantum are very similar in its structure:
>> - SIV:
>> -- hash
>> -- then stream cipher
>> - Adiantum:
>> -- hash (except last 16bytes)
>> -- then encrypt last 16bytes with hash,
>> -- then stream cipher
>> -- then hash.
>> If last N (N>16) bytes is nonce + zero bytes, then "hash, then encrypt last
>> 16bytes with hash" become equivalent to just "hash", and Adiantum became
>> logical equivalent to SIV.
> 
> While I appreciate your interest in this, I don't think it makes sense
> for us to try and implement something of our own- we're not
> cryptographers.  Best is to look at published guideance and what other
> projects have had success doing, and that's what this thread has been
> about.
> 

Yeah, I personally don't see much difference between XTS and Adiantum.

There are a bunch of benefits, but the main reason why Google developed 
it seems to be performance on low-end ARM machines (i.e. phones). Which 
is nice, but it's probably not hugely important - very few people run Pg 
on such machines, especially in performance-sensitive context.

It's true Adiantum is probably more resilient to IV reuse etc. but it's 
not like XTS is suddenly obsolete, and it certainly doesn't solve the 
integrity issue etc.

>>>> - like XTS it doesn't need to change plain text format and doesn't need in
>>>>    additional Nonce/Auth Code.
>>>
>>> Sure, in which case it's something that could potentially be added later
>>> as another option in the future.  I don't think we'll always have just
>>> one encryption method and it's good to generally think about what it
>>> might look like to have others but I don't think it makes sense to try
>>> and get everything in all at once.
>>
>> And among others Adiantum looks best: it is fast even without hardware
>> acceleration, it provides whole block encryption (ie every bit depends
>> on every bit) and it doesn't bound to plain-text format.
> 
> And it could still be added later as another option if folks really want
> it to be.  I've outlined why it makes sense to go with XTS first but I
> don't mean that to imply that we'll only ever have that.  Indeed, once
> we've actually got something, adding other methods will almost certainly
> be simpler.  Trying to do everything from the start will make this very
> difficult to accomplish though.
> 

Yeah.

So maybe the best thing is simply to roll with both - design the whole 
feature in a way that allows selecting the encryption scheme, with two 
options. That's generally a good engineering practice, as it ensures 
things are not coupled too much. And it's not like the encryption 
methods are expected to be super difficult.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.
Next
From: Bruce Momjian
Date:
Subject: Re: XTS cipher mode for cluster file encryption