Thread: Re: Optimize scram_SaltedPassword performance

Re: Optimize scram_SaltedPassword performance

From
Yura Sokolov
Date:
03.02.2025 10:11, Zixuan Fu пишет:
> Hi Hackers,  
> 
> While profiling a program with `perf`, I noticed that `scram_SaltedPassword`
> consumed more CPU time than expected. After some investigation, I found
> that the function performs many HMAC iterations (4096 rounds for
> SCRAM-SHA-256), and each iteration reinitializes the HMAC context, causing
> excessive overhead.
> 
> OpenSSL has an optimization for this case: when the key remains the
> same, the HMAC context can be reused with a lightweight state reset by
> passing NULL as the key. To take advantage of this, I introduced
> `pg_hmac_reuse()`, which replaces the key with NULL when OpenSSL is used.

Good catch.

Since pg_hmac_reuse is not `static`, I'd add some checks that key is
exactly same. At least there should be

   Assert(key == prev_key && len == prev_len && hash_bytes(key, len) ==
prev_hash);

Where `prev_key`, `prev_len` and `prev_hash` are static variables, filled
in `pg_hmac_init`.

I don't know, should it be `Assert`, or check that leads to `elog(ERROR)`.

`hash_bytes` is fast enough to not cause measurable slow down in production.

On the other hand, use cases are trivial enough to occasional misuses to be
caught using just `Assert`.

> With this change, the performance improves by approximately **4x** (reducing
> execution time from 4ms to 1ms). The built-in PostgreSQL HMAC implementation
> does not support context reuse, and modifying it would require substantial
> changes. Therefore, `pg_hmac_reuse()` simply calls `pg_hmac_init()` in that
> case, maintaining the existing logic.

-------

regards,
Yura



Re: Optimize scram_SaltedPassword performance

From
Daniel Gustafsson
Date:
> On 3 Feb 2025, at 08:45, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>
> 03.02.2025 10:11, Zixuan Fu пишет:
>> Hi Hackers,
>>
>> While profiling a program with `perf`, I noticed that `scram_SaltedPassword`
>> consumed more CPU time than expected. After some investigation, I found
>> that the function performs many HMAC iterations (4096 rounds for
>> SCRAM-SHA-256), and each iteration reinitializes the HMAC context, causing
>> excessive overhead.

While I don't disagree with speeding up this in general, the whole point of the
SCRAM iterations is to take a lot of time as a way to combat brute forcing.
Any attacker is likely to use a patched libpq (or not use libpq at all) so
clientside it doesn't matter as much but if we make it 4x faster serverside we
don't really achieve much other than making attacks more feasible.

The relevant portion from RFC 7677 §4 is:

    The strength of this mechanism is dependent in part on the hash
    iteration-count, as denoted by "i" in [RFC5802].  As a rule of thumb,
    the hash iteration-count should be such that a modern machine will take
    0.1 seconds to perform the complete algorithm; however, this is
    unlikely to be practical on mobile devices and other relatively low-
    performance systems.  At the time this was written, the rule of thumb
    gives around 15,000 iterations required; however, a hash iteration-
    count of 4096 takes around 0.5 seconds on current mobile handsets.
    This computational cost can be avoided by caching the ClientKey
    (assuming the Salt and hash iteration-count is stable).  Therefore, the
    recommendation of this specification is that the hash iteration- count
    SHOULD be at least 4096, but careful consideration ought to be given to
    using a significantly higher value, particularly where mobile use is
    less important.

The numbers are quite outdated but the gist of it holds.  If we speed it up
serverside we need to counter that with a higher iteration count, and for a
rogue client it's unlikely to matter since it wont use our code anyways.
Speeding up HMAC for other usecases is a different story (but also likely to
have less measurable performance impact).

>> OpenSSL has an optimization for this case: when the key remains the
>> same, the HMAC context can be reused with a lightweight state reset by
>> passing NULL as the key. To take advantage of this, I introduced
>> `pg_hmac_reuse()`, which replaces the key with NULL when OpenSSL is used.

> Where `prev_key`, `prev_len` and `prev_hash` are static variables, filled
> in `pg_hmac_init`.

Storing any part of a cryptograhic calculation, let alone a key, in a static
variable doesn't seem entirely like a best practice, and it also wont be
threadsafe.

--
Daniel Gustafsson