Re: pgbench - add pseudo-random permutation function - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench - add pseudo-random permutation function
Date
Msg-id alpine.DEB.2.21.1902142155050.20189@lancre
Whole thread Raw
In response to Re: pgbench - add pseudo-random permutation function  (Andres Freund <andres@anarazel.de>)
Responses Re: pgbench - add pseudo-random permutation function
List pgsql-hackers
Hello Andres,

>> +# PGAC_C_BUILTIN_CLZLL
>
> I think this has been partially superceded by
>
> commit 711bab1e4d19b5c9967328315a542d93386b1ac5
> Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date:   2019-02-13 16:10:06 -0300

Indeed, the patch needs a rebase & conflit resolution. I'll do it. Later.

>>    <para>
>> +    Function <literal>pr_perm</literal> implements a pseudo-random permutation.
>> +    It allows to mix the output of non uniform random functions so that
>> +    values drawn more often are not trivially correlated.
>> +    It permutes integers in [0, size) using a seed by applying rounds of
>> +    simple invertible functions, similarly to an encryption function,
>> +    although beware that it is not at all cryptographically secure.
>> +    Compared to <literal>hash</literal> functions discussed above, the function
>> +    ensures that a perfect permutation is applied: there are no collisions
>> +    nor holes in the output values.
>> +    Values outside the interval are interpreted modulo the size.
>> +    The function errors if size is not positive.
>> +    If no seed is provided, <literal>:default_seed</literal> is used.
>> +    For a given size and seed, the function is fully deterministic: if two
>> +    permutations on the same size must not be correlated, use distinct seeds
>> +    as outlined in the previous example about hash functions.
>> +  </para>
>
> This doesn't really explain why we want this in pgbench.

Who is "we"?

If someone runs non uniform tests, ie with random_exp/zipf/gauss, close 
values are drawn with a similar frequency, thus correlated, inducing an 
undeserved correlation at the page level (eg for read) and better 
performance that would be the case if relative frequencies were not 
correlated to key values.

So the function allows having more realistic non uniform test, whereas 
currently we can only have non uniform test with very unrealistically 
correlated values at the key level and possibly at the page level, meaning 
non representative performances because of these induced bias.

This is under the assumption that pgbench should allow more realistic 
performance test scenarii, which I believe is a desirable purpose. If 
someone disagree with this purpose, then they would consider both non 
uniform random functions and this proposed pseudo-random permutation 
function as useless, as probably most other additions to pgbench.

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: libpq debug log
Next
From: legrand legrand
Date:
Subject: Re: Planning counters in pg_stat_statements (using pgss_store)