Hi Fabian,
I reviewed `pgbench-prp-func-1.patch` and found an incomplete
implementation.
In the pseudorandom_perm function, I confirmed that the scramble and
scatter operations are mathematically bijections. Therefore, this
function is mathematically correct.
However, the implementation of the scatter operation in this patch
overflows in many cases if the variable:size is 38 bit integer or
greater. Because the variable:size and the item of the array:primes[]
which stores 27-29 bit integers are multiplicated. If overflow occurs,
the scatter operation does not satisfy bijective.
I did a sampling inspection, whose conditions are the variable:size is
1099511627773 (= 40 bit integer) and the variable:seed is 5432. Even
with very few samples, I found a huge number of collisions as shown below:
pr_perm(409749816, 1099511627773, 5432) = pr_perm(491041141,
1099511627773, 5432) = pr_perm(729171766, 1099511627773, 5432) =
pr_perm(849775914, 1099511627773, 5432) = 22445482629,
pr_perm(45609644, 1099511627773, 5432) = pr_perm(174005122,
1099511627773, 5432) = pr_perm(811754941, 1099511627773, 5432) =
pr_perm( 1131176891, 1099511627773, 5432) = 10626826534,
and so on.
There are two ways to resolve this issue. The first one is to reduce the
maximum value can be set for the variable:size. The second one is to add
a modular multiplication function to avoid overflow. I made a patch
including the function that can be calculated modular multiplication
without overflow, and attached this mail.
(I also attached a simple test suite of the function I added.)
In the others parts of the original patch, I could apply the patch and
did tests without trouble; the documentations and comments are well
except one comment in the function shown below:
>> (1) scramble: partial xors on power-or-2 subsets
I could not understand this meaning when I read it at the first time.
Best regards,
On 2018/07/28 15:03, Fabien COELHO wrote:
>
> Hello,
>
> This patch adds a pseudo-random permutation function to pgbench. It
> allows to mix non uniform random keys to avoid trivial correlations
> between neighboring values, hence between pages.
>
> The function is a simplistic form of encryption adapted to any size,
> using a few iterations of scramble and scatter phases. The result is not
> cryptographically convincing, nor even statistically, but it is quite
> inexpensive and achieves the desired result. A computation costs 0.22 µs
> per call on my laptop, about three times the cost of a simple function.
>
> Alternative designs, such as iterating over an actual encryption
> function or using some sbox, would lead to much more costly solutions
> and complex code.
>
> I also join a few scripts I used for testing.
>