Re: pgbench - add pseudo-random permutation function - Mailing list pgsql-hackers

From Hironobu SUZUKI
Subject Re: pgbench - add pseudo-random permutation function
Date
Msg-id f30bc717-5532-c2d8-e513-b4912c10ddbe@interdb.jp
Whole thread Raw
In response to pgbench - add pseudo-random permutation function  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: pgbench - add pseudo-random permutation function
Re: pgbench - add pseudo-random permutation function
List pgsql-hackers
Hi Fabian,

I reviewed `pgbench-prp-func-1.patch` and found an incomplete 
implementation.


In the pseudorandom_perm function, I confirmed that the scramble and 
scatter operations are mathematically bijections. Therefore, this 
function is mathematically correct.

However, the implementation of the scatter operation in this patch 
overflows in many cases if the variable:size is 38 bit integer or 
greater. Because the variable:size and the item of the array:primes[] 
which stores 27-29 bit integers are multiplicated. If overflow occurs, 
the scatter operation does not satisfy bijective.

I did a sampling inspection, whose conditions are the variable:size is 
1099511627773 (= 40 bit integer) and the variable:seed is 5432. Even 
with very few samples, I found a huge number of collisions as shown below:

pr_perm(409749816, 1099511627773, 5432) = pr_perm(491041141, 
1099511627773, 5432) = pr_perm(729171766, 1099511627773, 5432) = 
pr_perm(849775914, 1099511627773, 5432) = 22445482629,
pr_perm(45609644, 1099511627773, 5432) = pr_perm(174005122, 
1099511627773, 5432) = pr_perm(811754941, 1099511627773, 5432) = 
pr_perm( 1131176891, 1099511627773, 5432) = 10626826534,
and so on.

There are two ways to resolve this issue. The first one is to reduce the 
maximum value can be set for the variable:size. The second one is to add 
a modular multiplication function to avoid overflow. I made a patch 
including the function that can be calculated modular multiplication 
without overflow, and attached this mail.
(I also attached a simple test suite of the function I added.)


In the others parts of the original patch, I could apply the patch and 
did tests without trouble; the documentations and comments are well 
except one comment in the function shown below:

 >> (1) scramble: partial xors on power-or-2 subsets

I could not understand this meaning when I read it at the first time.


Best regards,


On 2018/07/28 15:03, Fabien COELHO wrote:
> 
> Hello,
> 
> This patch adds a pseudo-random permutation function to pgbench. It 
> allows to mix non uniform random keys to avoid trivial correlations 
> between neighboring values, hence between pages.
> 
> The function is a simplistic form of encryption adapted to any size, 
> using a few iterations of scramble and scatter phases. The result is not 
> cryptographically convincing, nor even statistically, but it is quite 
> inexpensive and achieves the desired result. A computation costs 0.22 µs 
> per call on my laptop, about three times the cost of a simple function.
> 
> Alternative designs, such as iterating over an actual encryption 
> function or using some sbox, would lead to much more costly solutions 
> and complex code.
> 
> I also join a few scripts I used for testing.
> 


Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Next
From: "Higuchi, Daisuke"
Date:
Subject: stat() on Windows might cause error if target file is larger than4GB