Re: pgbench - add pseudo-random permutation function - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench - add pseudo-random permutation function
Date
Msg-id alpine.DEB.2.22.394.2103301920340.305215@pseudo
Whole thread Raw
In response to Re: pgbench - add pseudo-random permutation function  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: pgbench - add pseudo-random permutation function
List pgsql-hackers
Hello Dean,

Thanks a lot for this work. This version looks much better than the 
previous one you sent, and has significant advantages over the one I sent, 
in particular avoiding the prime number stuff and large modular multiply.
So this looks good!

I'm happy that halves-xoring is back because it was an essential part of 
the design. ISTM that the previous version you sent only had linear/affine 
transforms which was a bad idea.

The linear transform is moved inside halves, why not, and the 
any-odd-number multiplication is prime with power-of-two trick is neat on 
these.

However I still have some reservations:

First, I have a thing against erand48. The erand 48 bits design looks like 
something that made sense when computer architectures where 16/32 bits and 
large integers were 32 bits, so a 16 bits margin looked good enough. Using 
a int16[3] type now is really depressing and probably costly on modern 
architectures. With 64 bits architecture, and given that we are 
manipulating 64 bits integers (well 63 really), ISTM that the minimal 
state size for a PRNG should be at least 64 bits. It there a good reason 
NOT to use a 64 bits state prn generator? I took Knuth's because it is 
cheap and 64 bits. I'd accept anything which is at least 64 bits. I looked 
at xor-shift things, but there was some controversy around these so I kept 
it simple and just shifted small values to get rid of the obvious cycles 
on Knuth's generator.

Second, I have a significant reservation about the very structure of the 
transformation in this version:

   loop 4 times :

     // FIRST HALF STEER
     m/r = pseudo randoms
     if v in first "half"
        v = ((v * m) ^ r) & mask;
        rotate1(v)

     // FULL SHIFT 1
     r = pseudo random
     v = (v + r) % size

     // SECOND HALF STEER
     m/r = pseudo randoms
     if v in second "half"
        same as previous on second half

     // FULL SHIFT 2
     r = pseudo random
     v = (v + r) % size

I'm really at odds with FULL SHIFT 1, because it means that up to 1/256 of 
values are kept out of STEERING. Whole chunks of values could be kept 
unshuffled because they would only have SHIFTS apply to them and each time 
fall in the not steered half. It should be an essential part of the design 
that at least one steer is applied on a value at each round, and if two 
are applied then fine, but certainly not zero. So basically I think that 
the design would be significantly improved by removing "FULL SHIFT 1".

Third, I think that the rotate code can be simplified, in particular the 
?: should be avoided because it may induce branches quite damaging to 
processor performance.

I'll give it some more thoughts.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: Calendar support in localization
Next
From: "Joel Jacobson"
Date:
Subject: Re: Idea: Avoid JOINs by using path expressions to follow FKs