Home > mailing lists

Re: gaussian distribution pgbench -- splits v4 - Mailing list pgsql-hackers

From	Fabien COELHO
Subject	Re: gaussian distribution pgbench -- splits v4
Date	August 1, 2014 07:27:07
Msg-id	alpine.DEB.2.10.1408010905040.9457@sto Whole thread Raw
In response to	Re: gaussian distribution pgbench -- splits v4 (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: gaussian distribution pgbench -- splits v4
List	pgsql-hackers

Tree view

Hello,

>> Version one is "k' = 1 + (a * k + b) modulo n" with "a" prime with 
>> respect to "n", "n" being the number of keys. This is nearly possible, 
>> but for the modulo operator which is currently missing, and that I'm 
>> planning to submit for this very reason, but probably another time.
>
> That's pretty crude,

Yep. It is very simple, it is much better than nothing, and for a database 
test is may be "good enough".

> although I don't object to a modulo operator.  It would be nice to be 
> able to use a truly random permutation, which is not hard to generate 
> but probably requires O(n) storage, likely a problem for large scale 
> factors.

That is indeed the actual issue in my mind. I was thinking of permutations 
with a formula, which are not so easy to find and may end-up looking like 
"(a*k+b)%n" anyway. I had the same issue for generating random data for a 
schema (see http://www.coelho.net/datafiller.html).

> Maybe somebody who knows more math than I do (like you, probably!) can 
> come up with something more clever.

I can certainly suggest other formula, but that does not mean beautiful 
code, thus would probably be rejected. I'll see.

An alternative to this whole process may be to hash/modulo a non uniform 
random value.
      id = 1 + hash(some-random()) % n

But the hashing changes the distribution as it adds collisions, so I have 
to think about how to be able to control the distribution in that case, 
and what hash function to use.

-- 
Fabien.

pgsql-hackers by date:

From: Jeff Davis
Date: 01 August 2014, 06:33:40
Subject: numeric and float comparison oddities

From: Mitsumasa KONDO
Date: 01 August 2014, 07:58:09
Subject: Re: gaussian distribution pgbench -- splits v4

Re: gaussian distribution pgbench -- splits v4 - Mailing list pgsql-hackers

Previous

Next