Hello Alik,
>> So I would be in favor of expanding the documentation but not
>> restricting the parameter beyond avoiding value 1.0.
>
> I have removed restriction and expanded documentation in attaching patch v5.
I've done some math investigations, which consisted in spending one hour
with Christian, a statistician colleague of mine. He took an old book out
of a shelf, opened it to page 550 (roughly in the middle), and explained
to me how to build a real zipfian distribution random generator.
The iterative method is for parameter a>1 and works for unbounded values.
It is simple to add a bound. In practice the iterative method is quite
effective, i.e. number of iterations is typically small, at least if the
bound is large and if parameter a is not too close to 1.
I've attached a python3 script which implements the algorithm. It looks
like magic. Beware that a C implementation should take care of float and
int overflows.
# usage: a, #values, #tests
sh> zipf.py 1.5 1000 1000000
# after 1.7 seconds
c = [391586, 138668, 75525, 49339, 35222, 26621, ...
... 11, 13, 12, 11, 16] (1.338591 iterations per draw)
sh> zipf.py 1.1 1000 1000000
# after 3.1 seconds
c = [179302, 83927, 53104, 39015, 30557, 25164, ...
... 82, 95, 93, 81, 80] (2.681451 iterations per draw)
I think that this method should be used for a>1, and the other very rough
one can be kept for parameter a in [0, 1), a case which does not make much
sense to a mathematician as it diverges if unbounded.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers