Re: [HACKERS] [WIP] Zipfian distribution in pgbench - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Date
Msg-id alpine.DEB.2.20.1708050950220.16395@lancre
Whole thread Raw
In response to Re: [HACKERS] [WIP] Zipfian distribution in pgbench  (Alik Khilazhev <a.khilazhev@postgrespro.ru>)
Responses Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Re: [HACKERS] [WIP] Zipfian distribution in pgbench
List pgsql-hackers
Hello Alik,

>> So I would be in favor of expanding the documentation but not 
>> restricting the parameter beyond avoiding value 1.0.
>
> I have removed restriction and expanded documentation in attaching patch v5.

I've done some math investigations, which consisted in spending one hour 
with Christian, a statistician colleague of mine. He took an old book out 
of a shelf, opened it to page 550 (roughly in the middle), and explained 
to me how to build a real zipfian distribution random generator.

The iterative method is for parameter a>1 and works for unbounded values. 
It is simple to add a bound. In practice the iterative method is quite 
effective, i.e. number of iterations is typically small, at least if the 
bound is large and if parameter a is not too close to 1.

I've attached a python3 script which implements the algorithm. It looks 
like magic. Beware that a C implementation should take care of float and 
int overflows.

   # usage: a, #values, #tests

   sh> zipf.py 1.5 1000 1000000
   # after 1.7 seconds
   c = [391586, 138668, 75525, 49339, 35222, 26621, ...
        ... 11, 13, 12, 11, 16] (1.338591 iterations per draw)

   sh> zipf.py 1.1 1000 1000000
   # after 3.1 seconds
   c = [179302, 83927, 53104, 39015, 30557, 25164, ...
        ... 82, 95, 93, 81, 80] (2.681451 iterations per draw)

I think that this method should be used for a>1, and the other very rough 
one can be kept for parameter a in [0, 1), a case which does not make much 
sense to a mathematician as it diverges if unbounded.

-- 
Fabien.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server
Next
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] Row Level Security Documentation