Re: CPU costs of random_zipfian in pgbench - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: CPU costs of random_zipfian in pgbench
Date
Msg-id CA+CSw_sTce74PYmisG5DB9eqaS8ncj88bVsb0+oCzjETQn6S-Q@mail.gmail.com
Whole thread Raw
In response to Re: CPU costs of random_zipfian in pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: CPU costs of random_zipfian in pgbench  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sun, Feb 17, 2019 at 10:52 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> I'm trying to use random_zipfian() for benchmarking of skewed data sets,
> and I ran head-first into an issue with rather excessive CPU costs.
> [...] This happens because generalizedHarmonicNumber() does this:
>
>       for (i = n; i > 1; i--)
>               ans += pow(i, -s);
>
> where n happens to be 1000000000 (range passed to random_zipfian), so
> the loop takes quite a bit of time.

If you find a better formula for the harmonic number, you are welcome
and probably get your name on it:-)

There are pretty good approximations for s > 1.0 using Riemann zeta function and Euler derived a formula for the s = 1 case.

I also noticed that i is int in this function, but n is int64. That seems like an oversight.

Regards,
Ants Aasma

 

pgsql-hackers by date:

Previous
From: "Matsumura, Ryo"
Date:
Subject: RE: SQL statement PREPARE does not work in ECPG
Next
From: Peter Eisentraut
Date:
Subject: Re: list append syntax for postgresql.conf