Home > mailing lists

Re: CPU costs of random_zipfian in pgbench - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: CPU costs of random_zipfian in pgbench
Date	February 18, 2019 01:08:31
Msg-id	779980b4-a4a6-3bf9-7ecf-56cc9ce6f5be@2ndquadrant.com Whole thread Raw
In response to	Re: CPU costs of random_zipfian in pgbench (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On 2/17/19 5:09 PM, Tom Lane wrote:
> Fabien COELHO <coelho@cri.ensmp.fr> writes:
>>> I'm trying to use random_zipfian() for benchmarking of skewed data sets, 
>>> and I ran head-first into an issue with rather excessive CPU costs. 
> 
>> If you want skewed but not especially zipfian, use exponential which is 
>> quite cheap. Also zipfian with a > 1.0 parameter does not have to compute 
>> the harmonic number, so it depends in the parameter.
> 
> Maybe we should drop support for parameter values < 1.0, then.  The idea
> that pgbench is doing something so expensive as to require caching seems
> flat-out insane from here.

Maybe.

It's not quite clear to me why we support the two modes at all? We use
one algorithm for values < 1.0 and another one for values > 1.0, what's
the difference there? Are those distributions materially different?

Also, I wonder if just dropping support for parameters < 1.0 would be
enough, because the docs say:

    The function's performance is poor for parameter values close and
    above 1.0 and on a small range.

which seems to suggest it might be slow even for values > 1.0 in some
cases. Not sure.

> That cannot be seen as anything but a foot-gun
> for unwary users.  Under what circumstances would an informed user use
> that random distribution rather than another far-cheaper-to-compute one?
> 
>> ... This is why I submitted a pseudo-random permutation 
>> function, which alas does not get much momentum from committers.
> 
> TBH, I think pgbench is now much too complex; it does not need more
> features, especially not ones that need large caveats in the docs.
> (What exactly is the point of having zipfian at all?)
> 

I wonder about the growing complexity of pgbench too ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tomas Vondra
Date: 18 February 2019, 01:02:37
Subject: Re: CPU costs of random_zipfian in pgbench

From: Andrew Gierth
Date: 18 February 2019, 01:22:03
Subject: Re: Ryu floating point output patch

Re: CPU costs of random_zipfian in pgbench - Mailing list pgsql-hackers

Previous

Next