Home > mailing lists

Re: CPU costs of random_zipfian in pgbench - Mailing list pgsql-hackers

From	David Fetter
Subject	Re: CPU costs of random_zipfian in pgbench
Date	February 17, 2019 17:33:45
Msg-id	20190217173344.GY10435@fetter.org Whole thread Raw
In response to	Re: CPU costs of random_zipfian in pgbench (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: CPU costs of random_zipfian in pgbench
List	pgsql-hackers

Tree view

On Sun, Feb 17, 2019 at 11:09:27AM -0500, Tom Lane wrote:
> Fabien COELHO <coelho@cri.ensmp.fr> writes:
> >> I'm trying to use random_zipfian() for benchmarking of skewed data sets, 
> >> and I ran head-first into an issue with rather excessive CPU costs. 
> 
> > If you want skewed but not especially zipfian, use exponential which is 
> > quite cheap. Also zipfian with a > 1.0 parameter does not have to compute 
> > the harmonic number, so it depends in the parameter.
> 
> Maybe we should drop support for parameter values < 1.0, then.  The idea
> that pgbench is doing something so expensive as to require caching seems
> flat-out insane from here.  That cannot be seen as anything but a foot-gun
> for unwary users.  Under what circumstances would an informed user use
> that random distribution rather than another far-cheaper-to-compute one?
> 
> > ... This is why I submitted a pseudo-random permutation 
> > function, which alas does not get much momentum from committers.
> 
> TBH, I think pgbench is now much too complex; it does not need more
> features, especially not ones that need large caveats in the docs.
> (What exactly is the point of having zipfian at all?)

Taking a statistical perspective, Zipfian distributions violate some
assumptions we make by assuming uniform distributions. This matters
because Zipf-distributed data sets are quite common in real life.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

pgsql-hackers by date:

From: Andrew Gierth
Date: 17 February 2019, 17:09:52
Subject: Re: Ryu floating point output patch

From: Tom Lane
Date: 17 February 2019, 17:40:21
Subject: Re: ON SELECT rule on a table without columns

Re: CPU costs of random_zipfian in pgbench - Mailing list pgsql-hackers

Previous

Next