Home > mailing lists

Re: pgbench randomness initialization - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: pgbench randomness initialization
Date	April 7, 2016 13:15:38
Msg-id	20160407131526.2342k5etkj6c4g2e@alap3.anarazel.de Whole thread Raw
In response to	Re: pgbench randomness initialization (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: pgbench randomness initialization
List	pgsql-hackers

Tree view

On 2016-04-07 08:58:16 -0400, Robert Haas wrote:
> On Thu, Apr 7, 2016 at 5:56 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> > I think that it depends on what you want, which may vary:
> >
> >  (1) "exactly" reproducible runs, but one run may hit a particular
> >      steady state not representative of what happens in general.
> >
> >  (2) runs which really vary from one to the next, so as
> >      to have an idea about how much it may vary, what is the
> >      performance stability.
> >
> > Currently pgbench focusses on (2), which may or may not be fine depending on
> > what you are doing. From a personal point of view I think that (2) is more
> > significant to collect performance data, even if the results are more
> > unstable: that simply reflects reality and its intrinsic variations, so I'm
> > fine that as the default.
> >
> > Now for those interested in (1) for some reason, I would suggest to rely a
> > PGBENCH_RANDOM_SEED environment variable or --random-seed option which could
> > be used to have a oxymoronic "deterministic randomness", if desired.
> > I do not think that it should be the default, though.
> 
> I agree entirely.  If performance is erratic, that's actually
> something you want to discover during benchmarking.  If different
> pgbench runs (of non-trivial length) are producing substantially
> different results, then that's really a problem we need to fix, not
> just adjust pgbench to cover it up.

It's not about "covering it up"; it's about actually being able to take
action based on benchmark results, and about practically being able to
run benchmarks. The argument above means essentially that we need to run
a significant number of pgbench runs for *anything*, because running
them 3-5 times before/after just isn't meaningful enough.

It means that you can't separate between OS caused, and pgbench order
caused performance differences.

I agree that it's a horrid problem that we can get half the throughput
dependent on large machines, dependant on the ordering. But without
running queries in the same order before/after a patch there's no way to
validate whether $patch caused the problem. And no way to reliably
trigger problematic scenarios.

I also agree that it's important to be able to vary workloads. But if
you do so, you should do so in the same order, both pre/post a
patch. Afaics the prime use of pgbench is validation of the performance
effects of patches; therefore it should be usable for that, and it's
not.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Amit Kapila
Date: 07 April 2016, 13:10:22
Subject: Re: Speed up Clog Access by increasing CLOG buffers

From: Andres Freund
Date: 07 April 2016, 13:18:16
Subject: Re: Speed up Clog Access by increasing CLOG buffers

Re: pgbench randomness initialization - Mailing list pgsql-hackers

Previous

Next