Home > mailing lists

Re: pgbench randomness initialization - Mailing list pgsql-hackers

From	Fabien COELHO
Subject	Re: pgbench randomness initialization
Date	April 7, 2016 10:26:07
Msg-id	alpine.DEB.2.10.1604071207390.11001@sto Whole thread Raw
In response to	Re: pgbench randomness initialization (Andres Freund <andres@anarazel.de>)
Responses	Re: pgbench randomness initialization
List	pgsql-hackers

Tree view

>>  (2) runs which really vary from one to the next, so as
>>      to have an idea about how much it may vary, what is the
>>      performance stability.
>
> I don't think this POV makes all that much sense. If you do something
> non-comparable, then the results aren't, uh, comparable. Which also
> means there's a lower chance to reproduce observed problems.

That also means that you are likely not to hit them if you always do the 
very same run...

Moreover, the Monte Carlo method requires randomness for its convergence 
result.

>> Currently pgbench focusses on (2), which may or may not be fine depending on
>> what you are doing. From a personal point of view I think that (2) is more
>> significant to collect performance data, even if the results are more
>> unstable: that simply reflects reality and its intrinsic variations, so I'm
>> fine that as the default.
>
> Uh, and what's the benefit of that variability? pgbench isn't a reality
> simulation tool, it's a benchmarking tool. And benchmarks with intrisinc
> variability are bad benchmarks.

From a statistical perspective, one run does not mean anything. If you do 
the exact same run over and over again, then all mathematical results 
about (slow) convergence towards the average are lost. This is like trying 
to survey a population by asking the questions to the same person over and 
over: the result will be biased.

Now when you develop, which is the use case you probably have in mind, you 
want to compare two pg version and check for the performance impact, so 
having the exact same run seems like a proxy to quickly check for that.

However, from a stastistical perspective this is just heresy: you may do a 
change which improves one given run at the expense of all possible others 
and you would not know it: Say for instance that there are two different 
behaviors depending on something, then you will check against one of them 
only.

So I have no mathematical doubt that changing the seed is the right 
default setting, thus I think that the current behavior is fine. However 
I'm okay if someone wants to control the randomness for some reason (maybe 
having "less sure" results, but quickly), so it could be allowed somehow.

-- 
Fabien.

pgsql-hackers by date:

From: Andres Freund
Date: 07 April 2016, 10:02:24
Subject: Re: pgbench randomness initialization

From: Andres Freund
Date: 07 April 2016, 10:26:13
Subject: Re: Proposal: Generic WAL logical messages

Re: pgbench randomness initialization - Mailing list pgsql-hackers

Previous

Next