Thread: What constitutes "reproducible" numbers from pgbench?

What constitutes "reproducible" numbers from pgbench?

From

Date:

21 April 2015, 14:22:09

Hello list,

Exactly what constitutes „reproducible“ values from pgbench? I keep getting a range between 340 tps and 440 tps or something like that using the same command line on the same machine. Is that reproducible enough?

The docs state that one should verify that the numbers are reproducible, so I repeat any test run ten times before believing the results. I’ve tried increasing the test duration (-T) from one minute to five minutes, then turning off autovacuum (in postgresql.conf) as recommended by the docs, but the range of results is not getting any narrower. So what does “reproducible” mean as applied to pgbench?

Obviously I could be doing something wrong, such as missing some vital configuration option…

Thanks in advance for any insights.

Cheers,

Holger Friedrich

Re: What constitutes "reproducible" numbers from pgbench?

From

Qingqing Zhou

Date:

21 April 2015, 17:17:09

On Tue, Apr 21, 2015 at 7:21 AM,
<Holger.Friedrich-Fa-Trivadis@it.nrw.de> wrote:
> Hello list,
>
> Exactly what constitutes „reproducible“ values from pgbench?  I keep getting
> a range between 340 tps and 440 tps or something like that using the same
> command line on the same machine.  Is that reproducible enough?
>
Nope, it is not. Is PostgreSQL the only resource consuming (IO,
memory, CPU, etc) program running there?

By reproducible, meaning the tps numbers you get shall be close,
within several percent, if nothing changed with your runs. You can try
a selection only (-S) pgbench first.

Regards,
Qingqing

Re: What constitutes "reproducible" numbers from pgbench?

From

Andy Colson

Date:

21 April 2015, 17:43:22

On 4/21/2015 9:21 AM, Holger.Friedrich-Fa-Trivadis@it.nrw.de wrote:
> Hello list,
> Exactly what constitutes „reproducible“ values from pgbench?  I keep
> getting a range between 340 tps and 440 tps or something like that using
> the same command line on the same machine.  Is that reproducible enough?
> The docs state that one should verify that the numbers are reproducible,
> so I repeat any test run ten times before believing the results.  I’ve
> tried increasing the test duration (-T) from one minute to five minutes,
> then turning off autovacuum (in postgresql.conf) as recommended by the
> docs, but the range of results is not getting any narrower.  So what
> does “reproducible” mean as applied to pgbench?
> Obviously I could be doing something wrong, such as missing some vital
> configuration option…
> Thanks in advance for any insights.
> Cheers,
> Holger Friedrich

I think its common to get different timings.  I think its ok because
things are changing (files, caches, indexes, etc).

If you run three to five short runs, they should all be withing the same
range (say 340 to 440).  If you are planning hardware, you might take
the worst case and purchase based on that.  If you are planning
schedules you might use the average case.  If you are bragging on the
newsgroups use the best case :-).

If you want more realistic then keep vacuum enabled and run for 24
hours.  In the real world, you are going to vacuum, so benchmark it too.

If you are playing with postgres.conf settings, then three runs of a few
minutes each will give you an average, and you can compare different
settings based on that.

As Qingqing said, a read-only test should be more stable, because you
are comparing apples to apples.  A read-write test is changing under the
hood so expect some differences.

Also, if your test data is small, or large, you are benchmarking
different things. (lock speed, cpu speed, disk io, etc)

pgbench is good for a first test, but its going to act different than
your real world work load.

-Andy

Re: What constitutes "reproducible" numbers from pgbench?

From

Date:

23 April 2015, 09:07:18

On Tuesday, April 21, 2015 7:43 PM, Andy Colson wrote:
> On 4/21/2015 9:21 AM, Holger.Friedrich-Fa-Trivadis@it.nrw.de wrote:
>> Exactly what constitutes "reproducible" values from pgbench?  I keep
>> getting a range between 340 tps and 440 tps or something like that
> I think its common to get different timings.  I think its ok because things are changing (files, caches, indexes,
etc).

As I found out, our test server is a virtual machine, so while I should be "alone" on that virtual machine, of course I
haveno idea what else might be going on on the physical server the virtual machine is running on.  That would explain
thesomewhat wide variations. 

Qingqing Zhou wrote that the range between 340 tps and 440 tps I keep getting is not ok and numbers should be the same
withinseveral per cent.  Of course, if other things are going on on the physical server, I can't always expect a close
match.

Since someone asked, the point of the exercise is to see if and how various configurations in postgresql.conf are
affectingperformance. 

Cheers,
Holger Friedrich

Re: What constitutes "reproducible" numbers from pgbench?

From

Date:

23 April 2015, 10:52:00

On Thu, 23 Apr 2015 11:07:05 +0200
<Holger.Friedrich-Fa-Trivadis@it.nrw.de> wrote:

> On Tuesday, April 21, 2015 7:43 PM, Andy Colson wrote:
> > On 4/21/2015 9:21 AM, Holger.Friedrich-Fa-Trivadis@it.nrw.de wrote:
> >> Exactly what constitutes "reproducible" values from pgbench?  I keep
> >> getting a range between 340 tps and 440 tps or something like that
> > I think its common to get different timings.  I think its ok because things are changing (files, caches, indexes,
etc).
>
> As I found out, our test server is a virtual machine, so while I should be "alone" on that virtual machine, of course
Ihave no idea what else might be going on on the physical server the virtual machine is running on.  That would explain
thesomewhat wide variations. 
>
> Qingqing Zhou wrote that the range between 340 tps and 440 tps I keep getting is not ok and numbers should be the
samewithin several per cent.  Of course, if other things are going on on the physical server, I can't always expect a
closematch. 
>
> Since someone asked, the point of the exercise is to see if and how various configurations in postgresql.conf are
affectingperformance. 

You're going to have difficulty doing that sort of tuning and testing
on a VM. Even when there's nothing else going on, VMs tend to have
a wider range of behaviors than native installs (since things like
cron jobs can run both on the host and the guest OS, as well as
other reasons, I'm sure).

Whether such an endeavour is worthwhile depends on your reason for
doing it. If your production environment will also be a VM of similar
configuration to this one, then I would proceed with the tests, simply
tracking the +/- variance and keeping it in mind; since you'll likely
see the same variance on production.

If you're doing it for your own general learning, then it might still
be worth it, but it's hardly an idea setup for that kind of thing.

--
PT <wmoran@potentialtech.com>

Re: What constitutes "reproducible" numbers from pgbench?

From

Andy Colson

Date:

23 April 2015, 14:01:27

On 4/23/2015 4:07 AM, Holger.Friedrich-Fa-Trivadis@it.nrw.de wrote:
> On Tuesday, April 21, 2015 7:43 PM, Andy Colson wrote:
>> On 4/21/2015 9:21 AM, Holger.Friedrich-Fa-Trivadis@it.nrw.de wrote:
>>> Exactly what constitutes "reproducible" values from pgbench?  I keep
>>> getting a range between 340 tps and 440 tps or something like that
>> I think its common to get different timings.  I think its ok because things are changing (files, caches, indexes,
etc).
>
> Qingqing Zhou wrote that the range between 340 tps and 440 tps I keep getting is not ok and numbers should be the
samewithin several per cent.  Of course, if other things are going on on the physical server, I can't always expect a
closematch. 
>

I disagree.  Having a reproducible test withing a few percent is a great
result.  But any result is informative.  You're tests tell you an upper
and lower bound on performance.  It tells you to expect a little
variance in your work load.  It probably tells you a little about how
your vm host is caching writes to disk.  You are feeling the pulse of
your hardware.  Each hardware setup has its own pulse, and understanding
it will help you understand how it'll handle a load.

-Andy