Thread: performance-test farm

performance-test farm

From
Tomas Vondra
Date:
Hi everyone,

several members of this mailing list mentioned recently it'd be really
useful to have a performance-test farm, that it might improve the
development process and make some changes easier.

I've briefly discussed this with another CSPUG member, who represents a
local company using PostgreSQL for a long time (and that supports
CSPUG), and we've agreed to investigate this a bit further.

I do have a rough idea what it might look like, but I've never built
performance-testing farm for such distributed project. So I'd like to
know what would you expect from such beast. Especially

1) Is there something that might serve as a model?
  I've googled to seach if there's some tool but "performance-test  farm" gave me a  lot of info about how to breed
cows,pigs and goats  on a farm, but that's not very useful in this case I guess.
 

2) How would you use it? What procedure would you expect?
  I mean this should produce regular performance test of the current  sources (and publish it on some website), but the
wholepoint is to  allow developers to do a performance test of their changes before  commit to the main.
 
  How would you expect to deliver these changes to the farm? How would  you define the job? How would you expect to get
theresults? etc.
 
  Just try to write down a list of steps.

3) Any other features expected?
  If you notice any interesting feature, write it down and note  whether it's a 'must have' or a 'nice to have'
feature.


I really can't promise anything right now - I have just a very rough
idea how much time/effort/money this might take. So let's see what is
needed to build a 'minimal farm' and if it's feasible with the resources
we can get.

regards
Tomas


Re: performance-test farm

From
"Kevin Grittner"
Date:
Tomas Vondra <tv@fuzzy.cz> wrote:
> 1) Is there something that might serve as a model?
I've been assuming that we would use the PostgreSQL Buildfarm as a
model.
http://buildfarm.postgresql.org/
> 2) How would you use it? What procedure would you expect?
People who had suitable test environments could sign up to
periodically build and performance test using the predetermined test
suite, and report results back for a consolidated status display. 
That would spot regressions.
It would be nice to have a feature where a proposed patch could be
included for a one-time build-and-benchmark run, so that ideas could
be tried before commit.  It can be hard to anticipate all the
differenced between Intel and AMD, Linux and Windows, 32 bit and 64
bit, etc.
> 3) Any other features expected?
Pretty graphs?  :-)
-Kevin


Re: performance-test farm

From
Tomas Vondra
Date:
Dne 11.5.2011 23:41, Kevin Grittner napsal(a):
> Tomas Vondra <tv@fuzzy.cz> wrote:
>  
>> 1) Is there something that might serve as a model?
>  
> I've been assuming that we would use the PostgreSQL Buildfarm as a
> model.
>  
> http://buildfarm.postgresql.org/

Yes, I was thinking about that too, but

1) A buildfarm used for regular building / unit testing IMHO may not  be the right place to do performance testing (not
surehow isolated  the benchmarks can be etc.).
 

2) Not sure how open this might be for the developers (if they could  issue their own builds etc.).

3) If this should be part of the current buildfarm, then I'm afraid I  can't do much about it.

>> 2) How would you use it? What procedure would you expect?
>  
> People who had suitable test environments could sign up to
> periodically build and performance test using the predetermined test
> suite, and report results back for a consolidated status display. 
> That would spot regressions.

So it would be a 'distributed farm'? Not sure it that's a good idea, as
to get reliable benchmark results you need a proper environment (not
influenced by other jobs, changes of hw etc.).

> It would be nice to have a feature where a proposed patch could be
> included for a one-time build-and-benchmark run, so that ideas could
> be tried before commit.  It can be hard to anticipate all the
> differenced between Intel and AMD, Linux and Windows, 32 bit and 64
> bit, etc.

Yes, that's one of the main goals - to allow developers to benchmark
their patches under various workloads. I don't think we'll be able to
get all those configurations, though.

>> 3) Any other features expected?
>  
> Pretty graphs?  :-)
Sure. And it will be Web 2.0 ready ;-)

Tomas


Re: performance-test farm

From
"Kevin Grittner"
Date:
Tomas Vondra <tv@fuzzy.cz> wrote:
> Dne 11.5.2011 23:41, Kevin Grittner napsal(a):
>> Tomas Vondra <tv@fuzzy.cz> wrote:
>>  
>>> 1) Is there something that might serve as a model?
>>  
>> I've been assuming that we would use the PostgreSQL Buildfarm as
>> a model.
>>  
>> http://buildfarm.postgresql.org/
> 
> Yes, I was thinking about that too, but
> 
> 1) A buildfarm used for regular building / unit testing IMHO may
>    not be the right place to do performance testing (not sure how
>    isolated the benchmarks can be etc.).
I'm not saying that we should use the existing buildfarm, or expect
current buildfarm machines to support this; just that the pattern of
people volunteering hardware in a similar way would be good.
> 2) Not sure how open this might be for the developers (if they
>    could issue their own builds etc.).
I haven't done it, but I understand that you can create a "local"
buildfarm instance which isn't reporting its results.  Again,
something similar might be good.
> 3) If this should be part of the current buildfarm, then I'm
>    afraid I can't do much about it.
Not part of the current buildfarm; just using a similar overall
pattern.  Others may have different ideas; I'm just speaking for
myself here about what seems like a good idea to me.
>>> 2) How would you use it? What procedure would you expect?
>>  
>> People who had suitable test environments could sign up to
>> periodically build and performance test using the predetermined
>> test suite, and report results back for a consolidated status
>> display. That would spot regressions.
> 
> So it would be a 'distributed farm'? Not sure it that's a good
> idea, as to get reliable benchmark results you need a proper
> environment (not influenced by other jobs, changes of hw etc.).
Yeah, accurate benchmarking is not easy.  We would have to make sure
people understood that the machine should be dedicated to the
benchmark while it is running, which is not a requirement for the
buildfarm.  Maybe provide some way to annotate HW or OS changes?
So if one machine goes to a new kernel and performance changes
radically, but other machines which didn't change their kernel
continue on a level graph, we'd know to suspect the kernel rather
than some change in PostgreSQL code.
-Kevin


Re: performance-test farm

From
Tomas Vondra
Date:
Dne 12.5.2011 00:21, Kevin Grittner napsal(a):
> Tomas Vondra <tv@fuzzy.cz> wrote:
>> Dne 11.5.2011 23:41, Kevin Grittner napsal(a):
>>> Tomas Vondra <tv@fuzzy.cz> wrote:
>>>  
>>>> 1) Is there something that might serve as a model?
>>>  
>>> I've been assuming that we would use the PostgreSQL Buildfarm as
>>> a model.
>>>  
>>> http://buildfarm.postgresql.org/
>>
>> Yes, I was thinking about that too, but
>>
>> 1) A buildfarm used for regular building / unit testing IMHO may
>>    not be the right place to do performance testing (not sure how
>>    isolated the benchmarks can be etc.).
>  
> I'm not saying that we should use the existing buildfarm, or expect
> current buildfarm machines to support this; just that the pattern of
> people volunteering hardware in a similar way would be good.

Good point. Actually I was not aware of how the buildfarm works, all I
knew was there's something like that because some of the hackers mention
a failed build on the mailing list occasionally.

So I guess this is a good opportunity to investigate it a bit ;-)

Anyway I'm not sure this would give us the kind of environment we need
to do benchmarks ... but it's worth to think of.

>  
>> 2) Not sure how open this might be for the developers (if they
>>    could issue their own builds etc.).
>  
> I haven't done it, but I understand that you can create a "local"
> buildfarm instance which isn't reporting its results.  Again,
> something similar might be good.

Well, yeah. So the developers would get a local 'copy' of all the
benchmarks / workloads and could run them?

>> 3) If this should be part of the current buildfarm, then I'm
>>    afraid I can't do much about it.
>  
> Not part of the current buildfarm; just using a similar overall
> pattern.  Others may have different ideas; I'm just speaking for
> myself here about what seems like a good idea to me.

OK, got it.

>>>> 2) How would you use it? What procedure would you expect?
>>>  
>>> People who had suitable test environments could sign up to
>>> periodically build and performance test using the predetermined
>>> test suite, and report results back for a consolidated status
>>> display. That would spot regressions.
>>
>> So it would be a 'distributed farm'? Not sure it that's a good
>> idea, as to get reliable benchmark results you need a proper
>> environment (not influenced by other jobs, changes of hw etc.).
>  
> Yeah, accurate benchmarking is not easy.  We would have to make sure
> people understood that the machine should be dedicated to the
> benchmark while it is running, which is not a requirement for the
> buildfarm.  Maybe provide some way to annotate HW or OS changes?
> So if one machine goes to a new kernel and performance changes
> radically, but other machines which didn't change their kernel
> continue on a level graph, we'd know to suspect the kernel rather
> than some change in PostgreSQL code.

I guess we could run a script that collects all those important
parameters and then detect changes. Anyway we still need some 'really
stable' machines that are not changed at all, to get a long-term baseline.

But I guess that could be done by running some dedicated machines ourselves.

regards
Tomas


Re: performance-test farm

From
Andrew Dunstan
Date:

On 05/11/2011 06:21 PM, Kevin Grittner wrote:
> Tomas Vondra<tv@fuzzy.cz>  wrote:
>> Dne 11.5.2011 23:41, Kevin Grittner napsal(a):
>>> Tomas Vondra<tv@fuzzy.cz>  wrote:


First up, you guys should be aware that Greg Smith at least is working 
on this. Let's not duplicate effort.


>>>
>>>> 1) Is there something that might serve as a model?
>>>
>>> I've been assuming that we would use the PostgreSQL Buildfarm as
>>> a model.
>>>
>>> http://buildfarm.postgresql.org/
>> Yes, I was thinking about that too, but
>>
>> 1) A buildfarm used for regular building / unit testing IMHO may
>>     not be the right place to do performance testing (not sure how
>>     isolated the benchmarks can be etc.).
>
> I'm not saying that we should use the existing buildfarm, or expect
> current buildfarm machines to support this; just that the pattern of
> people volunteering hardware in a similar way would be good.


Some buildfarm members might well be suitable for it.

I recently added support for running optional steps, and made the SCM 
module totally generic. Soon I'm hoping to provide for more radical 
extensibility by having addon modules, which will register themselves 
with the framework and the  have their tests run. I'm currently working 
on an API for such modules. This was inspired by Mike Fowler's work on a 
module to test JDBC builds, which his buildfarm member is currently 
doing: See 
<http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=piapiac&dt=2011-05-11%2000%3A00%3A02> 
for example. Obvious candidate modules might be other client libraries 
(e.g. perl DBD::Pg), non-committed patches, non-standard tests, and 
performance testing.

>> 2) Not sure how open this might be for the developers (if they
>>     could issue their own builds etc.).
>
> I haven't done it, but I understand that you can create a "local"
> buildfarm instance which isn't reporting its results.  Again,
> something similar might be good.


You can certainly create a client that doesn't report its results (just 
run it in --test mode). And you can create your own private server 
(that's been done by at least two organizations I know of).

But to test your own stuff, what we really need is a module to run 
non-committed patches, I think (see above).

There buildfarm client does have a mode (--from-source) that lets you 
test your own stuff and doesn't report on it if you do, but I don't see 
that it would be useful here.

>
>> 3) If this should be part of the current buildfarm, then I'm
>>     afraid I can't do much about it.
>

Sure you can. Contribute to the efforts mentioned above.


> Not part of the current buildfarm; just using a similar overall
> pattern.  Others may have different ideas; I'm just speaking for
> myself here about what seems like a good idea to me.


The buildfarm server is a pretty generic reporting framework. Sure we 
can build another. But it seems a bit redundant.

>
>>>> 2) How would you use it? What procedure would you expect?
>>>
>>> People who had suitable test environments could sign up to
>>> periodically build and performance test using the predetermined
>>> test suite, and report results back for a consolidated status
>>> display. That would spot regressions.
>> So it would be a 'distributed farm'? Not sure it that's a good
>> idea, as to get reliable benchmark results you need a proper
>> environment (not influenced by other jobs, changes of hw etc.).


You are not going to get a useful performance farm except in a 
distributed way. We don't own any labs, nor have we any way of 
assembling the dozens or hundreds of machines to represent the spectrum 
of platforms that we want tested in one spot. Knowing that we have 
suddenly caused a performance regression on, say, FreeBSD 8.1 running on 
AMD64, is a critical requirement.

>
> Yeah, accurate benchmarking is not easy.  We would have to make sure
> people understood that the machine should be dedicated to the
> benchmark while it is running, which is not a requirement for the
> buildfarm.  Maybe provide some way to annotate HW or OS changes?
> So if one machine goes to a new kernel and performance changes
> radically, but other machines which didn't change their kernel
> continue on a level graph, we'd know to suspect the kernel rather
> than some change in PostgreSQL code.
>


Indeed, there are lots of moving pieces.

cheers

andrew


Re: performance-test farm

From
Stephen Frost
Date:
* Andrew Dunstan (andrew@dunslane.net) wrote:
> First up, you guys should be aware that Greg Smith at least is
> working on this. Let's not duplicate effort.

Indeed.  I'm also interested in making this happen and have worked with
Greg in the past on it.  There's even some code out there that we
developed to add it on to the buildfarm, though that needs to be
reworked to fit with Andrew's latest changes (which are all good
changes).

We need a bit of hardware, but more, we need someone to clean up the
code, get it all integrated, and make it all work and report useful
information.  My feeling is "if you build it, they will come" with
regard to the hardware/performance machines.
Thanks,
    Stephen

Re: performance-test farm

From
Josh Berkus
Date:
Guys,

There are two mutually exclusive problems to solve with a
performance-test farm.

The first problem is plaform performance, which would be a matter of
expanding the buildfarm to include a small set of performance tests ...
probably ones based on previously known problems, plus some other simple
common operations.  The goal here would be to test on as many different
machines as possible, rather than getting full coverage of peformance.

The second would be to test the full range of PostgreSQL performance.
That is, to test every different thing we can reasonably benchmark on a
PostgreSQL server.  This would have to be done on a few dedicated
full-time testing machines, because of the need to use the full hardware
resources.  When done, this test would be like a full-blown TPC benchmark.

The above are fundamentally different tests.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: performance-test farm

From
Stephen Frost
Date:
* Josh Berkus (josh@agliodbs.com) wrote:
> The first problem is plaform performance, which would be a matter of
> expanding the buildfarm to include a small set of performance tests ...
> probably ones based on previously known problems, plus some other simple
> common operations.  The goal here would be to test on as many different
> machines as possible, rather than getting full coverage of peformance.

imv, we should be trying to include the above in the regression tests,
presuming that they can be done in that structure and that they can be
done 'quickly'.  (It shouldn't be hard to figure out if gettimeofday()
is really slow on one arch, for example, which I think is what you're
getting at here...)

> The second would be to test the full range of PostgreSQL performance.
> That is, to test every different thing we can reasonably benchmark on a
> PostgreSQL server.  This would have to be done on a few dedicated
> full-time testing machines, because of the need to use the full hardware
> resources.  When done, this test would be like a full-blown TPC benchmark.

Right, this is what I thought this discussion (and much of the other
recent commentary) was focused on.  I don't see the first as needing an
independent 'farm'.
Thanks,
    Stephen

Re: performance-test farm

From
Greg Smith
Date:
Tomas Vondra wrote:
> Actually I was not aware of how the buildfarm works, all I
> knew was there's something like that because some of the hackers mention
> a failed build on the mailing list occasionally.
>
> So I guess this is a good opportunity to investigate it a bit ;-)
>
> Anyway I'm not sure this would give us the kind of environment we need
> to do benchmarks ... but it's worth to think of.
>   

The idea is that buildfarm systems that are known to have a) reasonable 
hardware and b) no other concurrent work going on could also do 
performance tests.  The main benefit of this approach is it avoids 
duplicating all of the system management and source code building work 
needed for any sort of thing like this; just leverage the buildfarm 
parts when they solve similar enough problems.  Someone has actually 
done all that already; source code was last sync'd to the build farm 
master at the end of March:  https://github.com/greg2ndQuadrant/client-code

By far the #1 thing needed to move this forward from where it's stuck at 
now is someone willing to dig into the web application side of this.  
We're collecting useful data.  It needs to now be uploaded to the 
server, saved, and then reports of what happened generated.  Eventually 
graphs of performance results over time will be straighforward to 
generate.  But the whole idea requires someone else (not Andrew, who has 
enough to do) sits down and figures out how to extend the web UI with 
these new elements.


> I guess we could run a script that collects all those important
> parameters and then detect changes. Anyway we still need some 'really
> stable' machines that are not changed at all, to get a long-term baseline.
>   

I have several such scripts I use, and know where two very serious ones 
developed by others are at too.  This part is not a problem.  If the 
changes are big enough to matter, they will show up as a difference on 
the many possible "how is the server configured?" reports, we just need 
to pick the most reasonable one.  It's a small details I'm not worried 
about yet.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: performance-test farm

From
Tom Lane
Date:
> * Josh Berkus (josh@agliodbs.com) wrote:
>> The first problem is plaform performance, which would be a matter of
>> expanding the buildfarm to include a small set of performance tests ...
>> probably ones based on previously known problems, plus some other simple
>> common operations.  The goal here would be to test on as many different
>> machines as possible, rather than getting full coverage of peformance.

I think it's a seriously *bad* idea to expect existing buildfarm members
to produce useful performance data.  Very few of them are running on
dedicated machines, and some are deliberately configured with
performance-trashing options.  (I think just about all of 'em use
--enable-cassert, but there are some with worse things...)

We can probably share a great deal of the existing buildfarm code and
infrastructure, but the actual members of the p-farm will need to be a
separate collection of machines running different builds.

Stephen Frost <sfrost@snowman.net> writes:
> imv, we should be trying to include the above in the regression tests,
> presuming that they can be done in that structure and that they can be
> done 'quickly'.

There's no such thing as a useful performance test that runs quickly
enough to be sane to incorporate in our standard regression tests.
        regards, tom lane


Re: performance-test farm

From
Tomas Vondra
Date:
Dne 12.5.2011 16:36, Tom Lane napsal(a):
>> * Josh Berkus (josh@agliodbs.com) wrote:
>>> The first problem is plaform performance, which would be a matter of
>>> expanding the buildfarm to include a small set of performance tests ...
>>> probably ones based on previously known problems, plus some other simple
>>> common operations.  The goal here would be to test on as many different
>>> machines as possible, rather than getting full coverage of peformance.
> 
> I think it's a seriously *bad* idea to expect existing buildfarm members
> to produce useful performance data.  Very few of them are running on
> dedicated machines, and some are deliberately configured with
> performance-trashing options.  (I think just about all of 'em use
> --enable-cassert, but there are some with worse things...)
> 
> We can probably share a great deal of the existing buildfarm code and
> infrastructure, but the actual members of the p-farm will need to be a
> separate collection of machines running different builds.

Yes, I agree that to get reliable and useful performance data, we need
well defined environment (dedicated machines, proper settings) and this
probably is not possible with most of the current buildfarm members.
That's not an issue of the buildfarm - it simply serves other purposes,
not performance testing.

But I believe using the buildfarm framework is a good idea, and maybe we
could even use some of the nodes (those that are properly set).

Not sure if there should be two separate farms (one to test builds, one
to test performance), or if we could run one farm and enable
'performance-testing modules' only for some of the members.

regards
Tomas


Re: performance-test farm

From
Tomas Vondra
Date:
Dne 12.5.2011 08:54, Greg Smith napsal(a):
> Tomas Vondra wrote:
>> Actually I was not aware of how the buildfarm works, all I
>> knew was there's something like that because some of the hackers mention
>> a failed build on the mailing list occasionally.
>>
>> So I guess this is a good opportunity to investigate it a bit ;-)
>>
>> Anyway I'm not sure this would give us the kind of environment we need
>> to do benchmarks ... but it's worth to think of.
>>   
> 
> The idea is that buildfarm systems that are known to have a) reasonable
> hardware and b) no other concurrent work going on could also do
> performance tests.  The main benefit of this approach is it avoids
> duplicating all of the system management and source code building work
> needed for any sort of thing like this; just leverage the buildfarm
> parts when they solve similar enough problems.  Someone has actually
> done all that already; source code was last sync'd to the build farm
> master at the end of March:  https://github.com/greg2ndQuadrant/client-code

Yes, I think that using the existing buildfarm framework is a good idea.
Do you think we should integrate this into the current buildfarm
(although only the selected nodes would run these performance tests) or
that it should be a separate farm?

regards
Tomas


Re: performance-test farm

From
Greg Smith
Date:
Tom Lane wrote:
> There's no such thing as a useful performance test that runs quickly
> enough to be sane to incorporate in our standard regression tests.
>   

To throw a hard number out:  I can get a moderately useful performance 
test on a SELECT-only workload from pgbench in about one minute.  That's 
the bare minimum, stepping up to 5 minutes is really necessary before 
I'd want to draw any real conclusions.

More importantly, a large portion of the time I'd expect regression test 
runs to be happening with debug/assert on.  We've well established this 
trashes pgbench performance.  One of the uglier bits of code added to 
add the "performance farm" feature to the buildfarm code was hacking in 
a whole different set of build options for it.

Anyway, what I was envisioning here was that performance farm systems 
would also execute the standard buildfarm tests, but not the other way 
around.  We don't want performance numbers from some platform that is 
failing the basic tests.  I would just expect that systems running the 
performance tests would cycle through regression testing much less 
often, as they might miss a commit because they were running a longer 
test then.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: performance-test farm

From
Josh Berkus
Date:
All,

BTW, if we want a kind of "performance unit test", Drizzle has a very
nice framework for this.  And it's even already PostgreSQL-compatible.

I'm hunting for the link for it now ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: performance-test farm

From
Lou Picciano
Date:
Josh My Man! How are you?!!

Is this the one?: http://planetdrizzle.org/

Lou Picciano

----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: pgsql-hackers@postgresql.org
Sent: Thursday, May 12, 2011 8:11:57 PM
Subject: Re: [HACKERS] performance-test farm

All,

BTW, if we want a kind of "performance unit test", Drizzle has a very
nice framework for this.  And it's even already PostgreSQL-compatible.

I'm hunting for the link for it now ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: performance-test farm

From
Josh Berkus
Date:
On 5/12/11 7:19 PM, Lou Picciano wrote:
> Josh My Man! How are you?!! 
> 
> 
> Is this the one?: http://planetdrizzle.org/ 

Since that's their blog feed, here's some durable links:

Testing tool:
http://docs.drizzle.org/testing/dbqp.html

Random query generator:
https://launchpad.net/randgen

However, looking at those now I'm not seeing response time as part of
the test, which is of course critical for us. Also, their test results
are diff-based, which is (as we know all too well) fragile.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: performance-test farm

From
Tomas Vondra
Date:
Dne 12.5.2011 08:54, Greg Smith napsal(a):
> Tomas Vondra wrote:
>> Actually I was not aware of how the buildfarm works, all I
>> knew was there's something like that because some of the hackers mention
>> a failed build on the mailing list occasionally.
>>
>> So I guess this is a good opportunity to investigate it a bit ;-)
>>
>> Anyway I'm not sure this would give us the kind of environment we need
>> to do benchmarks ... but it's worth to think of.
>>   
> 
> The idea is that buildfarm systems that are known to have a) reasonable
> hardware and b) no other concurrent work going on could also do
> performance tests.  The main benefit of this approach is it avoids
> duplicating all of the system management and source code building work
> needed for any sort of thing like this; just leverage the buildfarm
> parts when they solve similar enough problems.  Someone has actually
> done all that already; source code was last sync'd to the build farm
> master at the end of March:  https://github.com/greg2ndQuadrant/client-code
> 
> By far the #1 thing needed to move this forward from where it's stuck at
> now is someone willing to dig into the web application side of this. 
> We're collecting useful data.  It needs to now be uploaded to the
> server, saved, and then reports of what happened generated.  Eventually
> graphs of performance results over time will be straighforward to
> generate.  But the whole idea requires someone else (not Andrew, who has
> enough to do) sits down and figures out how to extend the web UI with
> these new elements.

Hi all,

it seems CSPUG will get two refurbished servers at the end of this
month. We plan to put both of them to the buildfarm - one for regular
testing with Czech locales and I'd like to use the other one for the
proposed performance testing.

I'm willing to put some time into this, but I'll need help with
preparing the 'action plan' (because you know - I live in EU, and in EU
everything is driven by action plans).

AFAIK what needs to be done is:

1) preparing the hw, OS etc. - ok
2) registering the machine as a buildfarm member - ok
3) modifying the buildfarm client-code to collect performance data
 - What data should to be collected prior to the benchmark?
    a) info about the environment (to make sure it's safe)?    b) something else?
 - What performance tests should be executed?
    a) let's start with pgbench - select-only and regular    b) something else in the future? DSS/DWH workloads?    c)
specialtests (spinlocks, db that fits to RAM, ...)
 

4) modifying the buildfarm server-code to accept and display  performance data
  - not really sure what needs to be done here

regards
Tomas


Re: performance-test farm

From
Tomas Vondra
Date:
On 12.5.2011 08:54, Greg Smith wrote:
> Tomas Vondra wrote:
>
> The idea is that buildfarm systems that are known to have a) reasonable
> hardware and b) no other concurrent work going on could also do
> performance tests.  The main benefit of this approach is it avoids
> duplicating all of the system management and source code building work
> needed for any sort of thing like this; just leverage the buildfarm
> parts when they solve similar enough problems.  Someone has actually
> done all that already; source code was last sync'd to the build farm
> master at the end of March:  https://github.com/greg2ndQuadrant/client-code
> 
> By far the #1 thing needed to move this forward from where it's stuck at
> now is someone willing to dig into the web application side of this. 
> We're collecting useful data.  It needs to now be uploaded to the
> server, saved, and then reports of what happened generated.  Eventually
> graphs of performance results over time will be straighforward to
> generate.  But the whole idea requires someone else (not Andrew, who has
> enough to do) sits down and figures out how to extend the web UI with
> these new elements.

Hi,

I'd like to revive this thread. A few days ago we have finally got our
buildfarm member working (it's called "magpie") - it's spending ~2h a
day chewing on the buildfarm tasks, so we can use the other 22h to do
some useful work.

I suppose most of you are busy with 9.2 features, but I'm not so I'd
like to spend my time on this.

Now that I had to set up the buildfarm member I'm somehow aware of how
the buildfarm works. I've checked the PGBuildFarm/server-code and
greg2ndQuadrant/client-code repositories and while I certainly am not a
perl whiz, I believe I can tweak it to handle the performance-related
result too.

What is the current state of this effort? Is there someone else working
on that? If not, I propose this (for starters):
 * add a new page "Performance results" to the menu, with a list of   members that uploaded the perfomance-results
 * for each member, there will be a list of tests along with a running   average for each test, last test and indicator
ifit improved, got   worse or is the same
 
 * for each member/test, a history of runs will be displayed, along   with a simple graph


I'm not quite sure how to define which members will run the performance
tests - I see two options:
 * for each member, add a flag "run performance tests" so that we can   choose which members are supposed to be safe
 OR
 * run the tests on all members (if enabled in build-farm.conf) and   then decide which results are relevant based on
datadescribing the   environment (collected when running the tests)
 


I'm also wondering if
 * using the buildfarm infrastructure the right thing to do, if it can   provide some 'advanced features' (see below)
 * we should use the current buildfarm members (although maybe not all   of them)
 * it can handle one member running the tests with different settings   (various shared_buffer/work_mem sizes, num of
clientsetc.) and   various hw configurations (for example magpie contains a regular   SATA drive as well as an SSD -
wouldbe nice to run two sets of   tests, one for the spinner, one for the SSD)
 
 * this can handle 'pushing' a list of commits to test (instead of   just testing the HEAD) so that we can ask the
membersto run the   tests for particular commits in the past (I consider this to be   very handy feature)
 


regards
Tomas


Re: performance-test farm

From
Robert Haas
Date:
On Mon, Mar 5, 2012 at 5:20 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> The idea is that buildfarm systems that are known to have a) reasonable
>> hardware and b) no other concurrent work going on could also do
>> performance tests.  The main benefit of this approach is it avoids
>> duplicating all of the system management and source code building work
>> needed for any sort of thing like this; just leverage the buildfarm
>> parts when they solve similar enough problems.  Someone has actually
>> done all that already; source code was last sync'd to the build farm
>> master at the end of March:  https://github.com/greg2ndQuadrant/client-code
>>
>> By far the #1 thing needed to move this forward from where it's stuck at
>> now is someone willing to dig into the web application side of this.
>> We're collecting useful data.  It needs to now be uploaded to the
>> server, saved, and then reports of what happened generated.  Eventually
>> graphs of performance results over time will be straighforward to
>> generate.  But the whole idea requires someone else (not Andrew, who has
>> enough to do) sits down and figures out how to extend the web UI with
>> these new elements.
>
> Hi,
>
> I'd like to revive this thread. A few days ago we have finally got our
> buildfarm member working (it's called "magpie") - it's spending ~2h a
> day chewing on the buildfarm tasks, so we can use the other 22h to do
> some useful work.
>
> I suppose most of you are busy with 9.2 features, but I'm not so I'd
> like to spend my time on this.
>
> Now that I had to set up the buildfarm member I'm somehow aware of how
> the buildfarm works. I've checked the PGBuildFarm/server-code and
> greg2ndQuadrant/client-code repositories and while I certainly am not a
> perl whiz, I believe I can tweak it to handle the performance-related
> result too.
>
> What is the current state of this effort? Is there someone else working
> on that? If not, I propose this (for starters):
>
>  * add a new page "Performance results" to the menu, with a list of
>    members that uploaded the perfomance-results
>
>  * for each member, there will be a list of tests along with a running
>    average for each test, last test and indicator if it improved, got
>    worse or is the same
>
>  * for each member/test, a history of runs will be displayed, along
>    with a simple graph

I suspect that the second of these is not that useful; presumably
there will be many small, irrelevant variations.  The graph is the key
thing.

> I'm not quite sure how to define which members will run the performance
> tests - I see two options:
>
>  * for each member, add a flag "run performance tests" so that we can
>    choose which members are supposed to be safe
>
>  OR
>
>  * run the tests on all members (if enabled in build-farm.conf) and
>    then decide which results are relevant based on data describing the
>    environment (collected when running the tests)

First option seems better to me, but I don't run any buildfarm critters, so...

> I'm also wondering if
>
>  * using the buildfarm infrastructure the right thing to do, if it can
>    provide some 'advanced features' (see below)
>
>  * we should use the current buildfarm members (although maybe not all
>    of them)

The main advantage of using the buildfarm, AFAICS, is that it would
make it less work for people to participate in this new thing.

>  * it can handle one member running the tests with different settings
>    (various shared_buffer/work_mem sizes, num of clients etc.) and
>    various hw configurations (for example magpie contains a regular
>    SATA drive as well as an SSD - would be nice to run two sets of
>    tests, one for the spinner, one for the SSD)

That would be nice.

>  * this can handle 'pushing' a list of commits to test (instead of
>    just testing the HEAD) so that we can ask the members to run the
>    tests for particular commits in the past (I consider this to be
>    very handy feature)

That would be nice, too.

I think it's great that you want to put some energy into this.  It's
something we have been talking about for years, but if we're making
any progress at all, it's glacial.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: performance-test farm

From
Greg Smith
Date:
On 03/05/2012 05:20 PM, Tomas Vondra wrote:
> What is the current state of this effort? Is there someone else working
> on that? If not, I propose this (for starters):
>
>    * add a new page "Performance results" to the menu, with a list of
>      members that uploaded the perfomance-results
>
>    * for each member, there will be a list of tests along with a running
>      average for each test, last test and indicator if it improved, got
>      worse or is the same
>
>    * for each member/test, a history of runs will be displayed, along
>      with a simple graph


I am quite certain no one else is working on this.

The results are going to bounce around over time.  "Last test" and 
simple computations based on it are not going to be useful.  A graph and 
a way to drill down into the list of test results is what I had in mind.

Eventually we'll want to be able to flag bad trends for observation, 
without having to look at the graph.  That's really optional for now, 
but here's how you could do that.  If you compare a short moving average 
to a longer one, you can find out when a general trend line has been 
crossed upwards or downwards, even with some deviation to individual 
samples.  There's a stock trading technique using this property called 
the moving average crossover; a good example is shown at 

http://eresearch.fidelity.com/backtesting/viewstrategy?category=Trend%20Following&wealthScriptType=MovingAverageCrossover

To figure this out given a series of TPS value samples, let MA(N) be the 
moving average of the last N samples.  Now compute a value like this:

MA(5) - MA(20)

If that's positive, recent tests have improved performance from the 
longer trend average.  If it's negative, performance has dropped.  We'll 
have to tweak the constants in this--5 and 20 in this case--to make this 
useful.  It should be sensitive enough to go off after a major change, 
while not complaining about tiny variations.  I can tweak those 
parameters easily once there's some sample data to check against.  They 
might need to be an optional part of the configuration file, since 
appropriate values might have some sensitivity to how often the test runs.

It's possible to keep a running weighted moving average without actually 
remembering all of the history.  The background writer works that way. 
I don't think that will be helpful here though, because you need a chunk 
of the history to draw a graph of it.

> I'm not quite sure how to define which members will run the performance
> tests - I see two options:
>
>    * for each member, add a flag "run performance tests" so that we can
>      choose which members are supposed to be safe
>
>    OR
>
>    * run the tests on all members (if enabled in build-farm.conf) and
>      then decide which results are relevant based on data describing the
>      environment (collected when running the tests)

I was thinking of only running this on nodes that have gone out of their 
way to enable this, so something more like the first option you gave 
here.  Some buildfarm animals might cause a problem for their owners 
should they suddenly start doing anything new that gobbles up a lot more 
resources.  It's important that any defaults--including what happens if 
you add this feature to the code but don't change the config file--does 
not run any performance tests.

>    * it can handle one member running the tests with different settings
>      (various shared_buffer/work_mem sizes, num of clients etc.) and
>      various hw configurations (for example magpie contains a regular
>      SATA drive as well as an SSD - would be nice to run two sets of
>      tests, one for the spinner, one for the SSD)
>
>    * this can handle 'pushing' a list of commits to test (instead of
>      just testing the HEAD) so that we can ask the members to run the
>      tests for particular commits in the past (I consider this to be
>      very handy feature)

I would highly recommend against scope creep in these directions.  The 
goal here is not to test hardware or configuration changes.  You've been 
doing a lot of that recently, and this chunk of software is not going to 
be a good way to automate such tests.

The initial goal of the performance farm is to find unexpected 
regressions in the performance of the database code, running some simple 
tests.  It should handle the opposite too, proving improvements work out 
as expected on multiple systems.  The buildfarm structure is suitable 
for that job.

If you want to simulate a more complicated test, one where things like 
work_mem matter, the first step there is to pick a completely different 
benchmark workload.  You're not going to do it with simple pgbench calls.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


Re: performance-test farm

From
Tomas Vondra
Date:
On 4.4.2012 05:35, Greg Smith wrote:
> On 03/05/2012 05:20 PM, Tomas Vondra wrote:
>> What is the current state of this effort? Is there someone else working
>> on that? If not, I propose this (for starters):
>>
>>    * add a new page "Performance results" to the menu, with a list of
>>      members that uploaded the perfomance-results
>>
>>    * for each member, there will be a list of tests along with a running
>>      average for each test, last test and indicator if it improved, got
>>      worse or is the same
>>
>>    * for each member/test, a history of runs will be displayed, along
>>      with a simple graph
> 
> 
> I am quite certain no one else is working on this.
> 
> The results are going to bounce around over time.  "Last test" and
> simple computations based on it are not going to be useful.  A graph and
> a way to drill down into the list of test results is what I had in mind.
> 
> Eventually we'll want to be able to flag bad trends for observation,
> without having to look at the graph.  That's really optional for now,
> but here's how you could do that.  If you compare a short moving average
> to a longer one, you can find out when a general trend line has been
> crossed upwards or downwards, even with some deviation to individual
> samples.  There's a stock trading technique using this property called
> the moving average crossover; a good example is shown at
>
http://eresearch.fidelity.com/backtesting/viewstrategy?category=Trend%20Following&wealthScriptType=MovingAverageCrossover

Yes, exactly. I've written 'last test' but I actually meant something
like this, i.e. detecting the change of the trend over time. The moving
average crossover looks interesting, although there are other ways to
achieve similar goal (e.g. correlating with a a pattern - a step
function for example, etc.).

> It's possible to keep a running weighted moving average without actually
> remembering all of the history.  The background writer works that way. I
> don't think that will be helpful here though, because you need a chunk
> of the history to draw a graph of it.

Keeping the history is not a big deal IMHO. And it gives us the freedom
to run a bit more complex analysis anytime later.

>> I'm not quite sure how to define which members will run the performance
>> tests - I see two options:
>>
>>    * for each member, add a flag "run performance tests" so that we can
>>      choose which members are supposed to be safe
>>
>>    OR
>>
>>    * run the tests on all members (if enabled in build-farm.conf) and
>>      then decide which results are relevant based on data describing the
>>      environment (collected when running the tests)
> 
> I was thinking of only running this on nodes that have gone out of their
> way to enable this, so something more like the first option you gave
> here.  Some buildfarm animals might cause a problem for their owners
> should they suddenly start doing anything new that gobbles up a lot more
> resources.  It's important that any defaults--including what happens if
> you add this feature to the code but don't change the config file--does
> not run any performance tests.

Yes, good points. Default should be 'do not run performance test' then.

>>    * it can handle one member running the tests with different settings
>>      (various shared_buffer/work_mem sizes, num of clients etc.) and
>>      various hw configurations (for example magpie contains a regular
>>      SATA drive as well as an SSD - would be nice to run two sets of
>>      tests, one for the spinner, one for the SSD)
>>
>>    * this can handle 'pushing' a list of commits to test (instead of
>>      just testing the HEAD) so that we can ask the members to run the
>>      tests for particular commits in the past (I consider this to be
>>      very handy feature)
> 
> I would highly recommend against scope creep in these directions.  The
> goal here is not to test hardware or configuration changes.  You've been
> doing a lot of that recently, and this chunk of software is not going to
> be a good way to automate such tests.
> 
> The initial goal of the performance farm is to find unexpected
> regressions in the performance of the database code, running some simple
> tests.  It should handle the opposite too, proving improvements work out
> as expected on multiple systems.  The buildfarm structure is suitable
> for that job.

Testing hardware configuration changes was not the goal of the proposed
behavior. The goal was to test multiple (sane) PostgreSQL configs. There
are conditions that might demonstrate themselves only in certain
conditions (e.g. very small/large shared buffers, spinners/SSDs etc.).

Those are exacly the 'unexpected regressions' you've mentioned.

> If you want to simulate a more complicated test, one where things like
> work_mem matter, the first step there is to pick a completely different
> benchmark workload.  You're not going to do it with simple pgbench calls.

Yes, but I do expect to prepare custom pgbench scripts in the future to
test such things. So I want to design the code so that this is possible
(either right now or in the future).

A simple workaround would be to create a 'virtual member' for each
member configuration, e.g.
  magpie-hdd-1024-8 - magpie with HDD, 1024MB shared buffers and 8MB                      work_mem
  magpie-ssd-512-16 - magpie with SSD, 512MB shared buffers and 16MB                      work_mem

and so on. But IMHO it's a bit dirty solution.

Tomas