Re: performance-test farm - Mailing list pgsql-hackers

From Greg Smith
Subject Re: performance-test farm
Date
Msg-id 4F7BC17C.1040908@2ndQuadrant.com
Whole thread Raw
In response to Re: performance-test farm  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: performance-test farm  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers
On 03/05/2012 05:20 PM, Tomas Vondra wrote:
> What is the current state of this effort? Is there someone else working
> on that? If not, I propose this (for starters):
>
>    * add a new page "Performance results" to the menu, with a list of
>      members that uploaded the perfomance-results
>
>    * for each member, there will be a list of tests along with a running
>      average for each test, last test and indicator if it improved, got
>      worse or is the same
>
>    * for each member/test, a history of runs will be displayed, along
>      with a simple graph


I am quite certain no one else is working on this.

The results are going to bounce around over time.  "Last test" and 
simple computations based on it are not going to be useful.  A graph and 
a way to drill down into the list of test results is what I had in mind.

Eventually we'll want to be able to flag bad trends for observation, 
without having to look at the graph.  That's really optional for now, 
but here's how you could do that.  If you compare a short moving average 
to a longer one, you can find out when a general trend line has been 
crossed upwards or downwards, even with some deviation to individual 
samples.  There's a stock trading technique using this property called 
the moving average crossover; a good example is shown at 

http://eresearch.fidelity.com/backtesting/viewstrategy?category=Trend%20Following&wealthScriptType=MovingAverageCrossover

To figure this out given a series of TPS value samples, let MA(N) be the 
moving average of the last N samples.  Now compute a value like this:

MA(5) - MA(20)

If that's positive, recent tests have improved performance from the 
longer trend average.  If it's negative, performance has dropped.  We'll 
have to tweak the constants in this--5 and 20 in this case--to make this 
useful.  It should be sensitive enough to go off after a major change, 
while not complaining about tiny variations.  I can tweak those 
parameters easily once there's some sample data to check against.  They 
might need to be an optional part of the configuration file, since 
appropriate values might have some sensitivity to how often the test runs.

It's possible to keep a running weighted moving average without actually 
remembering all of the history.  The background writer works that way. 
I don't think that will be helpful here though, because you need a chunk 
of the history to draw a graph of it.

> I'm not quite sure how to define which members will run the performance
> tests - I see two options:
>
>    * for each member, add a flag "run performance tests" so that we can
>      choose which members are supposed to be safe
>
>    OR
>
>    * run the tests on all members (if enabled in build-farm.conf) and
>      then decide which results are relevant based on data describing the
>      environment (collected when running the tests)

I was thinking of only running this on nodes that have gone out of their 
way to enable this, so something more like the first option you gave 
here.  Some buildfarm animals might cause a problem for their owners 
should they suddenly start doing anything new that gobbles up a lot more 
resources.  It's important that any defaults--including what happens if 
you add this feature to the code but don't change the config file--does 
not run any performance tests.

>    * it can handle one member running the tests with different settings
>      (various shared_buffer/work_mem sizes, num of clients etc.) and
>      various hw configurations (for example magpie contains a regular
>      SATA drive as well as an SSD - would be nice to run two sets of
>      tests, one for the spinner, one for the SSD)
>
>    * this can handle 'pushing' a list of commits to test (instead of
>      just testing the HEAD) so that we can ask the members to run the
>      tests for particular commits in the past (I consider this to be
>      very handy feature)

I would highly recommend against scope creep in these directions.  The 
goal here is not to test hardware or configuration changes.  You've been 
doing a lot of that recently, and this chunk of software is not going to 
be a good way to automate such tests.

The initial goal of the performance farm is to find unexpected 
regressions in the performance of the database code, running some simple 
tests.  It should handle the opposite too, proving improvements work out 
as expected on multiple systems.  The buildfarm structure is suitable 
for that job.

If you want to simulate a more complicated test, one where things like 
work_mem matter, the first step there is to pick a completely different 
benchmark workload.  You're not going to do it with simple pgbench calls.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Switching to Homebrew as recommended Mac install?
Next
From: Greg Smith
Date:
Subject: Re: checkpoint patches