Re: performance-test farm - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: performance-test farm
Date
Msg-id 4F7C45F8.2000808@fuzzy.cz
Whole thread Raw
In response to Re: performance-test farm  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
On 4.4.2012 05:35, Greg Smith wrote:
> On 03/05/2012 05:20 PM, Tomas Vondra wrote:
>> What is the current state of this effort? Is there someone else working
>> on that? If not, I propose this (for starters):
>>
>>    * add a new page "Performance results" to the menu, with a list of
>>      members that uploaded the perfomance-results
>>
>>    * for each member, there will be a list of tests along with a running
>>      average for each test, last test and indicator if it improved, got
>>      worse or is the same
>>
>>    * for each member/test, a history of runs will be displayed, along
>>      with a simple graph
> 
> 
> I am quite certain no one else is working on this.
> 
> The results are going to bounce around over time.  "Last test" and
> simple computations based on it are not going to be useful.  A graph and
> a way to drill down into the list of test results is what I had in mind.
> 
> Eventually we'll want to be able to flag bad trends for observation,
> without having to look at the graph.  That's really optional for now,
> but here's how you could do that.  If you compare a short moving average
> to a longer one, you can find out when a general trend line has been
> crossed upwards or downwards, even with some deviation to individual
> samples.  There's a stock trading technique using this property called
> the moving average crossover; a good example is shown at
>
http://eresearch.fidelity.com/backtesting/viewstrategy?category=Trend%20Following&wealthScriptType=MovingAverageCrossover

Yes, exactly. I've written 'last test' but I actually meant something
like this, i.e. detecting the change of the trend over time. The moving
average crossover looks interesting, although there are other ways to
achieve similar goal (e.g. correlating with a a pattern - a step
function for example, etc.).

> It's possible to keep a running weighted moving average without actually
> remembering all of the history.  The background writer works that way. I
> don't think that will be helpful here though, because you need a chunk
> of the history to draw a graph of it.

Keeping the history is not a big deal IMHO. And it gives us the freedom
to run a bit more complex analysis anytime later.

>> I'm not quite sure how to define which members will run the performance
>> tests - I see two options:
>>
>>    * for each member, add a flag "run performance tests" so that we can
>>      choose which members are supposed to be safe
>>
>>    OR
>>
>>    * run the tests on all members (if enabled in build-farm.conf) and
>>      then decide which results are relevant based on data describing the
>>      environment (collected when running the tests)
> 
> I was thinking of only running this on nodes that have gone out of their
> way to enable this, so something more like the first option you gave
> here.  Some buildfarm animals might cause a problem for their owners
> should they suddenly start doing anything new that gobbles up a lot more
> resources.  It's important that any defaults--including what happens if
> you add this feature to the code but don't change the config file--does
> not run any performance tests.

Yes, good points. Default should be 'do not run performance test' then.

>>    * it can handle one member running the tests with different settings
>>      (various shared_buffer/work_mem sizes, num of clients etc.) and
>>      various hw configurations (for example magpie contains a regular
>>      SATA drive as well as an SSD - would be nice to run two sets of
>>      tests, one for the spinner, one for the SSD)
>>
>>    * this can handle 'pushing' a list of commits to test (instead of
>>      just testing the HEAD) so that we can ask the members to run the
>>      tests for particular commits in the past (I consider this to be
>>      very handy feature)
> 
> I would highly recommend against scope creep in these directions.  The
> goal here is not to test hardware or configuration changes.  You've been
> doing a lot of that recently, and this chunk of software is not going to
> be a good way to automate such tests.
> 
> The initial goal of the performance farm is to find unexpected
> regressions in the performance of the database code, running some simple
> tests.  It should handle the opposite too, proving improvements work out
> as expected on multiple systems.  The buildfarm structure is suitable
> for that job.

Testing hardware configuration changes was not the goal of the proposed
behavior. The goal was to test multiple (sane) PostgreSQL configs. There
are conditions that might demonstrate themselves only in certain
conditions (e.g. very small/large shared buffers, spinners/SSDs etc.).

Those are exacly the 'unexpected regressions' you've mentioned.

> If you want to simulate a more complicated test, one where things like
> work_mem matter, the first step there is to pick a completely different
> benchmark workload.  You're not going to do it with simple pgbench calls.

Yes, but I do expect to prepare custom pgbench scripts in the future to
test such things. So I want to design the code so that this is possible
(either right now or in the future).

A simple workaround would be to create a 'virtual member' for each
member configuration, e.g.
  magpie-hdd-1024-8 - magpie with HDD, 1024MB shared buffers and 8MB                      work_mem
  magpie-ssd-512-16 - magpie with SSD, 512MB shared buffers and 16MB                      work_mem

and so on. But IMHO it's a bit dirty solution.

Tomas


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: parallel pg_dump
Next
From: Tatsuo Ishii
Date:
Subject: Question regarding SSL code in backend and frontend