Re: Performance - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Performance
Date
Msg-id 4DA6216E.9020907@fuzzy.cz
Whole thread Raw
In response to Re: Performance  (Nathan Boley <npboley@gmail.com>)
Responses Re: Performance  (Claudio Freire <klaussfreire@gmail.com>)
Re: Performance  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
Dne 14.4.2011 00:05, Nathan Boley napsal(a):
>>> If you model the costing to reflect the reality on your server, good
>>> plans will be chosen.
>>
>> Wouldn't it be "better" to derive those costs from actual performance
>> data measured at runtime?
>>
>> Say, pg could measure random/seq page cost, *per tablespace* even.
>>
>> Has that been tried?
>
> FWIW, awhile ago I wrote a simple script to measure this and found
> that the *actual* random_page / seq_page cost ratio was much higher
> than 4/1.
>
> The problem is that caching effects have a large effect on the time it
> takes to access a random page, and caching effects are very workload
> dependent. So anything automated would probably need to optimize the
> parameter values over a set of 'typical' queries, which is exactly
> what a good DBA does when they set random_page_cost...

Plus there's a separate pagecache outside shared_buffers, which adds
another layer of complexity.

What I was thinking about was a kind of 'autotuning' using real
workload. I mean - measure the time it takes to process a request
(depends on the application - could be time to load a page, process an
invoice, whatever ...) and compute some reasonable metric on it
(average, median, variance, ...). Move the cost variables a bit (e.g.
the random_page_cost) and see how that influences performance. If it
improved, do another step in the same direction, otherwise do step in
the other direction (or do no change the values at all).

Yes, I've had some lectures on non-linear programming so I'm aware that
this won't work if the cost function has multiple extremes (walleys /
hills etc.) but I somehow suppose that's not the case of cost estimates.

Another issue is that when measuring multiple values (processing of
different requests), the decisions may be contradictory so it really
can't be fully automatic.

regards
Tomas

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Performance
Next
From: Claudio Freire
Date:
Subject: Re: Performance