Re: Hints proposal - Mailing list pgsql-performance

From Christopher Browne
Subject Re: Hints proposal
Date
Msg-id 87r6xd9b91.fsf@wolfe.cbbrowne.com
Whole thread Raw
In response to Re: Hints proposal  (Arjen van der Meijden <acmmailing@tweakers.net>)
List pgsql-performance
Quoth rabroersma@yahoo.com (Richard Broersma Jr):
>> By the way, wouldn't it be possible if the planner learned from a query
>> execution, so it would know if a choice for a specific plan or estimate
>> was actually correct or not for future reference? Or is that in the line
>> of DB2's complexity and a very hard problem and/or would it add too much
>> overhead?
>
> Just thinking out-loud here...
>
> Wow, a learning cost based planner sounds a-lot like problem for
> control & dynamical systems theory.

Alas, dynamic control theory, home of considerable numbers of
Hamiltonian equations, as well as Pontryagin's Minimum Principle, is
replete with:
 a) Gory multivariate calculus
 b) Need for all kinds of continuity requirements (e.g. - continuous,
    smooth functions with no discontinuities or other "nastiness")
    otherwise the math gets *really* nasty

We don't have anything even resembling "continuous" because our
measures are all discrete (e.g. - the base values are all integers).

> As I understand it, much of the advice given for setting
> PostgreSQL's tune-able parameters are from "RULES-OF-THUMB."  I am
> sure that effect on server performance from all of the parameters
> could be modeled and an adaptive feed-back controller could be
> designed to tuned these parameters as demand on the server changes.

Optimal control theory loves the "bang-bang" control, where you go to
one extreme or another, which requires all those continuity conditions
I mentioned, and is almost certainly not the right answer here.

> Al-thought, I suppose that a controller like this would have limited
> success since some of the most affective parameters are non-run-time
> tune-able.
>
> In regards to query planning, I wonder if there is way to model a
> controller that could adjust/alter query plans based on a comparison
> of expected and actual query execution times.

I think there would be something awesomely useful about recording
expected+actual statistics along with some of the plans.

The case that is easiest to argue for is where Actual >>> Expected
(e.g. - Actual "was a whole lot larger than" Expected); in such cases,
you've already spent a LONG time on the query, which means that
spending millisecond recording the moral equivalent to "Explain
Analyze" output should be an immaterial cost.

If we could record a whole lot of these cases, and possibly, with some
anonymization / permissioning, feed the data to a central place, then
some analysis could be done to see if there's merit to particular
modifications to the query plan cost model.

Part of the *really* fundamental query optimization problem is that
there seems to be some evidence that the cost model isn't perfectly
reflective of the costs of queries.  Improving the quality of the cost
model is one of the factors that would improve the performance of the
query optimizer.  That would represent a fundamental improvement.
--
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/languages.html
"If I can see farther it is because I am surrounded by dwarves."
-- Murray Gell-Mann

pgsql-performance by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Hints proposal
Next
From: "Jim C. Nasby"
Date:
Subject: Re: [HACKERS] Hints proposal