Re: What about utility to calculate planner cost constants? - Mailing list pgsql-performance

From Dave Held
Subject Re: What about utility to calculate planner cost constants?
Date
Msg-id 49E94D0CFCD4DB43AFBA928DDD20C8F902618481@asg002.asg.local
Whole thread Raw
In response to What about utility to calculate planner cost constants?  ("Tambet Matiisen" <t.matiisen@aprote.ee>)
List pgsql-performance
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, March 22, 2005 3:48 PM
> To: Greg Stark
> Cc: Christopher Browne; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] What about utility to calculate planner cost
> constants?
> [...]
> The difficulty with the notion of doing that measurement by timing
> Postgres operations is that it's a horribly bad experimental setup.
> You have no way to isolate the effects of just one variable, or even
> a small number of variables, which you really need to do if you want
> to estimate with any degree of precision.  What's more, there
> are plenty of relevant factors that aren't in the model at all (such
> as the extent of other load on the machine), and so the noise in the
> measurements will be enormous.
>
> And you can't just dismiss the issue of wrong cost models and say we
> can get numbers anyway.  We see examples almost every day on this
> list where the planner is so far off about indexscan vs seqscan costs
> that you'd have to put random_page_cost far below 1 to make its numbers
> line up with reality.  That's not a matter of tuning the parameter,
> it's evidence that the cost model is wrong.  If you try to solve for
> the "right value" of the parameter by comparing estimated and actual
> costs, you'll get garbage, even without any issues about noisy
> measurements or numerical instability of your system of equations.

Then instead of building a fixed cost model, why not evolve an adaptive
model using an ANN or GA?  I can't say that I'm remotely familiar with
how the planner does its business, but perhaps we should throw away all
these tunable cost parameters and let a neural network create them
implicitly, if they really exist in any meaningful form.  I suppose the
inputs to the network would be the available scan types, the actual and
estimated rows, correlations, etc.  The outputs would be query plans, is
that right?  So we pick several representative data points in the query
space and train the network on those, to "bootstrap" it.  With any luck,
the network will generalize the training inputs and do a halfway decent
job on real-world values.  If a user is unhappy with the way the network
is operating, they can switch on a training mode whereby the network
tries some different plans for a given query and uses the execution time
to judge which plans worked the best.  The alternative plans can be
suggested by built-in heuristics or perhaps built randomly.  Of course,
such training would only be practical for smaller data sets, but perhaps
there would be a way to let the network perform a query on a subset of
the data and then extrapolate the behavior of a plan over the full data
set.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: What about utility to calculate planner cost constants?
Next
From: Simon Riggs
Date:
Subject: Re: CPU 0.1% IOWAIT 99% for decisonnal queries