"Marinos J. Yannikos" <mjy@geizhals.at> writes:
> Tom Lane wrote:
>> No, they are not that easy to determine. In particular I think the idea
>> of automatically feeding back error measurements is hopeless, because
>> you cannot tell which parameters are wrong.
> Isn't it just a matter of solving an equation system with n variables (n
> being the number of parameters), where each equation stands for the
> calculation of the run time of a particular query?
If we knew all the variables involved, it might be (though since the
equations would be nonlinear, the solution would be more difficult than
you suppose). The real problems are:
1. There is lots of noise in any real-world measurement, mostly due to
competition from other processes.
2. There are effects we don't even try to model, such as the current
contents of kernel cache. Everybody who's done any work with Postgres
knows that for small-to-middling tables, running the same query twice in
a row will yield considerably different runtimes, because the second
time through all the data will be in kernel cache. But we don't have
any useful way to model that in the optimizer, since we can't see what
the kernel has in its buffers.
3. Even for the effects we do try to model, some of the equations are
pretty ad-hoc and might not fit real data very well. (I have little
confidence in the current correction for index order correlation, for
example.)
In short, if you just try to fit the present cost equations to real
data, what you'll get will inevitably be "garbage in, garbage out".
You could easily end up with parameter values that are much less
realistic than the defaults.
Over time we'll doubtless improve the optimizer's cost models, and
someday we might get to a point where this wouldn't be a fool's errand,
but I don't see it happening in the foreseeable future.
I think a more profitable approach is to set up special test code to try
to approximate the value of individual parameters measured in isolation.
For instance, the current default of 4.0 for random_page_cost was
developed through rather extensive testing a few years ago, and I think
it's still a decent average value (for the case where you are actually
doing I/O, mind you). But if your disks have particularly fast or slow
seek times, maybe it's not good for you. It might be useful to package
up a test program that repeats those measurements on particular systems
--- though the problem of noisy measurements still applies. It is not
easy or cheap to get a measurement that isn't skewed by kernel caching
behavior. (You need a test file significantly larger than RAM, and
even then you'd better repeat the measurement quite a few times to see
how much noise there is in it.)
regards, tom lane