Re: verbose cost estimate - Mailing list pgsql-hackers

From Greg Stark
Subject Re: verbose cost estimate
Date
Msg-id CAM-w4HO8FZckhf1AAzx=8RzRcJSJ3jDg_kz+K4yvE9JgDrgHEw@mail.gmail.com
Whole thread Raw
In response to Re: verbose cost estimate  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: verbose cost estimate
List pgsql-hackers
On Mon, 9 Dec 2019 at 17:14, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Sat, Dec 07, 2019 at 11:34:12AM -0500, Tom Lane wrote:
> >Justin Pryzby <pryzby@telsasoft.com> writes:
> >> Jeff said:
> >>> |What would I find very useful is a verbosity option to get the cost
> >>> |estimates expressed as a multiplier of each *_cost parameter, rather than
> >>> |just as a scalar.
> >
> >> It seems to me that's "just" a matter of redefining Cost and fixing everything that breaks:
> >
> >> struct Cost {
> >>         double seq, rand;
> >>         double cpu_tuple, cpu_index_tuple, cpu_oper;
> >>         double parallel_setup; // This is probably always in startup_cost and never in run_cost
> >>      double parallel_tuple; // This is probably always in run_cost and never in startup_cost
> >>         double disable;
> >> };
> >
> >> I'm perhaps 50% done with that - is there some agreement that's a desirable
> >> goal and a good way to do it ?
> >
> >No, I think this will get rejected out of hand.  The implications for
> >the planner's speed and memory consumption seem quite unacceptable
> >for the size of the benefit.  What you're showing above probably
> >doubles the size of most Paths, and the added cycles in hot-spots
> >like add_path seem pretty daunting.
> >
>
> Yeah, that's an issue. But I have to admit my main issue with this
> proposal is that I have no idea how I'd interpret this Cost. I mean,
> what do the fields express for different types of paths? How do they
> contribute to the actual cost of that path?

What I think users would be able to do with this info is understand
which parameter to tweak to raise the estimated cost of the node.

Everyone knows if you see a index scan is being used but is taking
longer than a sequential scan then you might try raising
random_page_cost. But I rarely see people tweaking the more "exotic"
parameters like operator_tuple_cost or index_tuple_cost and when they
do they aren't really sure what nodes they're affecting...

I remember planning to do a very similar thing back in the 8.3 era and
never getting around to it. You could imaging even storing these for
the overall plan in the logs and building a large matrix of actual
execution values versus these broken out individual costs. Then it
becomes a standard linear optimization problem to find the optimal
values for each parameter to minimize inaccurate plan estimates (and
to identify cases where there are outliers).

-- 
greg



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Unicode normalization test broken output
Next
From: Tom Lane
Date:
Subject: Re: remove support for old Python versions