Re: verbose cost estimate - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: verbose cost estimate
Date
Msg-id 20191209232546.kfv2tojn42g3ccjq@development
Whole thread Raw
In response to Re: verbose cost estimate  (Greg Stark <stark@mit.edu>)
Responses Re: verbose cost estimate
List pgsql-hackers
On Mon, Dec 09, 2019 at 05:27:01PM -0500, Greg Stark wrote:
>On Mon, 9 Dec 2019 at 17:14, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Sat, Dec 07, 2019 at 11:34:12AM -0500, Tom Lane wrote:
>> >Justin Pryzby <pryzby@telsasoft.com> writes:
>> >> Jeff said:
>> >>> |What would I find very useful is a verbosity option to get the cost
>> >>> |estimates expressed as a multiplier of each *_cost parameter, rather than
>> >>> |just as a scalar.
>> >
>> >> It seems to me that's "just" a matter of redefining Cost and fixing everything that breaks:
>> >
>> >> struct Cost {
>> >>         double seq, rand;
>> >>         double cpu_tuple, cpu_index_tuple, cpu_oper;
>> >>         double parallel_setup; // This is probably always in startup_cost and never in run_cost
>> >>      double parallel_tuple; // This is probably always in run_cost and never in startup_cost
>> >>         double disable;
>> >> };
>> >
>> >> I'm perhaps 50% done with that - is there some agreement that's a desirable
>> >> goal and a good way to do it ?
>> >
>> >No, I think this will get rejected out of hand.  The implications for
>> >the planner's speed and memory consumption seem quite unacceptable
>> >for the size of the benefit.  What you're showing above probably
>> >doubles the size of most Paths, and the added cycles in hot-spots
>> >like add_path seem pretty daunting.
>> >
>>
>> Yeah, that's an issue. But I have to admit my main issue with this
>> proposal is that I have no idea how I'd interpret this Cost. I mean,
>> what do the fields express for different types of paths? How do they
>> contribute to the actual cost of that path?
>
>What I think users would be able to do with this info is understand
>which parameter to tweak to raise the estimated cost of the node.
>
>Everyone knows if you see a index scan is being used but is taking
>longer than a sequential scan then you might try raising
>random_page_cost. But I rarely see people tweaking the more "exotic"
>parameters like operator_tuple_cost or index_tuple_cost and when they
>do they aren't really sure what nodes they're affecting...
>

Well, but that's kinda my point - how would you know that you need to
increase random_page_cost, or how big influence it has? The total is a
fairly non-trivial combination of various cost parameters, effective
cache size etc. Maybe I just don't understand how the cost is split into
those pieces, named the same as the cost GUCs ...

>I remember planning to do a very similar thing back in the 8.3 era and
>never getting around to it. You could imaging even storing these for
>the overall plan in the logs and building a large matrix of actual
>execution values versus these broken out individual costs. Then it
>becomes a standard linear optimization problem to find the optimal
>values for each parameter to minimize inaccurate plan estimates (and
>to identify cases where there are outliers).
>

Maybe, but that's for one query. If you do this for many queries, the
results may be easily contradicting, no?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: "Karl O. Pinc"
Date:
Subject: Re: proposal: minscale, rtrim, btrim functions for numeric
Next
From: Tomas Vondra
Date:
Subject: Re: verbose cost estimate