Re: verbose cost estimate - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: verbose cost estimate |
Date | |
Msg-id | 20191209232546.kfv2tojn42g3ccjq@development Whole thread Raw |
In response to | Re: verbose cost estimate (Greg Stark <stark@mit.edu>) |
Responses |
Re: verbose cost estimate
|
List | pgsql-hackers |
On Mon, Dec 09, 2019 at 05:27:01PM -0500, Greg Stark wrote: >On Mon, 9 Dec 2019 at 17:14, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> On Sat, Dec 07, 2019 at 11:34:12AM -0500, Tom Lane wrote: >> >Justin Pryzby <pryzby@telsasoft.com> writes: >> >> Jeff said: >> >>> |What would I find very useful is a verbosity option to get the cost >> >>> |estimates expressed as a multiplier of each *_cost parameter, rather than >> >>> |just as a scalar. >> > >> >> It seems to me that's "just" a matter of redefining Cost and fixing everything that breaks: >> > >> >> struct Cost { >> >> double seq, rand; >> >> double cpu_tuple, cpu_index_tuple, cpu_oper; >> >> double parallel_setup; // This is probably always in startup_cost and never in run_cost >> >> double parallel_tuple; // This is probably always in run_cost and never in startup_cost >> >> double disable; >> >> }; >> > >> >> I'm perhaps 50% done with that - is there some agreement that's a desirable >> >> goal and a good way to do it ? >> > >> >No, I think this will get rejected out of hand. The implications for >> >the planner's speed and memory consumption seem quite unacceptable >> >for the size of the benefit. What you're showing above probably >> >doubles the size of most Paths, and the added cycles in hot-spots >> >like add_path seem pretty daunting. >> > >> >> Yeah, that's an issue. But I have to admit my main issue with this >> proposal is that I have no idea how I'd interpret this Cost. I mean, >> what do the fields express for different types of paths? How do they >> contribute to the actual cost of that path? > >What I think users would be able to do with this info is understand >which parameter to tweak to raise the estimated cost of the node. > >Everyone knows if you see a index scan is being used but is taking >longer than a sequential scan then you might try raising >random_page_cost. But I rarely see people tweaking the more "exotic" >parameters like operator_tuple_cost or index_tuple_cost and when they >do they aren't really sure what nodes they're affecting... > Well, but that's kinda my point - how would you know that you need to increase random_page_cost, or how big influence it has? The total is a fairly non-trivial combination of various cost parameters, effective cache size etc. Maybe I just don't understand how the cost is split into those pieces, named the same as the cost GUCs ... >I remember planning to do a very similar thing back in the 8.3 era and >never getting around to it. You could imaging even storing these for >the overall plan in the logs and building a large matrix of actual >execution values versus these broken out individual costs. Then it >becomes a standard linear optimization problem to find the optimal >values for each parameter to minimize inaccurate plan estimates (and >to identify cases where there are outliers). > Maybe, but that's for one query. If you do this for many queries, the results may be easily contradicting, no? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: