Re: [PERFORM] Re: OLAP/reporting queries fall into nested loops overseq scans or other horrible planner choices - Mailing list pgsql-performance

From Dave Nicponski
Subject Re: [PERFORM] Re: OLAP/reporting queries fall into nested loops overseq scans or other horrible planner choices
Date
Msg-id CACuOpDbn4ztuGxmckim8QwavxmpOuF6X+iR-HWu4F7PawEEP2Q@mail.gmail.com
Whole thread Raw
In response to Re: [PERFORM] Re: OLAP/reporting queries fall into nested loops overseq scans or other horrible planner choices  (Gunther <raj@gusw.net>)
List pgsql-performance
Thank you Gunther for bringing this up.  It's been bothering me quite a bit over time as well.

Forgive the naive question, but does the query planner's cost estimator only track a single estimate of cost that gets accumulated and compared across plan variants?  Or is it keeping a range or probabilistic distribution?  I'm suspecting the former, but i bet either of the latter would fix this rapidly.

The cases that frustrate me are where NL is chosen over something like HJ, where if the query planner is slightly wrong on the lower side, then NL would certainly beat HJ (but by relatively small amounts), but a slight error on the higher side mean that the NL gets punished tremendously, do to the big-o penalty difference it's paying over the HJ approach.  Having the planner with some notion of the distribution might help it make a better assessment of the potential consequences for being slightly off in its estimates.  If it notices that being off on a plan involving a NL sends the distribution off into hours instead of seconds, it could potentially avoid it even if it might be slightly faster in the mean.

<fantasy> If i ever find time, maybe i'll try to play around with this idea and see how it performs... </fantasy>

   -dave-

On Fri, Nov 3, 2017 at 11:13 AM, Gunther <raj@gusw.net> wrote:
On 11/3/2017 10:55, legrand legrand wrote:
To limit NL usage, wouldn't a modified set of Planner Cost Constants
https://www.postgresql.org/docs/current/static/runtime-config-query.html
<https://www.postgresql.org/docs/current/static/runtime-config-query.html>

seq_page_cost
random_page_cost
cpu_tuple_cost
cpu_index_tuple_cost
cpu_operator_cost

be more hash join freindly (as Oracle' optimizer_index_cost_adj )?

I twiddled with some of these and could nudge it toward a Sort Merge instead NL. But it's hit or miss.

May be there should be a tool which you can run periodically which will test out the installation to see how IO, CPU, and memory performs. Or, again, these statistics should be collected during normal operation so that nobody needs to guess them or test them in complex procedures. As the system runs, it should sample the seq_page_cost and random_page_cost (noticing that it has a SSD or HDD) and it should see how much disk read is from cache and how much goes out to disk. Why isn't the executor of queries the best person to ask for these cost constants?

regards,
-Gunther



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--

Dave Nicponski

Chief Technology Officer

917.696.3081 

dave@seamlessdocs.com

30 Vandam Street. 2nd Floor. NYC 
855.77.SEAMLESS | SeamlessGov.com

pgsql-performance by date:

Previous
From: Gunther
Date:
Subject: Re: [PERFORM] Re: OLAP/reporting queries fall into nested loops overseq scans or other horrible planner choices
Next
From: 刘瑞
Date:
Subject: [PERFORM] Unnecessary DISTINCT while primary key in SQL