Home > mailing lists

Re: [PERFORM] Re: OLAP/reporting queries fall into nested loops overseq scans or other horrible planner choices - Mailing list pgsql-performance

From	Dave Nicponski
Subject	Re: [PERFORM] Re: OLAP/reporting queries fall into nested loops overseq scans or other horrible planner choices
Date	November 3, 2017 18:56:45
Msg-id	CACuOpDbn4ztuGxmckim8QwavxmpOuF6X+iR-HWu4F7PawEEP2Q@mail.gmail.com Whole thread Raw
In response to	Re: [PERFORM] Re: OLAP/reporting queries fall into nested loops overseq scans or other horrible planner choices (Gunther <raj@gusw.net>)
List	pgsql-performance

Tree view

Thank you Gunther for bringing this up. It's been bothering me quite a bit over time as well.

Forgive the naive question, but does the query planner's cost estimator only track a single estimate of cost that gets accumulated and compared across plan variants? Or is it keeping a range or probabilistic distribution? I'm suspecting the former, but i bet either of the latter would fix this rapidly.

The cases that frustrate me are where NL is chosen over something like HJ, where if the query planner is slightly wrong on the lower side, then NL would certainly beat HJ (but by relatively small amounts), but a slight error on the higher side mean that the NL gets punished tremendously, do to the big-o penalty difference it's paying over the HJ approach. Having the planner with some notion of the distribution might help it make a better assessment of the potential consequences for being slightly off in its estimates. If it notices that being off on a plan involving a NL sends the distribution off into hours instead of seconds, it could potentially avoid it even if it might be slightly faster in the mean.

<fantasy> If i ever find time, maybe i'll try to play around with this idea and see how it performs... </fantasy>

-dave-

On Fri, Nov 3, 2017 at 11:13 AM, Gunther <raj@gusw.net> wrote:

On 11/3/2017 10:55, legrand legrand wrote:
To limit NL usage, wouldn't a modified set of Planner Cost Constants
https://www.postgresql.org/docs/current/static/runtime-config-query.html
<https://www.postgresql.org/docs/current/static/runtime-config-query.html>

seq_page_cost
random_page_cost
cpu_tuple_cost
cpu_index_tuple_cost
cpu_operator_cost

be more hash join freindly (as Oracle' optimizer_index_cost_adj )?

I twiddled with some of these and could nudge it toward a Sort Merge instead NL. But it's hit or miss.

May be there should be a tool which you can run periodically which will test out the installation to see how IO, CPU, and memory performs. Or, again, these statistics should be collected during normal operation so that nobody needs to guess them or test them in complex procedures. As the system runs, it should sample the seq_page_cost and random_page_cost (noticing that it has a SSD or HDD) and it should see how much disk read is from cache and how much goes out to disk. Why isn't the executor of queries the best person to ask for these cost constants?

regards,
-Gunther

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Dave Nicponski

Chief Technology Officer

917.696.3081

dave@seamlessdocs.com

30 Vandam Street. 2nd Floor. NYC
855.77.SEAMLESS | SeamlessGov.com

pgsql-performance by date:

From: Gunther
Date: 03 November 2017, 18:13:35
Subject: Re: [PERFORM] Re: OLAP/reporting queries fall into nested loops overseq scans or other horrible planner choices

From: 刘瑞
Date: 04 November 2017, 18:20:42
Subject: [PERFORM] Unnecessary DISTINCT while primary key in SQL

Re: [PERFORM] Re: OLAP/reporting queries fall into nested loops overseq scans or other horrible planner choices - Mailing list pgsql-performance

Previous

Next