Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters) - Mailing list pgsql-hackers

From Evgeniy Shishkin
Subject Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)
Date
Msg-id F83D5F6E-A42F-4270-9B5E-BD520575AAF2@gmail.com
Whole thread Raw
In response to More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters)
List pgsql-hackers
> On 30 Dec 2015, at 10:16, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> Hi,
>
> On [1] I suggested an idea to make improvements to the planner around the Equivalence Class code. Later in [2] Tom
raisedconcerns with this adding too many planning cycles for a perhaps not common enough situation.  I don't want to
discussthat particular patch here, I want to discuss more generally about the dilemma about adding more smarts to the
plannerto allow it to generate a more optimal plan in order to save on execution time. 
>
> In the case of the Equivalence Class Filters code, I quoted an example where pushing these filters down into the
joinedrelation caused a significant performance improvement to a query. Now, I understand Tom's concerns with slowing
downthe planner, as in cases where the query is short running, or the optimisations don't apply, then we could cause
thequery to overall (including planning time) perform worse. Nobody wants that, but on the other hand, if spending 5-10
extramicroseconds during planning equates to 6 hours shaved off execution time, then nobody would think to grudge that
extra5-10 microseconds during planning. 
>
> What I'd like to discuss here is what was touched on on that other thread on ways to get around this problem:
>
> A number of ideas were suggested on the other thread about how we might go about solving this problem. In [3] Simon
talkedabout perhaps enabling extra optimisations when the planner sees that the plan will cost more than some given
threshold.That's perhaps an option, but may not work well for optimisations which must take place very early in
planning,for example [4]. 
> Another idea which came up was from Evgeniy [5], which was more of a request not to do it this way, but
never-the-less,the idea was basically to add lots of GUCs to enable/disable each extra planner feature. 
>

Well, my idea was to track planning/execution cost in something like pg_stat_statements.
That way we can track actual time, not estimated cost like Simon proposed.

This table can be combined with Tomas proposal of plan caching.


> Another option which I've thought about previously was a planner_strength GUC, at which various additional
optimisationsare enabled at various predefined strength levels, so that databases which tend to spend a great deal more
executiontime compared to planning time can turn this up a bit to see if that helps change that ratio a bit.  This idea
isfar from perfect though, as who's to say that planner feature X should "kick in" before planner feature Y? I've also
oftenthought that it might be nice to have it so the planner does not modify the Parse object, so that the planner has
theoption to throw away what it's done so far and start planning all over again with the "planner_strength" knob turned
upto the maximum, if the cost happened to indicate that the query was going to take a long time to execute. 
>
> In reality we already have some planner features which are possible candidates for non essential optimisations. For
examplejoin removals likely don't apply in all that many cases, but when they do, this feature is a great win. So by
havingsome sort of ability to enable/disable planner features we also stand to actually speed the planner up for fast
simplequeries.  
>
> I do strongly believe that we need to come up with something to solve this problem. I already summarised my thoughts
onthe other thread. 
>
> I wrote:
> > I believe that with parallel query on the horizon for 9.6 that we're now
> > aiming to support bigger OLAP type database than ever before. So if we
> > ignore patches like this one then it appears that we have some conflicting
> > goals in the community as it seems that we're willing to add the brawn, but
> > we're not willing to add the brain. If this is the case then it's a shame,
> > as I think we can have both. So I very much agree on the fact that we must
> > find a way to maintain support and high performance of small OLTP databases
> > too.
>
> So here I'd very much like to kick off discussion on an acceptable way to solve this problem, in a realistic way
whichwe're all happy with. 
>
> Comments are of course welcome.
>
> [1] http://www.postgresql.org/message-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A@mail.gmail.com
> [2] http://www.postgresql.org/message-id/30810.1449335261@sss.pgh.pa.us
> [3] http://www.postgresql.org/message-id/CANP8+jLRpRN4ynMsRkOqhYi-Dw5JrODMOt05qejhrAyrsExVmg@mail.gmail.com
> [4] http://www.postgresql.org/message-id/CAKJS1f_UZ_MXtpot6EPXsgHSujoUCrKuXYHLH06h072rDXsCzw@mail.gmail.com
> [5] http://www.postgresql.org/message-id/2F30BA8B-DAB9-4907-9E4E-102D242566E3@gmail.com
>
> --
>  David Rowley                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_controldata/pg_resetxlog "Latest checkpoint's NextXID" format
Next
From: Tom Lane
Date:
Subject: --enable-depend by default (was Re: Patch: fix lock contention for HASHHDR.mutex)