Thread: Convert MAX_SAOP_ARRAY_SIZE to new guc
Summary:
Create new guc array_optimization_size_limit and use it to replace
MAX_SAOP_ARRAY_SIZE in predtest.c.
Among other things this allows tuning when `col IN (1,2,3)` style
expressions can be matched against partial indexes.
It also fixes the comment:
"XXX is it worth exposing this as a GUC knob?"
Status:
The attached patch applies cleanly to master, builds without error,
and passes tests locally.
Thanks,
James Coleman
Attachment
Note: the original email from David went to my spam folder, and it also didn't show up on the archives (I assume caught by a spam filter there also?)
Thanks for taking this on!
As far as you can tell, is the default correct at 100?
I'm not sure what a good way of measuring it would be (that is, what all the possible cases are). I did try very simple SELECT * FROM t WHERE i IN (...) style queries with increasing size and was able to see increased planning time, but nothing staggering (going from 1000 to 2000 increased from ~1.5ms to 2.5ms planning time, in an admittedly very unscientific test.)
I think it's reasonable to leave the default at 100 for now. You could make an argument for increasing it since the limit currently affects whether scalar array ops can use partial indexes with "foo is not null" conditions, but I think that's better solved more holistically, as I've attempted to do in https://www.postgresql.org/message-id/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg%40mail.gmail.com
What are some issues that might arise if it's set too low/too high?
Too low would result in queries being planned unsatisfactorily (i.e., scalar array ops switching from partial index scans to seq scans), and setting it too high could significantly increase planning time.
On Fri, Nov 9, 2018 at 1:32 PM James Coleman <jtc331@gmail.com> wrote:
Summary:Create new guc array_optimization_size_limit and use it to replaceMAX_SAOP_ARRAY_SIZE in predtest.c.Status:The attached patch applies cleanly to master, builds without error,and passes tests locally.
Confirmed that it applies and builds cleaning and regresses without error in my environment (osx/clang)
My main comment is that the description of the purpose of the GUC doesn't help me understand when or why I might want to alter it from the default value. If nobody is going to alter it, because nobody understands it, it might as well remain a compile-time constant.
+ <para>
+ Sets the array size limit beyond which predicate optimization is not used.
+ The optimizer evaluates scalar array expressions to determine if they can
+ be treated as AND or OR clauses. This optimization proving is only performed
+ if the array contains at most this many items.
+ The default is <literal>100</literal>.
+ </para>
If I lower the value, what problem or use case do I solve? If I increase it, what do I solve? What gets faster or slower at different settings of the value? The description doesn't mention using the "IN" SQL clause which is the use case the parameter targets. I'd suggest alternate wording, but I'm actually still not 100% sure how a larger value would change the behaviour of "IN" in the presence of large numbers of values?
P.
Paul Ramsey <pramsey@cleverelephant.ca> writes: > On Fri, Nov 9, 2018 at 1:32 PM James Coleman <jtc331@gmail.com> wrote: >> Create new guc array_optimization_size_limit and use it to replace >> MAX_SAOP_ARRAY_SIZE in predtest.c. > My main comment is that the description of the purpose of the GUC doesn't > help me understand when or why I might want to alter it from the default > value. If nobody is going to alter it, because nobody understands it, it > might as well remain a compile-time constant. Yeah, that's sort of my reaction as well. I also feel like this is a mighty special case to expose as a separate GUC. There are other magic effort-limiting constants elsewhere in the planner --- we just added a new one in e3f005d97, for instance --- and I can't get very excited about exposing and trying to document them individually. We also have a lot of existing exposed knobs like join_collapse_limit and the various geqo parameters, which basically nobody knows how to use, a precedent that isn't encouraging for adding more. There have been occasional discussions of inventing a master "planner effort" control knob, with values say 1..10 [1], and allowing that one thing to control all these decisions, as well as other things we might do in the future that would cause increased planning time that might or might not get paid back. I'd rather see somebody put effort into designing a coherent feature like that than figure out how to document finer-grained knobs. regards, tom lane [1] ... but my inner Spinal Tap fan wants it to go to 11.
> > My main comment is that the description of the purpose of the GUC doesn't > > help me understand when or why I might want to alter it from the default > > value. If nobody is going to alter it, because nobody understands it, it > > might as well remain a compile-time constant. > > Yeah, that's sort of my reaction as well. I also feel like this is a > mighty special case to expose as a separate GUC. There are other magic > effort-limiting constants elsewhere in the planner --- we just added a > new one in e3f005d97, for instance --- and I can't get very excited about > exposing and trying to document them individually. We also have a lot > of existing exposed knobs like join_collapse_limit and the various geqo > parameters, which basically nobody knows how to use, a precedent that > isn't encouraging for adding more. I'd be happy to yank this in favor of my holistic solution to this problem I posted recently on the mailing list [1]. Assuming we go that route, I'd propose we still yank the existing todo comment about turning it into a GUC. [1] https://www.postgresql.org/message-id/flat/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg%40mail.gmail.com
On Fri, 16 Nov 2018 at 14:00, James Coleman <jtc331@gmail.com> wrote:
--
> > My main comment is that the description of the purpose of the GUC doesn't
> > help me understand when or why I might want to alter it from the default
> > value. If nobody is going to alter it, because nobody understands it, it
> > might as well remain a compile-time constant.
>
> Yeah, that's sort of my reaction as well. I also feel like this is a
> mighty special case to expose as a separate GUC. There are other magic
> effort-limiting constants elsewhere in the planner --- we just added a
> new one in e3f005d97, for instance --- and I can't get very excited about
> exposing and trying to document them individually. We also have a lot
> of existing exposed knobs like join_collapse_limit and the various geqo
> parameters, which basically nobody knows how to use, a precedent that
> isn't encouraging for adding more.
I'd be happy to yank this in favor of my holistic solution to this
problem I posted recently on the mailing list [1].
Not precisely sure what you mean - are you saying that we can just have an overall test for NOT NULL, which thereby avoids the need to expand the array and therefore dispenses with the GUC completely?
Having indexes defined using WHERE NOT NULL is a very important use case.
Assuming we go that route, I'd propose we still yank the existing todo
comment about turning it into a GUC.
Agreed
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > On Fri, 16 Nov 2018 at 14:00, James Coleman <jtc331@gmail.com> wrote: >>> Yeah, that's sort of my reaction as well. I also feel like this is a >>> mighty special case to expose as a separate GUC. There are other magic >>> effort-limiting constants elsewhere in the planner --- we just added a >>> new one in e3f005d97, for instance --- and I can't get very excited about >>> exposing and trying to document them individually. We also have a lot >>> of existing exposed knobs like join_collapse_limit and the various geqo >>> parameters, which basically nobody knows how to use, a precedent that >>> isn't encouraging for adding more. >> I'd be happy to yank this in favor of my holistic solution to this >> problem I posted recently on the mailing list [1]. >> [1] https://www.postgresql.org/message-id/flat/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg%40mail.gmail.com > Not precisely sure what you mean - are you saying that we can just have an > overall test for NOT NULL, which thereby avoids the need to expand the > array and therefore dispenses with the GUC completely? No, he's saying that other thing solves his particular problem. We certainly have seen other cases where people wished they could adjust MAX_SAOP_ARRAY_SIZE. I'm just not excited about exposing a GUC that does exactly that one thing. I'd rather have some more-generic knob that's not so tightly tied to implementation details. regards, tom lane
>> I'd be happy to yank this in favor of my holistic solution to this >> problem I posted recently on the mailing list [1]. > > [1] https://www.postgresql.org/message-id/flat/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg%40mail.gmail.com > > Not precisely sure what you mean - are you saying that we can just have an overall test for NOT NULL, which thereby avoidsthe need to expand the array and therefore dispenses with the GUC completely? > > Having indexes defined using WHERE NOT NULL is a very important use case. I don't think we can avoid expanding the array for other cases (for example, being able to infer that "foo < 5" for "foo IN (1,2,3,4)". If we wanted to keep that inference without expanding the array we'd have to (at minimum, I think) duplicate a lot of the existing inference logic, but I haven't investigated it much. So keeping the GUC could allow someone to tune how large an array can be and still guarantee inferences like "foo < 5". But I'm not sure that that is as valuable. At least I haven't run into cases where I've noticed a need for it. My patch only addresses the IS NOT NULL inference precisely for the reason you state: we have lots of plans that (unless you tack on an explicit "foo IS NOT NULL" to your query) the planner decides it can't use WHERE NOT NULL indexes because it can't currently infer the correctness of that for large arrays.
On Thu, Nov 15, 2018 at 5:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > There have been occasional discussions of inventing a master "planner > effort" control knob, with values say 1..10 [1], and allowing that one > thing to control all these decisions, as well as other things we might do > in the future that would cause increased planning time that might or might > not get paid back. I'd rather see somebody put effort into designing a > coherent feature like that than figure out how to document finer-grained > knobs. FWIW, I find it hard to believe this will make users very happy. I think it'll just lead to people complaining that they can't get planner optimization A without paying the cost of planner optimization B. The stuff that people have proposed grouping under a planner effort knob is all pretty much corner case behavior, so a lot of people won't get any benefit at all from turning up the knob, and among those that do, there will probably be one specific behavior that they want to enable, so the optimal value will be just high enough to enable that behavior, and then they'll wonder why they can't just enable that one thing. I do think it would make users happy if you could make it a nice, smooth curve: turning up the planner strength increases planning time at a predictable rate and decreases execution time at a predictable rate. But I really doubt it's possible to create something that works that way. > [1] ... but my inner Spinal Tap fan wants it to go to 11. +1 for that, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Nov 15, 2018 at 5:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> There have been occasional discussions of inventing a master "planner >> effort" control knob, with values say 1..10 [1], and allowing that one >> thing to control all these decisions, as well as other things we might do >> in the future that would cause increased planning time that might or might >> not get paid back. I'd rather see somebody put effort into designing a >> coherent feature like that than figure out how to document finer-grained >> knobs. > FWIW, I find it hard to believe this will make users very happy. I > think it'll just lead to people complaining that they can't get > planner optimization A without paying the cost of planner optimization > B. I don't think so, because right now they (a) can't get either optimization, and/or (b) don't know what either one does or how to invoke it. Also (c) exposing such knobs creates backwards-compatibility problems for us any time we want to change the associated behavior, which is hardly an unlikely wish considering that mostly these are kluges by definition. (Which contributes to the documentation problem Paul already noted.) A planner-effort knob would be really easy to understand, I think, and we'd not be tied to any particular details about what it does. > The stuff that people have proposed grouping under a planner > effort knob is all pretty much corner case behavior, One of the first things I'd replace with such a knob is join_collapse_limit/from_collapse_limit, which is by no stretch of the imagination a corner case. > I do think it would make users happy if you could make it a nice, > smooth curve: turning up the planner strength increases planning time > at a predictable rate and decreases execution time at a predictable > rate. But I really doubt it's possible to create something that works > that way. Yeah, it's unlikely that it'd be linear on any particular query. There would be jumps in the planner runtime whenever relevant thresholds were exceeded. >> [1] ... but my inner Spinal Tap fan wants it to go to 11. > +1 for that, though. It's tempting to imagine that 10 means "highest reasonable effort limits" and then 11 disengages all limits. The practical use of that would be if you wanted to see whether the planner *could* produce the plan you wanted and was just not trying hard enough. regards, tom lane
On Fri, Nov 16, 2018 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't think so, because right now they (a) can't get either > optimization, and/or (b) don't know what either one does or > how to invoke it. Sure. But as soon as they know that, they're just going to try to figure out how to get the thing they want without the stuff they don't want. > A planner-effort knob would be really easy to understand, I think, > and we'd not be tied to any particular details about what it does. That's just wishful thinking. People will want to know what it does, they'll want that to be documented, and they complain if it changes from release to release. I mean, you've often taken the position that people will notice and/or care deeply if our *C* interfaces change from release to release, and even moreso in a minor release. I think you overstate that danger, but it must be admitted that the danger is not zero. GUCs, unlike C functions, are unarguably part of the exposed interface, and the danger there is considerably more, at least IMHO. > One of the first things I'd replace with such a knob is > join_collapse_limit/from_collapse_limit, which is by no stretch > of the imagination a corner case. True. So then you'll have people who can't get sufficiently-high collapse limits without enabling a bunch of other stuff they don't care about, or on the other hand have to raise the collapse limits higher than makes sense for them to get the other optimizations that they want. They also won't be able to use the hack where you set join_collapse_limit=1 to force a join ordering any more. And the mapping of the 1..infinity collapse limit space onto the 1..10 planner effort space is going to be basically totally arbitrary. There is no hope at all that you're going to pick values that everyone likes. I think it might be reasonable to set various individual GUCs to values that mean "use the autoconfigure default" and then provide a planner-strength GUC that varies that default. But I believe that depriving people of the ability to control the settings individually is bound to produce complaints, both for things where we already expose them (like the collapse limits) and for things where we don't (like $SUBJECT). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company