Thread: Convert MAX_SAOP_ARRAY_SIZE to new guc

Convert MAX_SAOP_ARRAY_SIZE to new guc

From
James Coleman
Date:
Summary:
Create new guc array_optimization_size_limit and use it to replace
MAX_SAOP_ARRAY_SIZE in predtest.c.

Among other things this allows tuning when `col IN (1,2,3)` style
expressions can be matched against partial indexes.

It also fixes the comment:
"XXX is it worth exposing this as a GUC knob?"

Status:
The attached patch applies cleanly to master, builds without error,
and passes tests locally.

Thanks,
James Coleman
Attachment

Re: Convert MAX_SAOP_ARRAY_SIZE to new guc

From
James Coleman
Date:
Note: the original email from David went to my spam folder, and it also didn't show up on the archives (I assume caught by a spam filter there also?)

Thanks for taking this on!

As far as you can tell, is the default correct at 100?

I'm not sure what a good way of measuring it would be (that is, what all the possible cases are). I did try very simple SELECT * FROM t WHERE i IN (...) style queries with increasing size and was able to see increased planning time, but nothing staggering (going from 1000 to 2000 increased from ~1.5ms to 2.5ms planning time, in an admittedly very unscientific test.)

I think it's reasonable to leave the default at 100 for now. You could make an argument for increasing it since the limit currently affects whether scalar array ops can use partial indexes with "foo is not null" conditions, but I think that's better solved more holistically, as I've attempted to do in https://www.postgresql.org/message-id/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg%40mail.gmail.com
 
What are some issues that might arise if it's set too low/too high?

Too low would result in queries being planned unsatisfactorily (i.e., scalar array ops switching from partial index scans to seq scans), and setting it too high could significantly increase planning time.

Re: Convert MAX_SAOP_ARRAY_SIZE to new guc

From
Paul Ramsey
Date:
On Fri, Nov 9, 2018 at 1:32 PM James Coleman <jtc331@gmail.com> wrote:
Summary:
Create new guc array_optimization_size_limit and use it to replace
MAX_SAOP_ARRAY_SIZE in predtest.c.

Status:
The attached patch applies cleanly to master, builds without error,
and passes tests locally.

Confirmed that it applies and builds cleaning and regresses without error in my environment (osx/clang)

My main comment is that the description of the purpose of the GUC doesn't help me understand when or why I might want to alter it from the default value. If nobody is going to alter it, because nobody understands it, it might as well remain a compile-time constant.

+       <para>
+        Sets the array size limit beyond which predicate optimization is not used.
+        The optimizer evaluates scalar array expressions to determine if they can
+        be treated as AND or OR clauses. This optimization proving is only performed
+        if the array contains at most this many items.
+        The default is <literal>100</literal>.
+       </para>

If I lower the value, what problem or use case do I solve? If I increase it, what do I solve? What gets faster or slower at different settings of the value? The description doesn't mention using the "IN" SQL clause which is the use case the parameter targets. I'd suggest alternate wording, but I'm actually still not 100% sure how a larger value would change the behaviour of "IN" in the presence of large numbers of values?

P.

Re: Convert MAX_SAOP_ARRAY_SIZE to new guc

From
Tom Lane
Date:
Paul Ramsey <pramsey@cleverelephant.ca> writes:
> On Fri, Nov 9, 2018 at 1:32 PM James Coleman <jtc331@gmail.com> wrote:
>> Create new guc array_optimization_size_limit and use it to replace
>> MAX_SAOP_ARRAY_SIZE in predtest.c.

> My main comment is that the description of the purpose of the GUC doesn't
> help me understand when or why I might want to alter it from the default
> value. If nobody is going to alter it, because nobody understands it, it
> might as well remain a compile-time constant.

Yeah, that's sort of my reaction as well.  I also feel like this is a
mighty special case to expose as a separate GUC.  There are other magic
effort-limiting constants elsewhere in the planner --- we just added a
new one in e3f005d97, for instance --- and I can't get very excited about
exposing and trying to document them individually.  We also have a lot
of existing exposed knobs like join_collapse_limit and the various geqo
parameters, which basically nobody knows how to use, a precedent that
isn't encouraging for adding more.

There have been occasional discussions of inventing a master "planner
effort" control knob, with values say 1..10 [1], and allowing that one
thing to control all these decisions, as well as other things we might do
in the future that would cause increased planning time that might or might
not get paid back.  I'd rather see somebody put effort into designing a
coherent feature like that than figure out how to document finer-grained
knobs.

            regards, tom lane

[1] ... but my inner Spinal Tap fan wants it to go to 11.


Re: Convert MAX_SAOP_ARRAY_SIZE to new guc

From
James Coleman
Date:
> > My main comment is that the description of the purpose of the GUC doesn't
> > help me understand when or why I might want to alter it from the default
> > value. If nobody is going to alter it, because nobody understands it, it
> > might as well remain a compile-time constant.
>
> Yeah, that's sort of my reaction as well.  I also feel like this is a
> mighty special case to expose as a separate GUC.  There are other magic
> effort-limiting constants elsewhere in the planner --- we just added a
> new one in e3f005d97, for instance --- and I can't get very excited about
> exposing and trying to document them individually.  We also have a lot
> of existing exposed knobs like join_collapse_limit and the various geqo
> parameters, which basically nobody knows how to use, a precedent that
> isn't encouraging for adding more.

I'd be happy to yank this in favor of my holistic solution to this
problem I posted recently on the mailing list [1].

Assuming we go that route, I'd propose we still yank the existing todo
comment about turning it into a GUC.

[1] https://www.postgresql.org/message-id/flat/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg%40mail.gmail.com


Re: Convert MAX_SAOP_ARRAY_SIZE to new guc

From
Simon Riggs
Date:
On Fri, 16 Nov 2018 at 14:00, James Coleman <jtc331@gmail.com> wrote:
> > My main comment is that the description of the purpose of the GUC doesn't
> > help me understand when or why I might want to alter it from the default
> > value. If nobody is going to alter it, because nobody understands it, it
> > might as well remain a compile-time constant.
>
> Yeah, that's sort of my reaction as well.  I also feel like this is a
> mighty special case to expose as a separate GUC.  There are other magic
> effort-limiting constants elsewhere in the planner --- we just added a
> new one in e3f005d97, for instance --- and I can't get very excited about
> exposing and trying to document them individually.  We also have a lot
> of existing exposed knobs like join_collapse_limit and the various geqo
> parameters, which basically nobody knows how to use, a precedent that
> isn't encouraging for adding more.

I'd be happy to yank this in favor of my holistic solution to this
problem I posted recently on the mailing list [1].

Not precisely sure what you mean - are you saying that we can just have an overall test for NOT NULL, which thereby avoids the need to expand the array and therefore dispenses with the GUC completely?

Having indexes defined using WHERE NOT NULL is a very important use case.
 
Assuming we go that route, I'd propose we still yank the existing todo
comment about turning it into a GUC.

Agreed 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Convert MAX_SAOP_ARRAY_SIZE to new guc

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Fri, 16 Nov 2018 at 14:00, James Coleman <jtc331@gmail.com> wrote:
>>> Yeah, that's sort of my reaction as well.  I also feel like this is a
>>> mighty special case to expose as a separate GUC.  There are other magic
>>> effort-limiting constants elsewhere in the planner --- we just added a
>>> new one in e3f005d97, for instance --- and I can't get very excited about
>>> exposing and trying to document them individually.  We also have a lot
>>> of existing exposed knobs like join_collapse_limit and the various geqo
>>> parameters, which basically nobody knows how to use, a precedent that
>>> isn't encouraging for adding more.

>> I'd be happy to yank this in favor of my holistic solution to this
>> problem I posted recently on the mailing list [1].
>> [1] https://www.postgresql.org/message-id/flat/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg%40mail.gmail.com

> Not precisely sure what you mean - are you saying that we can just have an
> overall test for NOT NULL, which thereby avoids the need to expand the
> array and therefore dispenses with the GUC completely?

No, he's saying that other thing solves his particular problem.

We certainly have seen other cases where people wished they could adjust
MAX_SAOP_ARRAY_SIZE.  I'm just not excited about exposing a GUC that does
exactly that one thing.  I'd rather have some more-generic knob that's
not so tightly tied to implementation details.

            regards, tom lane


Re: Convert MAX_SAOP_ARRAY_SIZE to new guc

From
James Coleman
Date:
>> I'd be happy to yank this in favor of my holistic solution to this
>> problem I posted recently on the mailing list [1].
>
> [1] https://www.postgresql.org/message-id/flat/CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg%40mail.gmail.com
>
> Not precisely sure what you mean - are you saying that we can just have an overall test for NOT NULL, which thereby
avoidsthe need to expand the array and therefore dispenses with the GUC completely?
 
>
> Having indexes defined using WHERE NOT NULL is a very important use case.

I don't think we can avoid expanding the array for other cases (for
example, being able to infer that "foo < 5" for "foo IN (1,2,3,4)". If
we wanted to keep that inference without expanding the array we'd have
to (at minimum, I think) duplicate a lot of the existing inference
logic, but I haven't investigated it much.

So keeping the GUC could allow someone to tune how large an array can
be and still guarantee inferences like "foo < 5". But I'm not sure
that that is as valuable. At least I haven't run into cases where I've
noticed a need for it.

My patch only addresses the IS NOT NULL inference precisely for the
reason you state: we have lots of plans that (unless you tack on an
explicit "foo IS NOT NULL" to your query) the planner decides it can't
use WHERE NOT NULL indexes because it can't currently infer the
correctness of that for large arrays.


Re: Convert MAX_SAOP_ARRAY_SIZE to new guc

From
Robert Haas
Date:
On Thu, Nov 15, 2018 at 5:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There have been occasional discussions of inventing a master "planner
> effort" control knob, with values say 1..10 [1], and allowing that one
> thing to control all these decisions, as well as other things we might do
> in the future that would cause increased planning time that might or might
> not get paid back.  I'd rather see somebody put effort into designing a
> coherent feature like that than figure out how to document finer-grained
> knobs.

FWIW, I find it hard to believe this will make users very happy.  I
think it'll just lead to people complaining that they can't get
planner optimization A without paying the cost of planner optimization
B.  The stuff that people have proposed grouping under a planner
effort knob is all pretty much corner case behavior, so a lot of
people won't get any benefit at all from turning up the knob, and
among those that do, there will probably be one specific behavior that
they want to enable, so the optimal value will be just high enough to
enable that behavior, and then they'll wonder why they can't just
enable that one thing.

I do think it would make users happy if you could make it a nice,
smooth curve: turning up the planner strength increases planning time
at a predictable rate and decreases execution time at a predictable
rate.  But I really doubt it's possible to create something that works
that way.

> [1] ... but my inner Spinal Tap fan wants it to go to 11.

+1 for that, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Convert MAX_SAOP_ARRAY_SIZE to new guc

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Nov 15, 2018 at 5:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There have been occasional discussions of inventing a master "planner
>> effort" control knob, with values say 1..10 [1], and allowing that one
>> thing to control all these decisions, as well as other things we might do
>> in the future that would cause increased planning time that might or might
>> not get paid back.  I'd rather see somebody put effort into designing a
>> coherent feature like that than figure out how to document finer-grained
>> knobs.

> FWIW, I find it hard to believe this will make users very happy.  I
> think it'll just lead to people complaining that they can't get
> planner optimization A without paying the cost of planner optimization
> B.

I don't think so, because right now they (a) can't get either
optimization, and/or (b) don't know what either one does or
how to invoke it.

Also (c) exposing such knobs creates backwards-compatibility problems for
us any time we want to change the associated behavior, which is hardly
an unlikely wish considering that mostly these are kluges by definition.
(Which contributes to the documentation problem Paul already noted.)

A planner-effort knob would be really easy to understand, I think,
and we'd not be tied to any particular details about what it does.

> The stuff that people have proposed grouping under a planner
> effort knob is all pretty much corner case behavior,

One of the first things I'd replace with such a knob is
join_collapse_limit/from_collapse_limit, which is by no stretch
of the imagination a corner case.

> I do think it would make users happy if you could make it a nice,
> smooth curve: turning up the planner strength increases planning time
> at a predictable rate and decreases execution time at a predictable
> rate.  But I really doubt it's possible to create something that works
> that way.

Yeah, it's unlikely that it'd be linear on any particular query.
There would be jumps in the planner runtime whenever relevant
thresholds were exceeded.

>> [1] ... but my inner Spinal Tap fan wants it to go to 11.

> +1 for that, though.

It's tempting to imagine that 10 means "highest reasonable effort
limits" and then 11 disengages all limits.  The practical use of
that would be if you wanted to see whether the planner *could*
produce the plan you wanted and was just not trying hard enough.

            regards, tom lane


Re: Convert MAX_SAOP_ARRAY_SIZE to new guc

From
Robert Haas
Date:
On Fri, Nov 16, 2018 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't think so, because right now they (a) can't get either
> optimization, and/or (b) don't know what either one does or
> how to invoke it.

Sure.  But as soon as they know that, they're just going to try to
figure out how to get the thing they want without the stuff they don't
want.

> A planner-effort knob would be really easy to understand, I think,
> and we'd not be tied to any particular details about what it does.

That's just wishful thinking.  People will want to know what it does,
they'll want that to be documented, and they complain if it changes
from release to release.

I mean, you've often taken the position that people will notice and/or
care deeply if our *C* interfaces change from release to release, and
even moreso in a minor release.  I think you overstate that danger,
but it must be admitted that the danger is not zero.  GUCs, unlike C
functions, are unarguably part of the exposed interface, and the
danger there is considerably more, at least IMHO.

> One of the first things I'd replace with such a knob is
> join_collapse_limit/from_collapse_limit, which is by no stretch
> of the imagination a corner case.

True.  So then you'll have people who can't get sufficiently-high
collapse limits without enabling a bunch of other stuff they don't
care about, or on the other hand have to raise the collapse limits
higher than makes sense for them to get the other optimizations that
they want.  They also won't be able to use the hack where you set
join_collapse_limit=1 to force a join ordering any more.  And the
mapping of the 1..infinity collapse limit space onto the 1..10 planner
effort space is going to be basically totally arbitrary.  There is no
hope at all that you're going to pick values that everyone likes.

I think it might be reasonable to set various individual GUCs to
values that mean "use the autoconfigure default" and then provide a
planner-strength GUC that varies that default.  But I believe that
depriving people of the ability to control the settings individually
is bound to produce complaints, both for things where we already
expose them (like the collapse limits) and for things where we don't
(like $SUBJECT).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company