Thread: "Constraint exclusion" is not general enough

"Constraint exclusion" is not general enough

From
Tom Lane
Date:
I was just looking at Martin Lesser's gripe here:
http://archives.postgresql.org/pgsql-performance/2006-08/msg00053.php
about how the planner is not real bright about the filter conditions
it generates for a simple partitioning layout.  In particular it's
generating scans involving self-contradictory conditions:
Result  (cost=0.00..33.20 rows=6 width=36)  ->  Append  (cost=0.00..33.20 rows=6 width=36)        ->  Seq Scan on
t_parted (cost=0.00..33.20 rows=6 width=36)              Filter: ((id1 >= 0) AND (id1 < 100) AND (id1 >= 900) AND (id1
<1000))
 

which it seems we ought to be bright enough to notice.  In particular
I would argue that turning on constraint_exclusion ought to instruct
the planner to catch this sort of thing, whereas when it's off we
ought not expend the cycles.  I have a preliminary patch (below)
that seems to fix it.

The problem I'm having is that this isn't "constraint exclusion" anymore
--- it will in fact make useful deductions without a table constraint
anywhere in sight.  Should we rename the GUC variable, and if so to what?
Or just live with the misnomer?  I guess plan C would be to invent a
separate GUC variable for the other kind of test, but I can't see that
it's worth having two.  Thoughts?

BTW, the remaining infelicities in Martin's example stem from the fact that
predicate_refuted_by doesn't recognize "x IS NOT TRUE" as refuting "x".
Working on fixing that now.

I'm also thinking that formulating the check as "constraints are
refuted by WHERE clause" might be unnecessarily restrictive.  Doesn't
the case where a constraint implies falsity of a WHERE clause likewise
tell us we needn't bother to scan?  Seems like we ought to put all the
conditions together and run a symmetric test named something like
"mutually_exclusive_conditions".  Maybe "mutual_exclusion" would be
a better name for the GUC variable.
        regards, tom lane

*** src/backend/optimizer/util/plancat.c.orig    Tue Aug  1 21:59:46 2006
--- src/backend/optimizer/util/plancat.c    Fri Aug  4 13:56:18 2006
***************
*** 444,455 ****
--- 444,478 ---- bool relation_excluded_by_constraints(RelOptInfo *rel, RangeTblEntry *rte) {
+     List       *safe_restrictions;     List       *constraint_pred;
+     List       *safe_constraints;
+     ListCell   *lc;      /* Skip the test if constraint exclusion is disabled */     if (!constraint_exclusion)
 return false; 
 
+     /*
+      * Check for self-contradictory restriction clauses.  We dare not make
+      * deductions with non-immutable functions, but any immutable clauses that
+      * are self-contradictory allow us to conclude the scan is unnecessary.
+      *
+      * Note: strip off RestrictInfo because predicate_refuted_by() isn't
+      * expecting to see any in its predicate argument.
+      */
+     safe_restrictions = NIL;
+     foreach(lc, rel->baserestrictinfo)
+     {
+         RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ 
+         if (!contain_mutable_functions((Node *) rinfo->clause))
+             safe_restrictions = lappend(safe_restrictions, rinfo->clause);
+     }
+ 
+     if (predicate_refuted_by(safe_restrictions, safe_restrictions))
+         return true;
+      /* Only plain relations have constraints */     if (rte->rtekind != RTE_RELATION || rte->inh)         return
false;


Re: "Constraint exclusion" is not general enough

From
Hannu Krosing
Date:
Ühel kenal päeval, R, 2006-08-04 kell 14:40, kirjutas Tom Lane:
> I was just looking at Martin Lesser's gripe here:
> http://archives.postgresql.org/pgsql-performance/2006-08/msg00053.php
> about how the planner is not real bright about the filter conditions
> it generates for a simple partitioning layout.  In particular it's
> generating scans involving self-contradictory conditions:
> 
>  Result  (cost=0.00..33.20 rows=6 width=36)
>    ->  Append  (cost=0.00..33.20 rows=6 width=36)
>          ->  Seq Scan on t_parted  (cost=0.00..33.20 rows=6 width=36)
>                Filter: ((id1 >= 0) AND (id1 < 100) AND (id1 >= 900) AND (id1 < 1000))
> 
> which it seems we ought to be bright enough to notice.  In particular
> I would argue that turning on constraint_exclusion ought to instruct
> the planner to catch this sort of thing, whereas when it's off we
> ought not expend the cycles.  I have a preliminary patch (below)
> that seems to fix it.
> 
> The problem I'm having is that this isn't "constraint exclusion" anymore
> --- it will in fact make useful deductions without a table constraint
> anywhere in sight. 

I'd just keep the name. the parts of WHERE cluse can be described as
constraints on what is returned :)


-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: "Constraint exclusion" is not general enough

From
"Jim C. Nasby"
Date:
On Fri, Aug 04, 2006 at 02:40:30PM -0400, Tom Lane wrote:
> which it seems we ought to be bright enough to notice.  In particular
> I would argue that turning on constraint_exclusion ought to instruct
> the planner to catch this sort of thing, whereas when it's off we
> ought not expend the cycles.  I have a preliminary patch (below)
> that seems to fix it.

How many cycles are we talking about here? Is it even worth the GUC?

The most heavily loaded systems I've seen were doing on the order of 500
transactions a second (which would have been almost entirely
single-statement queries), so ISTM that you've got to be burning a
pretty good chunk of CPU time on planning before it becomes an issue.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: "Constraint exclusion" is not general enough

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Fri, Aug 04, 2006 at 02:40:30PM -0400, Tom Lane wrote:
>> I would argue that turning on constraint_exclusion ought to instruct
>> the planner to catch this sort of thing, whereas when it's off we
>> ought not expend the cycles.  I have a preliminary patch (below)
>> that seems to fix it.

> How many cycles are we talking about here? Is it even worth the GUC?

I think so.  On simple queries the optimization will *never* fire,
and there's no point in doing the search.  People who are running
complex queries will want to turn it on, but the mysql-equivalent
crew will just find it a waste of cycles.
        regards, tom lane


Re: "Constraint exclusion" is not general enough

From
stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> "Jim C. Nasby" <jnasby@pervasive.com> writes:
>
>> How many cycles are we talking about here? Is it even worth the GUC?
>
> I think so.  On simple queries the optimization will *never* fire,
> and there's no point in doing the search.  People who are running
> complex queries will want to turn it on, but the mysql-equivalent
> crew will just find it a waste of cycles.

The other class of people who will find this kind of thing useful are those
using automatically generated queries. Frequently you end up with redundant
clauses or "unreachable" clauses that you hope the database will be able to
see through.

Having to enable that intelligence with a GUC is fine though since those users
could just enable it even if they aren't using partitioning. That said I
expect that eventually any option we add whose only purpose is it to enable
some intelligence in the optimizer will become standard.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: "Constraint exclusion" is not general enough

From
Simon Riggs
Date:
On Fri, 2006-08-04 at 14:40 -0400, Tom Lane wrote:
> I was just looking at Martin Lesser's gripe here:
> http://archives.postgresql.org/pgsql-performance/2006-08/msg00053.php
> about how the planner is not real bright about the filter conditions
> it generates for a simple partitioning layout.  In particular it's
> generating scans involving self-contradictory conditions:
> 
>  Result  (cost=0.00..33.20 rows=6 width=36)
>    ->  Append  (cost=0.00..33.20 rows=6 width=36)
>          ->  Seq Scan on t_parted  (cost=0.00..33.20 rows=6 width=36)
>                Filter: ((id1 >= 0) AND (id1 < 100) AND (id1 >= 900) AND (id1 < 1000))
> 
> which it seems we ought to be bright enough to notice.  In particular
> I would argue that turning on constraint_exclusion ought to instruct
> the planner to catch this sort of thing, whereas when it's off we
> ought not expend the cycles.  I have a preliminary patch (below)
> that seems to fix it.
> 
> The problem I'm having is that this isn't "constraint exclusion" anymore
> --- it will in fact make useful deductions without a table constraint
> anywhere in sight.  Should we rename the GUC variable, and if so to what?
> Or just live with the misnomer?  I guess plan C would be to invent a
> separate GUC variable for the other kind of test, but I can't see that
> it's worth having two.  Thoughts?

In general, I'd prefer a control that allowed "amount of planning" to be
specified, much in the same way we rate error messages. We really want
just one simple knob that can be turned up or down, no matter how many
new optimizations we add.

planning_effort = LOW | MEDIUM | HIGH | VERYHIGH | EXHAUSTIVE

Though I'd prefer about 6-10 settings

> BTW, the remaining infelicities in Martin's example stem from the fact that
> predicate_refuted_by doesn't recognize "x IS NOT TRUE" as refuting "x".
> Working on fixing that now.

Seen that. Looks good.

> I'm also thinking that formulating the check as "constraints are
> refuted by WHERE clause" might be unnecessarily restrictive.  Doesn't
> the case where a constraint implies falsity of a WHERE clause likewise
> tell us we needn't bother to scan?  Seems like we ought to put all the
> conditions together and run a symmetric test named something like
> "mutually_exclusive_conditions".  Maybe "mutual_exclusion" would be
> a better name for the GUC variable.

That would be another approach to planning this, but I see a future that
may interfere with that suggestion (maybe not, so read on).

Currently, exclusion is tested for each relation that might possibly be
involved, so planning time increases O(N) for N partitions. I would like
to see a declarative approach where the constraints on all of the child
partitions form a coherent pattern that can be used to process far fewer
exclusion tests to enable us to achieve O(logN) behaviour. I'm not sure
how we'd do that if we went for the symmetric test approach.

To achieve the "indexed" partition pruning, we'd need
- a way to specify that all constraints are mutually exclusive
- a declarative approach for saying something like "arranged in date
sequence"
- preferably a way to have this happen at run-time so we can hard-plan a
query with CURRENT_TIMESTAMP in the WHERE clause

The reason for all of that is to better enable partitioning for use in
OLTP situations, not just DW.

With all of that in mind, I think renaming the current GUC would be a
fairly short-lived name change, so I'd suggest sticking with it until
the functions have been enhanced to the point we can dream up a slightly
more evocative name.

--  Simon Riggs EnterpriseDB          http://www.enterprisedb.com



Re: "Constraint exclusion" is not general enough

From
Rod Taylor
Date:
On Mon, 2006-08-07 at 16:54 +0100, Simon Riggs wrote:
> On Fri, 2006-08-04 at 14:40 -0400, Tom Lane wrote:
> > I was just looking at Martin Lesser's gripe here:
> > http://archives.postgresql.org/pgsql-performance/2006-08/msg00053.php
> > about how the planner is not real bright about the filter conditions
> > it generates for a simple partitioning layout.  In particular it's
> > generating scans involving self-contradictory conditions:
> > 
> >  Result  (cost=0.00..33.20 rows=6 width=36)
> >    ->  Append  (cost=0.00..33.20 rows=6 width=36)
> >          ->  Seq Scan on t_parted  (cost=0.00..33.20 rows=6 width=36)
> >                Filter: ((id1 >= 0) AND (id1 < 100) AND (id1 >= 900) AND (id1 < 1000))
> > 
> > which it seems we ought to be bright enough to notice.  In particular
> > I would argue that turning on constraint_exclusion ought to instruct
> > the planner to catch this sort of thing, whereas when it's off we
> > ought not expend the cycles.  I have a preliminary patch (below)
> > that seems to fix it.
> > 
> > The problem I'm having is that this isn't "constraint exclusion" anymore
> > --- it will in fact make useful deductions without a table constraint
> > anywhere in sight.  Should we rename the GUC variable, and if so to what?
> > Or just live with the misnomer?  I guess plan C would be to invent a
> > separate GUC variable for the other kind of test, but I can't see that
> > it's worth having two.  Thoughts?
> 
> In general, I'd prefer a control that allowed "amount of planning" to be
> specified, much in the same way we rate error messages. We really want
> just one simple knob that can be turned up or down, no matter how many
> new optimizations we add.
> 
> planning_effort = LOW | MEDIUM | HIGH | VERYHIGH | EXHAUSTIVE

A simple way of doing this might be to use a minimum cost number?
       # Minimum cost of query is over 100 before applying       mutual_exclusion = 100       
Once applied if the filter accomplished something the query is replanned
or adjusted to take that change into account.

If there were a large number of constraints on t_parted it may well have
taken longer to plan than to execute on the 6 rows. If there were 1M
rows in the structure, the extra effort would have been well worth it.


Ideally we could set the planning time as a percentage of total
execution time and let PostgreSQL figure out what should be tried and
when, but that means giving a cost to planner functionality and having
PostgreSQL plan how to plan.
       planning_effort = 5%

-- 



Re: "Constraint exclusion" is not general enough

From
Tom Lane
Date:
Rod Taylor <pg@rbt.ca> writes:
> A simple way of doing this might be to use a minimum cost number?

But you don't have any cost numbers until after you've done the plan.
        regards, tom lane


Re: "Constraint exclusion" is not general enough

From
Rod Taylor
Date:
On Mon, 2006-08-07 at 13:44 -0400, Tom Lane wrote:
> Rod Taylor <pg@rbt.ca> writes:
> > A simple way of doing this might be to use a minimum cost number?
> 
> But you don't have any cost numbers until after you've done the plan.

Isn't it possible to find the cost using the straight forward (fast)
method, find out what order of magnitude it is in, then do a second pass
with additional planner bells and whistles turned on if the first plan
had a high estimate?

-- 



Re: "Constraint exclusion" is not general enough

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> To achieve the "indexed" partition pruning, we'd need
> - a way to specify that all constraints are mutually exclusive
> - a declarative approach for saying something like "arranged in date
> sequence"
> - preferably a way to have this happen at run-time so we can hard-plan a
> query with CURRENT_TIMESTAMP in the WHERE clause

Definitely a direction worth pursuing, but it seems like it would be a
completely separate code path from the existing constraint-checking
code.  I'd imagine that instead of having to prove theorems about which
tables to scan, a declarative approach would let us "just know" what
to do.
        regards, tom lane


Re: "Constraint exclusion" is not general enough

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> Rod Taylor <pg@rbt.ca> writes:
>> A simple way of doing this might be to use a minimum cost number?
> 
> But you don't have any cost numbers until after you've done the plan.

Couldn't this work similar to geqo_effort? The planner could
try planning the query using only cheap algorithmns, and if
the cost exceeds a certain value, it'd restart, and use
more sophisticated methods.

greetings, Florian Pflug


Re: "Constraint exclusion" is not general enough

From
Tom Lane
Date:
"Florian G. Pflug" <fgp@phlo.org> writes:
> Tom Lane wrote:
>> But you don't have any cost numbers until after you've done the plan.

> Couldn't this work similar to geqo_effort? The planner could
> try planning the query using only cheap algorithmns, and if
> the cost exceeds a certain value, it'd restart, and use
> more sophisticated methods.

AFAICS this would be a net loss on average.  Most of the time, the
constraint exclusion code doesn't win, and so throwing away all your
planning work to try it is going to be a loser most of the time.
        regards, tom lane


Re: "Constraint exclusion" is not general enough

From
Rod Taylor
Date:
On Mon, 2006-08-07 at 22:01 -0400, Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
> > Tom Lane wrote:
> >> But you don't have any cost numbers until after you've done the plan.
> 
> > Couldn't this work similar to geqo_effort? The planner could
> > try planning the query using only cheap algorithmns, and if
> > the cost exceeds a certain value, it'd restart, and use
> > more sophisticated methods.
> 
> AFAICS this would be a net loss on average.  Most of the time, the
> constraint exclusion code doesn't win, and so throwing away all your
> planning work to try it is going to be a loser most of the time.

If constraint exclusion does not make any changes, mark the plan as
invalid, then there is no need to replan.
    1. Generate plan cheaply    2. If under $threshold, execute query. The cost of further planning       is
significantcompared to executing this potentially       non-optimal plan.    3. Run constraint exclusion. If it changes
theclauses due to       constraint exclusion, mark the plan as invalid. I assume       constraint exclusion is
relativelyself contained.    4. Invalid plan is replanned. Still valid plan (no potential       improvements can be
made)is executed.
 

-- 



Re: "Constraint exclusion" is not general enough

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> Tom Lane wrote:
>>> But you don't have any cost numbers until after you've done the plan.
> 
>> Couldn't this work similar to geqo_effort? The planner could
>> try planning the query using only cheap algorithmns, and if
>> the cost exceeds a certain value, it'd restart, and use
>> more sophisticated methods.
> 
> AFAICS this would be a net loss on average.  Most of the time, the
> constraint exclusion code doesn't win, and so throwing away all your
> planning work to try it is going to be a loser most of the time.

On the other hand, if the "consider-replanning" threshold is high enough,
than that additional time really doesn't matter - If a query runs for minutes,
or even hours, a few wasted cycles during planning don't hurt.

greetings, Florian Pflug