Thread: Re-ordering of OR conditions

Re-ordering of OR conditions

From
"Jim Nasby"
Date:
IF I run the following with the a < 2900 condition first, the more  
expensive EXISTS only gets executed when needed, but if I change the  
order of the OR's, the EXISTS is always executed. It would be good if  
the optimizer could re-order the OR conditions based on estimated  
cost (granted, this wouldn't work very well if you've got functions  
in the OR, but it'd still be useful):

select * from a where a < 2900 or exists (select * from b where b.a =  
a.a);

Here's a full example. Note the loops count for the Subplan between  
both cases:

decibel=# create table a as select * from generate_series(1,3000) a;
SELECT
decibel=# create table b as select a,b from a, generate_series(1,100)  
b where a > 10;
SELECT
decibel=# create index b__a on b(a);
CREATE INDEX
decibel=# explain analyze select * from a where a < 2900 or exists  
(select * from b where b.a = a.a);                                                       QUERY PLAN
------------------------------------------------------------------------ 
------------------------------------------------
Seq Scan on a  (cost=0.00..8080.41 rows=1997 width=4) (actual  
time=0.014..1.784 rows=3000 loops=1)   Filter: ((a < 2900) OR (subplan))   SubPlan     ->  Index Scan using b__a on b
(cost=0.00..4006.44rows=1495  
 
width=8) (actual time=0.009..0.009 rows=1 loops=101)           Index Cond: (a = $0)
Total runtime: 2.151 ms
(6 rows)

decibel=# explain analyze select * from a where exists (select * from  
b where b.a = a.a) or a < 2000;                                                       QUERY PLAN
------------------------------------------------------------------------ 
-------------------------------------------------
Seq Scan on a  (cost=0.00..8080.41 rows=1997 width=4) (actual  
time=0.067..37.011 rows=3000 loops=1)   Filter: ((subplan) OR (a < 2000))   SubPlan     ->  Index Scan using b__a on b
(cost=0.00..4006.44rows=1495  
 
width=8) (actual time=0.011..0.011 rows=1 loops=3000)           Index Cond: (a = $0)
Total runtime: 37.497 ms
(6 rows)

decibel=#

(This is on HEAD as of a few minutes ago)
--
Jim Nasby                               jim.nasby@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)





Re: Re-ordering of OR conditions

From
Tom Lane
Date:
"Jim Nasby" <jim.nasby@enterprisedb.com> writes:
> IF I run the following with the a < 2900 condition first, the more  
> expensive EXISTS only gets executed when needed, but if I change the  
> order of the OR's, the EXISTS is always executed. It would be good if  
> the optimizer could re-order the OR conditions based on estimated  
> cost (granted, this wouldn't work very well if you've got functions  
> in the OR, but it'd still be useful):

I looked at this for a bit.  It's in principle do-able but I'm not
sure it's a good idea.  The problem is that while AND'ed condition
lists are usually fairly short and hence cheap to sort, OR'ed condition
lists are not infrequently very long --- nobody blinks an eye at
hundreds of items in an IN-list for instance.  I'm afraid we'd waste
a lot more cycles sorting than we could hope to regain.
        regards, tom lane


Re: Re-ordering of OR conditions

From
"Jim Nasby"
Date:
On Feb 9, 2007, at 10:46 AM, Tom Lane wrote:
> "Jim Nasby" <jim.nasby@enterprisedb.com> writes:
>> IF I run the following with the a < 2900 condition first, the more
>> expensive EXISTS only gets executed when needed, but if I change the
>> order of the OR's, the EXISTS is always executed. It would be good if
>> the optimizer could re-order the OR conditions based on estimated
>> cost (granted, this wouldn't work very well if you've got functions
>> in the OR, but it'd still be useful):
>
> I looked at this for a bit.  It's in principle do-able but I'm not
> sure it's a good idea.  The problem is that while AND'ed condition
> lists are usually fairly short and hence cheap to sort, OR'ed  
> condition
> lists are not infrequently very long --- nobody blinks an eye at
> hundreds of items in an IN-list for instance.  I'm afraid we'd waste
> a lot more cycles sorting than we could hope to regain.

Do people actually do that with OR lists though? My understanding is  
that now IN lists are converted to arrays, so I'd think that wouldn't  
be an issue there.

Is it easy for the planner to discern between simple OR expressions  
and stuff like EXISTS? If so it might be worth automatically pushing  
EXISTS to the end...
--
Jim Nasby                               jim.nasby@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)





Re: Re-ordering of OR conditions

From
"Simon Riggs"
Date:
On Fri, 2007-02-09 at 11:46 -0500, Tom Lane wrote:
> "Jim Nasby" <jim.nasby@enterprisedb.com> writes:
> > IF I run the following with the a < 2900 condition first, the more  
> > expensive EXISTS only gets executed when needed, but if I change the  
> > order of the OR's, the EXISTS is always executed. It would be good if  
> > the optimizer could re-order the OR conditions based on estimated  
> > cost (granted, this wouldn't work very well if you've got functions  
> > in the OR, but it'd still be useful):
> 
> I looked at this for a bit.  It's in principle do-able but I'm not
> sure it's a good idea.  The problem is that while AND'ed condition
> lists are usually fairly short and hence cheap to sort, OR'ed condition
> lists are not infrequently very long --- nobody blinks an eye at
> hundreds of items in an IN-list for instance.  I'm afraid we'd waste
> a lot more cycles sorting than we could hope to regain.

Seems like the planner could decide ahead of time whether sorting the
conditions at execution time was likely to be effective or not. Perhaps
limiting it to at most 5 conditions, where at least one of those was a
function or a join condition? That would be a fairly cheap test at
planning time, but potentially a good win at execution time. 

The OR'ed condition is common condition when the schema uses complex
sub-classing. Now we have function costs it seems more likely this idea
would get used in practice.

Anyway, not necessarily for you to do, but sounds like a useful idea all
the same.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com