Thread: More efficient OR

More efficient OR

From

"Keith Worthington"

Date:

16 February 2005, 16:03:11

Hi All,

In several of my SQL statements I have to use a WHERE clause that contains
mutiple ORs.  i.e.

WHERE column1 = 'A' OR     column1 = 'B' OR     column1 = 'C'

Is there a more efficient SQL statement that accomplishes the same limiting
functionality?

Kind Regards,
Keith

Re: More efficient OR

From

Reinoud van Leeuwen

Date:

16 February 2005, 16:15:21

On Wed, Feb 16, 2005 at 11:02:59AM -0500, Keith Worthington wrote:
> Hi All,
> 
> In several of my SQL statements I have to use a WHERE clause that contains
> mutiple ORs.  i.e.
> 
> WHERE column1 = 'A' OR
>       column1 = 'B' OR
>       column1 = 'C'
> 
> Is there a more efficient SQL statement that accomplishes the same limiting
> functionality?

I do not know wheter it is more efficient in terms of execution, but I can
read this more efficiently:

WHERE column1 in ('A', 'B', 'C')


-- 
__________________________________________________
"Nothing is as subjective as reality"
Reinoud van Leeuwen    reinoud.v@n.leeuwen.net
http://www.xs4all.nl/~reinoud
__________________________________________________

Re: More efficient OR

From

Scott Marlowe

Date:

16 February 2005, 16:15:24

On Wed, 2005-02-16 at 10:02, Keith Worthington wrote:
> Hi All,
> 
> In several of my SQL statements I have to use a WHERE clause that contains
> mutiple ORs.  i.e.
> 
> WHERE column1 = 'A' OR
>       column1 = 'B' OR
>       column1 = 'C'
> 
> Is there a more efficient SQL statement that accomplishes the same limiting
> functionality?

The in() construct is (nowadays) basically the same as ORing multiple
columns;

where column1 in ('A','B','C')

Re: More efficient OR

From

Sean Davis

Date:

16 February 2005, 16:21:13

Could 'in' or 'between' do what you want?  I know that using 'in' is 
equivalent to what you have below.  Could 'between' be more 
efficient--you could do explain analyze on various options to see what 
the actual plan would be.

Sean

On Feb 16, 2005, at 11:02 AM, Keith Worthington wrote:

> Hi All,
>
> In several of my SQL statements I have to use a WHERE clause that 
> contains
> mutiple ORs.  i.e.
>
> WHERE column1 = 'A' OR
>       column1 = 'B' OR
>       column1 = 'C'
>
> Is there a more efficient SQL statement that accomplishes the same 
> limiting
> functionality?
>
> Kind Regards,
> Keith
>
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

Re: More efficient OR

From

"Keith Worthington"

Date:

16 February 2005, 16:35:43

> > Hi All,
> >
> > In several of my SQL statements I have to use a WHERE clause
> > that contains mutiple ORs.  i.e.
> >
> > WHERE column1 = 'A' OR
> >       column1 = 'B' OR
> >       column1 = 'C'
> >
> > Is there a more efficient SQL statement that accomplishes the
> > same limiting functionality?
> >
> > Kind Regards,
> > Keith
> >
>
> Scott wrote:
> The in() construct is (nowadays) basically the same as
> ORing multiple columns;
> 
> where column1 in ('A','B','C') 
> 
> 
> Sean Davis wrote
> Could 'in' or 'between' do what you want?  I know that using 'in'
> is equivalent to what you have below.  Could 'between' be more 
> efficient--you could do explain analyze on various options to see 
> what the actual plan would be.
> 
> Sean

Thanks Scott and Sean for the post.

It sounds like IN will save some typing and code space but not decrease the
execution time.

BETWEEN won't work for my real life query because the limiting values are
quite disparate.

Kind Regards,
Keith

Re: More efficient OR

From

KÖPFERL Robert

Date:

16 February 2005, 18:47:26

At least for between, I read that pgSQL rewrites it to a  (a<x) AND (x<b).
And my experience shows that IN is translated into a series of ORs. 
I may be wrong

> -----Original Message-----
> From: Keith Worthington [mailto:keithw@narrowpathinc.com]
> Sent: Mittwoch, 16. Februar 2005 17:36
> To: PostgreSQL SQL
> Cc: Sean Davis; Scott Marlowe
> Subject: Re: [SQL] More efficient OR
> 
> 
> > > Hi All,
> > >
> > > In several of my SQL statements I have to use a WHERE clause
> > > that contains mutiple ORs.  i.e.
> > >
> > > WHERE column1 = 'A' OR
> > >       column1 = 'B' OR
> > >       column1 = 'C'
> > >
> > > Is there a more efficient SQL statement that accomplishes the
> > > same limiting functionality?
> > >
> > > Kind Regards,
> > > Keith
> > >
> >
> > Scott wrote:
> > The in() construct is (nowadays) basically the same as
> > ORing multiple columns;
> > 
> > where column1 in ('A','B','C') 
> > 
> > 
> > Sean Davis wrote
> > Could 'in' or 'between' do what you want?  I know that using 'in'
> > is equivalent to what you have below.  Could 'between' be more 
> > efficient--you could do explain analyze on various options to see 
> > what the actual plan would be.
> > 
> > Sean
> 
> Thanks Scott and Sean for the post.
> 
> It sounds like IN will save some typing and code space but 
> not decrease the
> execution time.
> 
> BETWEEN won't work for my real life query because the 
> limiting values are
> quite disparate.
> 
> Kind Regards,
> Keith
> 
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq
>

Re: More efficient OR

From

PFC

Date:

16 February 2005, 19:05:27

You sound like you don't like the performance you get with OR or IN, from  
this I deduce that you have a very large list of values to OR from. These  
make huge queries which are not necessarily very fast ; also they are  
un-preparable by their very nature (ie. the planner has to look at each  
value, ponder its stats, think about it...) Basically a query with, say,  
'column IN (100 values)' will make postgres work a lot more than a query  
with 'column in (SELECT something which yields 100 values)'.
I have tested the following with good results :

- Write a very simple set returning function which takes an array as a  
parameter and RETURN NEXT each array element in turn. It's just a FOR...  
RETURN NEXT. Say you call it array_flatten( INTEGER[] ) or something.

- Then, instead of doing SELECT * FROM table WHERE id IN (1,4,77,586,1025)
do:
SELECT * FROM table WHERE id IN (SELECT * FROM  
array_flatten( '{1,4,77,586,1025}' ) );
or :
SELECT t.* FROM table t, (SELECT * FROM array_flatten( '{1,4,77,586,1025}'  
) ) foo WHERE t.id=foo.id;

The first one will do a uniqu'ing on the array, the second one will not.

You can also LEFT JOIN against your SRF to get the id's of the rows that  
were not in the table (you cannot do this with IN)

And you can PREPARE the statement to something that will take an array as  
a parameter and won't have to be parsed everytime.

Sometimes it can be a big performance boost. Try it !

However, if some value in your array matches a lot of rows in the table,  
it will be slower than the seq scan which would have been triggered by the  
planner actually seeing that value in the IN list and acting on it. But if  
you KNOW your column is unique, there is no point in forcing the planner  
to ponder each value in your list !

For an additional performance boost (likely negligible), you could sort  
your array in the function (or even in your application code) to ease the  
work of the index scanner, which will get a better cache hit rate.

If you have, say, 20.000 values to get, this is the only way.

Note that you could ask yourself why you need to get a lot of values. Are  
you fetching stuff from the database, computing a list of rows to get,  
then SELECTing them ? Then maybe you put something in the application that  
should really be in the database ?

As a side note, it would be nice :

- if that set returning function was a fast C built-in (I think there's  
one in contrib/intagg but obviously it works only for integers) because  
it's a useful tool and building brick ; same for array_accum, and some  
other commonly used five-lines aggregates and functions that everybody  
recodes once.

- if postgresql used this kind of optimization for the SELECT * FROM table  
WHERE id =ANY( array ) which currently uses a seq scan.

However, what IS nice from pg is that you can actually do the array SRF  
trick and pull a lot of rows by id's, at blazing speed, just by writing a  
three line function and tweaking your wuery.



> It sounds like IN will save some typing and code space but not decrease  
> the
> execution time.
>
> BETWEEN won't work for my real life query because the limiting values are
> quite disparate.
>
> Kind Regards,
> Keith
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>