Thread: WHERE clause OR vs IN

WHERE clause OR vs IN

From
"Medi Montaseri"
Date:
Hi,

I know this is not exactly admin related, but ... it is simple enough to be even fun

From a performance point of view, is it better to use OR as in
SELECT expr FROM expr WHERE col=this OR col=that OR col=theOther
Or to use a range of values as in
SELECT expr FROM expr WHERE col in ( val1, val2, ...., valn)

I think the IN range yields a better query plan .... what do you think ?
Thanks
Medi

Re: WHERE clause OR vs IN

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 12 Dec 2007 14:25:16 -0800
"Medi Montaseri" <montaseri@gmail.com> wrote:

> Hi,
> 
> I know this is not exactly admin related, but ... it is simple enough
> to be even fun
> 
> From a performance point of view, is it better to use OR as in
> SELECT expr FROM expr WHERE col=this OR col=that OR col=theOther
> Or to use a range of values as in
> SELECT expr FROM expr WHERE col in ( val1, val2, ...., valn)
> 
> I think the IN range yields a better query plan .... what do you
> think ? Thanks

Well a simple test:

postgres=# explain analyze select * from tellers where bid in ('1','2');
                                              QUERY
PLAN
- ------------------------------------------------------------------------------------------------------
Seq Scan on tellers  (cost=0.00..10.25 rows=20 width=352) (actual
time=0.019..0.192 rows=20 loops=1) Filter: (bid = ANY
('{1,2}'::integer[])) Total runtime: 0.314 ms (3 rows)

postgres=# explain analyze select * from tellers where bid = '1' or bid
= '2'; QUERY PLAN                                              
- ------------------------------------------------------------------------------------------------------
 Seq Scan on tellers  (cost=0.00..11.50 rows=20 width=352) (actual
time=0.018..0.199 rows=20 loops=1) Filter: ((bid = 1) OR (bid = 2))
 Total runtime: 0.275 ms
(3 rows)

postgres=# 



> Medi


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHYGJwATb/zqfZUUQRAs32AJ9HuJVcY5gcr0hboxkI6PcRtv++JwCfXd00
nQ7Frkof0mVwqNYVxQ9Vziw=
=XzJi
-----END PGP SIGNATURE-----

Re: WHERE clause OR vs IN

From
"Medi Montaseri"
Date:
Thanks...is'nt the run time latency more visiable with lorge input sets (big tables) as well as how long the OR-ed expression chain is

based on your report, run time of OR is 0.275 ms and IN is 0.314

Perhaps if we run explain verbose to see the actual query plan

medi

On Dec 12, 2007 2:36 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 12 Dec 2007 14:25:16 -0800
"Medi Montaseri" <montaseri@gmail.com > wrote:

> Hi,
>
> I know this is not exactly admin related, but ... it is simple enough
> to be even fun
>
> From a performance point of view, is it better to use OR as in
> SELECT expr FROM expr WHERE col=this OR col=that OR col=theOther
> Or to use a range of values as in
> SELECT expr FROM expr WHERE col in ( val1, val2, ...., valn)
>
> I think the IN range yields a better query plan .... what do you
> think ? Thanks

Well a simple test:

postgres=# explain analyze select * from tellers where bid in ('1','2');
                                             QUERY
PLAN
- ------------------------------------------------------------------------------------------------------
Seq Scan on tellers  (cost=0.00..10.25 rows=20 width=352) (actual
time=0.019..0.192 rows=20 loops=1) Filter: (bid = ANY
('{1,2}'::integer[])) Total runtime: 0.314 ms (3 rows)

postgres=# explain analyze select * from tellers where bid = '1' or bid
= '2'; QUERY PLAN
- ------------------------------------------------------------------------------------------------------
 Seq Scan on tellers  (cost=0.00..11.50 rows=20 width=352) (actual
time=0.018..0.199 rows=20 loops=1) Filter: ((bid = 1) OR (bid = 2))
 Total runtime: 0.275 ms
(3 rows)

postgres=#



> Medi


- --
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHYGJwATb/zqfZUUQRAs32AJ9HuJVcY5gcr0hboxkI6PcRtv++JwCfXd00
nQ7Frkof0mVwqNYVxQ9Vziw=
=XzJi
-----END PGP SIGNATURE-----

Re: WHERE clause OR vs IN

From
Richard Broersma Jr
Date:
--- On Wed, 12/12/07, Medi Montaseri <montaseri@gmail.com> wrote:

> based on your report, run time of OR is 0.275 ms and IN is
> 0.314
>
> > postgres=# explain analyze select * from tellers where
> bid in ('1','2');

Two other options are:

SELECT *
  FROM Tellers
 WHERE bin = ANY( '1', '2' );

and

    SELECT T.*
      FROM Tellers AS T
INNER JOIN ( VALUES ( '1' ), ( '2' ) ) AS B( bin )
        ON T.bin = B.bin;

Regards,
Richard Broersma Jr.

Re: WHERE clause OR vs IN

From
Tom Lane
Date:
Richard Broersma Jr <rabroersma@yahoo.com> writes:
> --- On Wed, 12/12/07, Medi Montaseri <montaseri@gmail.com> wrote:
>> based on your report, run time of OR is 0.275 ms and IN is
>> 0.314
>>
> postgres=# explain analyze select * from tellers where
>> bid in ('1','2');

> Two other options are:

> SELECT *
>   FROM Tellers
>  WHERE bin = ANY( '1', '2' );

Note that depending on which PG version you are testing, x IN (a,b,c)
is exactly equivalent to x=a OR x=b OR x=c (older versions), or to
x = ANY(ARRAYa,b,c]) (newer versions).

>     SELECT T.*
>       FROM Tellers AS T
> INNER JOIN ( VALUES ( '1' ), ( '2' ) ) AS B( bin )
>         ON T.bin = B.bin;

I seriously doubt that one's gonna win ...

            regards, tom lane

Re: WHERE clause OR vs IN

From
Richard Broersma Jr
Date:
--- On Wed, 12/12/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I seriously doubt that one's gonna win ...

:-)

Re: WHERE clause OR vs IN

From
"Medi Montaseri"
Date:
Thanks everyone, and I know I am taking too much bandwidth, but...

The reason I was thinking IN would work better is that from a parser point of view confronted with a series of expressions chained via boolean operators such as
expr OR expr OR expr
The back end code generated has to be generic to combat with different type of expression such as equality, greater than, less than, or any fancy expressions. For example:
a == b OR c != d  AND e > f OR g == h
Where as in the case of
a IN (1, 2, 3, 4)
is actually a subset of the above composite expression because
1- the expressions are always equality of operand a with a literal value which means the operand can be cached (register)
2- the composite expression is always an OR chained expression where the first TRUE-ness would return the composite as TRUE (aka short circuit behavior)

I could be wrong...I have been wrong before...

Medi


On Dec 12, 2007 5:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Broersma Jr <rabroersma@yahoo.com> writes:
> --- On Wed, 12/12/07, Medi Montaseri <montaseri@gmail.com > wrote:
>> based on your report, run time of OR is 0.275 ms and IN is
>> 0.314
>>
> postgres=# explain analyze select * from tellers where
>> bid in ('1','2');

> Two other options are:

> SELECT *
>   FROM Tellers
>  WHERE bin = ANY( '1', '2' );

Note that depending on which PG version you are testing, x IN (a,b,c)
is exactly equivalent to x=a OR x=b OR x=c (older versions), or to
x = ANY(ARRAYa,b,c]) (newer versions).

>     SELECT T.*
>       FROM Tellers AS T
> INNER JOIN ( VALUES ( '1' ), ( '2' ) ) AS B( bin )
>         ON T.bin = B.bin;

I seriously doubt that one's gonna win ...

                       regards, tom lane