Thread: OR or IN ?

OR or IN ?

From
Luca Ferrari
Date:
Hi all,
I've got a query with a long (>50) list of ORs, like the following:

SELECT colB, colC FROM table WHERE colA=X OR colA=Y OR colA=Z OR ....

Is there any difference in how postgresql manages the above query and the
following one?

SELECT colB, colC FROM table WHERE colA IN (X,Y,Z,...)

Which is the suggested index to use on colA to get better performances?

Thanks,
Luca

Re: OR or IN ?

From
"A. Kretschmer"
Date:
am  Tue, dem 14.10.2008, um  8:33:21 +0200 mailte Luca Ferrari folgendes:
> Hi all,
> I've got a query with a long (>50) list of ORs, like the following:
>
> SELECT colB, colC FROM table WHERE colA=X OR colA=Y OR colA=Z OR ....
>
> Is there any difference in how postgresql manages the above query and the
> following one?
>
> SELECT colB, colC FROM table WHERE colA IN (X,Y,Z,...)

Depends on the version: (same table foo)


8.1:
test=*# explain select * from foo where a in (1,2,3);
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on foo  (cost=0.00..47.45 rows=32 width=4)
   Filter: ((a = 1) OR (a = 2) OR (a = 3))


8.4devel:
test=# explain select * from foo where a in (1,2,3);
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on foo  (cost=0.00..43.00 rows=36 width=4)
   Filter: (a = ANY ('{1,2,3}'::integer[]))


As you can see, 8.1 rewrite the query to many ORs.



Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: OR or IN ?

From
Harald Fuchs
Date:
In article <20081014064831.GB22137@a-kretschmer.de>,
"A. Kretschmer" <andreas.kretschmer@schollglas.com> writes:

> am  Tue, dem 14.10.2008, um  8:33:21 +0200 mailte Luca Ferrari folgendes:
>> Hi all,
>> I've got a query with a long (>50) list of ORs, like the following:
>>
>> SELECT colB, colC FROM table WHERE colA=X OR colA=Y OR colA=Z OR ....
>>
>> Is there any difference in how postgresql manages the above query and the
>> following one?
>>
>> SELECT colB, colC FROM table WHERE colA IN (X,Y,Z,...)

> Depends on the version: (same table foo)


> 8.1:
> test=*# explain select * from foo where a in (1,2,3);
>                      QUERY PLAN
> -----------------------------------------------------
>  Seq Scan on foo  (cost=0.00..47.45 rows=32 width=4)
>    Filter: ((a = 1) OR (a = 2) OR (a = 3))


> 8.4devel:
> test=# explain select * from foo where a in (1,2,3);
>                      QUERY PLAN
> -----------------------------------------------------
>  Seq Scan on foo  (cost=0.00..43.00 rows=36 width=4)
>    Filter: (a = ANY ('{1,2,3}'::integer[]))


> As you can see, 8.1 rewrite the query to many ORs.

I think that "OR or IN" is the wrong question.  Where do those >50
values come from?  If they come from a DB operation, just include this
operation in your query.  Otherwise, you should ask yourself how many
values you might get: >50, >500, >5000?  There's a point where it's
probably more efficient to COPY all those values into an indexed
temporary table, ANALYZE it, and JOIN it into your query.

Re: OR or IN ?

From
Scott Ribe
Date:
Older versions of PG were inefficient with larger numbers of elements in an
IN query, and in fact would error out with something about lack of stack
space if you used very many (a few hundred IIRC).

8.x something was supposed to have improved that. Using 8.3 recently, after
an "oopsie" with some development data, I inadvertently confirmed that it
works and performance is not too bad with >34,000 items in an IN clause ;-)


--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: OR or IN ?

From
"Andrus"
Date:
> 8.x something was supposed to have improved that.

Not in 8.0
8.0 (and maybe newer releases also) throw dreaded  "Stack lenght limit
exceeded"

 message if IN contains large number or items.

Andrus.


Re: OR or IN ?

From
Luca Ferrari
Date:
On Friday 17 October 2008 Scott Ribe's cat, walking on the keyboard, wrote:
> Older versions of PG were inefficient with larger numbers of elements in an
> IN query, and in fact would error out with something about lack of stack
> space if you used very many (a few hundred IIRC).
>
> 8.x something was supposed to have improved that. Using 8.3 recently, after
> an "oopsie" with some development data, I inadvertently confirmed that it
> works and performance is not too bad with >34,000 items in an IN clause ;-)

Interesting, since my queries sometimes expand to a few thousands of ORs (or
INs), and I had in fact stack problems and I had to expand it to around 20
MB.....

Luca

Re: OR or IN ?

From
"Scott Marlowe"
Date:
On Fri, Oct 17, 2008 at 8:24 AM, Andrus <kobruleht2@hot.ee> wrote:
>> 8.x something was supposed to have improved that.
>
> Not in 8.0
> 8.0 (and maybe newer releases also) throw dreaded  "Stack lenght limit
> exceeded"
>
> message if IN contains large number or items.

Have y'all checked them there release notes?  I'm sure they tell the
tale. (sorry too much True Blood...)

Re: OR or IN ?

From
"Andrus"
Date:
>> 8.0 (and maybe newer releases also) throw dreaded  "Stack lenght limit
>> exceeded"
>>
>> message if IN contains large number or items.
>
> Have y'all checked them there release notes?  I'm sure they tell the
> tale. (sorry too much True Blood...)

No.  I need to support 8.0.
After those crashes I changed all IN clauses to

docid = ANY ({1,2,3,4,5,7,8,96,5445,.., 34534534})

this takes the same number of characters per id as IN clause.

I hope that this does not have any bad effects.

Andrus.

Re: OR or IN ?

From
Tom Lane
Date:
"Andrus" <kobruleht2@hot.ee> writes:
> No.  I need to support 8.0.
> After those crashes I changed all IN clauses to
> docid = ANY ({1,2,3,4,5,7,8,96,5445,.., 34534534})

I'm afraid that hardly counts as "supporting" 8.0 (or 8.1 for that
matter).  The syntax may work but it can't be indexed.

            regards, tom lane

Re: OR or IN ?

From
"Andrus"
Date:
>> No.  I need to support 8.0.
>> After those crashes I changed all IN clauses to
>> docid = ANY ({1,2,3,4,5,7,8,96,5445,.., 34534534})
>
> I'm afraid that hardly counts as "supporting" 8.0 (or 8.1 for that
> matter).  The syntax may work but it can't be indexed.

I'm sorry. It seems that Richard Huxton suggestion in this mailing list

SELECT * FROM generate_series(1,10) s
WHERE s = ANY (string_to_array('1,3,5,7', ',')::int[]);

works in 8.0
I hope that this works as well as new array syntax.

Andrus.