Thread: OR or IN ?
Hi all, I've got a query with a long (>50) list of ORs, like the following: SELECT colB, colC FROM table WHERE colA=X OR colA=Y OR colA=Z OR .... Is there any difference in how postgresql manages the above query and the following one? SELECT colB, colC FROM table WHERE colA IN (X,Y,Z,...) Which is the suggested index to use on colA to get better performances? Thanks, Luca
am Tue, dem 14.10.2008, um 8:33:21 +0200 mailte Luca Ferrari folgendes: > Hi all, > I've got a query with a long (>50) list of ORs, like the following: > > SELECT colB, colC FROM table WHERE colA=X OR colA=Y OR colA=Z OR .... > > Is there any difference in how postgresql manages the above query and the > following one? > > SELECT colB, colC FROM table WHERE colA IN (X,Y,Z,...) Depends on the version: (same table foo) 8.1: test=*# explain select * from foo where a in (1,2,3); QUERY PLAN ----------------------------------------------------- Seq Scan on foo (cost=0.00..47.45 rows=32 width=4) Filter: ((a = 1) OR (a = 2) OR (a = 3)) 8.4devel: test=# explain select * from foo where a in (1,2,3); QUERY PLAN ----------------------------------------------------- Seq Scan on foo (cost=0.00..43.00 rows=36 width=4) Filter: (a = ANY ('{1,2,3}'::integer[])) As you can see, 8.1 rewrite the query to many ORs. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
In article <20081014064831.GB22137@a-kretschmer.de>, "A. Kretschmer" <andreas.kretschmer@schollglas.com> writes: > am Tue, dem 14.10.2008, um 8:33:21 +0200 mailte Luca Ferrari folgendes: >> Hi all, >> I've got a query with a long (>50) list of ORs, like the following: >> >> SELECT colB, colC FROM table WHERE colA=X OR colA=Y OR colA=Z OR .... >> >> Is there any difference in how postgresql manages the above query and the >> following one? >> >> SELECT colB, colC FROM table WHERE colA IN (X,Y,Z,...) > Depends on the version: (same table foo) > 8.1: > test=*# explain select * from foo where a in (1,2,3); > QUERY PLAN > ----------------------------------------------------- > Seq Scan on foo (cost=0.00..47.45 rows=32 width=4) > Filter: ((a = 1) OR (a = 2) OR (a = 3)) > 8.4devel: > test=# explain select * from foo where a in (1,2,3); > QUERY PLAN > ----------------------------------------------------- > Seq Scan on foo (cost=0.00..43.00 rows=36 width=4) > Filter: (a = ANY ('{1,2,3}'::integer[])) > As you can see, 8.1 rewrite the query to many ORs. I think that "OR or IN" is the wrong question. Where do those >50 values come from? If they come from a DB operation, just include this operation in your query. Otherwise, you should ask yourself how many values you might get: >50, >500, >5000? There's a point where it's probably more efficient to COPY all those values into an indexed temporary table, ANALYZE it, and JOIN it into your query.
Older versions of PG were inefficient with larger numbers of elements in an IN query, and in fact would error out with something about lack of stack space if you used very many (a few hundred IIRC). 8.x something was supposed to have improved that. Using 8.3 recently, after an "oopsie" with some development data, I inadvertently confirmed that it works and performance is not too bad with >34,000 items in an IN clause ;-) -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
> 8.x something was supposed to have improved that. Not in 8.0 8.0 (and maybe newer releases also) throw dreaded "Stack lenght limit exceeded" message if IN contains large number or items. Andrus.
On Friday 17 October 2008 Scott Ribe's cat, walking on the keyboard, wrote: > Older versions of PG were inefficient with larger numbers of elements in an > IN query, and in fact would error out with something about lack of stack > space if you used very many (a few hundred IIRC). > > 8.x something was supposed to have improved that. Using 8.3 recently, after > an "oopsie" with some development data, I inadvertently confirmed that it > works and performance is not too bad with >34,000 items in an IN clause ;-) Interesting, since my queries sometimes expand to a few thousands of ORs (or INs), and I had in fact stack problems and I had to expand it to around 20 MB..... Luca
On Fri, Oct 17, 2008 at 8:24 AM, Andrus <kobruleht2@hot.ee> wrote: >> 8.x something was supposed to have improved that. > > Not in 8.0 > 8.0 (and maybe newer releases also) throw dreaded "Stack lenght limit > exceeded" > > message if IN contains large number or items. Have y'all checked them there release notes? I'm sure they tell the tale. (sorry too much True Blood...)
>> 8.0 (and maybe newer releases also) throw dreaded "Stack lenght limit >> exceeded" >> >> message if IN contains large number or items. > > Have y'all checked them there release notes? I'm sure they tell the > tale. (sorry too much True Blood...) No. I need to support 8.0. After those crashes I changed all IN clauses to docid = ANY ({1,2,3,4,5,7,8,96,5445,.., 34534534}) this takes the same number of characters per id as IN clause. I hope that this does not have any bad effects. Andrus.
"Andrus" <kobruleht2@hot.ee> writes: > No. I need to support 8.0. > After those crashes I changed all IN clauses to > docid = ANY ({1,2,3,4,5,7,8,96,5445,.., 34534534}) I'm afraid that hardly counts as "supporting" 8.0 (or 8.1 for that matter). The syntax may work but it can't be indexed. regards, tom lane
>> No. I need to support 8.0. >> After those crashes I changed all IN clauses to >> docid = ANY ({1,2,3,4,5,7,8,96,5445,.., 34534534}) > > I'm afraid that hardly counts as "supporting" 8.0 (or 8.1 for that > matter). The syntax may work but it can't be indexed. I'm sorry. It seems that Richard Huxton suggestion in this mailing list SELECT * FROM generate_series(1,10) s WHERE s = ANY (string_to_array('1,3,5,7', ',')::int[]); works in 8.0 I hope that this works as well as new array syntax. Andrus.