Thread: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."
Hi, when doing queries of the type: SELECT id FROM foo WHERE id IN (1, 4, 3, 2, 10, 11, 14) .., I get terrible performance on tables of any resonable size. I see the same behaviour when doing queries of the form "SELECT id FROM foo WHERE id = 5 OR id = 6 OR ..." When doing an "EXPLAIN" on the query, I get output like the following: Index Scan using foo_idx, foo_idx, foo_idx, foo_idx, foo_idx, foo_idx on foo (cost=0.00..18.16 rows=6 width=4) If the "IN (1, 2, 3, 6, ..., n)" clause is big enough, the database will actually throw an error saying "Recursive Depth Exceeded" or something similar and not complete the query. It looks to me like the query parser is recursively calling an index scan for each row in the 'IN' clause rather than just doing one index scan that it seems it should be. My question is, does anyone have any alternate ideas for how I can do a query like this and have it perform well? The tables I am working with are big enough that a sequential scan is not helpful. Is this a bug I am encountering or an error in my query? Is this a known issue? I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris and Linux platforms. Thanks for any input.
On Thursday 15 May 2003 10:44 pm, Mike Winter wrote: > Hi, when doing queries of the type: > > SELECT id FROM foo WHERE id IN (1, 4, 3, 2, 10, 11, 14) .., I get > terrible performance on tables of any resonable size. I see the > same behaviour when doing queries of the form "SELECT id FROM > foo WHERE id = 5 OR id = 6 OR ..." [snip] > It looks to me like the query parser is recursively calling > an index scan for each row in the 'IN' clause rather than just > doing one index scan that it seems it should be. Hmm - not sure how you could. When it says index-scan it's actually traversing a btree (probably), not scanning a list of indexes. The IN is basically treated like a series of a OR b OR c, hence the similar behaviour. > My question is, does anyone have any alternate ideas for how I > can do a query like this and have it perform well? The tables I > am working with are big enough that a sequential scan is not > helpful. Is this a bug I am encountering or an error in my > query? Is this a known issue? Known issue - the usual advice is to rewrite in the form of EXISTS, but I can't think how to do that if you have a long list of literal values. You could create a temp table to hold your matching values and join against it, but I realise that's not a terribly elegant solution. Unless of course, it's a search-engine type of situation where it makes a certain amount of sense. > I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris > and Linux platforms. Supposed to be some improvements in the forthcoming 7.4 but I don't know if that will help your particular case. -- Richard Huxton
Hi Mike, This is a well-known issue and to my knowledge has been addressed in the 7.4 branch. The recommended solution is to rephrase your query using EXISTS and eliminating the IN (hint: may require adding a join to the query); search pgsql-sql or pgsql-performance for details on others (this question is posted approximately weekly. Best, Randall On Thursday, May 15, 2003, at 05:44 PM, Mike Winter wrote: > Hi, when doing queries of the type: > > SELECT id FROM foo WHERE id IN (1, 4, 3, 2, 10, 11, 14) .., I get > terrible performance on tables of any resonable size. I see the > same behaviour when doing queries of the form "SELECT id FROM > foo WHERE id = 5 OR id = 6 OR ..." > > When doing an "EXPLAIN" on the query, I get output like the > following: > > Index Scan using foo_idx, foo_idx, foo_idx, foo_idx, foo_idx, > foo_idx on foo (cost=0.00..18.16 rows=6 width=4) > > If the "IN (1, 2, 3, 6, ..., n)" clause is big enough, the > database will actually throw an error saying "Recursive Depth > Exceeded" or something similar and not complete the query. > > It looks to me like the query parser is recursively calling > an index scan for each row in the 'IN' clause rather than just > doing one index scan that it seems it should be. > > My question is, does anyone have any alternate ideas for how I > can do a query like this and have it perform well? The tables I > am working with are big enough that a sequential scan is not > helpful. Is this a bug I am encountering or an error in my > query? Is this a known issue? > > I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris > and Linux platforms. > > Thanks for any input. > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
On Fri, 16 May 2003, Richard Huxton wrote: > > My question is, does anyone have any alternate ideas for how I > > can do a query like this and have it perform well? The tables I > > am working with are big enough that a sequential scan is not > > helpful. Is this a bug I am encountering or an error in my > > query? Is this a known issue? > > Known issue - the usual advice is to rewrite in the form of EXISTS, but I > can't think how to do that if you have a long list of literal values. You > could create a temp table to hold your matching values and join against it, > but I realise that's not a terribly elegant solution. Unless of course, it's > a search-engine type of situation where it makes a certain amount of sense. Thanks to everyone for the replies. The temporary table route is something we did. It's an acceptable stopgap, but it's still not outstanding in terms of performance and has lead to other issues. I will examine the 'EXISTS' clause option depending on the progress of 7.4. > > I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris > > and Linux platforms. > > Supposed to be some improvements in the forthcoming 7.4 but I don't know if > that will help your particular case. That's good news. -- _______________________________________________________________________ Front Logic Inc. Tel: 306.653.2725 x14 226 Pacific Ave or 1.800.521.4510 x14 Saskatoon, SK Fax: 306.653.0972 S7K 1N9 Canada Cell: 306.717.3746 http://www.frontlogic.com mike.winter@frontlogic.com Find out what TYPENGO(tm) N300 Search Technology can do for your company: http://www.frontlogic.com/interactive/hosts/typengo/index.html
Mike Winter <mike.winter@frontlogic.com> writes: > If the "IN (1, 2, 3, 6, ..., n)" clause is big enough, the > database will actually throw an error saying "Recursive Depth > Exceeded" or something similar and not complete the query. Would it perhaps be saying "out of free buffers: time to abort!" ? If so, you're probably running into this bug, which was introduced (by me :-() in 7.3: http://archives.postgresql.org/pgsql-hackers/2003-03/msg00939.php There is a fix in place for 7.3.3. > It looks to me like the query parser is recursively calling > an index scan for each row in the 'IN' clause rather than just > doing one index scan that it seems it should be. It is performing one index search per target value, yes, but not recursively. That's what it's supposed to do. regards, tom lane