Thread: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."

Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."

From

Mike Winter

Date:

16 May 2003, 12:44:31

Hi, when doing queries of the type:

SELECT id FROM foo WHERE id IN (1, 4, 3, 2, 10, 11, 14) .., I get
terrible performance on tables of any resonable size.  I see the
same behaviour when doing queries of the form "SELECT id FROM
foo WHERE id = 5 OR id = 6 OR ..."

When doing an "EXPLAIN" on the query, I get output like the
following:

Index Scan using foo_idx, foo_idx, foo_idx, foo_idx, foo_idx,
foo_idx on foo (cost=0.00..18.16 rows=6 width=4)

If the "IN (1, 2, 3, 6, ..., n)" clause is big enough, the
database will actually throw an error saying "Recursive Depth
Exceeded" or something similar and not complete the query.

It looks to me like the query parser is recursively calling
an index scan for each row in the 'IN' clause rather than just
doing one index scan that it seems it should be.

My question is, does anyone have any alternate ideas for how I
can do a query like this and have it perform well?  The tables I
am working with are big enough that a sequential scan is not
helpful.  Is this a bug I am encountering or an error in my
query?  Is this a known issue?

I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris
and Linux platforms.

Thanks for any input.

Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."

From

Richard Huxton

Date:

16 May 2003, 13:29:14

On Thursday 15 May 2003 10:44 pm, Mike Winter wrote:
> Hi, when doing queries of the type:
>
> SELECT id FROM foo WHERE id IN (1, 4, 3, 2, 10, 11, 14) .., I get
> terrible performance on tables of any resonable size.  I see the
> same behaviour when doing queries of the form "SELECT id FROM
> foo WHERE id = 5 OR id = 6 OR ..."
[snip]
> It looks to me like the query parser is recursively calling
> an index scan for each row in the 'IN' clause rather than just
> doing one index scan that it seems it should be.

Hmm - not sure how you could. When it says index-scan it's actually traversing
a btree (probably), not scanning a list of indexes. The IN is basically
treated like a series of a OR b OR c, hence the similar behaviour.

> My question is, does anyone have any alternate ideas for how I
> can do a query like this and have it perform well?  The tables I
> am working with are big enough that a sequential scan is not
> helpful.  Is this a bug I am encountering or an error in my
> query?  Is this a known issue?

Known issue - the usual advice is to rewrite in the form of EXISTS, but I
can't think how to do that if you have a long list of literal values. You
could create a temp table to hold your matching values and join against it,
but I realise that's not a terribly elegant solution. Unless of course, it's
a search-engine type of situation where it makes a certain amount of sense.

> I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris
> and Linux platforms.

Supposed to be some improvements in the forthcoming 7.4 but I don't know if
that will help your particular case.
--  Richard Huxton

Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."

From

Randall Lucas

Date:

16 May 2003, 13:35:08

Hi Mike,

This is a well-known issue and to my knowledge has been addressed in 
the 7.4 branch.

The recommended solution is to rephrase your query using EXISTS and 
eliminating the IN (hint: may require adding a join to the query); 
search pgsql-sql or pgsql-performance for details on others (this 
question is posted approximately weekly.

Best,

Randall

On Thursday, May 15, 2003, at 05:44 PM, Mike Winter wrote:

> Hi, when doing queries of the type:
>
> SELECT id FROM foo WHERE id IN (1, 4, 3, 2, 10, 11, 14) .., I get
> terrible performance on tables of any resonable size.  I see the
> same behaviour when doing queries of the form "SELECT id FROM
> foo WHERE id = 5 OR id = 6 OR ..."
>
> When doing an "EXPLAIN" on the query, I get output like the
> following:
>
> Index Scan using foo_idx, foo_idx, foo_idx, foo_idx, foo_idx,
> foo_idx on foo (cost=0.00..18.16 rows=6 width=4)
>
> If the "IN (1, 2, 3, 6, ..., n)" clause is big enough, the
> database will actually throw an error saying "Recursive Depth
> Exceeded" or something similar and not complete the query.
>
> It looks to me like the query parser is recursively calling
> an index scan for each row in the 'IN' clause rather than just
> doing one index scan that it seems it should be.
>
> My question is, does anyone have any alternate ideas for how I
> can do a query like this and have it perform well?  The tables I
> am working with are big enough that a sequential scan is not
> helpful.  Is this a bug I am encountering or an error in my
> query?  Is this a known issue?
>
> I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris
> and Linux platforms.
>
> Thanks for any input.
>
>
>
>
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR

From

Mike Winter

Date:

16 May 2003, 15:18:00

On Fri, 16 May 2003, Richard Huxton wrote:

> > My question is, does anyone have any alternate ideas for how I
> > can do a query like this and have it perform well?  The tables I
> > am working with are big enough that a sequential scan is not
> > helpful.  Is this a bug I am encountering or an error in my
> > query?  Is this a known issue?
>
> Known issue - the usual advice is to rewrite in the form of EXISTS, but I
> can't think how to do that if you have a long list of literal values. You
> could create a temp table to hold your matching values and join against it,
> but I realise that's not a terribly elegant solution. Unless of course, it's
> a search-engine type of situation where it makes a certain amount of sense.

Thanks to everyone for the replies.

The temporary table route is something we did.  It's an acceptable
stopgap, but it's still not outstanding in terms of performance
and has lead to other issues.  I will examine the 'EXISTS' clause
option depending on the progress of 7.4.

> > I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris
> > and Linux platforms.
>
> Supposed to be some improvements in the forthcoming 7.4 but I don't know if
> that will help your particular case.

That's good news.

-- 
_______________________________________________________________________
Front Logic Inc.                                  Tel: 306.653.2725 x14
226 Pacific Ave                                   or 1.800.521.4510 x14
Saskatoon, SK                                     Fax: 306.653.0972
S7K 1N9  Canada                      Cell: 306.717.3746
http://www.frontlogic.com                   mike.winter@frontlogic.com

Find out what TYPENGO(tm) N300 Search Technology can do for your
company: http://www.frontlogic.com/interactive/hosts/typengo/index.html

Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."

From

Tom Lane

Date:

16 May 2003, 15:25:28

Mike Winter <mike.winter@frontlogic.com> writes:
> If the "IN (1, 2, 3, 6, ..., n)" clause is big enough, the
> database will actually throw an error saying "Recursive Depth
> Exceeded" or something similar and not complete the query.

Would it perhaps be saying "out of free buffers: time to abort!" ?
If so, you're probably running into this bug, which was introduced
(by me :-() in 7.3:
http://archives.postgresql.org/pgsql-hackers/2003-03/msg00939.php
There is a fix in place for 7.3.3.

> It looks to me like the query parser is recursively calling
> an index scan for each row in the 'IN' clause rather than just
> doing one index scan that it seems it should be.

It is performing one index search per target value, yes, but not
recursively.  That's what it's supposed to do.
        regards, tom lane