Thread: index not used with subselect in where clause ?

index not used with subselect in where clause ?

From

Christian Fritze

Date:

16 April 2001, 13:20:20

Good evening (GMT+0200) everybody...

I have a table with an indexed integer attribute used to
determine which rows to fetch from my data base.
(pgsql version is 7.1RC4)


When doing a

  SELECT attr1 FROM table1 WHERE attr1 IN (<list of ints>)
                             AND <more conditions>;

where <list of ints> is entered explicitly (or generated by a
program) everything works fine and fast using the index on attr1.

But when I try doing a

  SELECT attr1 FROM table1 WHERE attr1 IN (<SUBSELECT returning list of ints>)
                             AND <more conditions>;

then the SELECT on table1 uses a sequential scan running 'endlessly'.

Is this really how it's meant to be?

If so, why?
If not, how can I get the second version working???


Thanks in advance...
Christian


--
"The sky above the port was the color of television,
 tuned to a dead channel."
                                         -- W.G. --

Re: index not used with subselect in where clause ?

From

Stephan Szabo

Date:

16 April 2001, 13:32:26

On Mon, 16 Apr 2001, Christian Fritze wrote:

>   SELECT attr1 FROM table1 WHERE attr1 IN (<list of ints>)
>                              AND <more conditions>;
>
> where <list of ints> is entered explicitly (or generated by a
> program) everything works fine and fast using the index on attr1.
>
> But when I try doing a
>
>   SELECT attr1 FROM table1 WHERE attr1 IN (<SUBSELECT returning list of ints>)
>                              AND <more conditions>;
>
> then the SELECT on table1 uses a sequential scan running 'endlessly'.

From the FAQ:

4.23) Why are my subqueries using IN so slow?

Currently, we join subqueries to outer queries by sequentially scanning
the result of the subquery for each row of the outer query. A workaround
is to replace IN with EXISTS:

SELECT *
    FROM tab
    WHERE col1 IN (SELECT col2 FROM TAB2)

to:
SELECT *
    FROM tab
    WHERE EXISTS (SELECT col2 FROM TAB2 WHERE col1 = col2)

We hope to fix this limitation in a future release.

Re: index not used with subselect in where clause ?

From

Christian Fritze

Date:

17 April 2001, 03:33:25


Hello and thank you for your quick reply...

> > But when I try doing a
> >
> >   SELECT attr1 FROM table1 WHERE attr1 IN (<SUBSELECT returning list of ints>)
> >                              AND <more conditions>;
> >
> > then the SELECT on table1 uses a sequential scan running 'endlessly'.
>
> >From the FAQ:
>
> 4.23) Why are my subqueries using IN so slow?

OOOPS, I confess that I didn't think of revisiting the source tree
after installing the complete HTML docs. I don't think the FAQ is
in there, is it? Maybe this would be helpful...

Anyway the solution from the FAQ doesn't seem to help me.
Maybe I wasn't clear enough on what I'm trying to do, so here is a
'real world' example:


1. That's my situation now:

   finn@chatsubo:~ > /Projekte/ENDEBIT/packages/pgsql-Test/bin/psql -p 7432
   zewtest_7.1 -c "explain select * from allmain where idn in (select distinct
   dokids_as_int from allslwfull where wort_nouml_lower like 'gen%')"
   NOTICE:  QUERY PLAN:

   Seq Scan on allmain  (cost=0.00..69328.08 rows=19619 width=556)
     SubPlan
       ->  Materialize  (cost=3.45..3.45 rows=1 width=4)
             ->  Unique  (cost=3.45..3.45 rows=1 width=4)
                   ->  Sort  (cost=3.45..3.45 rows=1 width=4)
                         ->  Index Scan using allslwfull_low_idx on allslwfull

   (cost=0.00..3.44 rows=1 width=4)

   EXPLAIN


2. FAQ 4.23 now seems to suggest something like

   finn@chatsubo:~ > /Projekte/ENDEBIT/packages/pgsql-Test/bin/psql -p 7432
   zewtest_7.1 -c "explain select * from allmain where exists (select distinct
   dokids_as_int from allslwfull where dokids_as_int = idn and
wort_nouml_lower
   like 'gen%')"
   NOTICE:  QUERY PLAN:

   Seq Scan on allmain  (cost=0.00..69328.08 rows=19619 width=556)
     SubPlan
       ->  Unique  (cost=3.45..3.46 rows=1 width=4)
             ->  Sort  (cost=3.45..3.45 rows=1 width=4)
                   ->  Index Scan using allslwfull_low_idx on allslwfull
   (cost=0.00..3.44 rows=1 width=4)

   EXPLAIN

Doesn't seem to make much of a difference... ;-)


3. On the other hand:

   finn@chatsubo:~ > /Projekte/ENDEBIT/packages/pgsql-Test/bin/psql -p 7432
   zewtest_7.1 -c "explain select distinct dokids_as_int from allslwfull where
   wort_nouml_lower like 'gen%'"
   NOTICE:  QUERY PLAN:

   Unique  (cost=3.45..3.45 rows=1 width=4)
     ->  Sort  (cost=3.45..3.45 rows=1 width=4)
           ->  Index Scan using allslwfull_low_idx on allslwfull
   (cost=0.00..3.44 rows=1 width=4)

   EXPLAIN
   finn@chatsubo:~ > /Projekte/ENDEBIT/packages/pgsql-Test/bin/psql -p 7432
   zewtest_7.1 -c "explain select * from allmain where idn in (677676, 34487,
   45353)"
   NOTICE:  QUERY PLAN:

   Index Scan using allmainidn_idx, allmainidn_idx, allmainidn_idx on allmain
   (cost=0.00..10.44 rows=1 width=556)

   EXPLAIN

Being far from knowlegeable in terms of database theory and looking just
at the figures returned by EXPLAIN I wondered if it wouldn't be much
faster to simply evaluate the inner query and hand the result over to
the outer query instead of performing an expensive join behind the scenes...

Am I thinking too naively here?


greetings...
Christian


--
"The sky above the port was the color of television,
 tuned to a dead channel."
                                         -- W.G. --

Re: index not used with subselect in where clause ?

From

Tom Lane

Date:

17 April 2001, 10:43:31

Christian Fritze <The.Finn@sprawl.de> writes:
>    explain select * from allmain where exists (select distinct
>    dokids_as_int from allslwfull where dokids_as_int = idn and
>    wort_nouml_lower like 'gen%')

Try dropping the "distinct" on the inner select.  As a moment's thought
will reveal, it's not buying you anything; and it's costing you sort
and unique passes over the subplan result.

            regards, tom lane

Re: index not used with subselect in where clause ?

From

"Rod Taylor"

Date:

17 April 2001, 11:40:08

Adding a LIMIT 1 in the subplan may also help -- as you only need a
single match to make it true so additional finds are useless -- it'll
stop sooner or will be more likely to use an index than a full table
scan.
--
Rod Taylor

There are always four sides to every story: your side, their side, the
truth, and what really happened.
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Christian Fritze" <The.Finn@sprawl.de>
Cc: "Stephan Szabo" <sszabo@megazone23.bigpanda.com>;
<pgsql-general@postgresql.org>
Sent: Tuesday, April 17, 2001 10:33 AM
Subject: Re: [GENERAL] index not used with subselect in where clause ?


> Christian Fritze <The.Finn@sprawl.de> writes:
> >    explain select * from allmain where exists (select distinct
> >    dokids_as_int from allslwfull where dokids_as_int = idn and
> >    wort_nouml_lower like 'gen%')
>
> Try dropping the "distinct" on the inner select.  As a moment's
thought
> will reveal, it's not buying you anything; and it's costing you sort
> and unique passes over the subplan result.
>
> regards, tom lane
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

Re: index not used with subselect in where clause ?

From

Tom Lane

Date:

17 April 2001, 12:20:43

"Rod Taylor" <rod.taylor@inquent.com> writes:
> Adding a LIMIT 1 in the subplan may also help -- as you only need a
> single match to make it true so additional finds are useless -- it'll
> stop sooner or will be more likely to use an index than a full table
> scan.

I believe this is not necessary; the WHERE EXISTS(...) context is
sufficient to cue the planner that only one tuple will be retrieved,
and it will alter the subselect's plan accordingly.

            regards, tom lane

Re: index not used with subselect in where clause ?

From

Christian Fritze

Date:

17 April 2001, 13:22:31

> Adding a LIMIT 1 in the subplan may also help -- as you only need a
> single match to make it true so additional finds are useless -- it'll
> stop sooner or will be more likely to use an index than a full table
> scan.
> --
> Rod Taylor

I'm not sure if I understand you correctly here: the subplan uses an
index scan already. It's the seq. scan in the outer query that makes
me whine.

> There are always four sides to every story: your side, their side, the
> truth, and what really happened.

Off Topic: I like that .sig, thoughts of "The man who shot Liberty Valance"
come to mind... :-)

> From: "Tom Lane" <tgl@sss.pgh.pa.us>
[...]
> > Christian Fritze <The.Finn@sprawl.de> writes:
> > >    explain select * from allmain where exists (select distinct
> > >    dokids_as_int from allslwfull where dokids_as_int = idn and
> > >    wort_nouml_lower like 'gen%')
> >
> > Try dropping the "distinct" on the inner select.  As a moment's

Yep, that increases performance...
...by about 0.35 % according to EXPLAIN :-{

Well, what I'm doing right now is the following:

I perform the inner query (which is reasonably fast) and pump the result
through the JDBC driver into my application. There I build the outer query
with an explicit list of integers for the WHERE clause and hand that query
back to the data base.

But that doesn't seem very smart either: in cases where the inner query
returns only a few results it's not really necessary. In cases where it
returns a few thousands, I need to split the outer query in order not to
run into a 'query too long' error (which comes from the jdbc driver rather
than from pgsql if I'm right? Maybe I should try to tweak that driver? Hmmm...)
That splitting however eats away much (if not all) of the intended
performance gain.

greetings...
Christian

--
"The sky above the port was the color of television,
 tuned to a dead channel."
                                         -- W.G. --