Re: [SQL] Subselect performance - Mailing list pgsql-sql

From Tom Lane
Subject Re: [SQL] Subselect performance
Date
Msg-id 21555.937923332@sss.pgh.pa.us
Whole thread Raw
In response to Subselect performance  (Daniel Lopez <ridruejo@atm9.com.dtu.dk>)
Responses Re: [SQL] Subselect performance  (Stuart Rison <rison@biochemistry.ucl.ac.uk>)
List pgsql-sql
Daniel Lopez <ridruejo@atm9.com.dtu.dk> writes:
> $list = select d from c
> select b from a where b in ( $list )
> is  5 seconds

> select b from a where b in (select d from c) 
> is 3 minutes!!  (although it should be at least as fast as a)!

Not necessarily.  Your first example is depending on the fact that
the "list" (number of values selected from c) is short.  Try it with
10000 or 100000 values from c, if you want to see the backend crash ;-)

The second case works OK even if the sub-select result is large, because
it re-executes the sub-select for each row from a (essentially,
rescanning c to see if b is matched).  However, this means the runtime
is proportional to the product of the number of rows in a and c.  Ugh.

Try rewriting your query as a join:
select a.b from a, c where a.b = c.d

(you might want "select distinct" here, if b can match many rows from d).
If the system can't figure out anything better than a nested-loop join,
then it'll probably end up taking about the same amount of time, but
this form at least allows the possibility of a smarter join method.

I believe we have a TODO list item to perform this sort of
transformation automatically when the sub-select is of a form that
allows it.  We need to get the left/outer join stuff working first
in order to have an exact match of the behavior.
        regards, tom lane


pgsql-sql by date:

Previous
From: "Mike Field"
Date:
Subject: Search Engine with PHP
Next
From: "Jörg R. Rudnick"
Date:
Subject: SQL Functions on Composite Types - what's the purpose??