Thread: Bug #598: optimizer: convert 'IN' to join

Bug #598: optimizer: convert 'IN' to join

From

pgsql-bugs@postgresql.org

Date:

20 February 2002, 16:39:11

piers haken (piersh@friskit.com) reports a bug with a severity of 3
The lower the number the more severe it is.

Short Description
optimizer: convert 'IN' to join

Long Description
the optimizer should do better than a sequential scan with statements like:

SELECT * FROM t1 WHERE t1.index IN (SELECT t2.index FROM t2);

this gives:
  Seq Scan on t1
    SubPlan
      -> Seq scan on t2

this is equivalent to (and should be transformed to)

SELECT t1.* FROM t1, t2 WHERE t1.index = t2.index;

which gives the much faster:
  Nested Loop
  -> Seq Scan on t1
  -> Index Scan using t2_pkey on t2

FYI: SQL Server generates a hash table from t1 and probes it with pkey values of t2 read from a NON-primary index. For
SQLServer, scanning a non-primary key takes fewer disk reads than scanning a primary key. 

if you add a condition to the subquery on a non-unique column:

SELECT * FROM t1 WHERE t1.index IN (SELECT t2.index FROM t2 WHERE t2.value='something');
or
SELECT t1.* FROM t1, t2 WHERE t1.index=t2.index AND t2.value='something';

you also get two different plans:

  Seq Scan on t1
    SubPlan
      ->  Materialize
        ->  Index Scan using ix_t2_value on t2

as opposed to

  Hash Join
    ->  Seq Scan on t1
    ->  Hash
      ->  Index Scan using ix_t2_value on t2



Sample Code


No file was uploaded with this report