Re: query question - Mailing list pgsql-performance

From Josh Berkus
Subject Re: query question
Date
Msg-id 200212061638.18529.josh@agliodbs.com
Whole thread Raw
In response to query question  (Laurette Cisneros <laurette@nextbus.com>)
List pgsql-performance
Laurette,

> This query:
> select distinct x, y
>   from table1 t
>   join table2 t2
>  using (col1)
> order by x;
>
> is *slower* than this query:
>
> select disting x, y
>   from table1
>  where col1 = (select col1 from table2)
> ORDER BY x;
>
> Is this because in the latter case the select col1 is cached?

Yes.   For all of the following structures:

where x = (select col from table)
where x IN (select col from table)
where x NOT IN (select col from table)
where x != ANY(select col from table)
etc.,

... Postgres must process the full subquery, return the results, and compare
all of the results as individual values against the reference column.

However, if you re-wrote the query as:

 select distint x, y
   from table1
  where EXISTS (select col1 from table2
    where table2.col1 = table1.col1)
 ORDER BY x;

... then Postgres would be able to use JOIN optimizations to evaluate the
subquery and pull a subset of relevant records or even use an index, making
the query *much* faster.

> Ooo, I would love to have a web page full of these tidbits (along with how
> to get around the max and min aggregates and why as an example..., etc.)!

Um:

http://techdocs.postgresql.org/guides/

Add your own Wiki page!


--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
                                        Josh Berkus
   Complete information technology     josh@agliodbs.com
    and data management solutions     (415) 565-7293
   for law firms, small businesses      fax 621-2533
    and non-profit organizations.     San Francisco


pgsql-performance by date:

Previous
From: Laurette Cisneros
Date:
Subject: query question
Next
From: Josh Berkus
Date:
Subject: Re: Speeding up aggregates