Thread: Understanding INNER JOIN versus IN subquery

Understanding INNER JOIN versus IN subquery

From

Robert James

Date:

20 July 2009, 13:38:12

I have two queries which should be equivalent. The Planner plans them differently, although they are both about the same time. Can someone explain why?

select word from dict
where
word in
(select substr('moon', 0, generate_series(3,length('moon'))))

select * from dict
inner join (select substr('moon', 0, generate_series(3,length('moon')))) as m
on dict.word = m.substr

Is one preferred?

Re: Understanding INNER JOIN versus IN subquery

From

Greg Stark

Date:

20 July 2009, 14:22:29

On Mon, Jul 20, 2009 at 2:37 PM, Robert James<srobertjames@gmail.com> wrote:
> I have two queries which should be equivalent.  The Planner plans them
> differently, although they are both about the same time.  Can someone
> explain why?

Uhm, please post the two plans and the server version. I know you've
posted them before but that was in another thread that I don't have
handy.

The short answer is that the server doesn't know that the subquery is
definitely going to produce distinct results so it doesn't know the
inner join won't produce duplicates. So it can't tell that these
queries might be equivalent.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: Understanding INNER JOIN versus IN subquery

From

Grzegorz Jaśkiewicz

Date:

20 July 2009, 14:28:49

On Mon, Jul 20, 2009 at 2:37 PM, Robert James<srobertjames@gmail.com> wrote:
> I have two queries which should be equivalent.  The Planner plans them
> differently, although they are both about the same time.  Can someone
> explain why?
> select word from dict
> where
>  word in
>  (select substr('moon', 0, generate_series(3,length('moon'))))
>
> select * from dict
> inner join (select substr('moon', 0, generate_series(3,length('moon')))) as
> m
> on dict.word = m.substr
>
> Is one preferred?

it is hard to say from your example. But in my general expierence, I
noticed similar thing many times.
Needless to say, I prefer personally to choose JOINs instead of IN(),
because those tend to be faster on postgresql.