Re: IN or EXISTS - Mailing list pgsql-performance

From Jeff Davis
Subject Re: IN or EXISTS
Date
Msg-id 1316741520.31091.7.camel@sussancws0025
Whole thread Raw
In response to Re: IN or EXISTS  (Craig Ringer <ringerc@ringerc.id.au>)
List pgsql-performance
On Wed, 2011-08-31 at 09:33 +0800, Craig Ringer wrote:
> On the other hand, the `IN' subquery is uncorrelated needs only run
> once, where the `EXISTS' subquery is correlated and has to run once for
> every outer record.

If the EXISTS looks semantically similar to an IN (aside from NULL
semantics), then it can be made into a semijoin. It doesn't require
re-executing any part of the plan.

I don't think there are any cases where [NOT] IN is an improvement, am I
mistaken?

> Another complication is the possible presence of NULL in an IN list.
> Getting NULLs in `IN' lists is a common source of questions on this
> list, because people are quite surprised by how it works. EXISTS avoids
> the NULL handling issue (and in the process demonstrates how woefully
> inconsistent SQL's handling of NULL really is).

Absolutely. The NULL behavior of IN is what makes it hard to optimize,
and therefore you should use EXISTS instead if the semantics are
suitable.

> Theoretically the query planner could transform:
>
> SELECT * from y WHERE y.id IN (SELECT DISTINCT z.y_id FROM z WHERE
> z.y_id IS NOT NULL);
>
> into:
>
> SELECT * FROM y WHERE EXISTS (SELECT 1 FROM z WHERE z.y_id = y.id)
>
> ... or vice versa depending on which it thought would be faster.

Although those two queries are semantically the same (I think), a lot of
very similar pairs of queries are not equivalent. For instance, if it
was a NOT IN you couldn't change that to a NOT EXISTS.

Regards,
    Jeff Davis


pgsql-performance by date:

Previous
From: Michael Viscuso
Date:
Subject: Re: Query optimization using order by and limit
Next
From: Gunnlaugur Þór Briem
Date:
Subject: Re: Constraint exclusion on UNION ALL subqueries with WHERE conditions