Thread: Question about subselect/IN performance

Question about subselect/IN performance

From
"T.H."
Date:
I have a query that's running an IN/Subselect that joins three different
tables and gets a list of IDs to compare against... the subselect
basically looks for records through a join table based on the 3rd
table's name, similar to:

... IN (SELECT id FROM foo, foo_bar, bar
         WHERE foo.id = foo_bar.foo_id
             AND bar.id = foo_bar.bar_id
             AND bar.name = "something") ...

This is all nested in a fairly complex query, and several of these
subselects operate on different tables within the query. The whole
thing, on some high-cardinality cases, can take 2.5 seconds to run
(clearly something can be done about that).

So in this example, the cardinality of the bar table is very low, and
fairly constant, something on the order of 5-7 records. In an
optimization attempt, I reduced the joins in the subselect from 2 to 1
by passing in the ID of the bar with the correct name, which I can
easily cache application-side or pre-fetch in a single query. Now it
looks like this:

... IN (SELECT id FROM foo, foo_bar
          WHERE foo.id = foo_bar.foo_id
              AND foo_bar.bar_id = 1) ...

Crazy thing is, that single optimization reduced the query time
significantly, from 2.5-3 seconds down to 40-60ms.

Does anyone have any kind of explanation for this? Are the inner
workings of the IN clause taking the plan for the subselect into account
when running, and doing something clever with it? Any insight on the
internal mechanisms of IN or subselects in Postgres would be greatly
appreciated if anyone knows more.

Also, are there any better ways you can think of doing such an IN query,
using non-subselect means that might be more efficient?

Thanks in advance, any advice/help understanding this better is greatly
appreciated.

Re: Question about subselect/IN performance

From
"Kevin Grittner"
Date:
"T.H." <calinet6@gmail.com> wrote:

> Also, are there any better ways you can think of doing such an IN
> query, using non-subselect means that might be more efficient?

Have you tried the EXISTS predicate?

-Kevin

Re: Question about subselect/IN performance

From
"T.H."
Date:
On 11/30/10 5:54 PM, Kevin Grittner wrote:
> "T.H."<calinet6@gmail.com>  wrote:
>
>> Also, are there any better ways you can think of doing such an IN
>> query, using non-subselect means that might be more efficient?
>
> Have you tried the EXISTS predicate?
>
> -Kevin
>

Just looking into it now, thanks for the suggestion. Is there a reason
that EXISTS is generally faster than IN for this sort of query?

-Tristan

Re: Question about subselect/IN performance

From
bricklen
Date:
On Tue, Nov 30, 2010 at 3:23 PM, T.H. <calinet6@gmail.com> wrote:
> Just looking into it now, thanks for the suggestion. Is there a reason that
> EXISTS is generally faster than IN for this sort of query?
>
> -Tristan

Exists will return immediately upon finding a match -- assuming there is one.