Thread: IN vs EXISTS

IN vs EXISTS

From
askel
Date:
Hi all,

I have been using IN clause almost exclusively until recently I tried
to use EXISTS and gained significant performance increase without
changing/creating any indexes:

SELECT ... FROM a WHERE a.ref IN (SELECT b.id WHERE ...)

vs

SELECT ... FROM a WHERE EXISTS (SELECT 1 FROM b WHERE a.ref=b.id ...)

Performance is at least few times better when EXISTS is used. Is it
just PostgreSQL specific?

Cheers

Re: IN vs EXISTS

From
"Adam Rich"
Date:
> -----Original Message-----
>
> Hi all,
>
> I have been using IN clause almost exclusively until recently I tried
> to use EXISTS and gained significant performance increase without
> changing/creating any indexes:
>
> SELECT ... FROM a WHERE a.ref IN (SELECT b.id WHERE ...)
>
> vs
>
> SELECT ... FROM a WHERE EXISTS (SELECT 1 FROM b WHERE a.ref=b.id ...)
>
> Performance is at least few times better when EXISTS is used. Is it
> just PostgreSQL specific?
>

IN should produce a different query plan than EXISTS. (You can run
"explain analyze" on your queries, to see the different plans).
Which one is faster depends on your data, and on your server.  Also,
what's faster on one dbms my be different than another.  I've found
that postgresql is usually slower than other databases for IN ()
queries, but handles EXISTS and inner joins (a third way of writing
your queries above) quite quickly.

SELECT a.foo
FROM a
INNER JOIN b on a.ref=b.id





Re: IN vs EXISTS

From
Tom Lane
Date:
askel <dummy666@mail.ru> writes:
> Performance is at least few times better when EXISTS is used.

It really shouldn't be.  PG knows more possible plans for IN than
EXISTS, so IN should pretty much always be equal or better ... unless
the planner is making the wrong choice.  I speculate that you have
out-of-date statistics or some such.  Have you compared EXPLAIN ANALYZE
output for the two cases?  Which PG version are we talking about?

            regards, tom lane

Re: IN vs EXISTS

From
askel
Date:
Tom,

I'm using 8.3.1. I did run EXPLAIN but have never familiarized myself
with how to read/use it beside simple comparing cost estimation and
whether there is any seq scan that can benefit from creating index.

Thanks for replying

On Jun 7, 11:19 am, t...@sss.pgh.pa.us (Tom Lane) wrote:
> askel <dummy...@mail.ru> writes:
> > Performance is at least few times better when EXISTS is used.
>
> It really shouldn't be.  PG knows more possible plans for IN than
> EXISTS, so IN should pretty much always be equal or better ... unless
> the planner is making the wrong choice.  I speculate that you have
> out-of-date statistics or some such.  Have you compared EXPLAIN ANALYZE
> output for the two cases?  Which PG version are we talking about?
>
>                         regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general


Re: IN vs EXISTS

From
Jeff Davis
Date:
On Sat, 2008-06-07 at 08:59 -0500, Adam Rich wrote:
> what's faster on one dbms my be different than another.  I've found
> that postgresql is usually slower than other databases for IN ()
> queries, but handles EXISTS and inner joins (a third way of writing
> your queries above) quite quickly.

Do you have an explanation for the slow IN performance? Do you mean
Postgres is slow with a value list or a subquery or both? Is it because
the execution of a particular plan is slow, or does some good plan not
exist in Postgres?

    Regards,
        Jeff Davis