Thread: IN vs EXISTS
Hi all, I have been using IN clause almost exclusively until recently I tried to use EXISTS and gained significant performance increase without changing/creating any indexes: SELECT ... FROM a WHERE a.ref IN (SELECT b.id WHERE ...) vs SELECT ... FROM a WHERE EXISTS (SELECT 1 FROM b WHERE a.ref=b.id ...) Performance is at least few times better when EXISTS is used. Is it just PostgreSQL specific? Cheers
> -----Original Message----- > > Hi all, > > I have been using IN clause almost exclusively until recently I tried > to use EXISTS and gained significant performance increase without > changing/creating any indexes: > > SELECT ... FROM a WHERE a.ref IN (SELECT b.id WHERE ...) > > vs > > SELECT ... FROM a WHERE EXISTS (SELECT 1 FROM b WHERE a.ref=b.id ...) > > Performance is at least few times better when EXISTS is used. Is it > just PostgreSQL specific? > IN should produce a different query plan than EXISTS. (You can run "explain analyze" on your queries, to see the different plans). Which one is faster depends on your data, and on your server. Also, what's faster on one dbms my be different than another. I've found that postgresql is usually slower than other databases for IN () queries, but handles EXISTS and inner joins (a third way of writing your queries above) quite quickly. SELECT a.foo FROM a INNER JOIN b on a.ref=b.id
askel <dummy666@mail.ru> writes: > Performance is at least few times better when EXISTS is used. It really shouldn't be. PG knows more possible plans for IN than EXISTS, so IN should pretty much always be equal or better ... unless the planner is making the wrong choice. I speculate that you have out-of-date statistics or some such. Have you compared EXPLAIN ANALYZE output for the two cases? Which PG version are we talking about? regards, tom lane
Tom, I'm using 8.3.1. I did run EXPLAIN but have never familiarized myself with how to read/use it beside simple comparing cost estimation and whether there is any seq scan that can benefit from creating index. Thanks for replying On Jun 7, 11:19 am, t...@sss.pgh.pa.us (Tom Lane) wrote: > askel <dummy...@mail.ru> writes: > > Performance is at least few times better when EXISTS is used. > > It really shouldn't be. PG knows more possible plans for IN than > EXISTS, so IN should pretty much always be equal or better ... unless > the planner is making the wrong choice. I speculate that you have > out-of-date statistics or some such. Have you compared EXPLAIN ANALYZE > output for the two cases? Which PG version are we talking about? > > regards, tom lane > > -- > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general
On Sat, 2008-06-07 at 08:59 -0500, Adam Rich wrote: > what's faster on one dbms my be different than another. I've found > that postgresql is usually slower than other databases for IN () > queries, but handles EXISTS and inner joins (a third way of writing > your queries above) quite quickly. Do you have an explanation for the slow IN performance? Do you mean Postgres is slow with a value list or a subquery or both? Is it because the execution of a particular plan is slow, or does some good plan not exist in Postgres? Regards, Jeff Davis