Thread: Re: Does PostgreSQL support EXISTS?
Raymond Chui wrote: > > The Subject says its all. > Yes 'exists' works (though I never understood the advantage to the 'in' operator). Nils
On Wed, Jun 06, 2001 at 10:44:04PM +0200, Nils Zonneveld wrote: > > > Raymond Chui wrote: > > > > The Subject says its all. > > > > Yes 'exists' works (though I never understood the advantage to the 'in' operator). On postgres at least, exists is faster than in. They are equivalent though. select x from a where v in (select v from b) select x from a where exists (select 1 from b where a.v = b.v) select x from a, b where a.v = b.v are all the same. Postgres doesn't quite know that yet though. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Yes 'exists' works (though I never understood the advantage to the 'in' operator). Each has its advantages and weak points. Eg, in some DBMS, number of entries of `in ( A1, ... AN )' has a pretty tight limit. Also, in PGSQL 7.0 having even fewer entries may cause `Tuple too big' error. Solution is to replace explicit enumeration with comparisons. And so on. `exists' is more `relational pure' on one hand, but is probably more resource consuming. -- ������������������
On Wed, Jun 13, 2001 at 12:23:15PM +1000, Martijn van Oosterhout wrote: > select x from a where v in (select v from b) > select x from a where exists (select 1 from b where a.v = b.v) The latter should be faster than the former on every relational database system. Michael -- Michael Meskes Michael@Fam-Meskes.De Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
On Wed, Jun 13, 2001 at 09:21:02AM +0200, Michael Meskes wrote: > On Wed, Jun 13, 2001 at 12:23:15PM +1000, Martijn van Oosterhout wrote: > > select x from a where v in (select v from b) > > select x from a where exists (select 1 from b where a.v = b.v) > > The latter should be faster than the former on every relational database > system. Huh? Since they do the same thing should they (in theory) run in the same time. Now, I can imagine most DBMSs would optimes the latter better than the forward, but that doesn't change the theory. In theory someone should be able to program postgres' rule rewrite to rewrite the former to the latter and then they would execute identically. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
| On postgres at least, exists is faster than in. | | They are equivalent though. | | select x from a where v in (select v from b) | select x from a where exists (select 1 from b where a.v = b.v) | select x from a, b where a.v = b.v | | are all the same. Postgres doesn't quite know that yet though. If you do an EXPLAIN on Oracle with the first 2 queries above, you will notice that the second one is faster in Oracle too (i.e., EXISTS is faster than IN). Cheers Shanx --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.252 / Virus Database: 125 - Release Date: 09-May-01 _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Martijn van Oosterhout <kleptog@svana.org> writes: > On postgres at least, exists is faster than in. > They are equivalent though. Not really. For one thing, IN can return a NULL (don't know) result; EXISTS cannot. regression=# select * from foo; f1 ---- 1 (2 rows) regression=# select 2 in (select f1 from foo); ?column? ---------- (1 row) regression=# select exists (select 1 from foo where f1 = 2); ?column? ---------- f (1 row) Yes, this behavior is spec-compliant. Think: we don't know what the NULL represents, therefore we don't know whether 2 is in the column or not, therefore IN should return NULL. regards, tom lane
> On Wed, Jun 13, 2001 at 12:23:15PM +1000, Martijn van Oosterhout wrote: > > select x from a where v in (select v from b) > > select x from a where exists (select 1 from b where a.v = b.v) > > The latter should be faster than the former on every relational database > system. That surprises me because the subquery is a correlated subquery which are usually slower on other databases that normal subqueries. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> | On postgres at least, exists is faster than in. > | > | They are equivalent though. > | > | select x from a where v in (select v from b) > | select x from a where exists (select 1 from b where a.v = b.v) > | select x from a, b where a.v = b.v > | > | are all the same. Postgres doesn't quite know that yet though. > > > If you do an EXPLAIN on Oracle with the first 2 queries above, you will > notice that the second one is faster in Oracle too (i.e., EXISTS is faster > than IN). So it is confirmed. Interesting. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> On Wed, Jun 13, 2001 at 12:23:15PM +1000, Martijn van Oosterhout wrote: > select x from a where v in (select v from b) > select x from a where exists (select 1 from b where a.v = b.v) >> >> The latter should be faster than the former on every relational database >> system. > That surprises me because the subquery is a correlated subquery which > are usually slower on other databases that normal subqueries. However, the second form is easily able to make use of an index on b.v, whereas the first form is impossible to optimize unless you are able to rewrite it into some weird form of JOIN. BTW, I just realized that the "weird form of JOIN" would have to be much stranger than I previously thought. The result of IN depends not only on whether the subselect's output has any matches to the current test value, but also on whether the subselect's output has any NULLs. So it's not simply a matter of doing a join with a special rule about producing no more than one output tuple per outer-query tuple. How would you check for the NULLs? regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> On Wed, Jun 13, 2001 at 12:23:15PM +1000, Martijn van Oosterhout wrote: > > select x from a where v in (select v from b) > > select x from a where exists (select 1 from b where a.v = b.v) > >> > >> The latter should be faster than the former on every relational database > >> system. > > > That surprises me because the subquery is a correlated subquery which > > are usually slower on other databases that normal subqueries. > > However, the second form is easily able to make use of an index on b.v, > whereas the first form is impossible to optimize unless you are able to > rewrite it into some weird form of JOIN. Assuming the index is there, yes, but again, do we use the index or run the query. This has the same dynamics as index/heap scan and is probably more complicated to figure out accurately. > BTW, I just realized that the "weird form of JOIN" would have to be > much stranger than I previously thought. The result of IN depends not > only on whether the subselect's output has any matches to the current > test value, but also on whether the subselect's output has any NULLs. > So it's not simply a matter of doing a join with a special rule about > producing no more than one output tuple per outer-query tuple. How > would you check for the NULLs? I thought NOT IN was the only one that was concerned about any NULL? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026