Thread: RE: [SQL] To create the query for a select where one is NOT in th e other

RE: [SQL] To create the query for a select where one is NOT in th e other

From
"Jackson, DeJuan"
Date:
22 September 1998, 19:08:57
> On Mon, 21 Sep 1998, Jackson, DeJuan wrote:
> > > On Sat, 19 Sep 1998, Karl Denninger wrote:
> > > I think what you need to do is use a subquery:
> > >
> > > select key_field from table1 where key_field NOT IN (select
> key_field
> > > from
> > > table2) ;
> > >
> > This query will run faster and get you the same results:
> > SELECT key_field FROM table1 a WHERE NOT EXISTS(SELECT b.key_field
> FROM
> > table2 b WHERE b.key_field = a.key_field);
>
> Bear with me; I'm learning, and I have a couple of questions.
>
> >
> > The reason query two is faster than one is two fold...
> > 1) The second query's subquery will only return row's if and only if
> > there is a match, unlike the first query which will have to return
> every
> > row in table2 for every row in table1 and then compare
> table1.key_field
> > to every one of those values.
>
> I understand that--makes perfect sense.
>
> > 2) The first query will not use indexes at all.  Where as the second
> one
> > will for the subquery and a table scan for table1.
> >
> > I suggest timing both queries with and without indexes.  And use
> > 'explain' to see the query plans.  Most IN/NOT IN queries can be
> written
> > as an EXISTS/NOT EXISTS query and run faster.
>
> So what is the main difference between IN/NOT IN and EXISTS/NOT EXISTS
> queries?  The fact that IN/NOT IN doesn't use indexes and EXISTS/NOT
> EXISTS does?  Is EXISTS/NOT EXISTS standard SQL syntax?
>
> Thanks for the pointers...
>
OK PostgreSQL version before 6.4 did not have OR indexing, and if you
think about an IN or a NOT IN expression it can be expressed as a series
of OR-equal.  So, it's possible that PosterSQL would be able to use
indexes on the query but I doubt the it would.  The EXISTS operator does
not use indexes either, but it is a single boolean operator that only
has to test for a single tuple or row return, which can be much faster
than a row by row boolean compare.  It was the rewrite of the subquery
that got the index speed bonus.  The first subquery had to bring back
every row in table2 for every row in table1 (also read table-scan on
table2), where as the second subquery only had to bring back the
matching rows.  As long as key_field on table2 has an index the database
should use it.

I don't have the ANSI SQL standard in front of me but every database,
commercial or otherwise, that I have used has had the EXISTS clause.