Re: Join Statements - Mailing list pgsql-sql

From Andy Marden
Subject Re: Join Statements
Date
Msg-id a5ggbc$3032$1@jupiter.hub.org
Whole thread Raw
In response to Re: Join Statements  (Mark Nielsen <python@kepnet.net>)
List pgsql-sql
Your problem here is in the use of the use of '!='. All the columns from b
will be null if there is no match. Any null result always fails a test so
null is never not equal to anything (or equal to it for that matter!). Don't
get me started on NULLs any the ludicrous ways that RDBMSs deal them them!

What you need is:

select count(a.*)
from a left outer join b on a.id = b.id
where b.id is null

This will give you the best performing query rather than NOT IN sub query.
It also gives you greater control of the query plan. Internally, these days,
this is what Oracle converts a not in sub-query to if you let it. PostgreSQL
still persists with the old way, it seems, of doing a nested-loop scan of
the sub query for each row in the outer which is usually horribly
inefficient.

The alternative solution below with the NOT ... ANY seems to give the same
query plan as the NOT IN approach

Cheers

Andy

"Mark Nielsen" <python@kepnet.net> wrote in message
news:3C6EA55C.8040905@kepnet.net...
> select distinct TABLENAME_id from TABLENAME_backup where  NOT
> TABLENAME_id = ANY (select TABLENAME_id from TABLENAME)
>
> Here is an example of me getting all the ids from the backup table that
> doesn't exist in the main table. This is similar to what you want.
>
> I think I am answering your question, or close to it.
> Mark
>
> T Conti wrote:
>
> > Howdy:
> >
> > I need to put together an SQL statement that will return all of the
> > rows in table A that are not in table B.  Table A and B have the same
> > primary key.  For example:
> >
> > select count(a.*)
> > from a  (nolock) left outer join
> > b (nolock) on a.id = b.id
> > where a.id != b.id
> >
> > This did not work.  It returned 0 rows.  I know that this could be
> > done very easily in a sub-select, but that seems inefficient.  Is
> > there any way to accomplish what I mentioned above in the join
> > statement or is the sub-select the way to go?
> >
> > Thanks for the help,
> > Tom
> >
>
>




pgsql-sql by date:

Previous
From: Christof Glaser
Date:
Subject: Re: Removing duplicates
Next
From: Andreas Joseph Krogh
Date:
Subject: Re: Timestamp output