Nic Ferrier <nferrier@tapsellferrier.co.uk> writes:
> create table t1 (id integer, name varchar(20), t2_id integer);
> insert into t1 (id, name, t2_id) values (1, 'nic', 2);
> insert into t1 (id, name, t2_id) values (2, 'jim', NULL);
> create table t2 (id integer, name varchar(20));
> insert into t1 (id, name, t2_id) values (1, 'ferrier');
> insert into t1 (id, name, t2_id) values (2, 'broadbent');
> And now do this query:
> select * from t2 where id not in (select t2_id from t1);
> then I get a NULL response (ie: no rows returned).
> What I SHOULD get is the row from t2 with id == 2;
No, you should not; the system's response is correct per spec.
For the t2 row with id=2, the WHERE clause is clearly FALSE
(2 is in select t2_id from t1). For the t2 row with id=1,
the WHERE clause yields UNKNOWN because of the NULL in t1,
and WHERE treats UNKNOWN as FALSE. This has been discussed
before on the lists, and it's quite clear that the result is
correct according to SQL's 3-valued boolean logic.
There are a number of ways you could deal with this. If you
simply want to ignore the NULLs in t1 then you could do either
select * from t2 where id not in (select distinct t2_id from t1);
select * from t2 where (id in (select t2_id from t1)) is not false;
The first of these will probably be faster if there aren't many
distinct t2_id values.
regards, tom lane