Greenplum Database does this optimization. The idea is to use a new join
type, let's call it JOIN_LASJ_NOTIN, and its semantic regarding NULL is
defined as below:
1. If there is a NULL in the outer side, and the inner side is empty, the
NULL should be part of the outputs.
2. If there is a NULL in the outer side, and the inner side is not empty,
the NULL should not be part of the outputs.
3. If there is a NULL in the inner side, no outputs should be produced.
An example plan looks like:
gpadmin=# explain (costs off) select * from t1 where a not in(select a from t2);
QUERY PLAN
-----------------------------------
Hash Left Anti Semi (Not-In) Join
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2
(5 rows)
Thanks
Richard