Thread: "WHERE col NOT IN" yields falsely empty result.

"WHERE col NOT IN" yields falsely empty result.

From
pgsql-bugs@postgresql.org
Date:
Dr M J Carter (Martin.J.Carter@nottingham.ac.uk) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
"WHERE col NOT IN" yields falsely empty result.

Long Description
Consider

        SELECT * FROM bar WHERE col2 NOT IN (SELECT col1 FROM foo)

where tables foo and bar overlap but neither is a subset of
the other.  Since the difference set bar - foo is nonempty, the above
should yield one or more rows.  However, the presence of a null
in foo.col1 (tests 3 and 8 below) yields zero rows, even where
the difference set has rows with no null entries (see test 2).
Am I missing something obvious? eg in one of the FAQs?

Two possible user-level workarounds (using EXCEPT with IN (tests 5
and 9), and using a correlated join (tests 6 and 10)) are shown
below, which work as expected if the null is in the difference
between the tables.  A null in the intersection, as in tests 9
and 10, causes its row to appear in the output; on reflection this
is be expected, and can be fixed (tests 11 and 12) once the possibility is recognised.  I can't (yet) see how to show
null-entry 
rows in the difference set without also showing nulls from the
intersection, but I'm an SQL newbie so this time I *know* I'm
missing something.  (Disable blather mode.)

The tables need not have the same column types; that just makes testing a touch easier, by changing the columns being
examined(as 
in test 4).  Apologies for the extensiveness of my tests, of which
you say "don't waste your time" in your Web page: I had to convince
myself that I wasn't just being stupid.

Binaries used are from the stock Debian distributions, not recompiled
by me.  The older set is on my home box (Debian 2.2, PostgreSQL-6.5.3,
built with gcc 2.95.2), and the newer one at work (Deb 2.x (testing): psql yields "PostgreSQL 7.1 on i686-pc-linux-gnu,
compiledby GCC 
2.95.4"; libc6 is v2.2.3-5; processor AMD K6/2 .... anything else?).


Sample Code
Feed the following through "psql foo -f foo.sql | less":

---- snip ----
DROP TABLE foo;
CREATE TABLE "foo" (
        "ipnum" inet,
        "ipname" text);

DROP TABLE bar;
CREATE TABLE "bar" (
        "ipnum" inet,
        "ipname" text);

COPY "foo" FROM stdin;
192.168.187.10  theory
192.168.187.11  junior
192.168.40.43   granby
192.168.40.48   gotham
192.168.21.16   marian
192.168.20.6    marian
192.168.1.1     jips-gw
\.

COPY "bar" FROM stdin;
192.168.187.10  theory
192.168.187.11  junior
192.168.40.43   granby
192.168.40.48   gotham
192.168.21.16   marian
192.168.20.6    marian
192.168.3.8     real-gw
\.

SELECT * FROM bar WHERE ipname IN (SELECT ipname FROM foo);
-- 1: Shows intersection (6 rows).

SELECT * FROM bar WHERE ipname NOT IN (SELECT ipname FROM foo);
-- 2: Shows expected single row from bar.

INSERT INTO foo (ipnum) VALUES ('192.168.104.42');
-- Add row to foo with null ipname value.

SELECT * FROM bar WHERE ipname NOT IN (SELECT ipname FROM foo);
-- 3: Yields empty set.  Is this a bug? it's certainly unexpected;   -- what I'd expected was the same one-line reply
asfor test 2. 

SELECT * FROM bar WHERE ipnum NOT IN (SELECT ipnum FROM foo);
-- 4: Yields expected single row again, by examining other column.

SELECT * FROM bar EXCEPT
 SELECT * FROM bar WHERE ipname IN (SELECT ipname FROM foo);
-- 5: Yields expected row from bar.

SELECT * FROM bar WHERE NOT EXISTS (
        SELECT * FROM foo WHERE foo.ipname = bar.ipname);
-- 6: Inspired by answer 4.23 in general PostgreSQL FAQ: yields      -- expected row, as well as being somewhat faster
forlarge tables. 

INSERT INTO bar (ipnum) VALUES ('192.168.104.42');
-- Move row to intersection.

SELECT * FROM bar WHERE ipname IN (SELECT ipname FROM foo);
-- 7: Shows non-null rows of intersection (6 rows).

SELECT * FROM bar WHERE ipname NOT IN (SELECT ipname FROM foo);
-- 8: Yields empty set.

SELECT * FROM bar EXCEPT
 SELECT * FROM bar WHERE ipname IN (SELECT ipname FROM foo);
-- 9: Yields expected row from bar, plus empty-field row from the     -- intersection.

SELECT * FROM bar WHERE NOT EXISTS (
        SELECT * FROM foo WHERE foo.ipname = bar.ipname);
-- 10: Ditto.

SELECT * FROM bar WHERE ipname IS NOT NULL
 EXCEPT
 SELECT * FROM bar WHERE ipname IN (SELECT ipname FROM foo);
-- 11: Yields expected row from bar only.

SELECT * FROM bar WHERE ipname IS NOT NULL AND NOT EXISTS (
        SELECT * FROM foo WHERE foo.ipname = bar.ipname);
-- 12: Ditto.


No file was uploaded with this report

Re: "WHERE col NOT IN" yields falsely empty result.

From
Tom Lane
Date:
pgsql-bugs@postgresql.org writes:
> Since the difference set bar - foo is nonempty, the above
> should yield one or more rows.  However, the presence of a null
> in foo.col1 (tests 3 and 8 below) yields zero rows, even where
> the difference set has rows with no null entries (see test 2).

This is the spec-mandated behavior.  Think of NULL as "I don't know what
this value is".  Unless you get a match to one of the non-null outputs
of the subselect (in which case you can definitely say that the test
value *is* IN the subselect), you are forced to conclude that you don't
know for sure whether the test value is in the set or not.  Accordingly,
NOT IN will always yield either FALSE or NULL in this situation.

> Am I missing something obvious? eg in one of the FAQs?

If it isn't in the FAQ, it probably should be ...

            regards, tom lane