Thread: Weird NOT IN effect with NULL values

Weird NOT IN effect with NULL values

From
Frank Joerdens
Date:
When doing a subselect with NOT IN, as in

SELECT name
FROM customer
WHERE customer_id NOT IN (
SELECT customer_id
FROM salesorder
);

(from Bruce Momjian's book)

I get no rows if the result column returned by the subselect
contains NULL values. It works as expected if I remove the NULL values
from the result set. Is this behaviour correct and if so, why?

I am using 7.1 beta 4.

Regards, Frank


Re: Weird NOT IN effect with NULL values

From
Josh Berkus
Date:
Mr. Joerdens,

> I get no rows if the result column returned by the subselect
> contains NULL values. It works as expected if I remove the NULL values
> from the result set. Is this behaviour correct and if so, why?

I can see how that bug would happen.  You may want to forward your
e-mail to pgsql-bugs.

Regardless, you'll find that you get faster results (as well as avoiding
the NULL bug) if you use the following form of the query:

SELECT name
FROM customer
WHERE NOT EXISTS (
SELECT customer_id
FROM salesorder
WHERE customer_id = customer.customer_id
);

Bruce, you may want to consider editing your next edition to include the
above modification.  WHERE ... NOT IN is a bad idea for any subselect on
medium-large tables.

-Josh Berkus


-- 
______AGLIO DATABASE SOLUTIONS___________________________                                       Josh Berkus  Complete
informationtechnology      josh@agliodbs.com   and data management solutions       (415) 565-7293  for law firms, small
businesses      fax  621-2533   and non-profit organizations.       San Francisco
 


Re: Weird NOT IN effect with NULL values

From
Bruce Momjian
Date:
> When doing a subselect with NOT IN, as in
> 
> SELECT name
> FROM customer
> WHERE customer_id NOT IN (
> SELECT customer_id
> FROM salesorder
> );
> 
> (from Bruce Momjian's book)
> 
> I get no rows if the result column returned by the subselect
> contains NULL values. It works as expected if I remove the NULL values
> from the result set. Is this behaviour correct and if so, why?
> 
> I am using 7.1 beta 4.

Read more in the book.  It covers subqueries with nulls, bottom of pages
96.  Not sure about web URL but it is in the subqueries section titled
"NOT IN and Subqueries with NULL Values".

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Weird NOT IN effect with NULL values

From
Peter Eisentraut
Date:
Frank Joerdens writes:

> When doing a subselect with NOT IN, as in
>
> SELECT name
> >FROM customer
> WHERE customer_id NOT IN (
> SELECT customer_id
> >FROM salesorder
> );
>
> (from Bruce Momjian's book)
>
> I get no rows if the result column returned by the subselect
> contains NULL values. It works as expected if I remove the NULL values
> >from the result set. Is this behaviour correct and if so, why?

It is correct.

customer_id NOT IN (value1, value2, value3, ...)

(which is what the subselect would essentially resolve to) is equivalent
to

NOT (customer_id = value1 OR customer_id = value2 OR customer_id = value3 ...)

Say value2 is NULL.  Then we have

NOT (customer_id = value1 OR customer_id = NULL OR customer_id = value3 ...)
NOT (customer_id = value1 OR NULL OR customer_id = value3 ...)
NOT (NULL)
NULL

which means FALSE in a WHERE condition, so no rows are returned.  Note
that 'xxx = NULL' is different from 'xxx IS NULL'.  Also note that NULL is
not the same as FALSE in general.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: Weird NOT IN effect with NULL values

From
Stephan Szabo
Date:
On Thu, 1 Mar 2001, Frank Joerdens wrote:

> When doing a subselect with NOT IN, as in
> 
> SELECT name
> FROM customer
> WHERE customer_id NOT IN (
> SELECT customer_id
> FROM salesorder
> );
> 
> (from Bruce Momjian's book)
> 
> I get no rows if the result column returned by the subselect
> contains NULL values. It works as expected if I remove the NULL values
> from the result set. Is this behaviour correct and if so, why?
> 
> I am using 7.1 beta 4.

I believe it may be actually correct.  If my reading of the spec is
correct (which it possibly is not), customer_id NOT IN (subselect) is
effectively, NOT ( customer_id = ANY (subselect) ) and then:

Using the rules for ANY,
If customer_id=<inner customer_id> for at least one row, IN returns trueso NOT IN returns false.
If customer_id=<inner customer_id> is false for every row, IN returnsfalse so NOT IN returns true.
Otherwise IN and NOT IN both return unknown.

Since customer_id=NULL is unknown, you're getting at least one unknown in
the ANY expression so NOT IN doesn't return true, it returns unknown
which is not sufficient for making the where clause return the row.



Re: Weird NOT IN effect with NULL values

From
Ken Kline
Date:
this is kind of weird but it is how it works.
You cannot use equality for null...
Null does not equal Null
Null means no value, since it's not a value
it can't equal anything another no value.

SELECT name
FROM customer
WHERE customer_id NOT IN
(
SELECT customer_id
FROM salesorder
)
and customer_id is not null;

should work

Ken


Frank Joerdens wrote:

> When doing a subselect with NOT IN, as in
>
> SELECT name
> FROM customer
> WHERE customer_id NOT IN (
> SELECT customer_id
> FROM salesorder
> );
>
> (from Bruce Momjian's book)
>
> I get no rows if the result column returned by the subselect
> contains NULL values. It works as expected if I remove the NULL values
> from the result set. Is this behaviour correct and if so, why?
>
> I am using 7.1 beta 4.
>
> Regards, Frank



Re: Weird NOT IN effect with NULL values

From
Tom Lane
Date:
Frank Joerdens <frank@joerdens.de> writes:
> When doing a subselect with NOT IN, as in
> SELECT name
> FROM customer
> WHERE customer_id NOT IN (
> SELECT customer_id
> FROM salesorder
> );
> I get no rows if the result column returned by the subselect
> contains NULL values. It works as expected if I remove the NULL values
> from the result set. Is this behaviour correct and if so, why?

This is correct because of SQL's 3-valued boolean logic.  SQL92 defines
"A NOT IN B" as equivalent to "NOT (A = SOME B)", and the latter has
the rules
           c) If the implied <comparison predicate> is true for at least             one row RT in T, then "R <comp op>
<some>T" is true.
 
           d) If T is empty or if the implied <comparison predicate> is             false for every row RT in T, then
"R<comp op> <some> T" is             false.
 
           e) If "R <comp op> <quantifier> T" is neither true nor false,             then it is unknown.

Now the implied comparison will generate TRUE for the subselect rows
that contain a matching customer_id, FALSE for the rows that contain
non-matching (but not null) customer_id, and UNKNOWN (null) for the
rows that contain nulls.  So if you have nulls then case (d) never
holds: the result of A = SOME B is either true or unknown.  And so
the result of NOT IN is either false or unknown, and either way the
outer WHERE fails.

This can be justified intuitively if you consider that null means
"don't know": you can say for sure that the target customer_id IS in
the subselect if you find it there, but you can't say for sure that it
IS NOT there, because you don't know all the subselect result elements.

Bottom line: you probably want to suppress nulls in the subselect...
        regards, tom lane


Re: Weird NOT IN effect with NULL values

From
Bruce Momjian
Date:
> SELECT name
> FROM customer
> WHERE NOT EXISTS (
> SELECT customer_id
> FROM salesorder
> WHERE customer_id = customer.customer_id
> );
> 
> Bruce, you may want to consider editing your next edition to include the
> above modification.  WHERE ... NOT IN is a bad idea for any subselect on
> medium-large tables.

FAQ item mentions this, and section 8.2 shows eqivalency at the end of
the section.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026