Thread: (null) != (null) ?
Below are two minor bug issues which I can't find as `known' (then again, I can't seem to find an easy-to-identify `known issues' list for that matter 8-), in pgsql 6.5.2. Platform: NetBSD/i386, 1.4.1 (a.out). ===== (1) SELECT ... FROM table1 a,table2 b WHERE a.fieldname = b.fieldname; Both "fieldname" definitions are identical (verified with char(2) and varchar(100) in particular), and both tables contain a row with a "null" in that field. However, the results don't contain the row with the "null" value. A quick reproduction: => create temp table foo (fieldname char(2)); => create temp table foo2 (fieldname char(2)); => insert into foo values (null); => insert into foo2 values (null); => select foo.fieldname from foo,foo2 where foo.fieldname = foo2.fieldname; fieldname --------- (0 rows) In the above, only the following expression seems to DTRT: => select foo.fieldname from foo,foo2 where foo.fieldname = foo2.fieldname or (foo.fieldname = null and foo2.fieldname = null); fieldname --------- (1 row) ===== (2) NOT IN doesn't seem to work at all. I always get 0 results--and very rapidly at that!--regardless of the situation. -- -- Todd Vierling (tv@pobox.com)
Todd Vierling <tv@pobox.com> writes: > (1) SELECT ... FROM table1 a,table2 b WHERE a.fieldname = b.fieldname; > Both "fieldname" definitions are identical (verified with char(2) and > varchar(100) in particular), and both tables contain a row with a "null" in > that field. However, the results don't contain the row with the "null" > value. NULL = NULL does not yield TRUE, it yields NULL. For that matter, NULL != NULL does not yield FALSE --- it yields NULL. This is a basic consequence of the semantics of NULL. The easiest way to think about NULL that I've heard of is: "NULL means I don't know what the value should be". So, for example, NULL = NULL is asking whether two things are equal when you don't know exactly what either of them is. The answer cannot be "yes", it cannot be "no", it has to be "I don't know" --- ie, NULL. Nearly all Postgres operators yield NULL if any input is NULL. This is perfectly sensible; for example, if you don't know what x is, you don't know what x+1 is, either. The main exceptions are the special operators IS NULL and IS NOT NULL. I think we also put in a dirty hack to treat "x = NULL" (when NULL is written as a literal constant) as "x IS NULL", because some clueless programmer at Microsloth made MS SQL act that way, and now people expect it. But it's bogus by any strict interpretation :-( The WHERE clause treats a NULL test result as false (ie, the row doesn't get selected), which accounts for the behavior you cite. A really hard-line view of the semantics would be that WHERE NULL should raise an error --- after all, if you don't know the result of the test, how can you say if the row should be in or out? But I guess the SQL committee felt that that would be sacrificing too much usability in the name of logical purity. If it worked that way you could hardly ever write a WHERE clause without explicit tests for NULLs. If you really want to match up nulls in your example, you can do something like WHERE (a.fieldname = b.fieldname) OR (a.fieldname IS NULL AND b.fieldname IS NULL) This is pretty grotty, of course, so my inclination would be to use a special non-NULL value --- an empty string, for example --- for rows that you wanted to match like this. PS: The above WHERE does succeed where both fields are NULL. Exercise for the student: explain why. (Hint: OR is just a little bit special.) > (2) NOT IN doesn't seem to work at all. I always get 0 results--and very > rapidly at that!--regardless of the situation. I don't think it's quite *that* broken. How about a concrete example of what you're trying to do? regards, tom lane
Todd Vierling wrote: > (1) SELECT ... FROM table1 a,table2 b WHERE a.fieldname = b.fieldname; > > Both "fieldname" definitions are identical (verified with char(2) and > varchar(100) in particular), and both tables contain a row with a "null" in > that field. However, the results don't contain the row with the "null" > value. A quick reproduction: This is standard SQL behavior. NULL != NULL. Essentially, NULL is an undefined value. Since it is undefined, it can't be know to be equal to anything, even another undefined value. Oracle has the NVL function which can be used to replace nulls with known values allowing your statement above to work as you expect. I don't think Postgres has a similar function but you could probably write your own. -- Thomas Pfau pfau@maherterminals.com aka pfau@eclipse.net http://www.eclipse.net/~pfau/
On Tue, 26 Oct 1999, Tom Lane wrote: : > Both "fieldname" definitions are identical (verified with char(2) and : > varchar(100) in particular), and both tables contain a row with a "null" in : > that field. However, the results don't contain the row with the "null" : > value. : : NULL = NULL does not yield TRUE, it yields NULL. For that matter, : NULL != NULL does not yield FALSE --- it yields NULL. This is a : basic consequence of the semantics of NULL. !? I have been using such constructs on commercial databases for ages. Do you have a link to a web-based SQL standard transcription that I could look this up? (I'll check up on exactly which database(s) I can use this type of construct when I get back to work tomorrow....) It seems _extremely_ counter-intuitive, especially in cases where both fields are in fact the same type. : Nearly all Postgres operators yield NULL if any input is NULL. Interesting ... so see my clarification of (2) below. : If you really want to match up nulls in your example, you can do : something like : WHERE (a.fieldname = b.fieldname) OR : (a.fieldname IS NULL AND b.fieldname IS NULL) Which I already described in my text, sigh. : This is pretty grotty, of course, so my inclination would be to : use a special non-NULL value --- an empty string, for example --- Doesn't work for datetime, which is an important application in my case which rather needs null to indicate "no datestamp at all". : > (2) NOT IN doesn't seem to work at all. I always get 0 results--and very : > rapidly at that!--regardless of the situation. : : I don't think it's quite *that* broken. How about a concrete : example of what you're trying to do? Well, after reading your statement about "Nearly all Postgres ...", here's a very simple example that I was able to create based on that assumption: => create temp table foo (name varchar(10)); => create temp table foo2 (name varchar(10)); => insert into foo values (null); // <<- here's the tripwire! => insert into foo values ('a'); => insert into foo2 values ('a'); => insert into foo2 values ('b'); => select * from foo2 where field not in (select field from foo); field ----- (0 rows) Now *that* is awfully disturbing. :> -- -- Todd Vierling (tv@pobox.com)
On Tue, 26 Oct 1999, Todd Vierling wrote: : : NULL = NULL does not yield TRUE, it yields NULL. For that matter, : : NULL != NULL does not yield FALSE --- it yields NULL. This is a : : basic consequence of the semantics of NULL. : It seems _extremely_ counter-intuitive, especially in cases where both : fields are in fact the same type. Although I did find a SQL92 document on the web in the amount of time this took to copy back to me, and I see the clause about NULL <comp op> <anything> being unknown. Which, I imagine, means "implementation dependent". -- -- Todd Vierling (tv@pobox.com)
Todd Vierling <tv@pobox.com> writes: > : NULL = NULL does not yield TRUE, it yields NULL. For that matter, > : NULL != NULL does not yield FALSE --- it yields NULL. This is a > : basic consequence of the semantics of NULL. > !? > I have been using such constructs on commercial databases for ages. Do you > have a link to a web-based SQL standard transcription that I could look this > up? The SQL92 standard expresses this notion in a very wordy, laborious fashion: every single place that they define the result of an expression, they start out by saying "if the input(s) are null the result is null, otherwise it's ...". Two examples: 1) If the value of any <numeric primary> simply contained in a <numeric value expression> is the null value, then the result of the <numeric value expression> is the null value. .... etc etc .... 2) If <concatenation> is specified, then let S1 and S2 be the re- sult of the <character value expression> and <character factor>, respectively. Case: a) If either S1 or S2 is the null value, then the result of the <concatenation> is the null value. .... etc etc .... And the particular case at hand is defined in 8.1 <predicate> and 8.2 <comparison predicate>, which say 8.1 <predicate> Function Specify a condition that can be evaluated to give a truth value of true, false, or unknown. ... 1) Let X and Y be any two corresponding <row value constructor element>s. Let XV and YV be the values represented by X and Y, respectively. Case: a) If XV or YV is the null value, then "X <comp op> Y" is un- known. Finally, WHERE is defined as selecting those rows which yield a true predicate result. If your other databases don't get this right, then they're broken. > Doesn't work for datetime, which is an important application in my case > which rather needs null to indicate "no datestamp at all". IIRC datetime has several special values such as "infinity"; you could use one of those, perhaps. But NULL doesn't act the way you are looking for. > : > (2) NOT IN doesn't seem to work at all. I always get 0 results--and very > : > rapidly at that!--regardless of the situation. > : > : I don't think it's quite *that* broken. How about a concrete > : example of what you're trying to do? > Well, after reading your statement about "Nearly all Postgres ...", here's a > very simple example that I was able to create based on that assumption: > => create temp table foo (name varchar(10)); > => create temp table foo2 (name varchar(10)); > => insert into foo values (null); // <<- here's the tripwire! > => insert into foo values ('a'); > => insert into foo2 values ('a'); > => insert into foo2 values ('b'); > => select * from foo2 where field not in (select field from foo); > field > ----- > (0 rows) > Now *that* is awfully disturbing. :> Well, it falls out of the semantics: the NOT IN is true if foo2's field is not equal to *all* of the entries returned by the subselect. If one of those is NULL, then the result of the NOT IN can't be "true", it has to be "unknown", because you don't know whether the foo2 value ought to be considered equal to the null or not. The NOT IN can return a definite "false" when it finds a match to one of the non-null subselect values, but never a definite "true". Of course when you are using it as a WHERE condition you won't see the difference between "false" and "unknown". I do see a related bug here, though: I'd expect select * from foo2 where (name not in (select name from foo)) is null; to produce hits, and it doesn't. I suspect the subselect evaluator is being a little careless about nulls ... will look into it. regards, tom lane
On Tue, 26 Oct 1999, Tom Lane wrote: : > => select * from foo2 where field not in (select field from foo); : Well, it falls out of the semantics: the NOT IN is true if foo2's : field is not equal to *all* of the entries returned by the subselect. : If one of those is NULL, then the result of the NOT IN can't be "true", : it has to be "unknown", In this case, I suppose I can produce another workaround: select * from foo where field not in (select field from foo2 where field notnull); which is more wordy, but seems to work as I want. Thanks for the help! -- -- Todd Vierling (tv@pobox.com)
Todd Vierling <tv@pobox.com> writes: > Although I did find a SQL92 document on the web in the amount of time this > took to copy back to me, and I see the clause about NULL <comp op> > <anything> being unknown. Which, I imagine, means "implementation > dependent". No. It means unknown, ie, NULL. SQL's predicates are three-valued. regards, tom lane
> On Tue, 26 Oct 1999, Tom Lane wrote: > > : > Both "fieldname" definitions are identical (verified with char(2) and > : > varchar(100) in particular), and both tables contain a row with a "null" in > : > that field. However, the results don't contain the row with the "null" > : > value. > : > : NULL = NULL does not yield TRUE, it yields NULL. For that matter, > : NULL != NULL does not yield FALSE --- it yields NULL. This is a > : basic consequence of the semantics of NULL. > > !? > > I have been using such constructs on commercial databases for ages. Do you > have a link to a web-based SQL standard transcription that I could look this > up? (I'll check up on exactly which database(s) I can use this type of > construct when I get back to work tomorrow....) > > It seems _extremely_ counter-intuitive, especially in cases where both > fields are in fact the same type. But NULL is unknown. How do you know they are equal if both values are unknown? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
I have just added a paragraph about this comparison in my book. You can see it on the documentation web page under "Published book". > On Tue, 26 Oct 1999, Tom Lane wrote: > > : > Both "fieldname" definitions are identical (verified with char(2) and > : > varchar(100) in particular), and both tables contain a row with a "null" in > : > that field. However, the results don't contain the row with the "null" > : > value. > : > : NULL = NULL does not yield TRUE, it yields NULL. For that matter, > : NULL != NULL does not yield FALSE --- it yields NULL. This is a > : basic consequence of the semantics of NULL. > > !? > > I have been using such constructs on commercial databases for ages. Do you > have a link to a web-based SQL standard transcription that I could look this > up? (I'll check up on exactly which database(s) I can use this type of > construct when I get back to work tomorrow....) > > It seems _extremely_ counter-intuitive, especially in cases where both > fields are in fact the same type. > > : Nearly all Postgres operators yield NULL if any input is NULL. > > Interesting ... so see my clarification of (2) below. > > : If you really want to match up nulls in your example, you can do > : something like > : WHERE (a.fieldname = b.fieldname) OR > : (a.fieldname IS NULL AND b.fieldname IS NULL) > > Which I already described in my text, sigh. > > : This is pretty grotty, of course, so my inclination would be to > : use a special non-NULL value --- an empty string, for example --- > > Doesn't work for datetime, which is an important application in my case > which rather needs null to indicate "no datestamp at all". > > : > (2) NOT IN doesn't seem to work at all. I always get 0 results--and very > : > rapidly at that!--regardless of the situation. > : > : I don't think it's quite *that* broken. How about a concrete > : example of what you're trying to do? > > Well, after reading your statement about "Nearly all Postgres ...", here's a > very simple example that I was able to create based on that assumption: > > => create temp table foo (name varchar(10)); > => create temp table foo2 (name varchar(10)); > => insert into foo values (null); // <<- here's the tripwire! > => insert into foo values ('a'); > => insert into foo2 values ('a'); > => insert into foo2 values ('b'); > => select * from foo2 where field not in (select field from foo); > > field > ----- > (0 rows) > > Now *that* is awfully disturbing. :> > > -- > -- Todd Vierling (tv@pobox.com) > > > ************ > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026