Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards - Mailing list pgsql-hackers

From Tom Ivar Helbekkmo
Subject Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards
Date
Msg-id 86ae3kr39f.fsf@athene.i.eunet.no
Whole thread Raw
In response to AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards  (Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>)
Responses Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Re: AW: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Stand ards  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:

> Actually I am not sure whether the column = NULL syntax is even defined 
> or allowed in SQL92 (e.g. Informix interprets the NULL as column name in 
> this context and errs out).

I don't have the standard handy, but I do have Joe Celko's book, "Data
& Databases: Concepts in Practice".  He says (in section 8.2, under
the heading "Multivalued Logic"):
       A NULL cannot be compared to another NULL or to a value       with what Dr. Codd called a theta operator and
what      programmers call a comparison operator (equal, not equal,       less than, greater than, and so forth).  This
resultsin       a three-valued logic, which has an UNKNOWN in addition       to TRUE and FALSE.  [...]  UNKNOWN is a
logicalvalue and       not the same as a NULL, which is a data value.  That is       why you have to say X IS [NOT]
NULLin SQL and not use       X = NULL instead.  Theta operators are expressions of the       form X <comp op> Y; when X
orY or both are NULL, theta       operators will return an UNKNOWN and not a NULL.
 

He goes on to explain three-valued logic in more detail, showing truth
tables according to Jan Lukasiewicz (the inventor of RPN), and says,
of SQL-92, that it "is comforting to see that [it has] the same truth
tables as the three-valued system of Lukasiewicz".  Further, he says:
       SQL-92 added a new predicate of the form
       <search condition> IS [NOT] TRUE | FALSE | UNKNOWN
       which will let you map any combination of three-valued       logic to the two Boolean values.

A quick test run with psql shows that PostgreSQL does not properly
implement three-valued logic: it does not recognize the UNKNOWN
keyword alongside TRUE and FALSE, in any situation.  It will also
return boolean truth values for comparisons with NULL values, using
them as "real" data values in the comparison.  Worse (IMHO), this is
not consistent: while a test for "column = NULL" will return rows
where that is true, and a test for "not column = NULL" will return the
rest, "column <> NULL" returns no rows!  This means that the theta
operators are not all treated the same way, which is surely wrong!

It seems to me that the idea of NULL as an unkown data value and
UNKNOWN as the corresponding truth value, combined with the rules for
propagation of NULL in mathematical operations, of UNKNOWN in truth
operations, and from NULL to UNKNOWN by theta operators, is a very
clean, intuitive way of handling these issues.  It feels right!  :-)

-tih
-- 
The basic difference is this: hackers build things, crackers break them.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [SQL] behavior of ' = NULL' vs. MySQL vs. Standards
Next
From: Tom Ivar Helbekkmo
Date:
Subject: Re: 7.2 items