Thread: Re: BUG #6064: != NULL, <> NULL do not work [sec=UNCLASSIFIED]
"Pilling, Michael" wrote: > The real problem here then is that the documentation showing > the boolean comparison operators does not mention this This page does, at length: http://www.postgresql.org/docs/9.0/interactive/functions-comparison.html What page are you looking at? > And indeed the parser does not generate warnings either. It would be totally wrong for the parser to generate warnings about correct behavior. -Kevin P.S. A copy/paste of part of the above-mentioned page: To check whether a value is or is not null, use the constructs: expression IS NULL expression IS NOT NULL or the equivalent, but nonstandard, constructs: expression ISNULL expression NOTNULL Do not write expression = NULL because NULL is not "equal to" NULL. (The null value represents an unknown value, and it is not known whether two unknown values are equal.) This behavior conforms to the SQL standard. Tip: Some applications might expect that expression = NULL returns true if expression evaluates to the null value. It is highly recommended that these applications be modified to comply with the SQL standard. However, if that cannot be done the transform_null_equals configuration variable is available. If it is enabled, PostgreSQL will convert x = NULL clauses to x IS NULL. Note: If the expression is row-valued, then IS NULL is true when the row expression itself is null or when all the row's fields are null, while IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null. Because of this behavior, IS NULL and IS NOT NULL do not always return inverse results for row-valued expressions, i.e., a row-valued expression that contains both NULL and non-null values will return false for both tests. This definition conforms to the SQL standard, and is a change from the inconsistent behavior exhibited by PostgreSQL versions prior to 8.2. Ordinary comparison operators yield null (signifying "unknown"), not true or false, when either input is null. For example, 7 = NULL yields null. When this behavior is not suitable, use the IS [ NOT ] DISTINCT FROM constructs: expression IS DISTINCT FROM expression expression IS NOT DISTINCT FROM expression For non-null inputs, IS DISTINCT FROM is the same as the <> operator. However, if both inputs are null it returns false, and if only one input is null it returns true. Similarly, IS NOT DISTINCT FROM is identical to = for non-null inputs, but it returns true when both inputs are null, and false when only one input is null. Thus, these constructs effectively act as though null were a normal data value, rather than "unknown". Boolean values can also be tested using the constructs expression IS TRUE expression IS NOT TRUE expression IS FALSE expression IS NOT FALSE expression IS UNKNOWN expression IS NOT UNKNOWN These will always return true or false, never a null value, even when the operand is null. A null input is treated as the logical value "unknown". Notice that IS UNKNOWN and IS NOT UNKNOWN are effectively the same as IS NULL and IS NOT NULL, respectively, except that the input expression must be of Boolean type.
=20 >> The real problem here then is that the documentation showing >> the boolean comparison operators does not mention this =20 > This page does, at length: But not in the table, in fact it doesn't even mention the IS NULL, IS NOT N= ULL operators at all. =20 >http://www.postgresql.org/docs/9.0/interactive/functions-comparison.html =20 >What page are you looking at? I was looking at that page but didn't read the detail because I thought the= information was in the=20 table and the detail was just textual examples. In so much of the documenta= tion, the rest of the page just writes out in long hand what can be inferred from the syntax. People d= o not read online like a book, they scan. If the information is not upfront it just won't be seen. =20 >> And indeed the parser does not generate warnings either. =20 >It would be totally wrong for the parser to generate warnings about >correct behavior. The behaviour of the generated code may well be correct and indeed I agree = that it is but from=20 everything you and the detailed documentation have said column !=3D NULL is= at least deprecated=20 and is highly likely to indicate a programming error. It is totally normal = for a parser to warn=20 against archaic or dangerous constructs. The idea being to avoid subtle run= time bugs that are hard to track down. This happens in Ada, Java and even C and many other lan= guages. Regards, Michael =20 IMPORTANT: This email remains the property of the Department of Defence and= is subject to the jurisdiction of section 70 of the Crimes Act 1914. If yo= u have received this email in error, you are requested to contact the sende= r and delete the email.
On 20/06/2011 8:30 AM, Pilling, Michael wrote: > I was looking at that page but didn't read the detail because I thought > the information was in the table and the detail was just textual examples. I do think that adding "IS DISTINCT FROM" and "IS NULL / IS NOT NULL" to the summary table would be helpful, with a superscript * linking to the note about null handling below. In general, though, there's only so much information that can be condensed into the brief summary and tables. Reading the documentation in detail is a really, really good idea. It'll tell you a lot about Pg and about SQL in general. I've generally found it really helpful and not at all long-winded. It's pretty much how I learned to use relational databases. In this case the documentation even has a big box-out saying: "Tip: Some applications might expect that expression = NULL returns true if expression evaluates to the null value." ... that carries on to explain about NULLs. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
On Mon, 2011-06-20 at 10:00 +0930, Pilling, Michael wrote: > The behaviour of the generated code may well be correct and indeed I > agree that it is but from > everything you and the detailed documentation have said column != NULL > is at least deprecated > and is highly likely to indicate a programming error. The right side of the expression may be an expression as well; e.g.: a != b (or a <> b) The DBMS would not know that one side is NULL until runtime. > It is totally normal for a parser to warn > against archaic or dangerous constructs. It's only an obvious mistake in the trivial case you show where one side is a constant NULL (therefore making the entire expression a constant NULL). The more general form "a != b" is quite common, even if somewhat dangerous in the presence of NULL. > The idea being to avoid subtle runtime bugs that are > hard to track down. This happens in Ada, Java and even C and many > other languages. NULL is one place in SQL that hides possible mistakes that could otherwise be caught by the compiler, leaving your queries in danger of subtle runtime bugs. There's not a good way to conform to the SQL spec and catch the kind of subtle NULL problems to which you're referring. It may be possible to make a static analysis "safety check" tool to warn users about dangerous constructs like that, but it would be a fairly major effort (and would probably just end up telling you to put COALESCE everywhere). Trying to only catch the kind of trivial mistakes involving constants and known operators is counterproductive, in my opinion. The bottom line is that NULLs are a little on the dangerous side. If you think your example is bad, consider the semantics of NOT IN with respect to NULL -- that's a trap even for experts. If you want to be safe, make liberal use of COALESCE and WHERE x IS NOT NULL on any expression that you think might ever evaluate to NULL. Also note that NULLs can be created by outer joins and aggregates even if your source data has no NULLs at all. PostgreSQL is a SQL DBMS, and in SQL, NULL affects everything. I'm sure there are places in the documentation that could be improved, but warnings on every page would be counterproductive. Regards, Jeff Davis
On 20/06/11 14:19, Jeff Davis wrote: > On Mon, 2011-06-20 at 10:00 +0930, Pilling, Michael wrote: >> The behaviour of the generated code may well be correct and indeed I >> agree that it is but from >> everything you and the detailed documentation have said column != NULL >> is at least deprecated >> and is highly likely to indicate a programming error. > > The right side of the expression may be an expression as well; e.g.: > a != b (or a <> b) > > The DBMS would not know that one side is NULL until runtime. If he's talking specifically about the case of a NULL literal, a parser-level warning could be emitted because the parser *does* know it was a literal NULL. I'm not convinced it's a good idea to warn about this case myself, but for a NULL literal it's at least vaguely practical. > It may be possible to make a static analysis "safety check" tool to warn > users about dangerous constructs like that, but it would be a fairly > major effort (and would probably just end up telling you to put COALESCE > everywhere). To be even remotely useful, it'd have to be able to prove that certain variables cannot be NULL in certain places. For example in this trivial made-up case: SELECT a FROM tablename WHERE a IS NOT NULL AND b IS NOT NULL GROUP BY a, b HAVING a > b; ... looks like a dangerous test ("a > b" without excluding/handling NULL) but in fact the WHERE clause already excluded potentially problematic tuples so it can never match a NULL result. Warning on that test would be incorrect, and adding a NULL check / CASE / COALESCE to it would make the query marginally slower and significantly less readable. In some places it is not possible to handle NULL inputs explicitly without multiply evaluating an expensive function or subquery. That not only has performance implications but may be a real issue if the expression uses volatile functions or functions with side effects. Even if that wasn't complicated enough, to be useful in the real world the tool would probably have to be able to work on SQL embedded in source code, including C and Java, probably including code that assembles SQL from fragments. All in all, it strikes me as a vast amount of work that's only questionably even possible for little to no gain. > The bottom line is that NULLs are a little on the dangerous side. If you > think your example is bad, consider the semantics of NOT IN with respect > to NULL -- that's a trap even for experts. If you want to be safe, make > liberal use of COALESCE and WHERE x IS NOT NULL on any expression that > you think might ever evaluate to NULL. Agreed. I don't think anyone is a total fan of NULL and SQL's three-valued logic, or would argue that it's properly consistent and ... logical. Unfortunately, we're kind of stuck with it because of the SQL standards process. Other solutions to the no-value problem are pretty darn ugly in different ways, anyway. In particular, throwing an error when any operation is performed on an undef/unknown value is one possibility that gets old REALLY fast. -- Craig Ringer