Thread: Re: BUG #6064: != NULL, <> NULL do not work [sec=UNCLASSIFIED]

Re: BUG #6064: != NULL, <> NULL do not work [sec=UNCLASSIFIED]

From
"Kevin Grittner"
Date:
"Pilling, Michael"  wrote:

> The real problem here then is that the documentation showing
> the boolean comparison operators does not mention this

This page does, at length:

http://www.postgresql.org/docs/9.0/interactive/functions-comparison.html

What page are you looking at?

> And indeed the parser does not generate warnings either.

It would be totally wrong for the parser to generate warnings about
correct behavior.

-Kevin


P.S. A copy/paste of part of the above-mentioned page:


To check whether a value is or is not null, use the constructs:

expression IS NULL
expression IS NOT NULL

or the equivalent, but nonstandard, constructs:

expression ISNULL
expression NOTNULL

Do not write expression = NULL because NULL is not "equal to" NULL.
(The null value represents an unknown value, and it is not known
whether two unknown values are equal.) This behavior conforms to the
SQL standard.

    Tip: Some applications might expect that expression = NULL
returns true if expression evaluates to the null value. It is highly
recommended that these applications be modified to comply with the
SQL standard. However, if that cannot be done the
transform_null_equals configuration variable is available. If it is
enabled, PostgreSQL will convert x = NULL clauses to x IS NULL.

    Note: If the expression is row-valued, then IS NULL is true when
the row expression itself is null or when all the row's fields are
null, while IS NOT NULL is true when the row expression itself is
non-null and all the row's fields are non-null. Because of this
behavior, IS NULL and IS NOT NULL do not always return inverse
results for row-valued expressions, i.e., a row-valued expression
that contains both NULL and non-null values will return false for
both tests. This definition conforms to the SQL standard, and is a
change from the inconsistent behavior exhibited by PostgreSQL
versions prior to 8.2.

Ordinary comparison operators yield null (signifying "unknown"), not
true or false, when either input is null. For example, 7 = NULL
yields null. When this behavior is not suitable, use the IS [ NOT ]
DISTINCT FROM constructs:

expression IS DISTINCT FROM expression
expression IS NOT DISTINCT FROM expression

For non-null inputs, IS DISTINCT FROM is the same as the <> operator.
However, if both inputs are null it returns false, and if only one
input is null it returns true. Similarly, IS NOT DISTINCT FROM is
identical to = for non-null inputs, but it returns true when both
inputs are null, and false when only one input is null. Thus, these
constructs effectively act as though null were a normal data value,
rather than "unknown".

Boolean values can also be tested using the constructs

expression IS TRUE
expression IS NOT TRUE
expression IS FALSE
expression IS NOT FALSE
expression IS UNKNOWN
expression IS NOT UNKNOWN

These will always return true or false, never a null value, even when
the operand is null. A null input is treated as the logical value
"unknown". Notice that IS UNKNOWN and IS NOT UNKNOWN are effectively
the same as IS NULL and IS NOT NULL, respectively, except that the
input expression must be of Boolean type.

Re: BUG #6064: != NULL, <> NULL do not work [sec=UNCLASSIFIED]

From
"Pilling, Michael"
Date:
=20
>> The real problem here then is that the documentation showing
>> the boolean comparison operators does not mention this
=20
> This page does, at length:

But not in the table, in fact it doesn't even mention the IS NULL, IS NOT N=
ULL operators at all.
=20
>http://www.postgresql.org/docs/9.0/interactive/functions-comparison.html
=20
>What page are you looking at?
I was looking at that page but didn't read the detail because I thought the=
 information was in the=20
table and the detail was just textual examples. In so much of the documenta=
tion, the rest of the page
just writes out in long hand what can be inferred from the syntax. People d=
o not read online like a book,
they scan. If the information is not upfront it just won't be seen.
=20
>> And indeed the parser does not generate warnings either.
=20
>It would be totally wrong for the parser to generate warnings about
>correct behavior.

The behaviour of the generated code may well be correct and indeed I agree =
that it is but from=20
everything you and the detailed documentation have said column !=3D NULL is=
 at least deprecated=20
and is highly likely to indicate a programming error. It is totally normal =
for a parser to warn=20
against archaic or dangerous constructs. The idea being to avoid subtle run=
time bugs that are
hard to track down. This happens in Ada, Java and even C and many other lan=
guages.

Regards,
Michael
=20

IMPORTANT: This email remains the property of the Department of Defence and=
 is subject to the jurisdiction of section 70 of the Crimes Act 1914. If yo=
u have received this email in error, you are requested to contact the sende=
r and delete the email.

Re: BUG #6064: != NULL, <> NULL do not work [sec=UNCLASSIFIED]

From
Craig Ringer
Date:
On 20/06/2011 8:30 AM, Pilling, Michael wrote:

> I was looking at that page but didn't read the detail because I thought
> the information was in the table and the detail was just textual examples.

I do think that adding "IS DISTINCT FROM" and "IS NULL / IS NOT NULL" to
the summary table would be helpful, with a superscript * linking to  the
note about null handling below.

In general, though, there's only so much information that can be
condensed into the brief summary and tables.

Reading the documentation in detail is a really, really good idea. It'll
tell you a lot about Pg and about SQL in general. I've generally found
it really helpful and not at all long-winded. It's pretty much how I
learned to use relational databases.

In this case the documentation even has a big box-out saying:

"Tip: Some applications might expect that expression = NULL returns true
if expression evaluates to the null value."

... that carries on to explain about NULLs.


--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: BUG #6064: != NULL, <> NULL do not work [sec=UNCLASSIFIED]

From
Jeff Davis
Date:
On Mon, 2011-06-20 at 10:00 +0930, Pilling, Michael wrote:
> The behaviour of the generated code may well be correct and indeed I
> agree that it is but from
> everything you and the detailed documentation have said column != NULL
> is at least deprecated
> and is highly likely to indicate a programming error.

The right side of the expression may be an expression as well; e.g.:
 a != b  (or a <> b)

The DBMS would not know that one side is NULL until runtime.

> It is totally normal for a parser to warn
> against archaic or dangerous constructs.

It's only an obvious mistake in the trivial case you show where one side
is a constant NULL (therefore making the entire expression a constant
NULL). The more general form "a != b" is quite common, even if somewhat
dangerous in the presence of NULL.

> The idea being to avoid subtle runtime bugs that are
> hard to track down. This happens in Ada, Java and even C and many
> other languages.

NULL is one place in SQL that hides possible mistakes that could
otherwise be caught by the compiler, leaving your queries in danger of
subtle runtime bugs. There's not a good way to conform to the SQL spec
and catch the kind of subtle NULL problems to which you're referring.

It may be possible to make a static analysis "safety check" tool to warn
users about dangerous constructs like that, but it would be a fairly
major effort (and would probably just end up telling you to put COALESCE
everywhere). Trying to only catch the kind of trivial mistakes involving
constants and known operators is counterproductive, in my opinion.

The bottom line is that NULLs are a little on the dangerous side. If you
think your example is bad, consider the semantics of NOT IN with respect
to NULL -- that's a trap even for experts. If you want to be safe, make
liberal use of COALESCE and WHERE x IS NOT NULL on any expression that
you think might ever evaluate to NULL.

Also note that NULLs can be created by outer joins and aggregates even
if your source data has no NULLs at all.

PostgreSQL is a SQL DBMS, and in SQL, NULL affects everything. I'm sure
there are places in the documentation that could be improved, but
warnings on every page would be counterproductive.

Regards,
    Jeff Davis

Re: BUG #6064: != NULL, <> NULL do not work [sec=UNCLASSIFIED]

From
Craig Ringer
Date:
On 20/06/11 14:19, Jeff Davis wrote:
> On Mon, 2011-06-20 at 10:00 +0930, Pilling, Michael wrote:
>> The behaviour of the generated code may well be correct and indeed I
>> agree that it is but from
>> everything you and the detailed documentation have said column != NULL
>> is at least deprecated
>> and is highly likely to indicate a programming error.
>
> The right side of the expression may be an expression as well; e.g.:
>  a != b  (or a <> b)
>
> The DBMS would not know that one side is NULL until runtime.

If he's talking specifically about the case of a NULL literal, a
parser-level warning could be emitted because the parser *does* know it
was a literal NULL.

I'm not convinced it's a good idea to warn about this case myself, but
for a NULL literal it's at least vaguely practical.

> It may be possible to make a static analysis "safety check" tool to warn
> users about dangerous constructs like that, but it would be a fairly
> major effort (and would probably just end up telling you to put COALESCE
> everywhere).

To be even remotely useful, it'd have to be able to prove that certain
variables cannot be NULL in certain places. For example in this trivial
made-up case:

SELECT a FROM tablename
WHERE a IS NOT NULL AND b IS NOT NULL
GROUP BY a, b HAVING a > b;

... looks like a dangerous test ("a > b" without excluding/handling
NULL) but in fact the WHERE clause already excluded potentially
problematic tuples so it can never match a NULL result. Warning on that
test would be incorrect, and adding a NULL check / CASE / COALESCE to it
would make the query marginally slower and significantly less readable.

In some places it is not possible to handle NULL inputs explicitly
without multiply evaluating an expensive function or subquery. That not
only has performance implications but may be a real issue if the
expression uses volatile functions or functions with side effects.

Even if that wasn't complicated enough, to be useful in the real world
the tool would probably have to be able to work on SQL embedded in
source code, including C and Java, probably including code that
assembles SQL from fragments.

All in all, it strikes me as a vast amount of work that's only
questionably even possible for little to no gain.

> The bottom line is that NULLs are a little on the dangerous side. If you
> think your example is bad, consider the semantics of NOT IN with respect
> to NULL -- that's a trap even for experts. If you want to be safe, make
> liberal use of COALESCE and WHERE x IS NOT NULL on any expression that
> you think might ever evaluate to NULL.

Agreed. I don't think anyone is a total fan of NULL and SQL's
three-valued logic, or would argue that it's properly consistent and ...
logical. Unfortunately, we're kind of stuck with it because of the SQL
standards process.

Other solutions to the no-value problem are pretty darn ugly in
different ways, anyway. In particular, throwing an error when any
operation is performed on an undef/unknown value is one possibility that
gets old REALLY fast.

--
Craig Ringer