Re: Null comparisons (was Re: checksum) - Mailing list pgsql-general

From Marco Colombo
Subject Re: Null comparisons (was Re: checksum)
Date
Msg-id Pine.LNX.4.61.0409281627180.12692@Megathlon.ESI
Whole thread Raw
In response to Re: Null comparisons (was Re: checksum)  (David Helgason <david@uti.is>)
Responses Re: Null comparisons (was Re: checksum)  (Greg Stark <gsstark@mit.edu>)
List pgsql-general
On Mon, 27 Sep 2004, David Helgason wrote:

> On 27. sep 2004, at 22:08, Dean Gibson (DB Administrator) wrote:
>> Greg Stark wrote on 2004-09-27 08:17:
>>> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>>> >> On Sun, 26 Sep 2004 20:16:52 +0200, David Helgason <david@uti.is>
>>> wrote:
>>> >>> On a similar note, I've found myself wanting an extended '=' operator
>>> >>> meaning
>>> >>>         (a = b or (a is null and b is null))
>>> >
>>> > The original does appear to be equivalent to "not(a is distinct from
>>> b)",
>>> > although I'm not sure that's necessarily easier to use than the above.
>>>
>>> I often do things like "coalesce(a,0) = coalesce(b,0)".
>>> (Or whatever value you know won't appear)
>>>
>> Even simpler:  COALESCE( a = b, a IS NULL AND b IS NULL )
>
> I'm not quite sure what is being accomplished here... My original expression
> wasn't that bad, just clunky. I'd prefer a === b or (a samevalue b), but the
> above just complicates matters. Also, a 'set' command outside the expression
> goes completely against the idea, that certain fields have 'null' as a legal,
> comparable value, while others do not.
>
> Anyway, idle speculation :)
>
> d.

(a = b or (a is null and b is null))

that raises a flag for me. It seems that NULL is used as a special value,
which is not. NULL just means 'unknown', nothing more, nothing less.
That's why any boolean expression involving a NULL is NULL.

'unknown' when compared to anything else just gives 'unknown'.
It means: "I can't tell whether the two expressions are the same, because
I don't know the value of one of them."
Note that that's different from "I know they're different".
They _could_ be equal, we just don't know.

'unknown' compared to 'unknown' gives of course 'unknown', since it's
just a special case of the above. Not knowing the value of both the
expressions doesn't help much. The answer can never be 'I know they
are the same.' Not knowing one is enough to say you don't know the
result of the comparison.

I can hardly imagine why you may want to select all rows that you
are certain have equal values, plus others that have potentially
different values because they are both unknown. It smells like a
design problem.


BTW,

   coalesce(a,0) = coalesce(b,0)

is wrong, since it assumes 0 is a special value, never used in the
table. If so, it's better use it from the start instead of NULL for
those special rows. That espression is true for the following rows:
  a | b
---+---
  1 | 1
  2 | 2
    |
  0 |
    | 0

the last two rows are wrongly selected.


   coalesce(a = b, a is null and b is null)

is correct, and maybe slightly better than the original

   (a = b) or (a is null and b is null)

if the implementation is smart enough to evaluate its arguments only
when needed. The or operator needs to evaluate the right side when
the left side is either false or null, COALESCE only when it's null.
I think the docs mention that.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

pgsql-general by date:

Previous
From: Gaetano Mendola
Date:
Subject: Re: Getting an out of memory failure.... (long email)
Next
From: Matthew Metnetsky
Date:
Subject: Re: Custom Functions