Thread: Nulls, arrays, records, IS NULL, IS DISTINCT FROM

Nulls, arrays, records, IS NULL, IS DISTINCT FROM

From
Tom Lane
Date:
Following up yesterday's discussion, I've been studying the SQL spec for
<null predicate> and <distinct predicate>, and it seems a bit
inconsistent.

The rules for <distinct predicate> make it clear that you are supposed
to "drill down" into row and array values to determine distinctness.
SQL99 has
           a) If the declared type of X or Y is an array type, then "X IS             DISTINCT FROM Y" is effectively
computedas follows:
 
             i) Let NX be the number of elements in X; let NY be the number                of elements in Y.
            ii) Let EX(i) be the i-th element of X; let EY(i) be the i-th                element of Y.
           iii) Case:
                1) If NX is not equal to NY, then "X IS DISTINCT FROM Y" is                  true.
                2) If NX equals zero and NY equals zero, then "X IS                  DISTINCT FROM Y" is false.
                3) If "EX(i) IS DISTINCT FROM EY(i)" is false for all i                  between 1 (one) and NX, then
"XIS DISTINCT FROM Y" is                  false.
 
                4) Otherwise, "X IS DISTINCT FROM Y" is true.

SQL2003 has completely rewritten the text but the meaning seems the
same.  I suppose we want to generalize the NX/NY business to say
"if the array bounds are not identical then the arrays are distinct".
We are clearly getting this wrong since the introduction of nulls in
arrays, but I'll go fix that.

Similarly, given two row expressions, distinctness is determined
field-wise: X and Y are distinct if any two corresponding fields
are distinct.  We are currently getting this correct only for
the case of parse-time ROW expressions, ieROW(x,y,z) IS [NOT] DISTINCT FROM ROW(xx,yy,zz)
This is pretty much analogous to the case Teodor noted yesterday
for IS NULL: it's not being done in gram.y but it's still being
done much too early.  We need to be able to do it in the executor
to handle situations where a row value is coming from a function
or some other source that's not disassemblable at parse time.

What's bothering me is that for "foo IS [NOT] NULL", the spec clearly
prescribes drilling down into a rowtype value to examine the individual
fields, but I can't find any language that prescribes the same for
arrays.  Is this intentional, or an oversight?  In particular, the
spec saysROW(1,2,NULL) IS NOT NULL
is false, because the row fields must be *all* not null to make it true.
But it's very unclear whetherARRAY[1,2,NULL] IS NOT NULL
should be false on the same reasoning.  Right now, we respond "true" on
the grounds that the array object as-a-whole isn't null, without
examining its contents.

Comments?  Does anyone see any guidance in the spec?  If there is none,
which behavior do we think is most useful/consistent?
        regards, tom lane


Re: Nulls, arrays, records, IS NULL, IS DISTINCT FROM

From
David Fetter
Date:
On Fri, Sep 29, 2006 at 12:53:19PM -0400, Tom Lane wrote:
> Following up yesterday's discussion, I've been studying the SQL spec for
> <null predicate> and <distinct predicate>, and it seems a bit
> inconsistent.

Do you have the official spec in hand, or just the draft from wiscorp?

> SQL2003 has completely rewritten the text but the meaning seems the
> same.  I suppose we want to generalize the NX/NY business to say "if
> the array bounds are not identical then the arrays are distinct".

That sounds like a reasonable generalization.

> We are clearly getting this wrong since the introduction of nulls in
> arrays, but I'll go fix that.

> Similarly, given two row expressions, distinctness is determined
> field-wise: X and Y are distinct if any two corresponding fields
> are distinct.  We are currently getting this correct only for
> the case of parse-time ROW expressions, ie
>     ROW(x,y,z) IS [NOT] DISTINCT FROM ROW(xx,yy,zz)
> This is pretty much analogous to the case Teodor noted yesterday
> for IS NULL: it's not being done in gram.y but it's still being
> done much too early.  We need to be able to do it in the executor
> to handle situations where a row value is coming from a function
> or some other source that's not disassemblable at parse time.
> 
> What's bothering me is that for "foo IS [NOT] NULL", the spec clearly
> prescribes drilling down into a rowtype value to examine the individual
> fields, but I can't find any language that prescribes the same for
> arrays.  Is this intentional, or an oversight?  In particular, the
> spec says
>     ROW(1,2,NULL) IS NOT NULL
> is false, because the row fields must be *all* not null to make it true.

That's odd because as I understand the above,
   ROW(1,2,NULL) IS NULL

is also false.  Have I misunderstood?

> But it's very unclear whether
>     ARRAY[1,2,NULL] IS NOT NULL
> should be false on the same reasoning.  Right now, we respond "true" on
> the grounds that the array object as-a-whole isn't null, without
> examining its contents.
> 
> Comments?  Does anyone see any guidance in the spec?

Sadly, I don't have the official one in hand.  Is there one available?

> If there is none, which behavior do we think is most
> useful/consistent?

My reading of the (provisional) spec is that a complex type (ROW,
ARRAY, MULTISET) is NULL iff all of its elements are NULL or the whole
of it is NULL.

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!


Re: Nulls, arrays, records, IS NULL, IS DISTINCT FROM

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Fri, Sep 29, 2006 at 12:53:19PM -0400, Tom Lane wrote:
>> In particular, the spec says
>>    ROW(1,2,NULL) IS NOT NULL
>> is false, because the row fields must be *all* not null to make it true.

> That's odd because as I understand the above,
>     ROW(1,2,NULL) IS NULL
> is also false.  Have I misunderstood?

Yup, they are both false.  The spec goes to some trouble to make it
clear that IS NULL/IS NOT NULL are not inverses for row values:
           NOTE 135 - For all R, "R IS NOT NULL" has the same result as           "NOT R IS NULL" if and only if R is
ofdegree 1. Table 16,           "<null predicate> semantics", specifies this behavior.
 

What I find odd is the lack of comparable language about arrays.
        regards, tom lane