Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY - Mailing list pgsql-sql

From Thomas Lockhart
Subject Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY
Date
Msg-id 3742D5E4.D220A67@alumni.caltech.edu
Whole thread Raw
In response to RE: [SQL] Oddities with NULL and GROUP BY  ("Jackson, DeJuan" <djackson@cpsgroup.com>)
Responses Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY
List pgsql-sql
> > The Pratical SQL Handbook at page 171 says:
> > Since nulls represent "the great unknown", there is no way to know
> > whether one null is equal to any other null. Each unknown value
> > may or may not be different from another.

Although I've noticed some questionable statements quoted from this
book, this looks good...

> > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
> This is something I have complained about time and again. It is time
> something is changed about it, otherwise Postgres will NEVER be a
> standard-compliant RDBMS.

Postgres conforms to SQL92 in this regard. Date and Darwen, "A Guide
to the SQL Standard", 3rd ed., are explicit about this near the top of
page 249:

Duplicates are relevant to the ... GROUP BY ... operations ...
... GROUP BY groups rows together on the basis of duplicate values in
the set of grouping columns (and those sets of grouping column values
can be regarded as "rows" for present purposes). The point is,
however, the definition of duplicate rows requires some refinement in
the presence of nulls. Let "left" and "right" be as defined
(previously). Then "left" and "right" are defined to be "duplicates"
of one another if and only if, for all "i" in the range 1 to "n",
either "left_i" = "right_i" is TRUE, or "left_i" and "right_i" are
both null.

There is a single exception to Postgres' SQL92 conformance wrt NULLs
afaik, involving DISTINCT column constraints which I discuss below.

> > However, if the grouping column contains more than one null,
> > all of them are put into a single group.
> > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
> The SQL92 text says:
>      A null value is an implementation-dependent special value that
>      is distinct from all non-null values of the associated data type.
>      There is effectively only one null value and that value is a member
>      of every SQL data type. There is no <literal> for a null value,
>      although the keyword NULL is used in some places to indicate that a
>      null value is desired.
> Thus, by rights, NULL=NULL should be true, because there is only one null
> value.

No! An explicit "unknown" = "unknown" in a constraint clause should
always evaluate to FALSE (we'll get to GROUP BY later). SQL92 and all
of my reference books are clear about this. Date and Darwen have a
good discussion of the shortcomings of NULL in SQL92, pointing out
that with NULL handling one would really like a distinct UNKNOWN added
to the possible boolean values TRUE and FALSE so that SQL would have
true three-value logic.

> About the <group by clause>, the text says:
>     1) The result of the <group by clause> is a partitioning of T into
>        a set of groups. The set is the minimum number of groups such
>        that, for each grouping column of each group of more than one
>        row, no two values of that grouping column are distinct.

Interesting. Note that SQL92 asks that any column with the DISTINCT
constraint contain *only one* NULL value in the entire column. Date
and Darwen point out that this is inconsistant with the fundamental
notion of "unknown" and renders DISTINCT constraints without NOT NULL
to be effectively useless. They recommend against having any DISTINCT
column without having an additional NOT NULL constraint. We've had
this discussion wrt Postgres, and concluded that we would diverge from
the standard by allowing multiple NULL fields in DISTINCT columns, to
make DISTINCT a useful feature with NULLs. It probably didn't hurt
that Postgres already behaved this way :)

afaik this last point is the *only* place where Postgres intentionally
diverges from SQL92, and it was done (or rather retained from existing
behavior) to make a useless feature useful.

> One should note, however, that when the actual comparison operator "=" is
> used, the standard says that if one of the operands is null, the result of
> the comparison is unknown. One should make a distinction between making
> comparisons within group by, uniqueness, and other database-logic
> operations, and between making the actual comparison (though in my opinion,
> this should not be so. Comparing a null value to something should be always
> false unless the other something is also null. But that's my opinion and
> not the standard's).

One can't take a portion of SQL92 statements wrt NULLs and apply it to
all uses of NULL, because SQL92 is not internally consistant in this
regard.

In most GROUP BY situations, a corresponding WHERE col IS NOT NULL is
probably a good idea.

Regards.
                        - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY
Next
From: secret
Date:
Subject: Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY