Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY - Mailing list pgsql-hackers
From | secret |
---|---|
Subject | Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY |
Date | |
Msg-id | 3742D8AA.3161112D@kearneydev.com Whole thread Raw |
In response to | Re: [SQL] Oddities with NULL and GROUP BY (Herouth Maoz <herouth@oumail.openu.ac.il>) |
Responses |
Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY
|
List | pgsql-hackers |
Thomas Lockhart wrote: > > > The Pratical SQL Handbook at page 171 says: > > > Since nulls represent "the great unknown", there is no way to know > > > whether one null is equal to any other null. Each unknown value > > > may or may not be different from another. > > Although I've noticed some questionable statements quoted from this > book, this looks good... > > > > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL. > > This is something I have complained about time and again. It is time > > something is changed about it, otherwise Postgres will NEVER be a > > standard-compliant RDBMS. > > Postgres conforms to SQL92 in this regard. Date and Darwen, "A Guide > to the SQL Standard", 3rd ed., are explicit about this near the top of > page 249: > > Duplicates are relevant to the ... GROUP BY ... operations ... > ... GROUP BY groups rows together on the basis of duplicate values in > the set of grouping columns (and those sets of grouping column values > can be regarded as "rows" for present purposes). The point is, > however, the definition of duplicate rows requires some refinement in > the presence of nulls. Let "left" and "right" be as defined > (previously). Then "left" and "right" are defined to be "duplicates" > of one another if and only if, for all "i" in the range 1 to "n", > either "left_i" = "right_i" is TRUE, or "left_i" and "right_i" are > both null. > > There is a single exception to Postgres' SQL92 conformance wrt NULLs > afaik, involving DISTINCT column constraints which I discuss below. > > > > However, if the grouping column contains more than one null, > > > all of them are put into a single group. > > > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL. > > The SQL92 text says: > > A null value is an implementation-dependent special value that > > is distinct from all non-null values of the associated data type. > > There is effectively only one null value and that value is a member > > of every SQL data type. There is no <literal> for a null value, > > although the keyword NULL is used in some places to indicate that a > > null value is desired. > > Thus, by rights, NULL=NULL should be true, because there is only one null > > value. > > No! An explicit "unknown" = "unknown" in a constraint clause should > always evaluate to FALSE (we'll get to GROUP BY later). SQL92 and all > of my reference books are clear about this. Date and Darwen have a > good discussion of the shortcomings of NULL in SQL92, pointing out > that with NULL handling one would really like a distinct UNKNOWN added > to the possible boolean values TRUE and FALSE so that SQL would have > true three-value logic. > > > About the <group by clause>, the text says: > > 1) The result of the <group by clause> is a partitioning of T into > > a set of groups. The set is the minimum number of groups such > > that, for each grouping column of each group of more than one > > row, no two values of that grouping column are distinct. > > Interesting. Note that SQL92 asks that any column with the DISTINCT > constraint contain *only one* NULL value in the entire column. Date > and Darwen point out that this is inconsistant with the fundamental > notion of "unknown" and renders DISTINCT constraints without NOT NULL > to be effectively useless. They recommend against having any DISTINCT > column without having an additional NOT NULL constraint. We've had > this discussion wrt Postgres, and concluded that we would diverge from > the standard by allowing multiple NULL fields in DISTINCT columns, to > make DISTINCT a useful feature with NULLs. It probably didn't hurt > that Postgres already behaved this way :) > > afaik this last point is the *only* place where Postgres intentionally > diverges from SQL92, and it was done (or rather retained from existing > behavior) to make a useless feature useful. > > > One should note, however, that when the actual comparison operator "=" is > > used, the standard says that if one of the operands is null, the result of > > the comparison is unknown. One should make a distinction between making > > comparisons within group by, uniqueness, and other database-logic > > operations, and between making the actual comparison (though in my opinion, > > this should not be so. Comparing a null value to something should be always > > false unless the other something is also null. But that's my opinion and > > not the standard's). > > One can't take a portion of SQL92 statements wrt NULLs and apply it to > all uses of NULL, because SQL92 is not internally consistant in this > regard. > > In most GROUP BY situations, a corresponding WHERE col IS NOT NULL is > probably a good idea. > > Regards. > > - Thomas > > -- > Thomas Lockhart lockhart@alumni.caltech.edu > South Pasadena, California Sigh. PostgreSQL seems pretty inconsitant in this... GROUP BY with 1 column produces NULLs grouped, with 2 colums it usually seems not to(although I somehow came up with an example where it did, grr... but lets ignore this since it's supposed to "not work" that way.)... Oracle8, DB/2, and Sybase all group NULLs together, for compatibility sake wouldn't it be reasonable for PostgreSQL to do the same? Else porting applications could fail miserably when one hits this inconsistency. --David
pgsql-hackers by date: