Thread: RE: [SQL] Oddities with NULL and GROUP BY

RE: [SQL] Oddities with NULL and GROUP BY

From
"Jackson, DeJuan"
Date:
The behavior is valid, if you define NULL as meaning undefined.
In other words when you define something as NULL you're saying, "I don't
know what it is. It could be equal or not."-DEJ

> -----Original Message-----
> From:    secret [SMTP:secret@kearneydev.com]
> Sent:    Friday, May 14, 1999 11:58 AM
> To:    PG-SQL
> Subject:    [SQL] Oddities with NULL and GROUP BY
> 
>     Maybe there is something I don't know about how GROUP BY should
> work, but if I have a table like:
> a,b,c
> 1,1,1
> 1,1,2
> 1,1,3
> 1,2,1
> 1,3,1
> 
> And I say SELECT a,b,sum(c) FROm .. GROUP BY a,b I get
> 1,1,6
> 1,2,1
> 1,3,1
> 
> So whenever a or b changes we get a new summed row, well if I have rows
> where a or b are null, this doesn't happen, infact I seem to get all
> those rows individually... Like if:
> 1,1,1
> 1,1,3
> 1,NULL,10
> 1,NULL,20
> 1,2,3
> 
> I get:
> 1,1,4
> 1,NULL,10
> 1,NULL,20
> 1,2,3
> 
> Shouldn't I get 1,NULL,30?  Ie shouldn't NULL be treated like any other
> value?  Or is there some bit of information I'm missing?  I can set
> everything from NULL to 0 if need be, but I'd rather not...
> 
> David Secret
> MIS Director
> Kearney Development Co., Inc.
> 


Re: [SQL] Oddities with NULL and GROUP BY

From
secret
Date:
"Jackson, DeJuan" wrote:

> The behavior is valid, if you define NULL as meaning undefined.
> In other words when you define something as NULL you're saying, "I don't
> know what it is. It could be equal or not."
>         -DEJ
>
> > -----Original Message-----
> > From: secret [SMTP:secret@kearneydev.com]
> > Sent: Friday, May 14, 1999 11:58 AM
> > To:   PG-SQL
> > Subject:      [SQL] Oddities with NULL and GROUP BY
> >
> >     Maybe there is something I don't know about how GROUP BY should
> > work, but if I have a table like:
> > a,b,c
> > 1,1,1
> > 1,1,2
> > 1,1,3
> > 1,2,1
> > 1,3,1
> >
> > And I say SELECT a,b,sum(c) FROm .. GROUP BY a,b I get
> > 1,1,6
> > 1,2,1
> > 1,3,1
> >
> > So whenever a or b changes we get a new summed row, well if I have rows
> > where a or b are null, this doesn't happen, infact I seem to get all
> > those rows individually... Like if:
> > 1,1,1
> > 1,1,3
> > 1,NULL,10
> > 1,NULL,20
> > 1,2,3
> >
> > I get:
> > 1,1,4
> > 1,NULL,10
> > 1,NULL,20
> > 1,2,3
> >
> > Shouldn't I get 1,NULL,30?  Ie shouldn't NULL be treated like any other
> > value?  Or is there some bit of information I'm missing?  I can set
> > everything from NULL to 0 if need be, but I'd rather not...
> >
> > David Secret
> > MIS Director
> > Kearney Development Co., Inc.
> >
   IBM's DB/2 Disagrees, so does Oracle8!


Here is a cut & paste from Oracle SQL+:

SQL> select * from z;
       A         B
--------- ---------       1         1       1         2                 5                10

SQL> select a,sum(b) from z group by a;
       A    SUM(B)
--------- ---------       1         3                15

SQL>
   I'm going to report this as a bug now that I've verified 2 major database
vendors perform the task as I would expect them to, and PostgreSQL does it
very differently.  The question is really is NULL=NULL, which I would say it
should be.



Re: [SQL] Oddities with NULL and GROUP BY

From
secret
Date:
"Jackson, DeJuan" wrote:

> The behavior is valid, if you define NULL as meaning undefined.
> In other words when you define something as NULL you're saying, "I don't
> know what it is. It could be equal or not."
>         -DEJ
>
> > -----Original Message-----
> > From: secret [SMTP:secret@kearneydev.com]
> > Sent: Friday, May 14, 1999 11:58 AM
> > To:   PG-SQL
> > Subject:      [SQL] Oddities with NULL and GROUP BY
> >
> >     Maybe there is something I don't know about how GROUP BY should
> > work, but if I have a table like:
> > a,b,c
> > 1,1,1
> > 1,1,2
> > 1,1,3
> > 1,2,1
> > 1,3,1
> >
> > And I say SELECT a,b,sum(c) FROm .. GROUP BY a,b I get
> > 1,1,6
> > 1,2,1
> > 1,3,1
> >
> > So whenever a or b changes we get a new summed row, well if I have rows
> > where a or b are null, this doesn't happen, infact I seem to get all
> > those rows individually... Like if:
> > 1,1,1
> > 1,1,3
> > 1,NULL,10
> > 1,NULL,20
> > 1,2,3
> >
> > I get:
> > 1,1,4
> > 1,NULL,10
> > 1,NULL,20
> > 1,2,3
> >
> > Shouldn't I get 1,NULL,30?  Ie shouldn't NULL be treated like any other
> > value?  Or is there some bit of information I'm missing?  I can set
> > everything from NULL to 0 if need be, but I'd rather not...
> >
> > David Secret
> > MIS Director
> > Kearney Development Co., Inc.
> >
   Oh, I just observed this oddity... PostgreSQL groups just fine when there
is a table of 2 fields a int4, b int4...

SELECT a,sum(b) FROM z GROUP BY a         Groups NULLs fine
SELECT a,b,sum(c) FROM z GROUP BY a,b    Error in grouping NULLs in b...





Re: [SQL] Oddities with NULL and GROUP BY

From
José Soares
Date:
secret ha scritto: <blockquote type="CITE">"Jackson, DeJuan" wrote: <p>> The behavior is valid, if you define NULL
asmeaning undefined. <br />> In other words when you define something as NULL you're saying, "I don't <br />>
knowwhat it is. It could be equal or not." <br />>         -DEJ <br />> <br />> > -----Original
Message-----<br />> > From: secret [SMTP:secret@kearneydev.com] <br />> > Sent: Friday, May 14, 1999 11:58
AM<br />> > To:   PG-SQL <br />> > Subject:      [SQL] Oddities with NULL and GROUP BY <br />> > <br
/>>>     Maybe there is something I don't know about how GROUP BY should <br />> > work, but if I have a
tablelike: <br />> > a,b,c <br />> > 1,1,1 <br />> > 1,1,2 <br />> > 1,1,3 <br />> >
1,2,1<br />> > 1,3,1 <br />> > <br />> > And I say SELECT a,b,sum(c) FROm .. GROUP BY a,b I get <br
/>>> 1,1,6 <br />> > 1,2,1 <br />> > 1,3,1 <br />> > <br />> > So whenever a or b changes
weget a new summed row, well if I have rows <br />> > where a or b are null, this doesn't happen, infact I seem
toget all <br />> > those rows individually... Like if: <br />> > 1,1,1 <br />> > 1,1,3 <br />>
>1,NULL,10 <br />> > 1,NULL,20 <br />> > 1,2,3 <br />> > <br />> > I get: <br />> >
1,1,4<br />> > 1,NULL,10 <br />> > 1,NULL,20 <br />> > 1,2,3 <br />> > <br />> >
Shouldn'tI get 1,NULL,30?  Ie shouldn't NULL be treated like any other <br />> > value?  Or is there some bit of
informationI'm missing?  I can set <br />> > everything from NULL to 0 if need be, but I'd rather not... <br
/>>> <br />> > David Secret <br />> > MIS Director <br />> > Kearney Development Co., Inc. <br
/>>> <p>    IBM's DB/2 Disagrees, so does Oracle8! <p>Here is a cut & paste from Oracle SQL+: <p>SQL>
select* from z; <p>        A         B <br />--------- --------- <br />        1         1 <br />        1         2
<br/>                  5 <br />                 10 <p>SQL> select a,sum(b) from z group by a; <p>        A    SUM(B)
<br/>--------- --------- <br />        1         3 <br />                 15 <p>SQL> <p>    I'm going to report this
asa bug now that I've verified 2 major database <br />vendors perform the task as I would expect them to, and
PostgreSQLdoes it <br />very differently.  The question is really is NULL=NULL, which I would say it <br />should
be.</blockquote><p><br/>I tried it in PostgreSQL 6.5beta1 with the same result: <p><tt>select * from z;</tt><br
/><tt>a|b</tt><br /><tt>-+--</tt><br /><tt>1| 1</tt><br /><tt>1| 2</tt><br /><tt> | 5</tt><br /><tt> |10</tt><br
/><tt>(4rows)</tt><tt></tt><p><tt>select a,sum(b) from z group by a;</tt><br /><tt>a|sum</tt><br /><tt>-+---</tt><br
/><tt>1| 3</tt><br /><tt> | 15</tt><br /><tt>(2 rows)</tt><p>The Pratical SQL Handbook at page 171 says: <br />Since
nullsrepresent "the great unknown", there is no way to know <br />whether one null is equal to any other null. Each
unknownvalue <br />may or may not be different from another. <br />However, if the grouping column contains more than
onenull, <br />all of them are put into a single group. <p>Thus: NULL!=NULL but on GROUP BY it is considered as
NULL=NULL.<p>José <br />  <br />  <br />  <p>-- <br />______________________________________________________________
<br/>PostgreSQL 6.5.0 on i586-pc-linux-gnu, compiled by gcc 2.7.2.3 <br
/>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<br />Jose' <br />  

Re: [SQL] Oddities with NULL and GROUP BY

From
secret
Date:
José Soares wrote:

> secret ha scritto:
>
>> "Jackson, DeJuan" wrote:
>>
>> > The behavior is valid, if you define NULL as meaning undefined.
>> > In other words when you define something as NULL you're saying, "I
>> don't
>> > know what it is. It could be equal or not."
>> >         -DEJ
>> >
>> > > -----Original Message-----
>> > > From: secret [SMTP:secret@kearneydev.com]
>> > > Sent: Friday, May 14, 1999 11:58 AM
>> > > To:   PG-SQL
>> > > Subject:      [SQL] Oddities with NULL and GROUP BY
>> > >
>> > >     Maybe there is something I don't know about how GROUP BY
>> should
>> > > work, but if I have a table like:
>> > > a,b,c
>> > > 1,1,1
>> > > 1,1,2
>> > > 1,1,3
>> > > 1,2,1
>> > > 1,3,1
>> > >
>> > > And I say SELECT a,b,sum(c) FROm .. GROUP BY a,b I get
>> > > 1,1,6
>> > > 1,2,1
>> > > 1,3,1
>> > >
>> > > So whenever a or b changes we get a new summed row, well if I
>> have rows
>> > > where a or b are null, this doesn't happen, infact I seem to get
>> all
>> > > those rows individually... Like if:
>> > > 1,1,1
>> > > 1,1,3
>> > > 1,NULL,10
>> > > 1,NULL,20
>> > > 1,2,3
>> > >
>> > > I get:
>> > > 1,1,4
>> > > 1,NULL,10
>> > > 1,NULL,20
>> > > 1,2,3
>> > >
>> > > Shouldn't I get 1,NULL,30?  Ie shouldn't NULL be treated like
>> any other
>> > > value?  Or is there some bit of information I'm missing?  I can
>> set
>> > > everything from NULL to 0 if need be, but I'd rather not...
>> > >
>> > > David Secret
>> > > MIS Director
>> > > Kearney Development Co., Inc.
>> > >
>>
>>     IBM's DB/2 Disagrees, so does Oracle8!
>>
>> Here is a cut & paste from Oracle SQL+:
>>
>> SQL> select * from z;
>>
>>         A         B
>> --------- ---------
>>         1         1
>>         1         2
>>                   5
>>                  10
>>
>> SQL> select a,sum(b) from z group by a;
>>
>>         A    SUM(B)
>> --------- ---------
>>         1         3
>>                  15
>>
>> SQL>
>>
>>     I'm going to report this as a bug now that I've verified 2 major
>> database
>> vendors perform the task as I would expect them to, and PostgreSQL
>> does it
>> very differently.  The question is really is NULL=NULL, which I
>> would say it
>> should be.
>
>
> I tried it in PostgreSQL 6.5beta1 with the same result:
>
> select * from z;
> a| b
> -+--
> 1| 1
> 1| 2
>  | 5
>  |10
> (4 rows)
>
> select a,sum(b) from z group by a;
> a|sum
> -+---
> 1|  3
>  | 15
> (2 rows)
>
> The Pratical SQL Handbook at page 171 says:
> Since nulls represent "the great unknown", there is no way to know
> whether one null is equal to any other null. Each unknown value
> may or may not be different from another.
> However, if the grouping column contains more than one null,
> all of them are put into a single group.
>
> Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
>
> José
>
>
>
>
> --
> ______________________________________________________________
> PostgreSQL 6.5.0 on i586-pc-linux-gnu, compiled by gcc 2.7.2.3
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Jose'
>
   Wonderful, that's as I expected.  However please try this in 6.5
Beta1,
CREATE TABLE z(a int4,b int4, c int4);
INSERT INTO z VALUES (1,1,1);
INSERT INTO z VALUES (1,1,2);
INSERT INTO z(a,c) VALUES (2,1);
INSERT INTO z(a,c) VALUES (2,2);

SELECT a,b,sum(c) FROM z GROUP BY a,b

GROUPing in PostgreSQL w/NULLs works just fine when there is only 1
column, however when one throws 2 in, the 2nd one having NULLs it starts
failing.  Your example demonstrates the right answer for 1 group by
column, try it with 2 and I expect 6.5beta1 will fail as 6.4.2 does.
   As to NULL=NULL or NULL!=NULL, evadentally my estimation of why the
problem is occuring was wrong. :)  But from the SQL handbook we
definately have a bug here.

David Secret
MIS Director
Kearney Development Co., Inc.



Re: [SQL] Oddities with NULL and GROUP BY

From
Herouth Maoz
Date:
At 18:28 +0300 on 17/05/1999, José Soares wrote:


> The Pratical SQL Handbook at page 171 says:
> Since nulls represent "the great unknown", there is no way to know
> whether one null is equal to any other null. Each unknown value
> may or may not be different from another.
> However, if the grouping column contains more than one null,
> all of them are put into a single group.
>
> Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.

This is something I have complained about time and again. It is time
something is changed about it, otherwise Postgres will NEVER be a
standard-compliant RDBMS.

The SQL92 text says:
    A null value is an implementation-dependent special value that    is distinct from all non-null values of the
associateddata type.    There is effectively only one null value and that value is a member    of every SQL data type.
Thereis no <literal> for a null value,    although the keyword NULL is used in some places to indicate that a    null
valueis desired.
 

Thus, by rights, NULL=NULL should be true, because there is only one null
value.

About the <group by clause>, the text says:
   1) The result of the <group by clause> is a partitioning of T into      a set of groups. The set is the minimum
numberof groups such      that, for each grouping column of each group of more than one      row, no two values of that
groupingcolumn are distinct.
 

And the treatment of nulls is implied from the definition of distinctness:
   h) distinct: Two values are said to be not distinct if either:      both are the null value, or they compare equal
accordingto      Subclause 8.2, "<comparison predicate>". Otherwise they are      distinct. Two rows (or partial rows)
aredistinct if at least      one of their pairs of respective values is distinct. Otherwise      they are not distinct.
Theresult of evaluating whether or not      two values or two rows are distinct is never unknown.
 

About uniqueness, it says:
   A unique constraint is satisfied if and only if no two rows in   a table have the same non-null values in the unique
columns.In   addition, if the unique constraint was defined with PRIMARY KEY,   then it requires that none of the
valuesin the specified column or   columns be the null value.
 

One should note, however, that when the actual comparison operator "=" is
used, the standard says that if one of the operands is null, the result of
the comparison is unknown. One should make a distinction between making
comparisons within group by, uniqueness, and other database-logic
operations, and between making the actual comparison (though in my opinion,
this should not be so. Comparing a null value to something should be always
false unless the other something is also null. But that's my opinion and
not the standard's).

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma




Re: [SQL] Oddities with NULL and GROUP BY

From
José Soares
Date:
<tt>Here the result:</tt><tt></tt><p><tt>SELECT a,b,sum(c) FROM z GROUP BY a,b;</tt><br /><tt>a|b|sum</tt><br
/><tt>-+-+---</tt><br/><tt>1|1|  3</tt><br /><tt>2| |  3</tt><br /><tt>(2 rows)</tt><br />  <p>secret ha scritto:
<blockquotetype="CITE">José Soares wrote: <p>> secret ha scritto: <br />> <br />>> "Jackson, DeJuan" wrote:
<br/>>> <br />>> > The behavior is valid, if you define NULL as meaning undefined. <br />>> >
Inother words when you define something as NULL you're saying, "I <br />>> don't <br />>> > know what it
is.It could be equal or not." <br />>> >         -DEJ <br />>> > <br />>> > >
-----OriginalMessage----- <br />>> > > From: secret [SMTP:secret@kearneydev.com] <br />>> > >
Sent:Friday, May 14, 1999 11:58 AM <br />>> > > To:   PG-SQL <br />>> > > Subject:      [SQL]
Odditieswith NULL and GROUP BY <br />>> > > <br />>> > >     Maybe there is something I don't
knowabout how GROUP BY <br />>> should <br />>> > > work, but if I have a table like: <br />>>
>> a,b,c <br />>> > > 1,1,1 <br />>> > > 1,1,2 <br />>> > > 1,1,3 <br
/>>>> > 1,2,1 <br />>> > > 1,3,1 <br />>> > > <br />>> > > And I say
SELECTa,b,sum(c) FROm .. GROUP BY a,b I get <br />>> > > 1,1,6 <br />>> > > 1,2,1 <br
/>>>> > 1,3,1 <br />>> > > <br />>> > > So whenever a or b changes we get a new
summedrow, well if I <br />>> have rows <br />>> > > where a or b are null, this doesn't happen,
infactI seem to get <br />>> all <br />>> > > those rows individually... Like if: <br />>> >
>1,1,1 <br />>> > > 1,1,3 <br />>> > > 1,NULL,10 <br />>> > > 1,NULL,20 <br
/>>>> > 1,2,3 <br />>> > > <br />>> > > I get: <br />>> > > 1,1,4 <br
/>>>> > 1,NULL,10 <br />>> > > 1,NULL,20 <br />>> > > 1,2,3 <br />>> >
><br />>> > > Shouldn't I get 1,NULL,30?  Ie shouldn't NULL be treated like <br />>> any other <br
/>>>> > value?  Or is there some bit of information I'm missing?  I can <br />>> set <br />>>
>> everything from NULL to 0 if need be, but I'd rather not... <br />>> > > <br />>> > >
DavidSecret <br />>> > > MIS Director <br />>> > > Kearney Development Co., Inc. <br />>>
>> <br />>> <br />>>     IBM's DB/2 Disagrees, so does Oracle8! <br />>> <br />>> Here is
acut & paste from Oracle SQL+: <br />>> <br />>> SQL> select * from z; <br />>> <br
/>>>        A         B <br />>> --------- --------- <br />>>         1         1 <br
/>>>        1         2 <br />>>                   5 <br />>>                  10 <br />>> <br
/>>>SQL> select a,sum(b) from z group by a; <br />>> <br />>>         A    SUM(B) <br />>>
------------------ <br />>>         1         3 <br />>>                  15 <br />>> <br />>>
SQL><br />>> <br />>>     I'm going to report this as a bug now that I've verified 2 major <br
/>>>database <br />>> vendors perform the task as I would expect them to, and PostgreSQL <br />>>
doesit <br />>> very differently.  The question is really is NULL=NULL, which I <br />>> would say it <br
/>>>should be. <br />> <br />> <br />> I tried it in PostgreSQL 6.5beta1 with the same result: <br
/>><br />> select * from z; <br />> a| b <br />> -+-- <br />> 1| 1 <br />> 1| 2 <br />>  | 5 <br
/>> |10 <br />> (4 rows) <br />> <br />> select a,sum(b) from z group by a; <br />> a|sum <br />>
-+---<br />> 1|  3 <br />>  | 15 <br />> (2 rows) <br />> <br />> The Pratical SQL Handbook at page 171
says:<br />> Since nulls represent "the great unknown", there is no way to know <br />> whether one null is equal
toany other null. Each unknown value <br />> may or may not be different from another. <br />> However, if the
groupingcolumn contains more than one null, <br />> all of them are put into a single group. <br />> <br />>
Thus:NULL!=NULL but on GROUP BY it is considered as NULL=NULL. <br />> <br />> José <br />> <br />> <br
/>><br />> <br />> -- <br />> ______________________________________________________________ <br />>
PostgreSQL6.5.0 on i586-pc-linux-gnu, compiled by gcc 2.7.2.3 <br />>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<br />> Jose' <br />> <p>    Wonderful, that's as I
expected. However please try this in 6.5 <br />Beta1, <br />CREATE TABLE z(a int4,b int4, c int4); <br />INSERT INTO z
VALUES(1,1,1); <br />INSERT INTO z VALUES (1,1,2); <br />INSERT INTO z(a,c) VALUES (2,1); <br />INSERT INTO z(a,c)
VALUES(2,2); <p>SELECT a,b,sum(c) FROM z GROUP BY a,b <p>GROUPing in PostgreSQL w/NULLs works just fine when there is
only1 <br />column, however when one throws 2 in, the 2nd one having NULLs it starts <br />failing.  Your example
demonstratesthe right answer for 1 group by <br />column, try it with 2 and I expect 6.5beta1 will fail as 6.4.2 does.
<p>   As to NULL=NULL or NULL!=NULL, evadentally my estimation of why the <br />problem is occuring was wrong. :)  But
fromthe SQL handbook we <br />definately have a bug here. <p>David Secret <br />MIS Director <br />Kearney Development
Co.,Inc.</blockquote><blockquote
type="CITE">______________________________________________________________</blockquote>PostgreSQL 6.5.0 on
i586-pc-linux-gnu,compiled by gcc 2.7.2.3 <br />^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <br
/>Jose'<br />  

Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY

From
Tom Lane
Date:
Herouth Maoz <herouth@oumail.openu.ac.il> writes:
> Thus, by rights, NULL=NULL should be true, because there is only one null
> value.

You are jumping to a conclusion not supported by the text you have
quoted.

It does appear that GROUP BY and DISTINCT should treat all nulls as
falling into the same class, because of

>     h) distinct: Two values are said to be not distinct if either:
>        both are the null value, or they compare equal according to
>        Subclause 8.2, "<comparison predicate>".

Kindly note, however, that the standards authors felt it necessary to
describe those two cases as separate cases.  If nulls compare as equal,
there would be no need to write more than "Two values are not distinct
if they compare equal".

> One should note, however, that when the actual comparison operator "=" is
> used, the standard says that if one of the operands is null, the result of
> the comparison is unknown.

Precisely.  A fortiori, if both operands are null, the result of the
comparison is still unknown.

We do seem to have a bug in GROUP BY/DISTINCT if nulls are producing
more than one output tuple in those operations.  But that has nothing
to do with what the comparison operator produces.
        regards, tom lane


Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY

From
Thomas Lockhart
Date:
> > The Pratical SQL Handbook at page 171 says:
> > Since nulls represent "the great unknown", there is no way to know
> > whether one null is equal to any other null. Each unknown value
> > may or may not be different from another.

Although I've noticed some questionable statements quoted from this
book, this looks good...

> > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
> This is something I have complained about time and again. It is time
> something is changed about it, otherwise Postgres will NEVER be a
> standard-compliant RDBMS.

Postgres conforms to SQL92 in this regard. Date and Darwen, "A Guide
to the SQL Standard", 3rd ed., are explicit about this near the top of
page 249:

Duplicates are relevant to the ... GROUP BY ... operations ...
... GROUP BY groups rows together on the basis of duplicate values in
the set of grouping columns (and those sets of grouping column values
can be regarded as "rows" for present purposes). The point is,
however, the definition of duplicate rows requires some refinement in
the presence of nulls. Let "left" and "right" be as defined
(previously). Then "left" and "right" are defined to be "duplicates"
of one another if and only if, for all "i" in the range 1 to "n",
either "left_i" = "right_i" is TRUE, or "left_i" and "right_i" are
both null.

There is a single exception to Postgres' SQL92 conformance wrt NULLs
afaik, involving DISTINCT column constraints which I discuss below.

> > However, if the grouping column contains more than one null,
> > all of them are put into a single group.
> > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
> The SQL92 text says:
>      A null value is an implementation-dependent special value that
>      is distinct from all non-null values of the associated data type.
>      There is effectively only one null value and that value is a member
>      of every SQL data type. There is no <literal> for a null value,
>      although the keyword NULL is used in some places to indicate that a
>      null value is desired.
> Thus, by rights, NULL=NULL should be true, because there is only one null
> value.

No! An explicit "unknown" = "unknown" in a constraint clause should
always evaluate to FALSE (we'll get to GROUP BY later). SQL92 and all
of my reference books are clear about this. Date and Darwen have a
good discussion of the shortcomings of NULL in SQL92, pointing out
that with NULL handling one would really like a distinct UNKNOWN added
to the possible boolean values TRUE and FALSE so that SQL would have
true three-value logic.

> About the <group by clause>, the text says:
>     1) The result of the <group by clause> is a partitioning of T into
>        a set of groups. The set is the minimum number of groups such
>        that, for each grouping column of each group of more than one
>        row, no two values of that grouping column are distinct.

Interesting. Note that SQL92 asks that any column with the DISTINCT
constraint contain *only one* NULL value in the entire column. Date
and Darwen point out that this is inconsistant with the fundamental
notion of "unknown" and renders DISTINCT constraints without NOT NULL
to be effectively useless. They recommend against having any DISTINCT
column without having an additional NOT NULL constraint. We've had
this discussion wrt Postgres, and concluded that we would diverge from
the standard by allowing multiple NULL fields in DISTINCT columns, to
make DISTINCT a useful feature with NULLs. It probably didn't hurt
that Postgres already behaved this way :)

afaik this last point is the *only* place where Postgres intentionally
diverges from SQL92, and it was done (or rather retained from existing
behavior) to make a useless feature useful.

> One should note, however, that when the actual comparison operator "=" is
> used, the standard says that if one of the operands is null, the result of
> the comparison is unknown. One should make a distinction between making
> comparisons within group by, uniqueness, and other database-logic
> operations, and between making the actual comparison (though in my opinion,
> this should not be so. Comparing a null value to something should be always
> false unless the other something is also null. But that's my opinion and
> not the standard's).

One can't take a portion of SQL92 statements wrt NULLs and apply it to
all uses of NULL, because SQL92 is not internally consistant in this
regard.

In most GROUP BY situations, a corresponding WHERE col IS NOT NULL is
probably a good idea.

Regards.
                        - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY

From
secret
Date:
Thomas Lockhart wrote:

> > > The Pratical SQL Handbook at page 171 says:
> > > Since nulls represent "the great unknown", there is no way to know
> > > whether one null is equal to any other null. Each unknown value
> > > may or may not be different from another.
>
> Although I've noticed some questionable statements quoted from this
> book, this looks good...
>
> > > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
> > This is something I have complained about time and again. It is time
> > something is changed about it, otherwise Postgres will NEVER be a
> > standard-compliant RDBMS.
>
> Postgres conforms to SQL92 in this regard. Date and Darwen, "A Guide
> to the SQL Standard", 3rd ed., are explicit about this near the top of
> page 249:
>
> Duplicates are relevant to the ... GROUP BY ... operations ...
> ... GROUP BY groups rows together on the basis of duplicate values in
> the set of grouping columns (and those sets of grouping column values
> can be regarded as "rows" for present purposes). The point is,
> however, the definition of duplicate rows requires some refinement in
> the presence of nulls. Let "left" and "right" be as defined
> (previously). Then "left" and "right" are defined to be "duplicates"
> of one another if and only if, for all "i" in the range 1 to "n",
> either "left_i" = "right_i" is TRUE, or "left_i" and "right_i" are
> both null.
>
> There is a single exception to Postgres' SQL92 conformance wrt NULLs
> afaik, involving DISTINCT column constraints which I discuss below.
>
> > > However, if the grouping column contains more than one null,
> > > all of them are put into a single group.
> > > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
> > The SQL92 text says:
> >      A null value is an implementation-dependent special value that
> >      is distinct from all non-null values of the associated data type.
> >      There is effectively only one null value and that value is a member
> >      of every SQL data type. There is no <literal> for a null value,
> >      although the keyword NULL is used in some places to indicate that a
> >      null value is desired.
> > Thus, by rights, NULL=NULL should be true, because there is only one null
> > value.
>
> No! An explicit "unknown" = "unknown" in a constraint clause should
> always evaluate to FALSE (we'll get to GROUP BY later). SQL92 and all
> of my reference books are clear about this. Date and Darwen have a
> good discussion of the shortcomings of NULL in SQL92, pointing out
> that with NULL handling one would really like a distinct UNKNOWN added
> to the possible boolean values TRUE and FALSE so that SQL would have
> true three-value logic.
>
> > About the <group by clause>, the text says:
> >     1) The result of the <group by clause> is a partitioning of T into
> >        a set of groups. The set is the minimum number of groups such
> >        that, for each grouping column of each group of more than one
> >        row, no two values of that grouping column are distinct.
>
> Interesting. Note that SQL92 asks that any column with the DISTINCT
> constraint contain *only one* NULL value in the entire column. Date
> and Darwen point out that this is inconsistant with the fundamental
> notion of "unknown" and renders DISTINCT constraints without NOT NULL
> to be effectively useless. They recommend against having any DISTINCT
> column without having an additional NOT NULL constraint. We've had
> this discussion wrt Postgres, and concluded that we would diverge from
> the standard by allowing multiple NULL fields in DISTINCT columns, to
> make DISTINCT a useful feature with NULLs. It probably didn't hurt
> that Postgres already behaved this way :)
>
> afaik this last point is the *only* place where Postgres intentionally
> diverges from SQL92, and it was done (or rather retained from existing
> behavior) to make a useless feature useful.
>
> > One should note, however, that when the actual comparison operator "=" is
> > used, the standard says that if one of the operands is null, the result of
> > the comparison is unknown. One should make a distinction between making
> > comparisons within group by, uniqueness, and other database-logic
> > operations, and between making the actual comparison (though in my opinion,
> > this should not be so. Comparing a null value to something should be always
> > false unless the other something is also null. But that's my opinion and
> > not the standard's).
>
> One can't take a portion of SQL92 statements wrt NULLs and apply it to
> all uses of NULL, because SQL92 is not internally consistant in this
> regard.
>
> In most GROUP BY situations, a corresponding WHERE col IS NOT NULL is
> probably a good idea.
>
> Regards.
>
>                          - Thomas
>
> --
> Thomas Lockhart                         lockhart@alumni.caltech.edu
> South Pasadena, California
   Sigh.  PostgreSQL seems pretty inconsitant in this... GROUP BY with 1 column
produces NULLs grouped, with 2 colums it usually seems not to(although I somehow
came up with an example where it did, grr... but lets ignore this since it's
supposed to "not work" that way.)... Oracle8, DB/2, and Sybase all group NULLs
together, for compatibility sake wouldn't it be reasonable for PostgreSQL to do
the same?  Else porting applications could fail miserably when one hits this
inconsistency.

--David



Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY

From
Herouth Maoz
Date:
At 18:16 +0300 on 19/05/1999, Thomas Lockhart wrote:


> Interesting. Note that SQL92 asks that any column with the DISTINCT
> constraint contain *only one* NULL value in the entire column. Date
> and Darwen point out that this is inconsistant with the fundamental
> notion of "unknown" and renders DISTINCT constraints without NOT NULL
> to be effectively useless. They recommend against having any DISTINCT
> column without having an additional NOT NULL constraint. We've had
> this discussion wrt Postgres, and concluded that we would diverge from
> the standard by allowing multiple NULL fields in DISTINCT columns, to
> make DISTINCT a useful feature with NULLs. It probably didn't hurt
> that Postgres already behaved this way :)
>
> afaik this last point is the *only* place where Postgres intentionally
> diverges from SQL92, and it was done (or rather retained from existing
> behavior) to make a useless feature useful.

You are probably referring to UNIQUE, not DISTINCT, which is not a
constraint but a query qualifier.

As for uniqueness, as I already quoted, it says:
   A unique constraint is satisfied if and only if no two rows in   a table have the same non-null values in the unique
columns.In   addition, if the unique constraint was defined with PRIMARY KEY,   then it requires that none of the
valuesin the specified column or   columns be the null value.
 

Which means that what Postgres does is quite the correct thing. You see?
"No two rows in a table have the same non-null values in the unique
columns". They *can* have the same *null* values!. The constraints only
talks about the non-null ones!

So I think Date and Darwen misinterpreted the rule, and you got this part
right in PostgreSQL. However, there *is* a bug in the GROUP BY behaviour,
at least over one column, and it should be checked if it doesn't work
according to the old convention of comparing nulls internally as they are
compared with the "=" operator.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma




Re: [HACKERS] Re: [SQL] Oddities with NULL and GROUP BY

From
Herouth Maoz
Date:
At 18:28 +0300 on 19/05/1999, secret wrote:


>     Sigh.  PostgreSQL seems pretty inconsitant in this... GROUP BY with 1
>column
> produces NULLs grouped, with 2 colums it usually seems not to(although I
>somehow
> came up with an example where it did, grr... but lets ignore this since it's
> supposed to "not work" that way.)... Oracle8, DB/2, and Sybase all group
>NULLs
> together, for compatibility sake wouldn't it be reasonable for PostgreSQL
>to do
> the same?  Else porting applications could fail miserably when one hits this
> inconsistency.

Please, please, the standard is clear about each of these things
separately. It absolutely says that nulls should be grouped together, and
it absolutely says that the comparison operator should not. It's true that
these things are not consistent, but for each operation, the standard is
quite clear on how it should be done.

In my opinion, there should be null comparison for internal operations, and
null comparison for the comparison operator. For this purpose, what
Postgres does now - return a NULL boolean if one of its operands is null -
is consistent with the standard. For GROUP BY and ORDER BY, they should be
compared equal, and for UNIQUE, they should not be compared.

UNIQUE has explicit mention of nulls in the standard.
ORDER BY has explicit mention of nulls in the standard.
GROUP BY has implicit mention of nulls, by using the term "distinct" which
is defined earlier and includes and explicit mention of nulls.
"=" has explicit mention of nulls in the standard.

And although they are not consistent (some are equal, some are not equal,
and some are unknown), they are covered in no uncertain terms.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma




Re: [SQL] Oddities with NULL and GROUP BY

From
Bruce Momjian
Date:

Looks like this is fixed in 6.5 too.a|b|sum-+-+---1|1|  32| |  3(2 rows)

[Charset iso-8859-1 unsupported, filtering to ASCII...]
> Jos_ Soares wrote:
> 
> > secret ha scritto:
> >
> >> "Jackson, DeJuan" wrote:
> >>
> >> > The behavior is valid, if you define NULL as meaning undefined.
> >> > In other words when you define something as NULL you're saying, "I
> >> don't
> >> > know what it is. It could be equal or not."
> >> >         -DEJ
> >> >
> >> > > -----Original Message-----
> >> > > From: secret [SMTP:secret@kearneydev.com]
> >> > > Sent: Friday, May 14, 1999 11:58 AM
> >> > > To:   PG-SQL
> >> > > Subject:      [SQL] Oddities with NULL and GROUP BY
> >> > >
> >> > >     Maybe there is something I don't know about how GROUP BY
> >> should
> >> > > work, but if I have a table like:
> >> > > a,b,c
> >> > > 1,1,1
> >> > > 1,1,2
> >> > > 1,1,3
> >> > > 1,2,1
> >> > > 1,3,1
> >> > >
> >> > > And I say SELECT a,b,sum(c) FROm .. GROUP BY a,b I get
> >> > > 1,1,6
> >> > > 1,2,1
> >> > > 1,3,1
> >> > >
> >> > > So whenever a or b changes we get a new summed row, well if I
> >> have rows
> >> > > where a or b are null, this doesn't happen, infact I seem to get
> >> all
> >> > > those rows individually... Like if:
> >> > > 1,1,1
> >> > > 1,1,3
> >> > > 1,NULL,10
> >> > > 1,NULL,20
> >> > > 1,2,3
> >> > >
> >> > > I get:
> >> > > 1,1,4
> >> > > 1,NULL,10
> >> > > 1,NULL,20
> >> > > 1,2,3
> >> > >
> >> > > Shouldn't I get 1,NULL,30?  Ie shouldn't NULL be treated like
> >> any other
> >> > > value?  Or is there some bit of information I'm missing?  I can
> >> set
> >> > > everything from NULL to 0 if need be, but I'd rather not...
> >> > >
> >> > > David Secret
> >> > > MIS Director
> >> > > Kearney Development Co., Inc.
> >> > >
> >>
> >>     IBM's DB/2 Disagrees, so does Oracle8!
> >>
> >> Here is a cut & paste from Oracle SQL+:
> >>
> >> SQL> select * from z;
> >>
> >>         A         B
> >> --------- ---------
> >>         1         1
> >>         1         2
> >>                   5
> >>                  10
> >>
> >> SQL> select a,sum(b) from z group by a;
> >>
> >>         A    SUM(B)
> >> --------- ---------
> >>         1         3
> >>                  15
> >>
> >> SQL>
> >>
> >>     I'm going to report this as a bug now that I've verified 2 major
> >> database
> >> vendors perform the task as I would expect them to, and PostgreSQL
> >> does it
> >> very differently.  The question is really is NULL=NULL, which I
> >> would say it
> >> should be.
> >
> >
> > I tried it in PostgreSQL 6.5beta1 with the same result:
> >
> > select * from z;
> > a| b
> > -+--
> > 1| 1
> > 1| 2
> >  | 5
> >  |10
> > (4 rows)
> >
> > select a,sum(b) from z group by a;
> > a|sum
> > -+---
> > 1|  3
> >  | 15
> > (2 rows)
> >
> > The Pratical SQL Handbook at page 171 says:
> > Since nulls represent "the great unknown", there is no way to know
> > whether one null is equal to any other null. Each unknown value
> > may or may not be different from another.
> > However, if the grouping column contains more than one null,
> > all of them are put into a single group.
> >
> > Thus: NULL!=NULL but on GROUP BY it is considered as NULL=NULL.
> >
> > Jos_
> >
> >
> >
> >
> > --
> > ______________________________________________________________
> > PostgreSQL 6.5.0 on i586-pc-linux-gnu, compiled by gcc 2.7.2.3
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > Jose'
> >
> 
>     Wonderful, that's as I expected.  However please try this in 6.5
> Beta1,
> CREATE TABLE z(a int4,b int4, c int4);
> INSERT INTO z VALUES (1,1,1);
> INSERT INTO z VALUES (1,1,2);
> INSERT INTO z(a,c) VALUES (2,1);
> INSERT INTO z(a,c) VALUES (2,2);
> 
> SELECT a,b,sum(c) FROM z GROUP BY a,b
> 
> GROUPing in PostgreSQL w/NULLs works just fine when there is only 1
> column, however when one throws 2 in, the 2nd one having NULLs it starts
> failing.  Your example demonstrates the right answer for 1 group by
> column, try it with 2 and I expect 6.5beta1 will fail as 6.4.2 does.
> 
>     As to NULL=NULL or NULL!=NULL, evadentally my estimation of why the
> problem is occuring was wrong. :)  But from the SQL handbook we
> definately have a bug here.
> 
> David Secret
> MIS Director
> Kearney Development Co., Inc.
> 
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Oddities with NULL and GROUP BY

From
Tom Lane
Date:
I believe I have finally resolved this old bug from May:

secret <secret@kearneydev.com> writes:
> GROUPing in PostgreSQL w/NULLs works just fine when there is only 1
> column, however when one throws 2 in, the 2nd one having NULLs it starts
> failing.  Your example demonstrates the right answer for 1 group by
> column, try it with 2 and I expect 6.5beta1 will fail as 6.4.2 does.

Actually, I believe that the problem was seen when you sort/group by
multiple columns and there are nulls in the *earlier* columns.  The bug
I just fixed in the sort logic was that it would stop comparing as soon
as it hit a null column.  Thus (NULL,1) would sort as equal to (NULL,2)
whereas you'd obviously like it to sort as smaller.

The reason it affected GROUP BY is that the sort could produce results
like(NULL,1)(NULL,1)(NULL,2)(NULL,1)
Because of the comparison bug, the sorter thought these tuples were
all equal-keyed and so it didn't worry about what order they'd come
out in.  But then the adjacent-duplicate-merging step would produce(NULL,1)    --- 2 tuples represented by this
group(NULL,2)(NULL,1)
which is the wrong answer.

The fix is to continue comparing columns when both tuples have a null
in one column, rather than stopping and declaring them equal.  This
is in current CVS sources and will be in 6.5.1.

The bug cannot be observed if you use test cases that only sort/group
on one column...
        regards, tom lane