Thread: Group By, NULL values and inconsistent behaviour.

Group By, NULL values and inconsistent behaviour.

From
darrenk@insightdist.com (Darren King)
Date:
While looking thru the nodeGroup code, I noticed the following
that I'm not sure is correct.

-- Using 01-09 snapshot
create table t1 (a int4, b char(2), c char(2));
CREATE
insert into t1 (a,c) values (1,'x');
INSERT 149419 1
insert into t1 (a,c) values (2,'x');
INSERT 149420 1
insert into t1 (a,c) values (3,'z');
INSERT 149421 1
insert into t1 (a,c) values (2,'x');
INSERT 149422 1
select * from t1;
a|b|c
-+-+--
1| |x
2| |x
3| |z
2| |x
(4 rows)

select b,c,sum(a) from t1 group by b,c;
b|c |sum
-+--+---
 |x |  3
 |z |  3
 |x |  2
(3 rows)

select b,c,sum(a) from t1 group by b,c order by c;
b|c |sum
-+--+---
 |x |  3
 |x |  2
 |z |  3
(3 rows)

In the second query, the first two rows have been grouped, but shouldn't
they not be since b is NULL?  I thought that NULL != NULL?

If so, is the third query wrong?  The first two rows are different, but
only because of the aggregated column that is the source of the group by.
According to the logic from the second query, these should have been
grouped, no?

What does the standard say about comparing two NULL values?

The fixes for these inconsistencies appear to be simple.  To cause a new
group to be started if NULL != NULL, simply change the "continue;" in the
sameGroup function in nodeGroup.c to "return FALSE;"  Ignoring aggregated
columns would also then be added to sameGroup().

darrenk

Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

From
Bruce Momjian
Date:
Where are we on this.  It appears this NULL group by is seriously
broken.

Can we have some tests on commercial databases, and get a patch
generated?

>
>
> While looking thru the nodeGroup code, I noticed the following
> that I'm not sure is correct.
>
> -- Using 01-09 snapshot
> create table t1 (a int4, b char(2), c char(2));
> CREATE
> insert into t1 (a,c) values (1,'x');
> INSERT 149419 1
> insert into t1 (a,c) values (2,'x');
> INSERT 149420 1
> insert into t1 (a,c) values (3,'z');
> INSERT 149421 1
> insert into t1 (a,c) values (2,'x');
> INSERT 149422 1
> select * from t1;
> a|b|c
> -+-+--
> 1| |x
> 2| |x
> 3| |z
> 2| |x
> (4 rows)
>
> select b,c,sum(a) from t1 group by b,c;
> b|c |sum
> -+--+---
>  |x |  3
>  |z |  3
>  |x |  2
> (3 rows)
>
> select b,c,sum(a) from t1 group by b,c order by c;
> b|c |sum
> -+--+---
>  |x |  3
>  |x |  2
>  |z |  3
> (3 rows)
>
> In the second query, the first two rows have been grouped, but shouldn't
> they not be since b is NULL?  I thought that NULL != NULL?
>
> If so, is the third query wrong?  The first two rows are different, but
> only because of the aggregated column that is the source of the group by.
> According to the logic from the second query, these should have been
> grouped, no?
>
> What does the standard say about comparing two NULL values?
>
> The fixes for these inconsistencies appear to be simple.  To cause a new
> group to be started if NULL != NULL, simply change the "continue;" in the
> sameGroup function in nodeGroup.c to "return FALSE;"  Ignoring aggregated
> columns would also then be added to sameGroup().
>
> darrenk
>
>


--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] Group By, NULL values and inconsistent behaviour.

From
ocie@paracel.com
Date:
Bruce Momjian wrote:
>
> Where are we on this.  It appears this NULL group by is seriously
> broken.
>
> Can we have some tests on commercial databases, and get a patch
> generated?

I ran the test on Sybase.  The only real changes were int4->int and
explicitly calling out field b as null (it defaults to not null).

1> select @@version
2> go

 -----------------------------------------------------------------------------
 SQL Server/11.0.2/P/Sun_svr4/OS 5.4/EBF 6536/OPT/Sat Aug 17 11:54:59 PDT 1996

(1 row affected)
1> create table t1 (a int, b char(2) null, c char(2))
2> go
1> insert into t1 (a,c) values (1,'x')
2> go
(1 row affected)
1> insert into t1 (a,c) values (2,'x')
2> go
(1 row affected)
1> insert into t1 (a,c) values (3,'z')
2> go
(1 row affected)
1> insert into t1 (a,c) values (2,'x')
2> go
(1 row affected)
1> select * from t1
2> go
 a           b  c
 ----------- -- --
           1 NULL x
           2 NULL x
           3 NULL z
           2 NULL x

(4 rows affected)
1> select b,c,sum(a) from t1 group by b,c
2> go
 b  c
 -- -- -----------
 NULL x            5
 NULL z            3

(2 rows affected)
1> select b,c,sum(a) from t1 group by b,c order by c
2> go
 b  c
 -- -- -----------
 NULL x            5
 NULL z            3

(2 rows affected)

It seems that Sybase thinks a null is a null in this case.  However,
try the following:

select * from t1 x, t1 y where x.b=y.b and y.c='z';

Sybase returns zero rows for this.  It seems that it treats NULLs as
equal for order and group operations, but not for join operations.

Ocie Mitchell