Thread: Group By, NULL values and inconsistent behaviour.
While looking thru the nodeGroup code, I noticed the following that I'm not sure is correct. -- Using 01-09 snapshot create table t1 (a int4, b char(2), c char(2)); CREATE insert into t1 (a,c) values (1,'x'); INSERT 149419 1 insert into t1 (a,c) values (2,'x'); INSERT 149420 1 insert into t1 (a,c) values (3,'z'); INSERT 149421 1 insert into t1 (a,c) values (2,'x'); INSERT 149422 1 select * from t1; a|b|c -+-+-- 1| |x 2| |x 3| |z 2| |x (4 rows) select b,c,sum(a) from t1 group by b,c; b|c |sum -+--+--- |x | 3 |z | 3 |x | 2 (3 rows) select b,c,sum(a) from t1 group by b,c order by c; b|c |sum -+--+--- |x | 3 |x | 2 |z | 3 (3 rows) In the second query, the first two rows have been grouped, but shouldn't they not be since b is NULL? I thought that NULL != NULL? If so, is the third query wrong? The first two rows are different, but only because of the aggregated column that is the source of the group by. According to the logic from the second query, these should have been grouped, no? What does the standard say about comparing two NULL values? The fixes for these inconsistencies appear to be simple. To cause a new group to be started if NULL != NULL, simply change the "continue;" in the sameGroup function in nodeGroup.c to "return FALSE;" Ignoring aggregated columns would also then be added to sameGroup(). darrenk
Where are we on this. It appears this NULL group by is seriously broken. Can we have some tests on commercial databases, and get a patch generated? > > > While looking thru the nodeGroup code, I noticed the following > that I'm not sure is correct. > > -- Using 01-09 snapshot > create table t1 (a int4, b char(2), c char(2)); > CREATE > insert into t1 (a,c) values (1,'x'); > INSERT 149419 1 > insert into t1 (a,c) values (2,'x'); > INSERT 149420 1 > insert into t1 (a,c) values (3,'z'); > INSERT 149421 1 > insert into t1 (a,c) values (2,'x'); > INSERT 149422 1 > select * from t1; > a|b|c > -+-+-- > 1| |x > 2| |x > 3| |z > 2| |x > (4 rows) > > select b,c,sum(a) from t1 group by b,c; > b|c |sum > -+--+--- > |x | 3 > |z | 3 > |x | 2 > (3 rows) > > select b,c,sum(a) from t1 group by b,c order by c; > b|c |sum > -+--+--- > |x | 3 > |x | 2 > |z | 3 > (3 rows) > > In the second query, the first two rows have been grouped, but shouldn't > they not be since b is NULL? I thought that NULL != NULL? > > If so, is the third query wrong? The first two rows are different, but > only because of the aggregated column that is the source of the group by. > According to the logic from the second query, these should have been > grouped, no? > > What does the standard say about comparing two NULL values? > > The fixes for these inconsistencies appear to be simple. To cause a new > group to be started if NULL != NULL, simply change the "continue;" in the > sameGroup function in nodeGroup.c to "return FALSE;" Ignoring aggregated > columns would also then be added to sameGroup(). > > darrenk > > -- Bruce Momjian maillist@candle.pha.pa.us
Bruce Momjian wrote: > > Where are we on this. It appears this NULL group by is seriously > broken. > > Can we have some tests on commercial databases, and get a patch > generated? I ran the test on Sybase. The only real changes were int4->int and explicitly calling out field b as null (it defaults to not null). 1> select @@version 2> go ----------------------------------------------------------------------------- SQL Server/11.0.2/P/Sun_svr4/OS 5.4/EBF 6536/OPT/Sat Aug 17 11:54:59 PDT 1996 (1 row affected) 1> create table t1 (a int, b char(2) null, c char(2)) 2> go 1> insert into t1 (a,c) values (1,'x') 2> go (1 row affected) 1> insert into t1 (a,c) values (2,'x') 2> go (1 row affected) 1> insert into t1 (a,c) values (3,'z') 2> go (1 row affected) 1> insert into t1 (a,c) values (2,'x') 2> go (1 row affected) 1> select * from t1 2> go a b c ----------- -- -- 1 NULL x 2 NULL x 3 NULL z 2 NULL x (4 rows affected) 1> select b,c,sum(a) from t1 group by b,c 2> go b c -- -- ----------- NULL x 5 NULL z 3 (2 rows affected) 1> select b,c,sum(a) from t1 group by b,c order by c 2> go b c -- -- ----------- NULL x 5 NULL z 3 (2 rows affected) It seems that Sybase thinks a null is a null in this case. However, try the following: select * from t1 x, t1 y where x.b=y.b and y.c='z'; Sybase returns zero rows for this. It seems that it treats NULLs as equal for order and group operations, but not for join operations. Ocie Mitchell