Group By, NULL values and inconsistent behaviour. - Mailing list pgsql-hackers

From darrenk@insightdist.com (Darren King)
Subject Group By, NULL values and inconsistent behaviour.
Date
Msg-id 9801172127.AA42656@ceodev
Whole thread Raw
Responses Re: [HACKERS] Group By, NULL values and inconsistent behaviour.
List pgsql-hackers
While looking thru the nodeGroup code, I noticed the following
that I'm not sure is correct.

-- Using 01-09 snapshot
create table t1 (a int4, b char(2), c char(2));
CREATE
insert into t1 (a,c) values (1,'x');
INSERT 149419 1
insert into t1 (a,c) values (2,'x');
INSERT 149420 1
insert into t1 (a,c) values (3,'z');
INSERT 149421 1
insert into t1 (a,c) values (2,'x');
INSERT 149422 1
select * from t1;
a|b|c
-+-+--
1| |x
2| |x
3| |z
2| |x
(4 rows)

select b,c,sum(a) from t1 group by b,c;
b|c |sum
-+--+---
 |x |  3
 |z |  3
 |x |  2
(3 rows)

select b,c,sum(a) from t1 group by b,c order by c;
b|c |sum
-+--+---
 |x |  3
 |x |  2
 |z |  3
(3 rows)

In the second query, the first two rows have been grouped, but shouldn't
they not be since b is NULL?  I thought that NULL != NULL?

If so, is the third query wrong?  The first two rows are different, but
only because of the aggregated column that is the source of the group by.
According to the logic from the second query, these should have been
grouped, no?

What does the standard say about comparing two NULL values?

The fixes for these inconsistencies appear to be simple.  To cause a new
group to be started if NULL != NULL, simply change the "continue;" in the
sameGroup function in nodeGroup.c to "return FALSE;"  Ignoring aggregated
columns would also then be added to sameGroup().

darrenk

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] subselects coding started
Next
From: The Hermit Hacker
Date:
Subject: Re: [QUESTIONS] Business cases