Thread: General Bug Report: GROUP BY with NULL not done properly(Oracle8 & DB/2 do this completely different)
General Bug Report: GROUP BY with NULL not done properly(Oracle8 & DB/2 do this completely different)
From
Unprivileged user
Date:
============================================================================ POSTGRESQL BUG REPORT TEMPLATE ============================================================================ Your name : Your email address : secret@kearneydev.com Category : runtime: back-end: SQL Severity : non-critical Summary: GROUP BY with NULL not done properly(Oracle8 & DB/2 do this completely different) System Configuration -------------------- Operating System : Linux 2.2.7 Redhat 5.2 PostgreSQL version : 6.4.2 Compiler used : 2.7.2.3 Hardware: --------- Linux tau.kearneydev.com 2.2.7 #3 Thu Apr 29 10:10:41 EDT 1999 i686 unknown Versions of other tools: ------------------------ -------------------------------------------------------------------------- Problem Description: -------------------- The appearance of NULL in a table where a GROUP BY clause is used causes the behavior of returning 1 line for every NULL. Both Oracle8 and DB/2 perform this as I would expect. IE a,b 1,1 1,2 NULL,1 NULL,2 SELECT a,sum(b) GROUP BY a returns on Postgres: 1,3 NULL,1 NULL,2 On Oracle8, DB/2, etc.: 1,3 NULL,3 Cut&paste from Oracle8: SQL> select * from z; A B --------- --------- 1 1 1 2 5 10 SQL> select a,sum(b) from z group by a; A SUM(B) --------- --------- 1 3 15 SQL> -------------------------------------------------------------------------- Test Case: ---------- CREATE TABLE z(a int4,b int4); INSERT INTO z values (1,2); INSERT INTO z VALUES (1,1); INSERT INTO z(b) VALUES (1); INSERT INTO z(b) VALUES (2); SELECT a,sum(b) FROM z GROUP BY a; -------------------------------------------------------------------------- Solution: --------- For whatever reason I've observed many times that NULL<>NULL under PostgreSQL, I've had to include many clauses in my SQL statements to make up for this, perhaps if this was corrected it would function properly. --------------------------------------------------------------------------
Re: [BUGS] General Bug Report: GROUP BY with NULL not done properly(Oracle8 & DB/2 do this completely different)
From
Bruce Momjian
Date:
Looks like this is fixed in 6.5. test=> SELECT a,sum(b) FROM z GROUP BY a; a|sum -+--- 1| 6 | 4 (2 rows) > > ============================================================================ > POSTGRESQL BUG REPORT TEMPLATE > ============================================================================ > > > Your name : > Your email address : secret@kearneydev.com > > Category : runtime: back-end: SQL > Severity : non-critical > > Summary: GROUP BY with NULL not done properly(Oracle8 & DB/2 do this completely different) > > System Configuration > -------------------- > Operating System : Linux 2.2.7 Redhat 5.2 > > PostgreSQL version : 6.4.2 > > Compiler used : 2.7.2.3 > > Hardware: > --------- > Linux tau.kearneydev.com 2.2.7 #3 Thu Apr 29 10:10:41 EDT 1999 i686 unknown > > Versions of other tools: > ------------------------ > > > -------------------------------------------------------------------------- > > Problem Description: > -------------------- > The appearance of NULL in a table where a GROUP BY clause is > used causes the behavior of returning 1 line for every NULL. > Both Oracle8 and DB/2 perform this as I would expect. IE > a,b > 1,1 > 1,2 > NULL,1 > NULL,2 > > SELECT a,sum(b) GROUP BY a returns on Postgres: > 1,3 > NULL,1 > NULL,2 > > On Oracle8, DB/2, etc.: > 1,3 > NULL,3 > > Cut&paste from Oracle8: > SQL> select * from z; > > A B > --------- --------- > 1 1 > 1 2 > 5 > 10 > > SQL> select a,sum(b) from z group by a; > > A SUM(B) > --------- --------- > 1 3 > 15 > > SQL> > > -------------------------------------------------------------------------- > > Test Case: > ---------- > CREATE TABLE z(a int4,b int4); > INSERT INTO z values (1,2); > INSERT INTO z VALUES (1,1); > INSERT INTO z(b) VALUES (1); > INSERT INTO z(b) VALUES (2); > SELECT a,sum(b) FROM z GROUP BY a; > > > -------------------------------------------------------------------------- > > Solution: > --------- > For whatever reason I've observed many times that NULL<>NULL > under PostgreSQL, I've had to include many clauses in my > SQL statements to make up for this, perhaps if this was > corrected it would function properly. > > -------------------------------------------------------------------------- > > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Re: [BUGS] General Bug Report: GROUP BY with NULL not done properly(Oracle8& DB/2 do this completely different)
From
secret
Date:
Bruce Momjian wrote: > Looks like this is fixed in 6.5. > > test=> SELECT a,sum(b) FROM z GROUP BY a; > a|sum > -+--- > 1| 6 > | 4 > (2 rows) > > > > > ============================================================================ > > POSTGRESQL BUG REPORT TEMPLATE > > ============================================================================ > > > > > > Your name : > > Your email address : secret@kearneydev.com > > > > Category : runtime: back-end: SQL > > Severity : non-critical > > > > Summary: GROUP BY with NULL not done properly(Oracle8 & DB/2 do this completely different) > > > > System Configuration > > -------------------- > > Operating System : Linux 2.2.7 Redhat 5.2 > > > > PostgreSQL version : 6.4.2 > > > > Compiler used : 2.7.2.3 > > > > Hardware: > > --------- > > Linux tau.kearneydev.com 2.2.7 #3 Thu Apr 29 10:10:41 EDT 1999 i686 unknown > > > > Versions of other tools: > > ------------------------ > > > > > > -------------------------------------------------------------------------- > > > > Problem Description: > > -------------------- > > The appearance of NULL in a table where a GROUP BY clause is > > used causes the behavior of returning 1 line for every NULL. > > Both Oracle8 and DB/2 perform this as I would expect. IE > > a,b > > 1,1 > > 1,2 > > NULL,1 > > NULL,2 > > > > SELECT a,sum(b) GROUP BY a returns on Postgres: > > 1,3 > > NULL,1 > > NULL,2 > > > > On Oracle8, DB/2, etc.: > > 1,3 > > NULL,3 > > > > Cut&paste from Oracle8: > > SQL> select * from z; > > > > A B > > --------- --------- > > 1 1 > > 1 2 > > 5 > > 10 > > > > SQL> select a,sum(b) from z group by a; > > > > A SUM(B) > > --------- --------- > > 1 3 > > 15 > > > > SQL> > > > > -------------------------------------------------------------------------- > > > > Test Case: > > ---------- > > CREATE TABLE z(a int4,b int4); > > INSERT INTO z values (1,2); > > INSERT INTO z VALUES (1,1); > > INSERT INTO z(b) VALUES (1); > > INSERT INTO z(b) VALUES (2); > > SELECT a,sum(b) FROM z GROUP BY a; > > > > > > -------------------------------------------------------------------------- > > > > Solution: > > --------- > > For whatever reason I've observed many times that NULL<>NULL > > under PostgreSQL, I've had to include many clauses in my > > SQL statements to make up for this, perhaps if this was > > corrected it would function properly. > > > > -------------------------------------------------------------------------- > > > > > > > > -- > Bruce Momjian | http://www.op.net/~candle > maillist@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 It works fine with 1 variable, try it with 2. 6.4.2 worked fine for 1, it's just when you GROUP BY 2 variables that contain NULLs issues start appearing.(Another reason it looks like a bug, not a feature :)) David Secret MIS Director Kearney Development Co., Inc.
Re: [BUGS] General Bug Report: GROUP BY with NULL not done properly(Oracle8& DB/2 do this completely different)
From
sszabo@bigpanda.com
Date:
>Bruce Momjian wrote: > >> Looks like this is fixed in 6.5. >> >> test=> SELECT a,sum(b) FROM z GROUP BY a; >> a|sum >> -+--- >> 1| 6 >> | 4 >> (2 rows) >> >> > >> > ============================================================================ >> > POSTGRESQL BUG REPORT TEMPLATE >> > ============================================================================ >> > >> > >> > Your name : >> > Your email address : secret@kearneydev.com >> > >> > Category : runtime: back-end: SQL >> > Severity : non-critical >> > >> > Summary: GROUP BY with NULL not done properly(Oracle8 & DB/2 do this completely different) >> > >> > System Configuration >> > -------------------- >> > Operating System : Linux 2.2.7 Redhat 5.2 >> > >> > PostgreSQL version : 6.4.2 >> > >> > Compiler used : 2.7.2.3 >> > >> > Hardware: >> > --------- >> > Linux tau.kearneydev.com 2.2.7 #3 Thu Apr 29 10:10:41 EDT 1999 i686 unknown >> > >> > Versions of other tools: >> > ------------------------ >> > >> > >> > -------------------------------------------------------------------------- >> > >> > Problem Description: >> > -------------------- >> > The appearance of NULL in a table where a GROUP BY clause is >> > used causes the behavior of returning 1 line for every NULL. >> > Both Oracle8 and DB/2 perform this as I would expect. IE >> > a,b >> > 1,1 >> > 1,2 >> > NULL,1 >> > NULL,2 >> > >> > SELECT a,sum(b) GROUP BY a returns on Postgres: >> > 1,3 >> > NULL,1 >> > NULL,2 >> > >> > On Oracle8, DB/2, etc.: >> > 1,3 >> > NULL,3 >> > >> > Cut&paste from Oracle8: >> > SQL> select * from z; >> > >> > A B >> > --------- --------- >> > 1 1 >> > 1 2 >> > 5 >> > 10 >> > >> > SQL> select a,sum(b) from z group by a; >> > >> > A SUM(B) >> > --------- --------- >> > 1 3 >> > 15 >> > >> > SQL> >> > >> > -------------------------------------------------------------------------- >> > >> > Test Case: >> > ---------- >> > CREATE TABLE z(a int4,b int4); >> > INSERT INTO z values (1,2); >> > INSERT INTO z VALUES (1,1); >> > INSERT INTO z(b) VALUES (1); >> > INSERT INTO z(b) VALUES (2); >> > SELECT a,sum(b) FROM z GROUP BY a; >> > >> > >> > -------------------------------------------------------------------------- >> > >> > Solution: >> > --------- >> > For whatever reason I've observed many times that NULL<>NULL >> > under PostgreSQL, I've had to include many clauses in my >> > SQL statements to make up for this, perhaps if this was >> > corrected it would function properly. >> > >> > -------------------------------------------------------------------------- >> > >> > >> > >> >> -- >> Bruce Momjian | http://www.op.net/~candle >> maillist@candle.pha.pa.us | (610) 853-3000 >> + If your life is a hard drive, | 830 Blythe Avenue >> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > It works fine with 1 variable, try it with 2. 6.4.2 worked fine for 1, it's just when you >GROUP BY 2 variables that contain NULLs issues start appearing.(Another reason it looks like a >bug, not a feature :)) When i make the following table: a|b| c -+-+-- 1|1|10 1|2|10 1|2| 5 2|2|15 2|2|-3 2|1|50 2| |25 2| |15 1| |34 1| |54 | | 5 | | 7 |1| 7 |4| 7 |4| 7 (15 rows) And do the following query, I get what i think was desired, the null/null, null/4, 1/null, 2/null are grouped together into a single output row... sszabo=> select a,b,sum(c) from b group by a,b; a|b|sum -+-+--- 1|1| 10 1|2| 15 1| | 88 2|1| 50 2|2| 12 2| | 40 |1| 7 |4| 14 | | 12 (9 rows) sszabo=> select version(); version -------------------------------------------------------------------- PostgreSQL 6.5.0 on i386-unknown-freebsd3.1, compiled by gcc 2.7.2.1
Re: [BUGS] General Bug Report: GROUP BY with NULL not done properly(Oracle8& DB/2 do this completely different)
From
secret
Date:
> > > When i make the following table: > > a|b| c > -+-+-- > 1|1|10 > 1|2|10 > 1|2| 5 > 2|2|15 > 2|2|-3 > 2|1|50 > 2| |25 > 2| |15 > 1| |34 > 1| |54 > | | 5 > | | 7 > |1| 7 > |4| 7 > |4| 7 > (15 rows) > > And do the following query, I get what i think was desired, > the null/null, null/4, 1/null, 2/null are grouped together into a single > output row... > > sszabo=> select a,b,sum(c) from b group by a,b; > a|b|sum > -+-+--- > 1|1| 10 > 1|2| 15 > 1| | 88 > 2|1| 50 > 2|2| 12 > 2| | 40 > |1| 7 > |4| 14 > | | 12 > (9 rows) > > sszabo=> select version(); > version > -------------------------------------------------------------------- > PostgreSQL 6.5.0 on i386-unknown-freebsd3.1, compiled by gcc 2.7.2.1 Smaller examples I try work too under v6.5, but here is an example from a larger one: SELECT a,b,sum(c) FROM xx GROUP BY a,b ORDER BY a,b; a | b | sum .. |102060| 6 |102060| 1 |102060| 6 |102060| 6 |102060| 0 |102060| 6 |102061| 6 |102061| 6 |102061| 6 |102061| 6 |102061| 7 |102084| 10 |102084| 10 |102084| 10 |102085| 4 |102109| 18 |102109| 18 |102109| 54 |102109| 18 |102110| 1 |102110| 1 There are actually 65,000 rows, so I can't quote all of them, but I will give you a dump of the test table upon request so you can duplicate the results... Here are a couple queries that illustrate the errors: ftc=> select count(*) from xx where b=102110; count ----- 2 (1 row) ftc=> select count(*) from xx where a is null and b=102110; count ----- 2 (1 row) As you see above, the GROUP on a didn't function for b=102110, we have 2 rows, both of which were returned, both of which A is NULL. If you'd like a dump of this table I'll send it to you and not cc the list on it... It's a stripped down version of another table I use quite a bit. Version: [PostgreSQL 6.5.0 on i586-pc-linux-gnulibc1, compiled by gcc 2.7.2.1] David Secret MIS Director Kearney Development Co., Inc.
Re: [BUGS] General Bug Report: GROUP BY with NULL not done properly(Oracle8& DB/2 do this completely different)
From
Bruce Momjian
Date:
> It works fine with 1 variable, try it with 2. 6.4.2 worked fine for 1, it's just when you > GROUP BY 2 variables that contain NULLs issues start appearing.(Another reason it looks like a > bug, not a feature :)) > I hate to ask, but can you send me a little set of test SQL's? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Re: [BUGS] General Bug Report: GROUP BY with NULL not done properly(Oracle8& DB/2 do this completely different)
From
sszabo@bigpanda.com
Date:
> Smaller examples I try work too under v6.5, but here is an example from a larger one: > > As you see above, the GROUP on a didn't function for b=102110, we have 2 rows, both of which >were returned, both of which A is NULL. If you'd like a dump of this table I'll send it to you and >not cc the list on it... It's a stripped down version of another table I use quite a bit. I got a similar result on a set of 23000 records or so I did a create table a (a int4, b int4, c int4); perl -e 'use integer; $x=1; while (1) { print "insert into a values(null, $x, ".(rand(100)*1).");\n"; if (rand(10)<=1) { $x++; } if ($x == 100000) { exit; } }' > out cat out | psql (I paused it part way through at about 23000 records) psql select a,b,sum(c) from a group by a,b; and got 2 more rows than i should have Interesting thing however was that the rows i got out where ordered null|1|<something> null|<something like 2000>|<something> null|<prev+1>|<something> ... null|1|<something> null|2|<something> [note that null,1 showed up more than once] where i had thought that normally because of the way postgres does its grouping, the results are ordered by the group by fields before the grouping is done, and in most cases of group by i get the output sorted by those values, in the case where it failed, i did not. I'm not certain if that's normal or not, so i thought it would be worth mentioning. In the case originally done on the 60000+ records, an explicit order by was done so this wouldn't have shown up. Stephan
Re: [BUGS] General Bug Report: GROUP BY with NULL not done properly(Oracle8& DB/2 do this completely different)
From
secret
Date:
sszabo@bigpanda.com wrote: > > Smaller examples I try work too under v6.5, but here is an example from a larger one: > > > > As you see above, the GROUP on a didn't function for b=102110, we have 2 rows, both of which > >were returned, both of which A is NULL. If you'd like a dump of this table I'll send it to you and > >not cc the list on it... It's a stripped down version of another table I use quite a bit. > > I got a similar result on a set of 23000 records or so > I did a > create table a (a int4, b int4, c int4); > perl -e 'use integer; $x=1; while (1) { > print "insert into a values(null, $x, ".(rand(100)*1).");\n"; > if (rand(10)<=1) { $x++; } if ($x == 100000) { exit; } }' > out > cat out | psql > > (I paused it part way through at about 23000 records) > > psql > select a,b,sum(c) from a group by a,b; > and got 2 more rows than i should have > > Interesting thing however was that the rows i got out where ordered > null|1|<something> > null|<something like 2000>|<something> > null|<prev+1>|<something> > ... > null|1|<something> > null|2|<something> > [note that null,1 showed up more than once] > > where i had thought that normally because of the way postgres does its > grouping, the results are ordered by the group by fields before the > grouping is done, and in most cases of group by i get the output > sorted by those values, in the case where it failed, i did not. > I'm not certain if that's normal or not, so i thought it would be > worth mentioning. > > In the case originally done on the 60000+ records, an explicit > order by was done so this wouldn't have shown up. > > Stephan I have a .ZIP file of about 50k that has a data set with query that will produce the incorrect results. I put an explicit ORDER BY in to demonstrate the problem. If anyone wants a copy let me know, I hesitate to send it to the list due to the size. David Secret MIS Director Kearney Development Co., Inc.
Re: [BUGS] General Bug Report: GROUP BY with NULL not done properly(Oracle8& DB/2 do this completely different)
From
Tom Lane
Date:
sszabo@bigpanda.com writes: > Interesting thing however was that the rows i got out where ordered > null|1|<something> > null|<something like 2000>|<something> > null|<prev+1>|<something> > ... > null|1|<something> > null|2|<something> > [note that null,1 showed up more than once] > where i had thought that normally because of the way postgres does its > grouping, the results are ordered by the group by fields before the > grouping is done, and in most cases of group by i get the output > sorted by those values, in the case where it failed, i did not. > I'm not certain if that's normal or not, so i thought it would be > worth mentioning. You're right, "GROUP BY a,b ORDER BY a,b" is redundant because Postgres implements GROUP BY using a sort-and-merge-adjacent-duplicates algorithm. So the result of a GROUPed query is going to be sorted on the group columns anyway. I have just fixed a bug in lselect.c that would cause tuples containing null fields to be sorted in an unpredictable order. This manifested itself in David's example by failing to bring together all the tuples that in fact had the same values of a + b; whereupon the duplicate-recognizer of course wasn't able to merge all the tuples it should have. The second sort step probably didn't produce perfect ordering either, though it seems to have mostly worked. With the bug fix I get 1736 rows out of David's example, rather than 5322 rows (while he got 5321, another indication that the sort wasn't predictable...) regards, tom lane