Re: [SQL] Oddities with NULL and GROUP BY - Mailing list pgsql-sql

From Tom Lane
Subject Re: [SQL] Oddities with NULL and GROUP BY
Date
Msg-id 5126.931636997@sss.pgh.pa.us
Whole thread Raw
In response to Re: [SQL] Oddities with NULL and GROUP BY  (secret <secret@kearneydev.com>)
List pgsql-sql
I believe I have finally resolved this old bug from May:

secret <secret@kearneydev.com> writes:
> GROUPing in PostgreSQL w/NULLs works just fine when there is only 1
> column, however when one throws 2 in, the 2nd one having NULLs it starts
> failing.  Your example demonstrates the right answer for 1 group by
> column, try it with 2 and I expect 6.5beta1 will fail as 6.4.2 does.

Actually, I believe that the problem was seen when you sort/group by
multiple columns and there are nulls in the *earlier* columns.  The bug
I just fixed in the sort logic was that it would stop comparing as soon
as it hit a null column.  Thus (NULL,1) would sort as equal to (NULL,2)
whereas you'd obviously like it to sort as smaller.

The reason it affected GROUP BY is that the sort could produce results
like(NULL,1)(NULL,1)(NULL,2)(NULL,1)
Because of the comparison bug, the sorter thought these tuples were
all equal-keyed and so it didn't worry about what order they'd come
out in.  But then the adjacent-duplicate-merging step would produce(NULL,1)    --- 2 tuples represented by this
group(NULL,2)(NULL,1)
which is the wrong answer.

The fix is to continue comparing columns when both tuples have a null
in one column, rather than stopping and declaring them equal.  This
is in current CVS sources and will be in 6.5.1.

The bug cannot be observed if you use test cases that only sort/group
on one column...
        regards, tom lane


pgsql-sql by date:

Previous
From: welssen crow
Date:
Subject: a problem of the postgresql ..
Next
From: Oleg Bartunov
Date:
Subject: SELECT DISTINCT question