Re: [HACKERS] No: implied sort with group by - Mailing list pgsql-hackers

From darrenk@insightdist.com (Darren King)
Subject Re: [HACKERS] No: implied sort with group by
Date
Msg-id 9801271708.AA21938@ceodev
Whole thread Raw
List pgsql-hackers
> > > This is what I think is missing or broken right now.
> > >
> > > > > select * from t1;
> > > >          a b  c
> > > >          1    x
> > > >          2    x
> > > >          3    z
> > > >          2    x
> > > >
> > > > 4 row(s) retrieved.
> > > > > select b,c,sum(a) from t1 group by b,c;
> > > > b  c             (sum)
> > > >
> > > >    x                 5
> > > >    z                 3
> > > >> 2 row(s) retrieved.
> >
> > Sorry, I've lost the thread. What is broken? I get this same result, and
> > (assuming that column "b" is full of nulls) I think this the correct result.
>
> At one point, it was thought that NULLs shouldn't be grouped, but I
> backed out the patch.  There is a problem with GROUP BY on large
> datasets, and Vadim knows the cause, and will work on it later.

Different from the grouping by NULLs issue...

The above results are from Sybase.  If these same four rows are inserted into
postgres, the second query will return three rows.  Something like...

b|c|sum(a)
 |x|3
 |z|3
 |x|2

It does this not because of the null values of column b, but because the data is
not sorted before getting to the group by node if the user does not explicitly put
an order by in the query.  IMHO, postgres should put an arbitrary sort node in the
tree so that the data can be properly grouped as the group node iterates over it.

And even if I put an "order by c" clause in there, I still get three rows, they're
just properly sorted. :)

darrenk


pgsql-hackers by date:

Previous
From: Brett McCormick
Date:
Subject: Re: [QUESTIONS] select date('now'::datetime+'30 day'::timespan)
Next
From: Bruce Momjian
Date:
Subject: Re: Frontend/Backend Protocol Patch