Thread: Re: [HACKERS] No: implied sort with group by
> > > Does the SQL standard say anything about an implied sort when > > grouping or is it up to the user to include an ORDER BY clause? > > darrenk > > Without order by the result set is never guaranteed to be ordered in a > specific way (standard speak). The order is dependent on the chosen > query path, which changes from query to query. > > Andreas Postgres should then do an internal sort before grouping. In the second of your examples, I take the above to mean that either row could be returned first. In order to get that result set though, the data needs to be sorted before getting to the group by node in the executor. The order of that internal sort is purely arbitrary, it just has to be done. This is what I think is missing or broken right now. > > select * from t1; > a b c > 1 x > 2 x > 3 z > 2 x > > 4 row(s) retrieved. > > select b,c,sum(a) from t1 group by b,c; > b c (sum) > > x 5 > z 3 > > 2 row(s) retrieved. darrenk
> > > Does the SQL standard say anything about an implied sort when > > > grouping or is it up to the user to include an ORDER BY clause? Up to the user. SQL is a set-oriented language. The fact that many/most/all implementations order results to then do grouping is an implementation detail, not a language definition. > This is what I think is missing or broken right now. > > > > select * from t1; > > a b c > > 1 x > > 2 x > > 3 z > > 2 x > > > > 4 row(s) retrieved. > > > select b,c,sum(a) from t1 group by b,c; > > b c (sum) > > > > x 5 > > z 3 > >> 2 row(s) retrieved. Sorry, I've lost the thread. What is broken? I get this same result, and (assuming that column "b" is full of nulls) I think this the correct result. - Tom
> > > > > Does the SQL standard say anything about an implied sort when > > > > grouping or is it up to the user to include an ORDER BY clause? > > Up to the user. SQL is a set-oriented language. The fact that many/most/all > implementations order results to then do grouping is an implementation > detail, not a language definition. > > > > This is what I think is missing or broken right now. > > > > > > select * from t1; > > > a b c > > > 1 x > > > 2 x > > > 3 z > > > 2 x > > > > > > 4 row(s) retrieved. > > > > select b,c,sum(a) from t1 group by b,c; > > > b c (sum) > > > > > > x 5 > > > z 3 > > >> 2 row(s) retrieved. > > Sorry, I've lost the thread. What is broken? I get this same result, and > (assuming that column "b" is full of nulls) I think this the correct result. At one point, it was thought that NULLs shouldn't be grouped, but I backed out the patch. There is a problem with GROUP BY on large datasets, and Vadim knows the cause, and will work on it later. -- Bruce Momjian maillist@candle.pha.pa.us