Thread: Ordering of records in group by not possible
Hello all, I wanted to verify what we are seeing. Select a, aggregate(b) from c group by a order by a,b Is not accepted by postgres. This will only work if you order by a. But, this means that the records that are grouped are processed in no apparent order. We have some custom aggregate functions where the order of the rows is important. Is there no way to do this without a subselect? select a,aggregate(b) from ( select a,b from c order by a,b ) as foo group by a order by a Even with a subselect doing the ordering ahead of time, is there any guarrantee that the records will be processed in the group by with the specified order? Or will the group by always be arbitrary in it's ordering of the records? This is pg 8.1 on Linux. Thanks, -chris -- Chris Kratz
On Wed, Apr 26, 2006 at 09:19:41AM -0400, Chris Kratz wrote: > Hello all, > > I wanted to verify what we are seeing. > Select a, aggregate(b) > from c > group by a > order by a,b > > Is not accepted by postgres. This will only work if you order by a. But, > this means that the records that are grouped are processed in no apparent > order. Well, ORDER BY happens *after* the select values have been calculated, so it can't possibly affect the order of the rows into the aggregate. > We have some custom aggregate functions where the order of the rows is > important. Is there no way to do this without a subselect? You found the right solution, use ORDER BY in a subselect. > Even with a subselect doing the ordering ahead of time, is there any > guarrantee that the records will be processed in the group by with the > specified order? Or will the group by always be arbitrary in it's ordering > of the records? Currently, if the sub-select orders the rows, the outer query will see the rows in that order. I don't think the SQL standard even allows ORDER BY there, but PostgreSQL does support it for this reason. BTW, GROUP BY doesn't not imply any ordering at all, consider a Hash Aggregate that calculates all the aggregates simultaneously... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Chris Kratz wrote: > Hello all, > > I wanted to verify what we are seeing. > > Select a, aggregate(b) > from c > group by a > order by a,b That's a rather odd query... Values in b aren't available to order by, as they have been aggregated. There is no relation to the values in b and the values in your result set. You could order by "column 2" if you want to order on the results on your aggregate: Select a, aggregate(b) from c group by a order by a,2 -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Hi: > You could order by "column 2" if you want to order on the > results on > your aggregate: > Select a, aggregate(b) > from c > group by a > order by a,2 > another alternative is : select a , aggregrate(b) as <alias> from c group by a order by a, <alias> e.g. select a,sum(b) as sum_b from c group by a order by a, sum_b ludwig lim __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Wednesday 26 April 2006 10:30 am, you wrote: > Chris Kratz wrote: > > Hello all, > > > > I wanted to verify what we are seeing. > > > > Select a, aggregate(b) > > from c > > group by a > > order by a,b > > That's a rather odd query... Values in b aren't available to order by, > as they have been aggregated. There is no relation to the values in b > and the values in your result set. > > You could order by "column 2" if you want to order on the results on > your aggregate: > Select a, aggregate(b) > from c > group by a > order by a,2 Hello Alban, The point is that the aggregates we are working on in our application are order sensitive. One common example of order sensitive aggregates are the first and last aggregate functions found in some other databases. Both of which are very dependant on the order of the records going into the aggregate. Since you can't order the records before they hit the group by in postgres, the actual ordering of the records when they hit the aggregate function is indeterminate and so order sensitive aggregates really don't work. The best solution we have found so far is to feed the aggregating query with a subquery that orders on the appropriate columns. But even then, since the group by forces another sort, I'm not convinced that the ordering is still not indeterminate. I know you can use distinct on to get similar behavior for first and last, but that's not a general solution for aggregates that are order sensitive. I'm almost convinced that there isn't a way to do order sensitive aggregates in pg, but will have to pull the record set out and process it programmatically externally if the subquery trick doesn't work. -Chris -- Chris Kratz
On Wednesday 26 April 2006 11:19 am, Ludwig Isaac Lim wrote: > Hi: > > You could order by "column 2" if you want to order on the > > results on > > your aggregate: > > Select a, aggregate(b) > > from c > > group by a > > order by a,2 > > another alternative is : > > select a , aggregrate(b) as <alias> > from c > group by a > order by a, <alias> > > e.g. > > select a,sum(b) as sum_b > from c > group by a > order by a, sum_b > > ludwig lim > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend Ironically, before we realized we weren't getting the results we expected, we were doing something like this. select a,aggregate(b) as b from c group by a order by a,b We got lucky the first couple of tests and it appeared to work. It was only later that we realized that the sort was actually sorting the aggregated values after the group by, not the rows going into the order sensitive aggregate as we had hoped. I actually like the fact that the sort is the last thing that is done on a query, in most cases this is what makes sense. It's just frustrating in this one place to have little control of the records going into the grouping code since the aggregates we are using are order sensitive. Think functions like first and last. -Chris -- Chris Kratz
Chris Kratz wrote: > On Wednesday 26 April 2006 10:30 am, you wrote: > Hello Alban, > > The point is that the aggregates we are working on in our application are > order sensitive. One common example of order sensitive aggregates are the > first and last aggregate functions found in some other databases. Both of Ah, order sensitive aggregates... Of course. I've been playing with the thought of implementing first and last aggregates myself, but never got around to it. It is not that often that you need those, the same result can often be achieved by ordering and limitting. Ordering the data before aggregating should do the trick though, but you'll need to order your records in the subquery. Something like this, I think: Select a, aggregate(b) from ( select a, b from c order by a,b ) d group by a Ordering on the aggregated values afterwards shouldn't make a difference to the aggregated results. I'm pretty sure of that. > The best solution we have found so far is to feed the aggregating query with a > subquery that orders on the appropriate columns. But even then, since the > group by forces another sort, I'm not convinced that the ordering is still > not indeterminate. The experts on the internals will know for sure, but I think their answer'll be that the ordering is determinate this way. > I'm almost convinced that there isn't a way to do order sensitive aggregates > in pg, but will have to pull the record set out and process it > programmatically externally if the subquery trick doesn't work. No matter how you're going to achieve your aggregate, you'll need to order the records you're aggregating over before doing so. Whether that can be done in SQL, a stored procedure or in application code doesn't change that. I don't know where your idea to order on the original column outside the aggregate originated, but it looks like a rather confusing way to achieve this. It may even be indeterminate... It completely depends on the database system knowing what the aggregate is about, AFAICS. For the record, I'm just a regular here. I'm not directly involved with PostgreSQL. Regards, -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Alban Hertroys <alban@magproductions.nl> writes: > Ordering the data before aggregating should do the trick though, but > you'll need to order your records in the subquery. Something like this, > I think: > Select a, aggregate(b) > from ( > select a, b > from c > order by a,b > ) d > group by a Right. The key point there is that the subquery is already delivering outputs that are suitably sorted for grouping by A, and so even if the planner wants to do it via group rather than hash aggregation, it will not need to put in another sort step. You can't just "order by B" in the subquery, or it won't work. This should work reliably since [ checks CVS... ] PG 7.4. Just to be sure, you might try looking at the query plan with "set enable_hashagg = off" to verify that the planner recognizes it doesn't need a second sort step. regards, tom lane