Thread: Ordering of records in group by not possible

Ordering of records in group by not possible

From
Chris Kratz
Date:
Hello all,

I wanted to verify what we are seeing.

Select a, aggregate(b)
from c
group by a
order by a,b

Is not accepted by postgres.  This will only work if you order by a.  But,
this means that the records that are grouped are processed in no apparent
order.

We have some custom aggregate functions where the order of the rows is
important.  Is there no way to do this without a subselect?

select a,aggregate(b)
from (
   select a,b
   from c
   order by a,b
) as foo
group by a
order by a

Even with a subselect doing the ordering ahead of time, is there any
guarrantee that the records will be processed in the group by with the
specified order?  Or will the group by always be arbitrary in it's ordering
of the records?

This is pg 8.1 on Linux.

Thanks,

-chris
--
Chris Kratz

Re: Ordering of records in group by not possible

From
Martijn van Oosterhout
Date:
On Wed, Apr 26, 2006 at 09:19:41AM -0400, Chris Kratz wrote:
> Hello all,
>
> I wanted to verify what we are seeing.

> Select a, aggregate(b)
> from c
> group by a
> order by a,b
>
> Is not accepted by postgres.  This will only work if you order by a.  But,
> this means that the records that are grouped are processed in no apparent
> order.

Well, ORDER BY happens *after* the select values have been calculated,
so it can't possibly affect the order of the rows into the aggregate.

> We have some custom aggregate functions where the order of the rows is
> important.  Is there no way to do this without a subselect?

You found the right solution, use ORDER BY in a subselect.

> Even with a subselect doing the ordering ahead of time, is there any
> guarrantee that the records will be processed in the group by with the
> specified order?  Or will the group by always be arbitrary in it's ordering
> of the records?

Currently, if the sub-select orders the rows, the outer query will see
the rows in that order. I don't think the SQL standard even allows
ORDER BY there, but PostgreSQL does support it for this reason. BTW,
GROUP BY doesn't not imply any ordering at all, consider a Hash
Aggregate that calculates all the aggregates simultaneously...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Ordering of records in group by not possible

From
Alban Hertroys
Date:
Chris Kratz wrote:
> Hello all,
>
> I wanted to verify what we are seeing.
>
> Select a, aggregate(b)
> from c
> group by a
> order by a,b

That's a rather odd query... Values in b aren't available to order by,
as they have been aggregated. There is no relation to the values in b
and the values in your result set.

You could order by "column 2" if you want to order on the results on
your aggregate:
    Select a, aggregate(b)
    from c
    group by a
    order by a,2

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: Ordering of records in group by not possible

From
Ludwig Isaac Lim
Date:
Hi:

> You could order by "column 2" if you want to order on the
> results on
> your aggregate:
>     Select a, aggregate(b)
>     from c
>     group by a
>     order by a,2
>

another alternative is :

   select a , aggregrate(b) as <alias>
   from c
   group by a
   order by a, <alias>

e.g.

   select a,sum(b) as sum_b
   from c
   group by a
   order by a, sum_b

ludwig lim

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Ordering of records in group by not possible

From
Chris Kratz
Date:
On Wednesday 26 April 2006 10:30 am, you wrote:
> Chris Kratz wrote:
> > Hello all,
> >
> > I wanted to verify what we are seeing.
> >
> > Select a, aggregate(b)
> > from c
> > group by a
> > order by a,b
>
> That's a rather odd query... Values in b aren't available to order by,
> as they have been aggregated. There is no relation to the values in b
> and the values in your result set.
>
> You could order by "column 2" if you want to order on the results on
> your aggregate:
>     Select a, aggregate(b)
>     from c
>     group by a
>     order by a,2

Hello Alban,

The point is that the aggregates we are working on in our application are
order sensitive.  One common example of order sensitive aggregates are the
first and last aggregate functions found in some other databases.  Both of
which are very dependant on the order of the records going into the
aggregate.  Since you can't order the records before they hit the group by in
postgres, the actual ordering of the records when they hit the aggregate
function is indeterminate and so order sensitive aggregates really don't
work.

The best solution we have found so far is to feed the aggregating query with a
subquery that orders on the appropriate columns.  But even then, since the
group by forces another sort, I'm not convinced that the ordering is still
not indeterminate.

I know you can use distinct on to get similar behavior for first and last, but
that's not a general solution for aggregates that are order sensitive.

I'm almost convinced that there isn't a way to do order sensitive aggregates
in pg, but will have to pull the record set out and process it
programmatically externally if the subquery trick doesn't work.

-Chris

--
Chris Kratz

Re: Ordering of records in group by not possible

From
Chris Kratz
Date:
On Wednesday 26 April 2006 11:19 am, Ludwig Isaac Lim wrote:
> Hi:
> > You could order by "column 2" if you want to order on the
> > results on
> > your aggregate:
> >     Select a, aggregate(b)
> >     from c
> >     group by a
> >     order by a,2
>
> another alternative is :
>
>    select a , aggregrate(b) as <alias>
>    from c
>    group by a
>    order by a, <alias>
>
> e.g.
>
>    select a,sum(b) as sum_b
>    from c
>    group by a
>    order by a, sum_b
>
> ludwig lim
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Ironically, before we realized we weren't getting the results we expected, we
were doing something like this.

select a,aggregate(b) as b
from c
group by a
order by a,b

We got lucky the first couple of tests and it appeared to work.  It was only
later that we realized that the sort was actually sorting the aggregated
values after the group by, not the rows going into the order sensitive
aggregate as we had hoped.

I actually like the fact that the sort is the last thing that is done on a
query, in most cases this is what makes sense.  It's just frustrating in this
one place to have little control of the records going into the grouping code
since the aggregates we are using are order sensitive.  Think functions like
first and last.

-Chris
--
Chris Kratz

Re: Ordering of records in group by not possible

From
Alban Hertroys
Date:
Chris Kratz wrote:
> On Wednesday 26 April 2006 10:30 am, you wrote:
> Hello Alban,
>
> The point is that the aggregates we are working on in our application are
> order sensitive.  One common example of order sensitive aggregates are the
> first and last aggregate functions found in some other databases.  Both of

Ah, order sensitive aggregates... Of course. I've been playing with the
thought of implementing first and last aggregates myself, but never got
around to it. It is not that often that you need those, the same result
can often be achieved by ordering and limitting.

Ordering the data before aggregating should do the trick though, but
you'll need to order your records in the subquery. Something like this,
I think:

Select a, aggregate(b)
from (
    select a, b
    from c
    order by a,b
) d
group by a

Ordering on the aggregated values afterwards shouldn't make a difference
to the aggregated results. I'm pretty sure of that.

> The best solution we have found so far is to feed the aggregating query with a
> subquery that orders on the appropriate columns.  But even then, since the
> group by forces another sort, I'm not convinced that the ordering is still
> not indeterminate.

The experts on the internals will know for sure, but I think their
answer'll be that the ordering is determinate this way.

> I'm almost convinced that there isn't a way to do order sensitive aggregates
> in pg, but will have to pull the record set out and process it
> programmatically externally if the subquery trick doesn't work.

No matter how you're going to achieve your aggregate, you'll need to
order the records you're aggregating over before doing so. Whether that
can be done in SQL, a stored procedure or in application code doesn't
change that.

I don't know where your idea to order on the original column outside the
aggregate originated, but it looks like a rather confusing way to
achieve this. It may even be indeterminate... It completely depends on
the database system knowing what the aggregate is about, AFAICS.

For the record, I'm just a regular here. I'm not directly involved with
PostgreSQL.

Regards,
--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: Ordering of records in group by not possible

From
Tom Lane
Date:
Alban Hertroys <alban@magproductions.nl> writes:
> Ordering the data before aggregating should do the trick though, but
> you'll need to order your records in the subquery. Something like this,
> I think:

> Select a, aggregate(b)
> from (
>     select a, b
>     from c
>     order by a,b
> ) d
> group by a

Right.  The key point there is that the subquery is already delivering
outputs that are suitably sorted for grouping by A, and so even if the
planner wants to do it via group rather than hash aggregation, it will
not need to put in another sort step.  You can't just "order by B" in
the subquery, or it won't work.

This should work reliably since [ checks CVS... ] PG 7.4.  Just to be
sure, you might try looking at the query plan with "set enable_hashagg =
off" to verify that the planner recognizes it doesn't need a second sort
step.

            regards, tom lane