Re: Embarassing GROUP question - Mailing list pgsql-general

From Tom Lane
Subject Re: Embarassing GROUP question
Date
Msg-id 8798.1254602380@sss.pgh.pa.us
Whole thread Raw
In response to Re: Embarassing GROUP question  (Sam Mason <sam@samason.me.uk>)
Responses Re: Embarassing GROUP question
List pgsql-general
Sam Mason <sam@samason.me.uk> writes:
> On Sat, Oct 03, 2009 at 01:05:49PM -0400, Tom Lane wrote:
>> but looking at this example, and presuming that you find that
>> it actually does something useful, I wonder whether they interpret
>> the combination of GROUP BY and ambiguous-per-spec ORDER BY
>> in some fashion similar to DISTINCT ON.

> Yup, does look that way doesn't it.  It's still a weird pair of
> semantics to conflate.

I poked around in the MySQL 5.1 manual to see if this is true.
I think it isn't --- it says very clearly here:
http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html
that you simply get an arbitrary choice among the possible values
when you reference an ambiguous column.  It's possible that Corey's
query actually does give him the answers he wants, but apparently
it would be an implementation artifact that they're not promising
to maintain.

> Hum, if they were assuming that you'd always have to implement GROUP BY
> by doing a sort step first then I can see why they'd end up with this.

It's worse than that --- they actually are promising that GROUP BY
orders the results!  In
http://dev.mysql.com/doc/refman/5.1/en/select.html
I find

    If you use GROUP BY, output rows are sorted according to the
    GROUP BY columns as if you had an ORDER BY for the same
    columns. To avoid the overhead of sorting that GROUP BY
    produces, add ORDER BY NULL:

        SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;

    MySQL extends the GROUP BY clause so that you can also specify
    ASC and DESC after columns named in the clause:

        SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;

The first of these examples implies that they allow ORDER BY to override
the default GROUP BY sorting, which would mean that the ORDER BY sort
has to happen after the GROUP BY operation, unlike the approach we take
for DISTINCT ON.  So that means the ORDER BY *isn't* going to affect
which row gets chosen out of each event_type group.

What I am currently betting is that Corey's query does not really do
what he thinks it does in MySQL.  It probably is selecting a random
representative row in each group and then sorting on the basis of the
event_dates in those rows.

            regards, tom lane

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Procedure for feature requests?
Next
From: Corey Tisdale
Date:
Subject: Re: Embarassing GROUP question