Re: SQL feature requests - Mailing list pgsql-hackers

From Michael Glaesemann
Subject Re: SQL feature requests
Date
Msg-id F550537A-2D75-401C-B6CF-7BCFC2438364@seespotcode.net
Whole thread Raw
In response to Re: SQL feature requests  ("Ben Tilly" <btilly@gmail.com>)
Responses Re: SQL feature requests  ("Ben Tilly" <btilly@gmail.com>)
List pgsql-hackers
On Aug 23, 2007, at 10:47 , Ben Tilly wrote:

> On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
>>
>> *This* seems like a bug:
>> test=# select record_id
>>         , count(observation_id) as bar
>>         from observation
>>         group by record_id
>>             , case when true
>>                    then 'foo'
>>               end;
>> record_id | bar
>> -----------+-----
>>           1 |   4
>>           2 |   4
>>           3 |   2
>> (3 rows)
>
> Why does it seem like a bug to you?
>
> Turn it around, and tell me in what way is its behaviour surprising to
> someone who knows SQL.  You asked to group on something that is the
> same for all rows.  That group by condition did nothing.  (Except
> rendered the syntax valid when it might not have been.)  As I would
> expect.

Considering that I expect the GROUP BY clause to include only column
references (or other groupings of column references), not
expressions. Whether or not the value is the same, it surprises me
that something other than a column reference is accepted at all. I
hadn't realized this behavior was accepted in PostgreSQL, but I learn
something new every day.

My non-rigorous way of thinking about GROUP BY is that it groups this
listed columns when the values of the listed columns are the same. An
expression that evaluates to anything other than a column name
doesn't provide any information about which column to consider
grouped, and expressions don't evaluate to column names, or
identifiers in general. If I understand you correctly, a GROUP BY
item that isn't a column name would be a value that's applied to all
columns, and the actual value is irrelevant—different values don't
change the result.

So the only purpose it would serve would be to prevent a trailing
comma from raising a syntax error: you'd still need to explicitly
list the other columns (unless the implementation behavior is changed
to extend the spec there as well). What this does is allow you to use
something like this (psuedocode):

group_columns = [ 'foo', 'bar', 'baz' ]

group_column_list = ''
for col in group_columns { group_column_list += col + ',' } #
group_column_list = "foo,bar,baz,"

group_by_clause = "GROUP BY $group_column_list CASE WHEN TRUE THEN
'quux' END"

rather than

group_column_list = join group_columns, ',' # group_column_list =
"foo,bar,baz"
group_by_clause = "GROUP BY $group_column_list"

I still feel I'm missing something. If that's it, it seems like
something easy enough to handle in middleware. Sorry if it appears
I'm being dense. I've definitely learned things in this thread.

> Furthermore ask yourself whether anyone who wrote that would likely
> have written it by accident.

I don't see what that has to do with anything. There are plenty of
things I can write on purpose that would be nonsense. You might even
consider my posts as prime examples :)

Michael Glaesemann
grzm seespotcode net




pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: SQL feature requests
Next
From: Tom Lane
Date:
Subject: Re: SQL feature requests