Re: SQL feature requests - Mailing list pgsql-hackers

From Ben Tilly
Subject Re: SQL feature requests
Date
Msg-id acc274b30708231057k5993c131gacd4136afd649b18@mail.gmail.com
Whole thread Raw
In response to Re: SQL feature requests  (Michael Glaesemann <grzm@seespotcode.net>)
List pgsql-hackers
On 8/23/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
> On Aug 23, 2007, at 10:47 , Ben Tilly wrote:
[...]
> > Why does it seem like a bug to you?
> >
> > Turn it around, and tell me in what way is its behaviour surprising to
> > someone who knows SQL.  You asked to group on something that is the
> > same for all rows.  That group by condition did nothing.  (Except
> > rendered the syntax valid when it might not have been.)  As I would
> > expect.
>
> Considering that I expect the GROUP BY clause to include only column
> references (or other groupings of column references), not
> expressions. Whether or not the value is the same, it surprises me
> that something other than a column reference is accepted at all. I
> hadn't realized this behavior was accepted in PostgreSQL, but I learn
> something new every day.

That's obviously a very different point of view than mine.  Over the
last decade of using relational databases I've always assumed that any
non-aggregate expression that is valid in a select is valid in a group
by, and I've only rarely been surprised by this.  (The only exceptions
that I can list off of the top of my head are that integer constants
can refer to columns by position, and text constants are not allowed
by postgres.)

> My non-rigorous way of thinking about GROUP BY is that it groups this
> listed columns when the values of the listed columns are the same. An
> expression that evaluates to anything other than a column name
> doesn't provide any information about which column to consider
> grouped, and expressions don't evaluate to column names, or
> identifiers in general. If I understand you correctly, a GROUP BY
> item that isn't a column name would be a value that's applied to all
> columns, and the actual value is irrelevant—different values don't
> change the result.

That's not quite how it works.  The better non-rigorous way of
thinking about it is that any non-aggregate function you can put in a
select is allowed in a group by.  So if I group by trim(foo.bar), I
will be grouping rows based on the distinct values of trim(foo.bar).
So the values 'baz', ' baz', 'baz ' and ' baz ' would all be rolled up
into one row in the group by query.  But the value 'blat' would wind
up in another row.

The case of a constant expression is the logical (if normally useless)
extension of this.

> So the only purpose it would serve would be to prevent a trailing
> comma from raising a syntax error: you'd still need to explicitly
> list the other columns (unless the implementation behavior is changed
> to extend the spec there as well). What this does is allow you to use
> something like this (psuedocode):

That is the only purpose of a constant expression is that.

> group_columns = [ 'foo', 'bar', 'baz' ]
>
> group_column_list = ''
> for col in group_columns { group_column_list += col + ',' } #
> group_column_list = "foo,bar,baz,"
>
> group_by_clause = "GROUP BY $group_column_list CASE WHEN TRUE THEN
> 'quux' END"
>
> rather than
>
> group_column_list = join group_columns, ',' # group_column_list =
> "foo,bar,baz"
> group_by_clause = "GROUP BY $group_column_list"
>
> I still feel I'm missing something. If that's it, it seems like
> something easy enough to handle in middleware. Sorry if it appears
> I'm being dense. I've definitely learned things in this thread.

That's mostly right.  However don't forget the group_columns might be
an empty list, and in that case you need to optionally not have a
group by clause at all.  (Yes, in some of my queries this is a very
real possibility.)

Plus a bit of context.  This comes up for me in reports which are
basically implemented as a series of queries using temp tables.  So
making the generation of SQL more convoluted significantly increases
the complexity of the code.  (Writing reports is most of my job, so I
get to write lots and lots of these.)

> > Furthermore ask yourself whether anyone who wrote that would likely
> > have written it by accident.
>
> I don't see what that has to do with anything. There are plenty of
> things I can write on purpose that would be nonsense. You might even
> consider my posts as prime examples :)

When you have a boundary case, sometimes you really want to answer the
question, "Is this case likely to be a result of confusion?"  Which
question is particularly relevant in this case because my strong
suspicion is that constant text expressions are banned in PostgreSQL
explicitly because of fears that they are a result of confusion.

Cheers,
Ben


pgsql-hackers by date:

Previous
From: "Ben Tilly"
Date:
Subject: Re: SQL feature requests
Next
From: "Florian G. Pflug"
Date:
Subject: Re: SQL feature requests