Re: SQL feature requests - Mailing list pgsql-hackers

From Michael Glaesemann
Subject Re: SQL feature requests
Date
Msg-id 24DD5FCF-8A34-476D-8ECD-AD32D4B0E721@seespotcode.net
Whole thread Raw
In response to Re: SQL feature requests  ("Ben Tilly" <btilly@gmail.com>)
Responses Re: SQL feature requests  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: SQL feature requests  ("Ben Tilly" <btilly@gmail.com>)
List pgsql-hackers
On Aug 22, 2007, at 20:36 , Ben Tilly wrote:

> On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
>>
>> On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
>>
>>> 1. Just a minor annoyance, but why must subqueries in FROM clauses
>>> have an alias?
>>
>> It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
>> does?
>
> I can well believe that the standard says that you must accept
> subqueries with aliases.  But does it say you must reject subqueries
> without aliases?  I strongly doubt that.

If I'm reading my draft copy of the SQL:2003 spec right (and there's
a good chance that I'm not, as it's not the easiest document for me
to parse), aliases *are* required.
From 5WD-02-Foundation-2003-09 (the fifth working draft)

> 7.5 <from clause>
>
> <from clause> ::= FROM <table reference list>
> <table reference list> ::=
>     <table reference> [ { <comma> <table reference> }... ]


> 7.6 <table reference>
>
> <table reference> ::=
>     <table factor>
>   | <joined table>


> <table factor> ::= <table primary> [ <sample clause> ]


> <table primary> ::=
>     <table or query name> [ [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ] ]
>   | <derived table> [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ]
>   | <lateral derived table> [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ]
>   | <collection derived table> [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ]
>   | <table function derived table> [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ]
>   | <only spec> [ [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ] ]
>   | <left paren> <joined table> <right paren>

> <derived table> ::= <table subquery>

<correlation name> I believe is what we commonly refer to as an
alias. I think <table or query name> is a table or view name, and
<derived table> is a subquery. <correlation name> is optional for
<table or query name> but not for <derived table>. The fact that the
standard clearly makes it optional in some cases and required in
others is pretty clear that <derived table> without a <correlation
name> is to be rejected, don't you think?

> I have no clue what EnterpriseDB does.

In case it wasn't clear, the reason I bring it up is that
EnterpriseDB, while working from a PostgreSQL base, strives for
Oracle compatibility.

>> that case, couldn't you use whatever method works in PostgreSQL in
>> Oracle? I haven't checked the SQL standard, but it seems unlikely
>> it'd allow something like
>>
>> GROUP BY , , , ;
>
> That's not what Oracle accepts  that postgres does not.  What
> Oracle accepts is:
>
>   ...
>   GROUP BY 'foo';

Thanks for clarifying. If it wasn't obvious by now, you know I'm not
familiar with Oracle :)

>> AIUI, Integers are only allowed because the SQL standard explicitly
>> allows you to refer to columns by the order they appear in the SELECT
>> list. Otherwise the GROUP BY items need to be column names.
>
> Need to be?
>
> The SQL-92 standard is clear that you must accept a list of column
> names.  It is also clear that a column name must be be of the form
> field or table.field.

The 2003 draft (same as above) seems to agree with the SQL92 standard:

> 7.9 <group by clause>

> Format
> <group by clause> ::=
>     GROUPBY [ <set quantifier> ] <grouping element list>
> <grouping element list> ::=
>     <grouping element> [ { <comma> <grouping element> }... ]
> <grouping element> ::=
>     <ordinary grouping set>
>   | <rollup list>
>   | <cube list>
>   | <grouping sets specification>
>   | <empty grouping set>
> <ordinary grouping set> ::=
>     <grouping column reference>
>   | <left paren> <grouping column reference list><right paren>
> <grouping column reference> ::=
>     <column reference> [ <collate clause> ]
> <grouping column reference list> ::=
>     <grouping column reference> [ { <comma><grouping column
> reference> }... ]
> <rollup list> ::=
>     ROLLUP<left paren> <ordinary grouping set list><right paren>
> <ordinary grouping set list> ::=
>     <ordinary grouping set> [ { <comma> <ordinary grouping set> }... ]
> <cube list> ::=
>     CUBE<left paren> <ordinary grouping set list><right paren>
> <grouping sets specification> ::=
>     GROUPINGSETS <left paren> <grouping set list><right paren>
> <grouping set list> ::=
>     <grouping set> [ { <comma> <grouping set> }... ]
> <grouping set> ::=
>     <ordinary grouping set>
>   | <rollup list>
>   | <cube list>
>   | <grouping sets specification>
>   | <empty grouping set>


> <empty grouping set> ::= <left paren><right paren>

> 6.7 <column reference>

> Format
> <column reference> ::=
>     <basic identifier chain>
>   | MODULE<period> <qualified identifier><period> <column name>

There'd have to be a pretty strong reason to extend this, more than
just a convenience, I should think.

>   In no way, shape or form does that allow having
> terms like trim(foo.bar) in a group by.
>
> But every reasonable database that I know - including postgres -
> allows that.

Can you give an example of something like this working in PostgreSQL?
I get an error when I try to use a text value in a GROUP BY clause.
(Or are you referring specifically to the CASE expression corner case?)

test=# select record_id as foo, count(observation_id) as bar from
observation group by trim(observation.record_id::text);
ERROR:  column "observation.record_id" must appear in the GROUP BY
clause or be used in an aggregate function
test=# select record_id as foo, count(observation_id) as bar from
observation group by trim(observation.record_id);
ERROR:  column "observation.record_id" must appear in the GROUP BY
clause or be used in an aggregate function

> Postgres explicitly disallows a constant character expression.  But it
> allows the constant case expression that I gave.  It would be nice for
> me to not have to remember that very obscure and convoluted case.

I agree, and would move that it should be disallowed if there isn't a
reason for it to be maintained, for exactly the reason you give:
there shouldn't be such convoluted, and obscure corner case.

test=# select record_id as foo, count(observation_id) as bar from
observation group by record_id, true;
ERROR:  non-integer constant in GROUP BY
test=# select record_id as foo, count(observation_id) as bar from
observation group by record_id, case when true then true end;

> What I'm trying to say is that it would be convenient for me to be
> able to write:
>
>   select bar as "baz"
>     , count(*) as "some count"
>   from foo
>   group by bar as "baz"
>
> That's not allowed right now because as is not allowed in a group
> by statement.

Ah, I see. Thanks for clarifying.

>>> Which is to assume that
>>> a query without a group by clause, but with an aggregate function in
>>> the select, should have an implicit group by clause where you
>>> group by
>>> all non-aggregate functions in the select.
>>>
>>> For example
>>>
>>>   SELECT foo, count(*)
>>>   FROM bar
>>>
>>> would be processed as:
>>>
>>>   SELECT foo, count(*)
>>>   FROM bar
>>>   GROUP BY foo
>>
>> It's been discussed before. I don't believe it's been rejected out of
>> hand (though you can check the archives), just that no one's gotten
>> around to it. (Don't know what the SQL-spec says on this point.)
>
> I don't know what the SQL spec says, but I know (having talked to
> other developers) that many people would find it very nice.

Since I had the spec open, I tried to look at this as well, though I
must admit I found it very tough going.

I think this is the key section:

> 10.9 <aggregate function>
>
> ...
>
> Syntax Rules
>
> ...
>
> 4) The argument source of an <aggregate function> is
> Case:
> a) If AF is immediately contained in a <set function
> specification>, then a table or group of a grouped table as
> specified in Subclause 7.10, “<having clause>”, and Subclause 7.12,
> “<query specification>”.
>
> b) Otherwise, the collection of rows in the current row's window
> frame defined by the window structure  descriptor identified by the
> <window function> that simply contains AF, as defined in Subclause
> 7.11, “<window clause>”.

Now the <query specification> stuff is pretty dense, and the <window
clause> is currently impenetrable for me, so I just looked at the
<having clause> and <query specification> sections referenced above.

> 7.10 <having clause>
>
> ...
>
> Syntax Rules
>
> 1) Let HC be the <having clause>. Let TE be the <table expression>
> that immediately contains HC. If TE does not immediately contain a
> <group by clause>, then “GROUPBY ()” is implicit. Let T be the
> descriptor of the table defined by the <group by clause>GBC
> immediately contained in TE and let R be the result of GBC.

So it looks like if there's no explicit GROUP BY, we're to assume
GROUP BY (), at least in the case of HAVING.

> 7.12 <query specification>
>
> ...
>
> Syntax Rules
>
> ...
>
> 14) If <table expression> does not immediately contain a <group by
> clause> and <table expression> is simply contained in a <query
> expression> that is the aggregation query of some <set function
> specification>, then GROUP BY () is implicit.

Again, assume GROUP BY () if no explicit GROUP BY with a <set
function specification>, which I understand to be some form of
aggregation.

> 15) If T is a grouped table, then let G be the set of grouping
> columns of T. In each <value expression> contained in <select
> list> , each column reference that references a column of T shall
> reference some column C that is functionally dependent on G or
> shall be contained in an aggregated argument of a <set function
> specification> whose aggregation query is QS.

And a definition of grouped table is found:

> 4.14.2 Types of tables
>
> ...
>
> A grouped table is a set of groups derived during the evaluation of
> a <group by clause>. A group G is a collection of rows in which,
> for every grouping column GC, if the value of GC in some row is not
> distinct from GV, then the value of GC in every row is GV;
> moreover, if R1 is a row in group G1 of grouped table GT and R2 is
> a row in GT such that for every grouping column GC the value of GC
> in R1 is not distinct from the value of GC in R2, then R2 is in G1.
> Every row in GT is in exactly one group. A group may be considered
> as a table. Set functions operate on groups.

So we're already talking about an explicit GROUP BY clause, and it
doesn't seem to leave much wiggle room grouping columns that aren't
explicitly listed. I'll readily admit I very easily could be reading
this wrong or have missed a section of the spec that discusses this
in more detail. (And that goes for any of my interpretations of the
spec!)

One place you and I differ is that I think you'd be in favor of many
more extensions in cases where the SQL spec doesn't explicitly say
"Thou shalt not do X". I'm a bit more conservative here: in my
opinion, if the SQL speaks to a particular issue (for example, a
WHERE clause), then you should pretty much keep to the letter of what
it says. Extensions are more for areas where the standard is silent.
For example, full text search will be added to PostgreSQL in 8.3.
AFAIK, the SQL spec doesn't talk about full text search at all, so
this is an area where extensions are a natural fit. Others, such as
PostgreSQL's DISTINCT ON, is a much more difficult issue, and I'd
suspect if it were proposed as a new feature now it would be
rejected. However, it's been a part of PostgreSQL for a long time, so
for historical reasons it remains. Likewise, I doubt "text" as a
synonym for "varchar" would be accepted.

Anyway, enough excursions into SpecLand for me. I'd welcome anyone
else taking a crack at it.

Michael Glaesemann
grzm seespotcode net




pgsql-hackers by date:

Previous
From: "Florian G. Pflug"
Date:
Subject: Re: SQL feature requests
Next
From: Michael Glaesemann
Date:
Subject: Re: SQL feature requests