Thread: Gsets: ROW expression semantic broken between 9.4 and 9.5

Gsets: ROW expression semantic broken between 9.4 and 9.5

From
Jeevan Chalke
Date:
Hi

It looks like we have broken the ROW expression without explicit
ROW keyword in GROUP BY.
I mean, after Grouping sets merge, if we have (c1, c2) in group by,
we are treating it as ROW expression for grouping, but at the same
time we are allowing individual column in the target list.
However this was not true with PG9.4 where we error out saying
"column "c1" must appear in the GROUP BY clause..".

But if I use explicit ROW keyword, like ROW(c1, c2), then on PG95
it error outs for individual column reference in select list.

Example may clear more:

ON PG 9.5 (after grouping sets implementation)

postgres=# create view gstest1(a,b,v)
postgres-#   as values (1,1,10),(1,1,11),(1,2,12),(1,2,13),(1,3,14),
postgres-#             (2,3,15),
postgres-#             (3,3,16),(3,4,17),
postgres-#             (4,1,18),(4,1,19);
CREATE VIEW

postgres=#
postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY (a, b)
ORDER BY 1, 2, 3 DESC;
 a | b | max
---+---+-----
 1 | 1 |  11
 1 | 2 |  13
 1 | 3 |  14
 2 | 3 |  15
 3 | 3 |  16
 3 | 4 |  17
 4 | 1 |  19
(7 rows)

postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b)
ORDER BY 1, 2, 3 DESC;
ERROR:  column "gstest1.a" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b) ORDER BY...
               ^

In above example, you see that when we have only (a, b), it is working fine.
But when we have ROW(a, b), it is throwing an error.
On PG 9.4 both cases are failing. Here it is:

postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY (a, b)
ORDER BY 1, 2, 3 DESC;
ERROR:  column "gstest1.a" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT a, b, max(v) FROM gstest1 GROUP BY (a, b) ORDER BY 1,...
               ^
postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b)
ORDER BY 1, 2, 3 DESC;
ERROR:  column "gstest1.a" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b) ORDER BY...
               ^

Do we broke ROW expression semantics in grouping sets implementation?

Any idea why is this happening?

Thanks
--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: Gsets: ROW expression semantic broken between 9.4 and 9.5

From
Andres Freund
Date:
Hi,

On 2015-07-22 23:53:26 +0530, Jeevan Chalke wrote:
> It looks like we have broken the ROW expression without explicit
> ROW keyword in GROUP BY.

That was "intentional", and is actually standards required
behaviour. GROUP BY (a, b) is the same as GROUP BY a,b. It'd otherwise
be pretty confusing because parens in GROUPING SETS would mean something
different than in GROUP BY.

We should probably add a release note entry about it...

Thanks for testing gsets!

- Andres



Re: Gsets: ROW expression semantic broken between 9.4 and 9.5

From
Andrew Gierth
Date:
>>>>> "Jeevan" == Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes:
Jeevan> HiJeevan> It looks like we have broken the ROW expression withoutJeevan> explicit ROW keyword in GROUP BY.

Andres has given the short version, but here's the long version:

In the spec, GROUP BY ROW(a,b) is an error, while GROUP BY (a,b) is
exactly equivalent to GROUP BY a,b.

Previously, pg treated GROUP BY (a,b) as if it were GROUP BY ROW(a,b)
since it was parsing it as an expression, and (a,b) in an expression is
shorthand for ROW(a,b).  However, the parens are significant in many
contexts in the grouping set syntax, e.g. ROLLUP(a,(b,c)) is equivalent
to GROUPING SETS ((a,b,c), (a), ()), and we have to be able to parse
both GROUPING SETS (a,b) (which is two grouping sets) and GROUPING SETS
((a,b),(c,d)), which means that we can't actually use the grammar to
distinguish expressions from parenthesized sublists.  What the code
therefore does is to explicitly distinguish (a,b) and ROW(a,b), and
treat the first as a list and the second as a single expression.

This is documented in the following NOTE in queries.sgml:
 <note>  <para>   The construct <literal>(a,b)</> is normally recognized in expressions as   a <link
linkend="sql-syntax-row-constructors">rowconstructor</link>.   Within the <literal>GROUP BY</> clause, this does not
applyat the top   levels of expressions, and <literal>(a,b)</> is parsed as a list of   expressions as described above.
If for some reason you <emphasis>need</>   a row constructor in a grouping expression, use <literal>ROW(a,b)</>.
</para></note>
 

http://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-GROUPING-SETS

Andres has suggested that this should be mentioned in an incompatibility
note in the release notes. I'm not sure that's needed, since I don't
believe there are any cases where previously valid queries change in
behavior; a query such as

select (a,b) from (values (1,2)) v(a,b) group by (a,b);

previously evaluated the row constructor before grouping, while now it
groups by a and b separately and evaluates the row constructor
afterwards.  If there's a way to make this change affect the result,
I've not found it yet, even when using volatile functions.

-- 
Andrew (irc:RhodiumToad)