Multiple grouping set specs referencing duplicate alias - Mailing list pgsql-hackers

From David Kimura
Subject Multiple grouping set specs referencing duplicate alias
Date
Msg-id CAHnPFjSdFx_TtNpQturPMkRSJMYaD5rGP2=8iFH9V24-OjHGiQ@mail.gmail.com
Whole thread Raw
Responses Re: Multiple grouping set specs referencing duplicate alias
List pgsql-hackers
Hi all!

I think I may have stumbled across a case of wrong results on HEAD (same
through version 9.6, though interestingly 9.5 produces different results
altogether).

test=# SELECT i AS ai1, i AS ai2 FROM generate_series(1,3)i GROUP BY
ai2, ROLLUP(ai1) ORDER BY ai1, ai2;

 ai1 | ai2
-----+-----
   1 |   1
     |   1
   2 |   2
     |   2
   3 |   3
     |   3
(6 rows)

I had expected:

 ai1 | ai2
-----+-----
   1 |   1
   2 |   2
   3 |   3
     |   1
     |   2
     |   3
(6 rows)

It seems to me that the plan is missing a Sort node (on ai1 and ai2) above the
Aggregate node.

                   QUERY PLAN
------------------------------------------------
 GroupAggregate
   Group Key: i, i
   Group Key: i
   ->  Sort
         Sort Key: i
         ->  Function Scan on generate_series i

I have a hunch part of the issue may be an assumption that a duplicate aliased
column will produce the same column values and hence isn't included in the
range table, nor subsequently the pathkeys. However, that assumption does not
seem to be true for queries with multiple grouping set specifications:

test=#  SELECT i as ai1, i as ai2 FROM (values (1),(2),(3)) v(i) GROUP
BY ai1, ROLLUP(ai2);
 ai1 | ai2
-----+-----
   1 |   1
   2 |   2
   3 |   3
   1 |
   2 |
   3 |
(6 rows)

It seems to me that excluding the duplicate alias from the pathkeys is leading
to a case where the group order is incorrectly determined to satisfy the sort
order. Thus create_ordered_paths() decides against applying an explicit sort
node. But simply forcing an explicit sort still seems wrong since we've
effectively lost a relevant column for the sort.

I tinkered a bit and hacked together an admittedly ugly patch that triggers an
explicit sort constructed from the parse tree. An alternative approach I had
considered was to update the rewriteHandler to explicitly force the existence of
the duplicate alias column in the range tables. But that also felt meh.

Does this seem like a legit issue? And if so, any thoughts on alternative
approaches?

Thanks,
David Kimura

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Missing update of all_hasnulls in BRIN opclasses
Next
From: Anton Voloshin
Date:
Subject: patch suggestion: Fix citext_utf8 test's "Turkish I" with ICU collation provider