Thread: Multiple grouping set specs referencing duplicate alias

Multiple grouping set specs referencing duplicate alias

From
David Kimura
Date:
Hi all!

I think I may have stumbled across a case of wrong results on HEAD (same
through version 9.6, though interestingly 9.5 produces different results
altogether).

test=# SELECT i AS ai1, i AS ai2 FROM generate_series(1,3)i GROUP BY
ai2, ROLLUP(ai1) ORDER BY ai1, ai2;

 ai1 | ai2
-----+-----
   1 |   1
     |   1
   2 |   2
     |   2
   3 |   3
     |   3
(6 rows)

I had expected:

 ai1 | ai2
-----+-----
   1 |   1
   2 |   2
   3 |   3
     |   1
     |   2
     |   3
(6 rows)

It seems to me that the plan is missing a Sort node (on ai1 and ai2) above the
Aggregate node.

                   QUERY PLAN
------------------------------------------------
 GroupAggregate
   Group Key: i, i
   Group Key: i
   ->  Sort
         Sort Key: i
         ->  Function Scan on generate_series i

I have a hunch part of the issue may be an assumption that a duplicate aliased
column will produce the same column values and hence isn't included in the
range table, nor subsequently the pathkeys. However, that assumption does not
seem to be true for queries with multiple grouping set specifications:

test=#  SELECT i as ai1, i as ai2 FROM (values (1),(2),(3)) v(i) GROUP
BY ai1, ROLLUP(ai2);
 ai1 | ai2
-----+-----
   1 |   1
   2 |   2
   3 |   3
   1 |
   2 |
   3 |
(6 rows)

It seems to me that excluding the duplicate alias from the pathkeys is leading
to a case where the group order is incorrectly determined to satisfy the sort
order. Thus create_ordered_paths() decides against applying an explicit sort
node. But simply forcing an explicit sort still seems wrong since we've
effectively lost a relevant column for the sort.

I tinkered a bit and hacked together an admittedly ugly patch that triggers an
explicit sort constructed from the parse tree. An alternative approach I had
considered was to update the rewriteHandler to explicitly force the existence of
the duplicate alias column in the range tables. But that also felt meh.

Does this seem like a legit issue? And if so, any thoughts on alternative
approaches?

Thanks,
David Kimura

Attachment

Re: Multiple grouping set specs referencing duplicate alias

From
Tom Lane
Date:
David Kimura <david.g.kimura@gmail.com> writes:
> I think I may have stumbled across a case of wrong results on HEAD (same
> through version 9.6, though interestingly 9.5 produces different results
> altogether).

> test=# SELECT i AS ai1, i AS ai2 FROM generate_series(1,3)i GROUP BY
> ai2, ROLLUP(ai1) ORDER BY ai1, ai2;

Yeah, this is an instance of an issue we've known about for awhile:
when using grouping sets (ROLLUP), the planner fails to distinguish
between "ai1" and "ai1 as possibly nulled by the action of the
grouping node".  This has been discussed at, eg, [1] and [2].
The direction I'd like to take to fix it is to invent explicit
labeling of Vars that have been nulled by some operation such as
outer joins or grouping, and then represent grouping set outputs
as either PlaceHolderVars or Vars tied to a new RTE that represents
the grouping step.  I have been working on a patch that'd do the
first half of that [3], but it's been slow going, because we've
indulged in a lot of semantic squishiness in this area and cleaning
it all up is a large undertaking.

> I tinkered a bit and hacked together an admittedly ugly patch that triggers an
> explicit sort constructed from the parse tree.

I seriously doubt that that'll fix all the issues in this area.
We really really need to understand that a PathKey based on
the scan-level value of a Var is different from a PathKey based
on a post-nulling-step value.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAMbWs48AtQTQGk37MSyDk_EAgDO3Y0iA_LzvuvGQ2uO_Wh2muw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/7dbdcf5c-b5a6-ef89-4958-da212fe10176%40iki.fi
[3] https://www.postgresql.org/message-id/flat/830269.1656693747@sss.pgh.pa.us