Thread: Sort keys are omitted incorrectly after grouping sets

Sort keys are omitted incorrectly after grouping sets

From
Richard Guo
Date:
Hi hackers,

In each PathKey we have an EC representing the value being sorted on.
This works correctly and efficiently in scan/join planning since the
members in an EC are always equal to each other.

However, after applying GROUP BY grouping sets, the members in an EC may
not be that equivalent any more, because NULLS may be emitted by
grouping sets. As a result, we may lose some sort keys incorrectly.

# explain (costs off) select a, b from foo where a = b group by cube(a, b) order by a, b;
          QUERY PLAN
-------------------------------
 Sort
   Sort Key: a
   ->  MixedAggregate
         Hash Key: a, b
         Hash Key: a
         Hash Key: b
         Group Key: ()
         ->  Seq Scan on foo
               Filter: (a = b)
(9 rows)

I believe we should not ignore sort key 'b' in the query above.

Is this a problem we should be worried about?

Thanks
Richard

Re: Sort keys are omitted incorrectly after grouping sets

From
Greg Stark
Date:
On Tue, 3 Aug 2021 at 00:04, Richard Guo <guofenglinux@gmail.com> wrote:
>
> Is this a problem we should be worried about?

It's easy to see this produce output in the wrong order:

postgres=# select a, b from (values (1,1),(2,2)) as foo(a,b) where a =
b group by cube(a, b) order by a, b nulls first;
 a | b
---+---
 1 |
 1 | 1
 2 | 2
 2 |
   |
   | 1
   | 2
(7 rows)

postgres=# select a, b from (values (1,1),(2,2)) as foo(a,b) where a =
b group by cube(a, b) order by a, b nulls last;
 a | b
---+---
 1 |
 1 | 1
 2 | 2
 2 |
   |
   | 1
   | 2
(7 rows)

I know we had a hack to deal with outer joins "placeholder vars" or
something like that. I imagine the same thing needs to happen here.

Incidentally, the same thing doesn't happen for a VALUES clause with a
single row value. There it seems we inline the row value and the plan
ends up ordering on both -- though it's hard to tell because the way
the explain plan is formatted makes it hard to see what's going on:

postgres=# select a, b from (values (1,1)) as foo(a,b) where a = b
group by cube(a, b) order by a, b nulls first;
 a | b
---+---
 1 |
 1 | 1
   |
   | 1
(4 rows)


postgres=# explain select a, b from (values (1,1)) as foo(a,b) where a
= b group by cube(a, b) order by a, b nulls first;
                       QUERY PLAN
--------------------------------------------------------
 Sort  (cost=0.10..0.11 rows=4 width=8)
   Sort Key: (1), (1) NULLS FIRST
   ->  MixedAggregate  (cost=0.00..0.06 rows=4 width=8)
         Hash Key: 1, 1
         Hash Key: 1
         Hash Key: 1
         Group Key: ()
         ->  Result  (cost=0.00..0.01 rows=1 width=8)
               One-Time Filter: (1 = 1)
(9 rows)

With two rows we're clearly not inlining it and clearly ordering on
only the first column:

postgres=# explain select a, b from (values (1,1),(2,2)) as foo(a,b)
where a = b group by cube(a, b) order by a, b nulls first;
                               QUERY PLAN
-------------------------------------------------------------------------
 Sort  (cost=0.12..0.13 rows=4 width=8)
   Sort Key: "*VALUES*".column1
   ->  MixedAggregate  (cost=0.00..0.08 rows=4 width=8)
         Hash Key: "*VALUES*".column1, "*VALUES*".column2
         Hash Key: "*VALUES*".column1
         Hash Key: "*VALUES*".column2
         Group Key: ()
         ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=1 width=8)
               Filter: (column1 = column2)
(9 rows)


-- 
greg