Home > mailing lists

Re: Wrong results with grouping sets - Mailing list pgsql-hackers

From	Ashutosh Bapat
Subject	Re: Wrong results with grouping sets
Date	July 16, 2024 09:40:57
Msg-id	CAExHW5uES7aeM=OOG9k9xHbVrdf1OQcTe96JOk3EX4zj1M0GPw@mail.gmail.com Whole thread Raw
In response to	Re: Wrong results with grouping sets (Richard Guo <guofenglinux@gmail.com>)
Responses	Re: Wrong results with grouping sets
List	pgsql-hackers

Tree view

On Mon, Jul 15, 2024 at 8:15 AM Richard Guo <guofenglinux@gmail.com> wrote:


>
> We can see that with the 0001 patch, this query runs ~3 times faster,
> which is no surprise because there are 3 instances of the same
> subquery in the targetlist.

I am not sure if that's the right thing to do.

I am using a slightly elaborate version of the tests in your patch
#select v, grouping(v) gv, grouping((select t1.v from gstest5 t2 where
id = t1.id)) gs,grouping((select t1.v from gstest5 t2 where id =
t1.id)) gs2,
       (select t1.v from gstest5 t2 where id = t1.id) as s,
case when grouping(v) = 0
              then v
              else null end as cv,
case when grouping((select t1.v from gstest5 t2 where id = t1.id)) = 0
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end as cs
from gstest5 t1
group by grouping sets(v, s)
;
 v | gv | gs | gs2 | s | cv | cs
---+----+----+-----+---+----+----
 3 |  0 |  1 |   1 |   |  3 |
 5 |  0 |  1 |   1 |   |  5 |
 4 |  0 |  1 |   1 |   |  4 |
 2 |  0 |  1 |   1 |   |  2 |
 1 |  0 |  1 |   1 |   |  1 |
   |  1 |  0 |   0 | 2 |    |  2
   |  1 |  0 |   0 | 5 |    |  5
   |  1 |  0 |   0 | 4 |    |  4
   |  1 |  0 |   0 | 3 |    |  3
   |  1 |  0 |   0 | 1 |    |  1
(10 rows)

#explain verbose select v, grouping(v) gv, grouping((select t1.v from
gstest5 t2 where id = t1.id)) gs,grouping((select t1.v from gstest5 t2
w
here id = t1.id)) gs2,
       (select t1.v from gstest5 t2 where id = t1.id) as s,
case when grouping(v) = 0
              then v
              else null end as cv,
case when grouping((select t1.v from gstest5 t2 where id = t1.id)) = 0
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end as cs
from gstest5 t1
group by grouping sets(v, s)
;

                                           QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------
 HashAggregate  (cost=18508.10..58790.10 rows=2460 width=28)
   Output: t1.v, GROUPING(t1.v), GROUPING((SubPlan 2)),
GROUPING((SubPlan 3)), ((SubPlan 1)), CASE WHEN (GROUPING(t1.v) = 0)
THEN t1.v ELSE NULL::integer END
, CASE WHEN (GROUPING((SubPlan 4)) = 0) THEN ((SubPlan 1)) ELSE
NULL::integer END
   Hash Key: t1.v
   Hash Key: (SubPlan 1)
   ->  Seq Scan on pg_temp.gstest5 t1  (cost=0.00..18502.45 rows=2260 width=12)
         Output: t1.v, (SubPlan 1), t1.id
         SubPlan 1
           ->  Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2  (cost=0.15..8.17 rows=1 width=4)
                 Output: t1.v
                 Index Cond: (t2.id = t1.id)

The result looks as expected but the plan isn't consistent with what
happens without grouping set
#select v,
                (select t1.v from gstest5 t2 where id = t1.id) as s,
                (select t1.v from gstest5 t2 where id = t1.id) as s2,
                case when t1.v < 3
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end as cs
        from gstest5 t1
        order by case when t1.v < 3
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end
;
 v | s | s2 | cs
---+---+----+----
 1 | 1 |  1 |  1
 2 | 2 |  2 |  2
 3 | 3 |  3 |
 4 | 4 |  4 |
 5 | 5 |  5 |
(5 rows)

postgres@92841=#explain verbose select v,
                (select t1.v from gstest5 t2 where id = t1.id) as s,
                (select t1.v from gstest5 t2 where id = t1.id) as s2,
                case when t1.v < 3
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end as cs
        from gstest5 t1
        order by case when t1.v < 3
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end
;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Sort  (cost=55573.71..55579.36 rows=2260 width=16)
   Output: t1.v, ((SubPlan 1)), ((SubPlan 2)), (CASE WHEN (t1.v < 3)
THEN (SubPlan 3) ELSE NULL::integer END)
   Sort Key: (CASE WHEN (t1.v < 3) THEN (SubPlan 3) ELSE NULL::integer END)
   ->  Seq Scan on pg_temp.gstest5 t1  (cost=0.00..55447.80 rows=2260 width=16)
         Output: t1.v, (SubPlan 1), (SubPlan 2), CASE WHEN (t1.v < 3)
THEN (SubPlan 3) ELSE NULL::integer END
         SubPlan 1
           ->  Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2  (cost=0.15..8.17 rows=1 width=4)
                 Output: t1.v
                 Index Cond: (t2.id = t1.id)
         SubPlan 2
           ->  Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2_1  (cost=0.15..8.17 rows=1 width=4)
                 Output: t1.v
                 Index Cond: (t2_1.id = t1.id)
         SubPlan 3
           ->  Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2_2  (cost=0.15..8.17 rows=1 width=4)
                 Output: t1.v
                 Index Cond: (t2_2.id = t1.id)
(17 rows)

Notice that every instance of that subquery has its own subplan in
this case. Why should the grouping set be different and have the same
subplan for two instances of the subquery? And if so, why not all of
the instances have the same subplan?

Since a subquery is a volatile expression, each of its instances
should be evaluated separately. If the expressions in ORDER BY,
GROUPING and GROUP BY are the same as an expression in the targetlist,
subqueries in those expressions won't need a subplan of their own. If
they are not part of targetlist, they will be added to the targetlist
as resjunk columns and thus form separate instances of subquery thus
adding more subplans.

--
Best Wishes,
Ashutosh Bapat

pgsql-hackers by date:

From: Shlok Kyal
Date: 16 July 2024, 09:04:12
Subject: Re: Pgoutput not capturing the generated columns

From: Peter Smith
Date: 16 July 2024, 10:02:21
Subject: Re: Slow catchup of 2PC (twophase) transactions on replica in LR

Re: Wrong results with grouping sets - Mailing list pgsql-hackers

Previous

Next