Re: Wrong results with grouping sets - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Wrong results with grouping sets
Date
Msg-id CAExHW5uES7aeM=OOG9k9xHbVrdf1OQcTe96JOk3EX4zj1M0GPw@mail.gmail.com
Whole thread Raw
In response to Re: Wrong results with grouping sets  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Wrong results with grouping sets
List pgsql-hackers
On Mon, Jul 15, 2024 at 8:15 AM Richard Guo <guofenglinux@gmail.com> wrote:


>
> We can see that with the 0001 patch, this query runs ~3 times faster,
> which is no surprise because there are 3 instances of the same
> subquery in the targetlist.

I am not sure if that's the right thing to do.

I am using a slightly elaborate version of the tests in your patch
#select v, grouping(v) gv, grouping((select t1.v from gstest5 t2 where
id = t1.id)) gs,grouping((select t1.v from gstest5 t2 where id =
t1.id)) gs2,
       (select t1.v from gstest5 t2 where id = t1.id) as s,
case when grouping(v) = 0
              then v
              else null end as cv,
case when grouping((select t1.v from gstest5 t2 where id = t1.id)) = 0
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end as cs
from gstest5 t1
group by grouping sets(v, s)
;
 v | gv | gs | gs2 | s | cv | cs
---+----+----+-----+---+----+----
 3 |  0 |  1 |   1 |   |  3 |
 5 |  0 |  1 |   1 |   |  5 |
 4 |  0 |  1 |   1 |   |  4 |
 2 |  0 |  1 |   1 |   |  2 |
 1 |  0 |  1 |   1 |   |  1 |
   |  1 |  0 |   0 | 2 |    |  2
   |  1 |  0 |   0 | 5 |    |  5
   |  1 |  0 |   0 | 4 |    |  4
   |  1 |  0 |   0 | 3 |    |  3
   |  1 |  0 |   0 | 1 |    |  1
(10 rows)

#explain verbose select v, grouping(v) gv, grouping((select t1.v from
gstest5 t2 where id = t1.id)) gs,grouping((select t1.v from gstest5 t2
w
here id = t1.id)) gs2,
       (select t1.v from gstest5 t2 where id = t1.id) as s,
case when grouping(v) = 0
              then v
              else null end as cv,
case when grouping((select t1.v from gstest5 t2 where id = t1.id)) = 0
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end as cs
from gstest5 t1
group by grouping sets(v, s)
;

                                           QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------
 HashAggregate  (cost=18508.10..58790.10 rows=2460 width=28)
   Output: t1.v, GROUPING(t1.v), GROUPING((SubPlan 2)),
GROUPING((SubPlan 3)), ((SubPlan 1)), CASE WHEN (GROUPING(t1.v) = 0)
THEN t1.v ELSE NULL::integer END
, CASE WHEN (GROUPING((SubPlan 4)) = 0) THEN ((SubPlan 1)) ELSE
NULL::integer END
   Hash Key: t1.v
   Hash Key: (SubPlan 1)
   ->  Seq Scan on pg_temp.gstest5 t1  (cost=0.00..18502.45 rows=2260 width=12)
         Output: t1.v, (SubPlan 1), t1.id
         SubPlan 1
           ->  Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2  (cost=0.15..8.17 rows=1 width=4)
                 Output: t1.v
                 Index Cond: (t2.id = t1.id)

The result looks as expected but the plan isn't consistent with what
happens without grouping set
#select v,
                (select t1.v from gstest5 t2 where id = t1.id) as s,
                (select t1.v from gstest5 t2 where id = t1.id) as s2,
                case when t1.v < 3
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end as cs
        from gstest5 t1
        order by case when t1.v < 3
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end
;
 v | s | s2 | cs
---+---+----+----
 1 | 1 |  1 |  1
 2 | 2 |  2 |  2
 3 | 3 |  3 |
 4 | 4 |  4 |
 5 | 5 |  5 |
(5 rows)

postgres@92841=#explain verbose select v,
                (select t1.v from gstest5 t2 where id = t1.id) as s,
                (select t1.v from gstest5 t2 where id = t1.id) as s2,
                case when t1.v < 3
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end as cs
        from gstest5 t1
        order by case when t1.v < 3
              then (select t1.v from gstest5 t2 where id = t1.id)
              else null end
;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Sort  (cost=55573.71..55579.36 rows=2260 width=16)
   Output: t1.v, ((SubPlan 1)), ((SubPlan 2)), (CASE WHEN (t1.v < 3)
THEN (SubPlan 3) ELSE NULL::integer END)
   Sort Key: (CASE WHEN (t1.v < 3) THEN (SubPlan 3) ELSE NULL::integer END)
   ->  Seq Scan on pg_temp.gstest5 t1  (cost=0.00..55447.80 rows=2260 width=16)
         Output: t1.v, (SubPlan 1), (SubPlan 2), CASE WHEN (t1.v < 3)
THEN (SubPlan 3) ELSE NULL::integer END
         SubPlan 1
           ->  Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2  (cost=0.15..8.17 rows=1 width=4)
                 Output: t1.v
                 Index Cond: (t2.id = t1.id)
         SubPlan 2
           ->  Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2_1  (cost=0.15..8.17 rows=1 width=4)
                 Output: t1.v
                 Index Cond: (t2_1.id = t1.id)
         SubPlan 3
           ->  Index Only Scan using gstest5_pkey on pg_temp.gstest5
t2_2  (cost=0.15..8.17 rows=1 width=4)
                 Output: t1.v
                 Index Cond: (t2_2.id = t1.id)
(17 rows)

Notice that every instance of that subquery has its own subplan in
this case. Why should the grouping set be different and have the same
subplan for two instances of the subquery? And if so, why not all of
the instances have the same subplan?

Since a subquery is a volatile expression, each of its instances
should be evaluated separately. If the expressions in ORDER BY,
GROUPING and GROUP BY are the same as an expression in the targetlist,
subqueries in those expressions won't need a subplan of their own. If
they are not part of targetlist, they will be added to the targetlist
as resjunk columns and thus form separate instances of subquery thus
adding more subplans.

--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Shlok Kyal
Date:
Subject: Re: Pgoutput not capturing the generated columns
Next
From: Peter Smith
Date:
Subject: Re: Slow catchup of 2PC (twophase) transactions on replica in LR