Re: Wrong results with grouping sets - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Wrong results with grouping sets
Date
Msg-id CAMbWs4_2t2pqqCFdS3NYJLwMMkAzYQKBOhKweFt-wE3YOi7rGg@mail.gmail.com
Whole thread Raw
In response to Re: Wrong results with grouping sets  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I've been looking at cases where there are grouping-set keys that
reduce to Consts, and I noticed a plan with v11 patch that is not very
great.

explain (verbose, costs off)
select 1 as one group by rollup(one) order by one nulls first;
          QUERY PLAN
-------------------------------
 Sort
   Output: (1)
   Sort Key: (1) NULLS FIRST
   ->  GroupAggregate
         Output: (1)
         Group Key: (1)
         Group Key: ()
         ->  Sort
               Output: (1)
               Sort Key: (1)
               ->  Result
                     Output: 1
(12 rows)

The Sort operation below the Agg node is unnecessary because the
grouping key is actually a Const.  This plan results from wrapping the
Const in a PlaceHolderVar to carry the nullingrel bit of the RTE_GROUP
RT index, as it can be nulled by the grouping step.  Although we
remove this nullingrel bit when generating the groupClause pathkeys
since we know the groupClause is logically below the grouping step, we
do not unwrap the PlaceHolderVar.

This suggests that we might need a mechanism to unwrap PHVs when safe.
0003 includes a flag in PlaceHolderVar to indicate whether it is safe
to remove the PHV and use its contained expression instead when its
phnullingrels becomes empty.  Currently it is set true only in cases
where the PHV is used to carry the nullingrel bit of the RTE_GROUP RT
index.  With 0003 the plan above becomes more reasonable:

explain (verbose, costs off)
select 1 as one group by rollup(one) order by one nulls first;
         QUERY PLAN
-----------------------------
 Sort
   Output: (1)
   Sort Key: (1) NULLS FIRST
   ->  GroupAggregate
         Output: (1)
         Group Key: 1
         Group Key: ()
         ->  Result
               Output: 1
(9 rows)

This could potentially open up opportunities for optimization by
unwrapping PHVs in other cases.  As an example, consider

explain (costs off)
select * from t t1 left join
    lateral (select t1.a as x, * from t t2) s on true
where t1.a = s.a;
         QUERY PLAN
----------------------------
 Nested Loop
   ->  Seq Scan on t t1
   ->  Seq Scan on t t2
         Filter: (t1.a = a)
(4 rows)

The target entry s.x is wrapped in a PHV that contains lateral
reference to t1, which forces us to resort to nestloop join.  However,
since the left join has been reduced to an inner join, we should be
able to remove this PHV and use merge or hash joins instead.  I did
not implement this optimization in 0003.  It seems that it should be
addressed in a separate patch.

Thanks
Richard

Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Small refactoring around vacuum_open_relation
Next
From: Dean Rasheed
Date:
Subject: Re: Adding OLD/NEW support to RETURNING