Thread: How can the Aggregation move to the outer query

How can the Aggregation move to the outer query

From
Andy Fan
Date:
My question can be demonstrated with the below example:

create table m1(a int, b int);
explain (costs off) select  (select count(*) filter (where true) from m1 t1)
from m1 t2 where t2.b % 2 = 1;

           QUERY PLAN
---------------------------------
 Seq Scan on m1 t2
   Filter: ((b % 2) = 1)
   InitPlan 1 (returns $0)
     ->  Aggregate
           ->  Seq Scan on m1 t1
(5 rows)

The above is good to me. The aggregate is run in the subPlan/InitPlan.

explain (costs off) select  (select count(*) filter (where t2.b = 1) from m1 t1)
from m1 t2 where t2.b % 2 = 1;

          QUERY PLAN
-------------------------------
 Aggregate
   ->  Seq Scan on m1 t2
         Filter: ((b % 2) = 1)
   SubPlan 1
     ->  Seq Scan on m1 t1
(5 rows)

This one is too confusing to me since the Aggregate happens
on t2 rather than t1.  What happens here? Would this query
generate 1 row all the time like SELECT aggfunc(a) FROM t? 

--
Best Regards

Re: How can the Aggregation move to the outer query

From
David Rowley
Date:
On Tue, 25 May 2021 at 22:28, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> explain (costs off) select  (select count(*) filter (where t2.b = 1) from m1 t1)
> from m1 t2 where t2.b % 2 = 1;
>
>           QUERY PLAN
> -------------------------------
>  Aggregate
>    ->  Seq Scan on m1 t2
>          Filter: ((b % 2) = 1)
>    SubPlan 1
>      ->  Seq Scan on m1 t1
> (5 rows)
>
> This one is too confusing to me since the Aggregate happens
> on t2 rather than t1.  What happens here? Would this query
> generate 1 row all the time like SELECT aggfunc(a) FROM t?

I think you're misreading the plan. There's a scan on t2 with a
subplan then an aggregate on top of that. Because you made the
subquery correlated by adding t2.b, it cannot be executed as an
initplan.

You might see what's going on better if you add VERBOSE to the EXPLAIN options.

David



Re: How can the Aggregation move to the outer query

From
Andy Fan
Date:


On Tue, May 25, 2021 at 7:42 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 25 May 2021 at 22:28, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> explain (costs off) select  (select count(*) filter (where t2.b = 1) from m1 t1)
> from m1 t2 where t2.b % 2 = 1;
>
>           QUERY PLAN
> -------------------------------
>  Aggregate
>    ->  Seq Scan on m1 t2
>          Filter: ((b % 2) = 1)
>    SubPlan 1
>      ->  Seq Scan on m1 t1
> (5 rows)
>
> This one is too confusing to me since the Aggregate happens
> on t2 rather than t1.  What happens here? Would this query
> generate 1 row all the time like SELECT aggfunc(a) FROM t?

I think you're misreading the plan. There's a scan on t2 with a
subplan then an aggregate on top of that. Because you made the
subquery correlated by adding t2.b, it cannot be executed as an
initplan.

You might see what's going on better if you add VERBOSE to the EXPLAIN options.


Thanks,  VERBOSE does provide more information.  

 Aggregate
   Output: (SubPlan 1)
   ->  Seq Scan on public.m1 t2
         Output: t2.a, t2.b
         Filter: ((t2.b % 2) = 1)
   SubPlan 1
     ->  Seq Scan on public.m1 t1
           Output: count(*) FILTER (WHERE (t2.b = 1))
(8 rows)

I am still confused about the SubPlan1,  how can it output a
count(*) without an Aggregate under it (If this is not easy to
explain, I can try more by myself later). 

But after all, I find this case when working on the UniqueKey stuff,
I have rule that if (query->hasAgg && !query->groupClause),  then
there are only 1 row for this query.   In the above case, the outer query
(t2) hasAgg=true and subplan's hasAgg=false, which looks not right 
to me.  I think the hasAgg=true should be in the subquery and outer
query should have hasAgg=false.  anything I missed?

--
Best Regards

Re: How can the Aggregation move to the outer query

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, 25 May 2021 at 22:28, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> explain (costs off) select  (select count(*) filter (where t2.b = 1) from m1 t1)
>> from m1 t2 where t2.b % 2 = 1;
>>
>> This one is too confusing to me since the Aggregate happens
>> on t2 rather than t1.  What happens here? Would this query
>> generate 1 row all the time like SELECT aggfunc(a) FROM t?

> I think you're misreading the plan. There's a scan on t2 with a
> subplan then an aggregate on top of that. Because you made the
> subquery correlated by adding t2.b, it cannot be executed as an
> initplan.

Also keep in mind that adding that filter clause completely changed
the meaning of the aggregate.  Aggregates belong to the lowest
query level containing any Var used in their arguments, so that
where in your original query the count(*) was an aggregate of the
subquery, now it's an aggregate of the outer query (and the subquery
now perceives it as a constant outer reference).  AFAIR this is per
SQL spec.

            regards, tom lane



Re: How can the Aggregation move to the outer query

From
Andy Fan
Date:


On Tue, May 25, 2021 at 10:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, 25 May 2021 at 22:28, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>> explain (costs off) select  (select count(*) filter (where t2.b = 1) from m1 t1)
>> from m1 t2 where t2.b % 2 = 1;
>>
>> This one is too confusing to me since the Aggregate happens
>> on t2 rather than t1.  What happens here? Would this query
>> generate 1 row all the time like SELECT aggfunc(a) FROM t?

> I think you're misreading the plan. There's a scan on t2 with a
> subplan then an aggregate on top of that. Because you made the
> subquery correlated by adding t2.b, it cannot be executed as an
> initplan.

Also keep in mind that adding that filter clause completely changed
the meaning of the aggregate.  Aggregates belong to the lowest
query level containing any Var used in their arguments, so that
where in your original query the count(*) was an aggregate of the
subquery, now it's an aggregate of the outer query (and the subquery
now perceives it as a constant outer reference).  AFAIR this is per
SQL spec.
 
Well, finally I know it's an aggregate of the outer query..  Thank you for
the explanation!   so I would say the result set has 1 row for that query
all the time.

--
Best Regards