Re: Partial aggregates pushdown - Mailing list pgsql-hackers

From Alexander Pyhalov
Subject Re: Partial aggregates pushdown
Date
Msg-id e2e23366a3897f2fd89ac4a43dd3b82c@postgrespro.ru
Whole thread Raw
In response to RE: Partial aggregates pushdown  ("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>)
Responses Re: Partial aggregates pushdown
List pgsql-hackers
Fujii.Yuki@df.MitsubishiElectric.co.jp писал 2023-07-19 03:43:
> Hi Mr.Pyhalov, hackers.

> 3)
> I modified the patch to safely do a partial aggregate pushdown for
> queries which contain having clauses.
> 

Hi.
Sorry, but I don't see how it could work.
For example, the attached test returns wrong result:

CREATE FUNCTION f() RETURNS INT AS $$
begin
   return 10;
end
$$ LANGUAGE PLPGSQL;

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) < f() ORDER BY 
1;
  b  | sum
----+-----
   0 |   0
  10 |   0
  20 |   0
  30 |   0
  40 |   0
+(5 rows)

In fact the above query should have returned 0 rows, as

SELECT b, sum(a) FROM pagg_tab GROUP BY b ORDER BY 1;
  b  | sum
----+------
   0 |  600
   1 |  660
   2 |  720
   3 |  780
   4 |  840
   5 |  900
   6 |  960
   7 | 1020
   8 | 1080
   9 | 1140
  10 |  600
  11 |  660
  12 |  720
....
shows no such rows.

Or, on the same data

SELECT b, sum(a) FROM pagg_tab GROUP BY b HAVING sum(a) > 660 ORDER BY 
1;

You'll get 0 rows.

But
SELECT b, sum(a) FROM pagg_tab GROUP BY b;
  b  | sum
----+------
  42 |  720
  29 | 1140
   4 |  840
  34 |  840
  41 |  660
   0 |  600
  40 |  600
gives.

The issue is that you can't calculate "partial" having. You should 
compare full aggregate in filter, but it's not possible on the level of 
one partition.
And you have this in plans

  Finalize GroupAggregate
    Output: pagg_tab.b, avg(pagg_tab.a), max(pagg_tab.a), count(*)
    Group Key: pagg_tab.b
    Filter: (sum(pagg_tab.a) < 700)
    ->  Sort
          Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), (PARTIAL 
max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
          Sort Key: pagg_tab.b
          ->  Append
                ->  Foreign Scan
                      Output: pagg_tab.b, (PARTIAL avg(pagg_tab.a)), 
(PARTIAL max(pagg_tab.a)), (PARTIAL count(*)), (PARTIAL sum(pagg_tab.a))
                      Filter: ((PARTIAL sum(pagg_tab.a)) < 700)   !!!! 
<--- here we can't compare anything yet, sum is incomplete.
                      Relations: Aggregate on (public.fpagg_tab_p1 
pagg_tab)
                      Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p1 GROUP BY 1
                ->  Foreign Scan
                      Output: pagg_tab_1.b, (PARTIAL avg(pagg_tab_1.a)), 
(PARTIAL max(pagg_tab_1.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_1.a))
                      Filter: ((PARTIAL sum(pagg_tab_1.a)) < 700)
                      Relations: Aggregate on (public.fpagg_tab_p2 
pagg_tab_1)
                      Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p2 GROUP BY 1
                ->  Foreign Scan
                      Output: pagg_tab_2.b, (PARTIAL avg(pagg_tab_2.a)), 
(PARTIAL max(pagg_tab_2.a)), (PARTIAL count(*)), (PARTIAL 
sum(pagg_tab_2.a))
                      Filter: ((PARTIAL sum(pagg_tab_2.a)) < 700)
                      Relations: Aggregate on (public.fpagg_tab_p3 
pagg_tab_2)
                      Remote SQL: SELECT b, avg_p_int4(a), max(a), 
count(*), sum(a) FROM public.pagg_tab_p3 GROUP BY 1

In foreign_grouping_ok()
6586                         if (IsA(expr, Aggref))
6587                         {
6588                                 if (partial)
6589                                 {
6590                                         mark_partial_aggref((Aggref 
*) expr, AGGSPLIT_INITIAL_SERIAL);
6591                                         continue;
6592                                 }
6593                                 else if (!is_foreign_expr(root, 
grouped_rel, expr))
6594                                         return false;
6595
6596                                 tlist = add_to_flat_tlist(tlist, 
list_make1(expr));
6597                         }

at least you shouldn't do anything with expr, if is_foreign_expr() 
returned false. If we restrict pushing down queries with havingQuals, 
I'm not quite sure how Aggref can appear in local_conds.

As for changes in planner.c (setGroupClausePartial()) I have several 
questions.

1) Why don't we add non_group_exprs to pathtarget->exprs when 
partial_target->exprs is not set?

2) We replace extra->partial_target->exprs with partial_target->exprs 
after processing. Why are we sure that after this tleSortGroupRef is 
correct?

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional
Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: inconsistency between the VM page visibility status and the visibility status of the page
Next
From: shveta malik
Date:
Subject: Re: Synchronizing slots from primary to standby