Re: Eager aggregation, take 3 - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Eager aggregation, take 3
Date
Msg-id CAMbWs49v7SpPqWGAKQU=9xf-=0wKuJ13TOGBWsbqbN85b2TW5A@mail.gmail.com
Whole thread Raw
In response to Re: Eager aggregation, take 3  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Dec 4, 2024 at 11:38 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Nov 10, 2024 at 7:52 PM Richard Guo <guofenglinux@gmail.com> wrote:
> > Hmm, currently we only consider grouped aggregation for eager
> > aggregation.  For grouped aggregation, the window function's
> > arguments, as well as the PARTITION BY expressions, must appear in the
> > GROUP BY clause.  That is to say, the depname column in the first
> > query, or the n column in the second query, will not be aggregated
> > into the partial groups.  Instead, they will remain as they are as
> > input for the WindowAgg nodes.  It seems to me that this ensures
> > that we're good with window functions.  But maybe I'm wrong.
>
> Returning to this point now that I understand what you meant by
> grouped aggregation:
>
> I still don't understand how you expect to be able to evaluate
> functions like LEAD() and LAG() if any form of partial aggregation has
> been done.

In grouped aggregation, the non-aggregate arguments of the window
function must appear in the GROUP BY clause, so they will not be
aggregated into the partial groups.  It seems to me that this ensures
that they remain available as valid inputs for the window function.

For the Aggref arguments of the window function, their final values
are calculated in the Finalize Agg node, meaning they, too, are good
to be used as inputs for the window function.

As an example, please consider

create table tbl (a int, b int, c int);
insert into tbl select i%3, i%3, i%3 from generate_series(1,1000)i;
analyze tbl;

explain (verbose, costs off)
select lead(t1.a+sum(t2.b)) over (), sum(t2.c) from
tbl t1 join tbl t2 on t1.b = t2.b group by t1.a;
                                  QUERY PLAN
------------------------------------------------------------------------------
 WindowAgg
   Output: lead((t1.a + (sum(t2.b)))) OVER (?), (sum(t2.c)), t1.a
   ->  Finalize HashAggregate
         Output: t1.a, sum(t2.b), sum(t2.c)
         Group Key: t1.a
         ->  Hash Join
               Output: t1.a, (PARTIAL sum(t2.b)), (PARTIAL sum(t2.c))
               Hash Cond: (t1.b = t2.b)
               ->  Seq Scan on public.tbl t1
                     Output: t1.a, t1.b, t1.c
               ->  Hash
                     Output: t2.b, (PARTIAL sum(t2.b)), (PARTIAL sum(t2.c))
                     ->  Partial HashAggregate
                           Output: t2.b, PARTIAL sum(t2.b), PARTIAL sum(t2.c)
                           Group Key: t2.b
                           ->  Seq Scan on public.tbl t2
                                 Output: t2.a, t2.b, t2.c
(17 rows)

It seems to me that both 't1.a' and 'sum(t2.b)' are valid inputs for
LEAD(), even though we have performed partial aggregation.

Am I missing something?

Thanks
Richard



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: Tom Lane
Date:
Subject: Re: Missing initialization steps in --check and --single modes