Re: Statistical aggregate functions are not working with PARTIAL aggregation - Mailing list pgsql-hackers

From David Rowley
Subject Re: Statistical aggregate functions are not working with PARTIAL aggregation
Date
Msg-id CAKJS1f9+d88Hp8nu=UQhb7wjZek+kmT9H4FGcGQnGmrV=JP8xg@mail.gmail.com
Whole thread Raw
In response to Re: Statistical aggregate functions are not working with PARTIALaggregation  (Andres Freund <andres@anarazel.de>)
Responses Re: Statistical aggregate functions are not working with PARTIALaggregation  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, 20 May 2019 at 06:36, Andres Freund <andres@anarazel.de> wrote:
> > Isn't it more due to the lack of any aggregates with > 1 arg having a
> > combine function?
>
> I'm not sure I follow? regr_count() already was in 9.6? Including a
> combine function?

Oops, that line I meant to delete before sending.

> > Yeah, probably we should be passing in the correct arg count for the
> > combinefn to build_pertrans_for_aggref(). However, I see that we also
> > pass in the inputTypes from the transfn, just we don't use them when
> > working with the combinefn.
>
> Not sure what you mean by that "however"?

Well, previously those two arguments were always for the function in
pg_aggregate.aggtransfn. I only changed one of them to mean the trans
func that's being used, which detracted slightly from my ambition to
change just what numArguments means.

> > You'll notice that I've just hardcoded the numTransArgs to set it to 1
> > when we're working with a combinefn.  The combinefn always requires 2
> > args of trans type, so this seems pretty valid to me.
>
> > I think Kyotaro's patch setting of numInputs is wrong.
>
> Yea, my proposal was to simply harcode it to 2 in the
> DO_AGGSPLIT_COMBINE path.

ok.

> > diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
> > index d01fc4f52e..b061162961 100644
> > --- a/src/backend/executor/nodeAgg.c
> > +++ b/src/backend/executor/nodeAgg.c
> > @@ -2522,8 +2522,9 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
> >               int                     existing_aggno;
> >               int                     existing_transno;
> >               List       *same_input_transnos;
> > -             Oid                     inputTypes[FUNC_MAX_ARGS];
> > +             Oid                     transFnInputTypes[FUNC_MAX_ARGS];
> >               int                     numArguments;
> > +             int                     numTransFnArgs;
> >               int                     numDirectArgs;
> >               HeapTuple       aggTuple;
> >               Form_pg_aggregate aggform;
> > @@ -2701,14 +2702,23 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
> >                * could be different from the agg's declared input types, when the
> >                * agg accepts ANY or a polymorphic type.
> >                */
> > -             numArguments = get_aggregate_argtypes(aggref, inputTypes);
> > +             numTransFnArgs = get_aggregate_argtypes(aggref, transFnInputTypes);
>
> Not sure I understand the distinction you're trying to make with the
> variable renaming. The combine function is also a transition function,
> no?

I was trying to make it more clear what each variable is for. It's
true that the combine function is used as a transition function in
this case, but I'd hoped it would be more easy to understand that the
input arguments listed in a variable named transFnInputTypes would be
for the function mentioned in pg_aggregate.aggtransfn rather than the
transfn we're using.  If that's not any more clear then maybe another
fix is better, or we can leave it...   I had to make sense of all this
code last night and I was just having a go at making it easier to
follow for the next person who has to.

> >               /* Count the "direct" arguments, if any */
> >               numDirectArgs = list_length(aggref->aggdirectargs);
> >
> > +             /*
> > +              * Combine functions always have a 2 trans state type input params, so
> > +              * this is always set to 1 (we don't count the first trans state).
> > +              */
>
> Perhaps the parenthetical should instead be something like "to 1 (the
> trans type is not counted as an arg, just like with non-combine trans
> function)" or similar?

Yeah, that's better.

>
> > @@ -2781,7 +2791,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
> >                                                                         aggref, transfn_oid, aggtranstype,
> >                                                                         serialfn_oid, deserialfn_oid,
> >                                                                         initValue, initValueIsNull,
> > -                                                                       inputTypes, numArguments);
> > +                                                                       transFnInputTypes, numArguments);
>
> That means we pass in the wrong input types? Seems like it'd be better
> to either pass an empty list, or just create the argument list here.

What do you mean "here"?  Did you mean to quote this fragment?

@@ -2880,7 +2895,7 @@ build_pertrans_for_aggref(AggStatePerTrans pertrans,
    Oid aggtransfn, Oid aggtranstype,
    Oid aggserialfn, Oid aggdeserialfn,
    Datum initValue, bool initValueIsNull,
-   Oid *inputTypes, int numArguments)
+   Oid *transFnInputTypes, int numArguments)

I had hoped the rename would make it more clear that these are the
args for the function in pg_aggregate.aggtransfn.  We could pass NULL
instead when it's the combine func, but I didn't really see the
advantage of it.

> I'm inclined to push a minimal fix now, and then a slightly more evolved
> version fo this after beta1.

Ok

> > diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
> > index d4fd657188..bd8b9e8b4f 100644
> > --- a/src/test/regress/sql/aggregates.sql
> > +++ b/src/test/regress/sql/aggregates.sql
> > @@ -963,10 +963,11 @@ SET enable_indexonlyscan = off;
> >
> >  -- variance(int4) covers numeric_poly_combine
> >  -- sum(int8) covers int8_avg_combine
> > +-- regr_cocunt(float8, float8) covers int8inc_float8_float8 and aggregates with > 1 arg
>
> typo...

oops. I spelt coconut wrong. :)

>
> >  EXPLAIN (COSTS OFF)
> > -  SELECT variance(unique1::int4), sum(unique1::int8) FROM tenk1;
> > +  SELECT variance(unique1::int4), sum(unique1::int8),regr_count(unique1::float8, unique1::float8) FROM tenk1;
> >
> > -SELECT variance(unique1::int4), sum(unique1::int8) FROM tenk1;
> > +SELECT variance(unique1::int4), sum(unique1::int8),regr_count(unique1::float8, unique1::float8) FROM tenk1;
> >
> >  ROLLBACK;
>
> Does this actually cover the bug at issue here? The non-combine case
> wasn't broken?

The EXPLAIN shows the plan is:

                  QUERY PLAN
----------------------------------------------
 Finalize Aggregate
   ->  Gather
         Workers Planned: 4
         ->  Partial Aggregate
               ->  Parallel Seq Scan on tenk1
(5 rows)

so it is exercising the combine functions.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: sample scans and predicate locking
Next
From: Tomas Vondra
Date:
Subject: Re: Multivariate MCV stats can leak data to unprivileged users