Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date
Msg-id CAEudQAq-H2yNzCni1N_PW7PfrWr4jYMhzRAiQiGyAQ4vt4-sTg@mail.gmail.com
Whole thread Raw
In response to Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Em seg., 12 de jul. de 2021 às 09:04, David Rowley <dgrowleyml@gmail.com> escreveu:
On Sun, 13 Jun 2021 at 03:07, David Rowley <dgrowleyml@gmail.com> wrote:
>
> Please find attached my WIP patch.  It's WIP due to what I mentioned
> in the above paragraph and also because I've not bothered to add JIT
> support for the new expression evaluation steps.

I've split this patch into two parts.
Hi, I'll take a look.


0001 Adds planner support for ORDER BY aggregates.
/* Normal transition function without ORDER BY / DISTINCT. */
Is it possible to avoid entering to initialize args if 'argno >= pertrans->numTransInputs'?
Like this:

if (!pertrans->aggsortrequired && argno < pertrans->numTransInputs)

And if argos is '>' that pertrans->numTransInputs?
The test shouldn't be, inside the loop?

+ /*
+ * Don't initialize args for any ORDER BY clause that might
+ * exist in a presorted aggregate.
+ */
+ if (argno >= pertrans->numTransInputs)
+ break;

I think that or can reduce the scope of variable 'sortlist' or simply remove it?

a)
+ /* Determine pathkeys for aggregate functions with an ORDER BY */
+ if (parse->groupingSets == NIL && root->numOrderedAggs > 0 &&
+ (qp_extra->groupClause == NIL || root->group_pathkeys))
+ {
+ ListCell   *lc;
+ List   *pathkeys = NIL;
+
+ foreach(lc, root->agginfos)
+ {
+ AggInfo    *agginfo = (AggInfo *) lfirst(lc);
+ Aggref   *aggref = agginfo->representative_aggref;
+ List   *sortlist;
+

or better

b)
+ /* Determine pathkeys for aggregate functions with an ORDER BY */
+ if (parse->groupingSets == NIL && root->numOrderedAggs > 0 &&
+ (qp_extra->groupClause == NIL || root->group_pathkeys))
+ {
+ ListCell   *lc;
+ List   *pathkeys = NIL;
+
+ foreach(lc, root->agginfos)
+ {
+ AggInfo    *agginfo = (AggInfo *) lfirst(lc);
+ Aggref   *aggref = agginfo->representative_aggref;
+
+ if (AGGKIND_IS_ORDERED_SET(aggref->aggkind))
+ continue;
+
+ /* DISTINCT aggregates not yet supported by the planner */
+ if (aggref->aggdistinct != NIL)
+ continue;
+
+ if (aggref->aggorder == NIL)
+ continue;
+
+ /*
+ * Find the pathkeys with the most sorted derivative of the first
+ * Aggref. For example, if we determine the pathkeys for the first
+ * Aggref to be {a}, and we find another with {a,b}, then we use
+ * {a,b} since it's useful for more Aggrefs than just {a}.  We
+ * currently ignore anything that might have a longer list of
+ * pathkeys than the first Aggref if it is not contained in the
+ * pathkeys for the first agg.  We can't practically plan for all
+ * orders of each Aggref, so this seems like the best compromise.
+ */
+ if (pathkeys == NIL)
+ {
+ pathkeys = make_pathkeys_for_sortclauses(root,  aggref->aggorder ,
+ aggref->args);
+ aggref->aggpresorted = true;
+ }
+ else
+ {
+ List   *pathkeys2 = make_pathkeys_for_sortclauses(root,
+  aggref->aggorder,
+  aggref->args);


0002 is a WIP patch for DISTINCT support.  This still lacks JIT
support and I'm still not certain of the best where to store the
previous value or tuple to determine if the current one is distinct
from it.
In the patch 0002, I think that can reduce the scope of variable 'aggstate'?

+ EEO_CASE(EEOP_AGG_PRESORTED_DISTINCT_SINGLE)
+ {
+ AggStatePerTrans pertrans = op->d.agg_presorted_distinctcheck.pertrans;
+ Datum value = pertrans->transfn_fcinfo->args[1].value;
+ bool isnull = pertrans->transfn_fcinfo->args[1].isnull;
+
+ if (!pertrans->haslast ||
+ pertrans->lastisnull != isnull ||
+ !DatumGetBool(FunctionCall2Coll(&pertrans->equalfnOne,
+ pertrans->aggCollation,
+ pertrans->lastdatum, value)))
+ {
+ if (pertrans->haslast && !pertrans->inputtypeByVal)
+ pfree(DatumGetPointer(pertrans->lastdatum));
+
+ pertrans->haslast = true;
+ if (!isnull)
+ {
+ AggState   *aggstate = castNode(AggState, state->parent);
+
+ /*
+ * XXX is it worth having a dedicted ByVal version of this
+ * operation so that we can skip switching memory contexts
+ * and do a simple assign rather than datumCopy below?
+ */
+ MemoryContext oldContext;
+
+ oldContext = MemoryContextSwitchTo(aggstate->curaggcontext->ecxt_per_tuple_memory);

What do you think?

regards,
Ranier Vilela

pgsql-hackers by date:

Previous
From: Josef Šimánek
Date:
Subject: Re: [PATCH] Don't block HOT update by BRIN index
Next
From: Tom Lane
Date:
Subject: Re: proposal: possibility to read dumped table's name from file