Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers
From | Ranier Vilela |
---|---|
Subject | Re: Add proper planner support for ORDER BY / DISTINCT aggregates |
Date | |
Msg-id | CAEudQAq-H2yNzCni1N_PW7PfrWr4jYMhzRAiQiGyAQ4vt4-sTg@mail.gmail.com Whole thread Raw |
In response to | Re: Add proper planner support for ORDER BY / DISTINCT aggregates (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Add proper planner support for ORDER BY / DISTINCT aggregates
|
List | pgsql-hackers |
Em seg., 12 de jul. de 2021 às 09:04, David Rowley <dgrowleyml@gmail.com> escreveu:
On Sun, 13 Jun 2021 at 03:07, David Rowley <dgrowleyml@gmail.com> wrote:
>
> Please find attached my WIP patch. It's WIP due to what I mentioned
> in the above paragraph and also because I've not bothered to add JIT
> support for the new expression evaluation steps.
I've split this patch into two parts.
Hi, I'll take a look.
0001 Adds planner support for ORDER BY aggregates.
/* Normal transition function without ORDER BY / DISTINCT. */
Is it possible to avoid entering to initialize args if 'argno >= pertrans->numTransInputs'?
Like this:
if (!pertrans->aggsortrequired && argno < pertrans->numTransInputs)
And if argos is '>' that pertrans->numTransInputs?
The test shouldn't be, inside the loop?
+ /*
+ * Don't initialize args for any ORDER BY clause that might
+ * exist in a presorted aggregate.
+ */
+ if (argno >= pertrans->numTransInputs)
+ break;
+ * Don't initialize args for any ORDER BY clause that might
+ * exist in a presorted aggregate.
+ */
+ if (argno >= pertrans->numTransInputs)
+ break;
I think that or can reduce the scope of variable 'sortlist' or simply remove it?
a)
+ /* Determine pathkeys for aggregate functions with an ORDER BY */
+ if (parse->groupingSets == NIL && root->numOrderedAggs > 0 &&
+ (qp_extra->groupClause == NIL || root->group_pathkeys))
+ {
+ ListCell *lc;
+ List *pathkeys = NIL;
+
+ foreach(lc, root->agginfos)
+ {
+ AggInfo *agginfo = (AggInfo *) lfirst(lc);
+ Aggref *aggref = agginfo->representative_aggref;
+ List *sortlist;
+
+ if (parse->groupingSets == NIL && root->numOrderedAggs > 0 &&
+ (qp_extra->groupClause == NIL || root->group_pathkeys))
+ {
+ ListCell *lc;
+ List *pathkeys = NIL;
+
+ foreach(lc, root->agginfos)
+ {
+ AggInfo *agginfo = (AggInfo *) lfirst(lc);
+ Aggref *aggref = agginfo->representative_aggref;
+ List *sortlist;
+
or better
b)
+ /* Determine pathkeys for aggregate functions with an ORDER BY */
+ if (parse->groupingSets == NIL && root->numOrderedAggs > 0 &&
+ (qp_extra->groupClause == NIL || root->group_pathkeys))
+ {
+ ListCell *lc;
+ List *pathkeys = NIL;
+
+ foreach(lc, root->agginfos)
+ {
+ AggInfo *agginfo = (AggInfo *) lfirst(lc);
+ Aggref *aggref = agginfo->representative_aggref;
+
+ if (AGGKIND_IS_ORDERED_SET(aggref->aggkind))
+ continue;
+
+ /* DISTINCT aggregates not yet supported by the planner */
+ if (aggref->aggdistinct != NIL)
+ continue;
+
+ if (aggref->aggorder == NIL)
+ continue;
+
+ /*
+ * Find the pathkeys with the most sorted derivative of the first
+ * Aggref. For example, if we determine the pathkeys for the first
+ * Aggref to be {a}, and we find another with {a,b}, then we use
+ * {a,b} since it's useful for more Aggrefs than just {a}. We
+ * currently ignore anything that might have a longer list of
+ * pathkeys than the first Aggref if it is not contained in the
+ * pathkeys for the first agg. We can't practically plan for all
+ * orders of each Aggref, so this seems like the best compromise.
+ */
+ if (pathkeys == NIL)
+ {
+ pathkeys = make_pathkeys_for_sortclauses(root, aggref->aggorder ,
+ aggref->args);
+ aggref->aggpresorted = true;
+ }
+ else
+ {
+ List *pathkeys2 = make_pathkeys_for_sortclauses(root,
+ aggref->aggorder,
+ aggref->args);
+ if (parse->groupingSets == NIL && root->numOrderedAggs > 0 &&
+ (qp_extra->groupClause == NIL || root->group_pathkeys))
+ {
+ ListCell *lc;
+ List *pathkeys = NIL;
+
+ foreach(lc, root->agginfos)
+ {
+ AggInfo *agginfo = (AggInfo *) lfirst(lc);
+ Aggref *aggref = agginfo->representative_aggref;
+
+ if (AGGKIND_IS_ORDERED_SET(aggref->aggkind))
+ continue;
+
+ /* DISTINCT aggregates not yet supported by the planner */
+ if (aggref->aggdistinct != NIL)
+ continue;
+
+ if (aggref->aggorder == NIL)
+ continue;
+
+ /*
+ * Find the pathkeys with the most sorted derivative of the first
+ * Aggref. For example, if we determine the pathkeys for the first
+ * Aggref to be {a}, and we find another with {a,b}, then we use
+ * {a,b} since it's useful for more Aggrefs than just {a}. We
+ * currently ignore anything that might have a longer list of
+ * pathkeys than the first Aggref if it is not contained in the
+ * pathkeys for the first agg. We can't practically plan for all
+ * orders of each Aggref, so this seems like the best compromise.
+ */
+ if (pathkeys == NIL)
+ {
+ pathkeys = make_pathkeys_for_sortclauses(root, aggref->aggorder ,
+ aggref->args);
+ aggref->aggpresorted = true;
+ }
+ else
+ {
+ List *pathkeys2 = make_pathkeys_for_sortclauses(root,
+ aggref->aggorder,
+ aggref->args);
0002 is a WIP patch for DISTINCT support. This still lacks JIT
support and I'm still not certain of the best where to store the
previous value or tuple to determine if the current one is distinct
from it.
In the patch 0002, I think that can reduce the scope of variable 'aggstate'?
+ EEO_CASE(EEOP_AGG_PRESORTED_DISTINCT_SINGLE)
+ {
+ AggStatePerTrans pertrans = op->d.agg_presorted_distinctcheck.pertrans;
+ Datum value = pertrans->transfn_fcinfo->args[1].value;
+ bool isnull = pertrans->transfn_fcinfo->args[1].isnull;
+
+ if (!pertrans->haslast ||
+ pertrans->lastisnull != isnull ||
+ !DatumGetBool(FunctionCall2Coll(&pertrans->equalfnOne,
+ pertrans->aggCollation,
+ pertrans->lastdatum, value)))
+ {
+ if (pertrans->haslast && !pertrans->inputtypeByVal)
+ pfree(DatumGetPointer(pertrans->lastdatum));
+
+ pertrans->haslast = true;
+ if (!isnull)
+ {
+ AggState *aggstate = castNode(AggState, state->parent);
+ {
+ AggStatePerTrans pertrans = op->d.agg_presorted_distinctcheck.pertrans;
+ Datum value = pertrans->transfn_fcinfo->args[1].value;
+ bool isnull = pertrans->transfn_fcinfo->args[1].isnull;
+
+ if (!pertrans->haslast ||
+ pertrans->lastisnull != isnull ||
+ !DatumGetBool(FunctionCall2Coll(&pertrans->equalfnOne,
+ pertrans->aggCollation,
+ pertrans->lastdatum, value)))
+ {
+ if (pertrans->haslast && !pertrans->inputtypeByVal)
+ pfree(DatumGetPointer(pertrans->lastdatum));
+
+ pertrans->haslast = true;
+ if (!isnull)
+ {
+ AggState *aggstate = castNode(AggState, state->parent);
+
+ /*
+ * XXX is it worth having a dedicted ByVal version of this
+ * operation so that we can skip switching memory contexts
+ * and do a simple assign rather than datumCopy below?
+ */
+ MemoryContext oldContext;
+
+ oldContext = MemoryContextSwitchTo(aggstate->curaggcontext->ecxt_per_tuple_memory);
+ * XXX is it worth having a dedicted ByVal version of this
+ * operation so that we can skip switching memory contexts
+ * and do a simple assign rather than datumCopy below?
+ */
+ MemoryContext oldContext;
+
+ oldContext = MemoryContextSwitchTo(aggstate->curaggcontext->ecxt_per_tuple_memory);
regards,
Ranier Vilela
pgsql-hackers by date: