Rethinking representation of partial-aggregate steps - Mailing list pgsql-hackers

From Tom Lane
Subject Rethinking representation of partial-aggregate steps
Date
Msg-id 29309.1466699160@sss.pgh.pa.us
Whole thread Raw
Responses Re: Rethinking representation of partial-aggregate steps  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
I really dislike this aspect of the partial-aggregate patch:

typedef struct Agg
{   ...   bool        combineStates;  /* input tuples contain transition states */   bool        finalizeAggs;   /*
shouldwe call the finalfn on agg states? */   bool        serialStates;   /* should agg states be (de)serialized? */
...
} Agg;

Representing the various partial-aggregate behaviors as three bools seems
like a bad idea to me.  In the first place, this makes it look like there
are as many as eight possible behaviors, whereas only a subset of those
flag combinations are or ever will be supported (not that the comments
anywhere tell you that, much less which ones).  In the second place,
it leads to unreadable code like this:
           /* partial phase */           count_agg_clauses(root, (Node *) partial_grouping_target->exprs,
             &agg_partial_costs, false, false, true);
 
           /* final phase */           count_agg_clauses(root, (Node *) target->exprs, &agg_final_costs,
            true, true, true);           count_agg_clauses(root, parse->havingQual, &agg_final_costs, true,
               true, true);
 

Good luck telling whether those falses and trues actually do what they
should.

I'm inclined to think that we should aggressively simplify here, perhaps
replacing all three of these flags with a three-way enum, say
PARTIALAGG_NORMAL    /* simple aggregation */PARTIALAGG_PARTIAL    /* lower phase of partial aggregation
*/PARTIALAGG_FINAL   /* upper phase of partial aggregation */
 

This embodies a couple of judgments that maybe someone would quibble with:
* we don't have any use for intermediate partial aggregation steps;
* we don't have any use for partial aggregation in which data transferred
up needn't be serialized.

But I can't really see how either of those would make sense for any
practical use-case, and I don't think we need to have a confusing and
bug-prone API in order to hold the door open to support them.

Comments?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Alex Ignatov
Date:
Subject: Re: Bug in to_timestamp().
Next
From: "David G. Johnston"
Date:
Subject: Re: Bug in to_timestamp().