Thread: Rethinking representation of partial-aggregate steps
I really dislike this aspect of the partial-aggregate patch: typedef struct Agg { ... bool combineStates; /* input tuples contain transition states */ bool finalizeAggs; /* shouldwe call the finalfn on agg states? */ bool serialStates; /* should agg states be (de)serialized? */ ... } Agg; Representing the various partial-aggregate behaviors as three bools seems like a bad idea to me. In the first place, this makes it look like there are as many as eight possible behaviors, whereas only a subset of those flag combinations are or ever will be supported (not that the comments anywhere tell you that, much less which ones). In the second place, it leads to unreadable code like this: /* partial phase */ count_agg_clauses(root, (Node *) partial_grouping_target->exprs, &agg_partial_costs, false, false, true); /* final phase */ count_agg_clauses(root, (Node *) target->exprs, &agg_final_costs, true, true, true); count_agg_clauses(root, parse->havingQual, &agg_final_costs, true, true, true); Good luck telling whether those falses and trues actually do what they should. I'm inclined to think that we should aggressively simplify here, perhaps replacing all three of these flags with a three-way enum, say PARTIALAGG_NORMAL /* simple aggregation */PARTIALAGG_PARTIAL /* lower phase of partial aggregation */PARTIALAGG_FINAL /* upper phase of partial aggregation */ This embodies a couple of judgments that maybe someone would quibble with: * we don't have any use for intermediate partial aggregation steps; * we don't have any use for partial aggregation in which data transferred up needn't be serialized. But I can't really see how either of those would make sense for any practical use-case, and I don't think we need to have a confusing and bug-prone API in order to hold the door open to support them. Comments? regards, tom lane
On Thu, Jun 23, 2016 at 12:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I really dislike this aspect of the partial-aggregate patch: > > typedef struct Agg > { > ... > bool combineStates; /* input tuples contain transition states */ > bool finalizeAggs; /* should we call the finalfn on agg states? */ > bool serialStates; /* should agg states be (de)serialized? */ > ... > } Agg; > > Representing the various partial-aggregate behaviors as three bools seems > like a bad idea to me. In the first place, this makes it look like there > are as many as eight possible behaviors, whereas only a subset of those > flag combinations are or ever will be supported (not that the comments > anywhere tell you that, much less which ones). In the second place, > it leads to unreadable code like this: > > /* partial phase */ > count_agg_clauses(root, (Node *) partial_grouping_target->exprs, > &agg_partial_costs, false, false, true); > > /* final phase */ > count_agg_clauses(root, (Node *) target->exprs, &agg_final_costs, > true, true, true); > count_agg_clauses(root, parse->havingQual, &agg_final_costs, true, > true, true); > > Good luck telling whether those falses and trues actually do what they > should. > > I'm inclined to think that we should aggressively simplify here, perhaps > replacing all three of these flags with a three-way enum, say > > PARTIALAGG_NORMAL /* simple aggregation */ > PARTIALAGG_PARTIAL /* lower phase of partial aggregation */ > PARTIALAGG_FINAL /* upper phase of partial aggregation */ > > This embodies a couple of judgments that maybe someone would quibble with: > * we don't have any use for intermediate partial aggregation steps; > * we don't have any use for partial aggregation in which data transferred > up needn't be serialized. > > But I can't really see how either of those would make sense for any > practical use-case, and I don't think we need to have a confusing and > bug-prone API in order to hold the door open to support them. Hmm, well I guess I would have to disagree with the idea that those other modes aren't useful. I seem to recall that David had some performance results showing that pushing partial aggregate steps below an Append node resulted in a noticeable speed-up even in the absence of any parallelism, presumably because it avoids whatever projection the Append might do, and also improves icache and dcache locality. So it seems quite possible that you might want to have something like this: FinalizeAggregate -> Gather -> IntermediateAggregate -> Append -> PartialAggregate -> Parallel SeqScan -> PartialAggregate -> Parallel SeqScan -> PartialAggregate -> Parallel SeqScan -> PartialAggregate -> Parallel SeqScan The partial aggregate plans need not serialize, because they are passing data to the same process, and the intermediate aggregate is there, too. I do agree, however, that the three Boolean flags don't make the code entirely easy to read. What I might suggest is that we replace the three Boolean flags with integer flags, something like this: #define AGGOP_COMBINESTATES 0x1 #define AGGOP_SERIALIZESTATES 0x2 #define AGGOP_FINALIZEAGGS 0x4 #define AGGTYPE_SIMPLE (AGGOP_FINALIZEAGGS) #define AGGTYPE_PARTIAL_SERIAL (AGGOP_SERIALIZESTATES) #define AGGTYPE_FINALIZE_SERIAL (AGGOP_COMBINESTATES|AGGOP_FINALIZEAGGS|AGG_SERIALIZESTATES) Code that requests behavior can do so with the AGGTYPE_* constants, but code that implements behavior can test AGGOP_* constants. Then if we decide we need new combinations in the future, we can just add those --- e.g. AGGTYPE_PARTIAL = 0, AGGTYPE_FINALIZE = AGGOP_COMBINESTATES|AGGOP_FINALIZEAGGS, AGGTYPE_INTERMEDIATE = AGGOP_COMBINESTATES --- and have some hope that it will mostly work. I actually think nearly all of the combinations are sensible, here, although I think it may have been a mistake to overload the "serialize states" flag to mean both "does this combining aggregate expect to receive the serial type rather than the transition type?" and also "does this partial aggregate output the serial type rather than the transition type?". In the example above, you'd want the IntermediateAggregate to expect the transition type but output the serial type. Oops. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 23, 2016 at 12:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm inclined to think that we should aggressively simplify here, perhaps >> replacing all three of these flags with a three-way enum, say >> >> PARTIALAGG_NORMAL /* simple aggregation */ >> PARTIALAGG_PARTIAL /* lower phase of partial aggregation */ >> PARTIALAGG_FINAL /* upper phase of partial aggregation */ >> >> This embodies a couple of judgments that maybe someone would quibble with: >> * we don't have any use for intermediate partial aggregation steps; >> * we don't have any use for partial aggregation in which data transferred >> up needn't be serialized. > Hmm, well I guess I would have to disagree with the idea that those > other modes aren't useful. I seem to recall that David had some > performance results showing that pushing partial aggregate steps below > an Append node resulted in a noticeable speed-up even in the absence > of any parallelism, presumably because it avoids whatever projection > the Append might do, and also improves icache and dcache locality. I don't believe that for one second, because introducing another layer of intermediate aggregation implies another projection step, plus all the other overhead of a Plan node. Also, if we are going to support intermediate partial aggregation, the current representation is broken anyway, since it lacks any ability to distinguish serialization for the input states from serialization for the output states, which is something you'd certainly need to distinguish in this type of plan structure. > I do agree, however, that the three Boolean flags don't make the code > entirely easy to read. What I might suggest is that we replace the > three Boolean flags with integer flags, something like this: Yeah, that's another way we could go. I had been considering a variant of that, which was to assign specific code values to the enum constants and then invent macros that did bit-anding tests on them. That ends up being just about what you propose except that the compiler understands the enum-ness of the behavioral alternatives, which seems like a good thing. regards, tom lane
On Thu, Jun 23, 2016 at 1:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jun 23, 2016 at 12:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm inclined to think that we should aggressively simplify here, perhaps >>> replacing all three of these flags with a three-way enum, say >>> >>> PARTIALAGG_NORMAL /* simple aggregation */ >>> PARTIALAGG_PARTIAL /* lower phase of partial aggregation */ >>> PARTIALAGG_FINAL /* upper phase of partial aggregation */ >>> >>> This embodies a couple of judgments that maybe someone would quibble with: >>> * we don't have any use for intermediate partial aggregation steps; >>> * we don't have any use for partial aggregation in which data transferred >>> up needn't be serialized. > >> Hmm, well I guess I would have to disagree with the idea that those >> other modes aren't useful. I seem to recall that David had some >> performance results showing that pushing partial aggregate steps below >> an Append node resulted in a noticeable speed-up even in the absence >> of any parallelism, presumably because it avoids whatever projection >> the Append might do, and also improves icache and dcache locality. > > I don't believe that for one second, because introducing another layer of > intermediate aggregation implies another projection step, plus all the > other overhead of a Plan node. Sure, but aggregating as early as possible will often have the effect of dramatically reducing the number of tuples that have to pass through upper levels of the plan tree, which seems it would frequently far outweigh those considerations. Anyway, you can go back and find what he posted about this previously and look at it for yourself; that's probably better than having an argument with me about his test results. > Also, if we are going to support intermediate partial aggregation, the > current representation is broken anyway, since it lacks any ability > to distinguish serialization for the input states from serialization > for the output states, which is something you'd certainly need to > distinguish in this type of plan structure. I mentioned that in the very same email to which you were replying when you wrote this, which makes me wonder whether you bothered to read the whole thing. >> I do agree, however, that the three Boolean flags don't make the code >> entirely easy to read. What I might suggest is that we replace the >> three Boolean flags with integer flags, something like this: > > Yeah, that's another way we could go. I had been considering a variant > of that, which was to assign specific code values to the enum constants > and then invent macros that did bit-anding tests on them. That ends up > being just about what you propose except that the compiler understands > the enum-ness of the behavioral alternatives, which seems like a good > thing. I don't object to that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I do agree, however, that the three Boolean flags don't make the code > > entirely easy to read. What I might suggest is that we replace the > > three Boolean flags with integer flags, something like this: > > Yeah, that's another way we could go. I had been considering a variant > of that, which was to assign specific code values to the enum constants > and then invent macros that did bit-anding tests on them. That ends up > being just about what you propose except that the compiler understands > the enum-ness of the behavioral alternatives, which seems like a good > thing. Isn't that what you said not to do in https://www.postgresql.org/message-id/13345.1462383078@sss.pgh.pa.us ? ISTM that in Robert's proposal it is allowed for a "flags" value to have an OR'ed combination of multiple individual flags. Unless you're proposing to enumerate each allowed combination? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 23 June 2016 at 18:31, Robert Haas <robertmhaas@gmail.com> wrote:
--
Sure, but aggregating as early as possible will often have the effect
of dramatically reducing the number of tuples that have to pass
through upper levels of the plan tree, which seems it would frequently
far outweigh those considerations.
Agreed
I can imagine plans where a useful aggregation occurs before every join, so N > 2 is easily possible.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> Yeah, that's another way we could go. I had been considering a variant >> of that, which was to assign specific code values to the enum constants >> and then invent macros that did bit-anding tests on them. That ends up >> being just about what you propose except that the compiler understands >> the enum-ness of the behavioral alternatives, which seems like a good >> thing. > Isn't that what you said not to do in > https://www.postgresql.org/message-id/13345.1462383078@sss.pgh.pa.us ? No. What I'm imagining is, say, #define AGGOP_COMBINESTATES 0x1 #define AGGOP_SERIALIZESTATES 0x2 #define AGGOP_DESERIALIZESTATES 0x4 #define AGGOP_FINALIZEAGGS 0x8 typedef enum AggPartialMode { AGGPARTIAL_SIMPLE = AGGOP_FINALIZEAGGS, AGGPARTIAL_PARTIAL = AGGOP_SERIALIZESTATES, AGGPARTIAL_FINAL = AGGOP_COMBINESTATES| AGGOP_DESERIALIZESTATES | AGGOP_FINALIZEAGGS } AggPartialMode; #define DO_AGGPARTIAL_COMBINE(apm) (((apm) & AGGOP_COMBINESTATES) != 0) #define DO_AGGPARTIAL_SERIALIZE(apm) (((apm) & AGGOP_SERIALIZESTATES) != 0) #define DO_AGGPARTIAL_DESERIALIZE(apm) (((apm) & AGGOP_DESERIALIZESTATES) != 0) #define DO_AGGPARTIAL_FINALIZE(apm) (((apm) & AGGOP_FINALIZEAGGS) != 0) These enum constants satisfy the properties I mentioned before, but their assigned values are chosen to make the macros cheap. regards, tom lane
Tom Lane wrote: > What I'm imagining is, say, > > #define AGGOP_COMBINESTATES 0x1 > #define AGGOP_SERIALIZESTATES 0x2 > #define AGGOP_DESERIALIZESTATES 0x4 > #define AGGOP_FINALIZEAGGS 0x8 > > typedef enum AggPartialMode > { > AGGPARTIAL_SIMPLE = AGGOP_FINALIZEAGGS, > AGGPARTIAL_PARTIAL = AGGOP_SERIALIZESTATES, > AGGPARTIAL_FINAL = AGGOP_COMBINESTATES | AGGOP_DESERIALIZESTATES | AGGOP_FINALIZEAGGS > } AggPartialMode; > > #define DO_AGGPARTIAL_COMBINE(apm) (((apm) & AGGOP_COMBINESTATES) != 0) > #define DO_AGGPARTIAL_SERIALIZE(apm) (((apm) & AGGOP_SERIALIZESTATES) != 0) > #define DO_AGGPARTIAL_DESERIALIZE(apm) (((apm) & AGGOP_DESERIALIZESTATES) != 0) > #define DO_AGGPARTIAL_FINALIZE(apm) (((apm) & AGGOP_FINALIZEAGGS) != 0) > > > These enum constants satisfy the properties I mentioned before, but their > assigned values are chosen to make the macros cheap. Ah, sure, that makes sense. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 24 June 2016 at 05:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Hmm, well I guess I would have to disagree with the idea that those >> other modes aren't useful. I seem to recall that David had some >> performance results showing that pushing partial aggregate steps below >> an Append node resulted in a noticeable speed-up even in the absence >> of any parallelism, presumably because it avoids whatever projection >> the Append might do, and also improves icache and dcache locality. > > I don't believe that for one second, because introducing another layer of > intermediate aggregation implies another projection step, plus all the > other overhead of a Plan node. It's certainly not difficult to mock up a test to prove it is faster. create table t1 (a int not null); insert into t1 select generate_series(1,1000000); create table t2 (a int not null); insert into t2 select generate_series(1,1000000); select sum(c) from (select count(*) c from t1 union all select count(*) from t2) t; sum ---------2000000 (1 row) Time: 82.038 ms select count(*) from (select * from t1 union all select * from t2) t; count ---------2000000 (1 row) Time: 180.540 ms -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 24 June 2016 at 05:12, Robert Haas <robertmhaas@gmail.com> wrote: > I do agree, however, that the three Boolean flags don't make the code > entirely easy to read. What I might suggest is that we replace the > three Boolean flags with integer flags, something like this: > > #define AGGOP_COMBINESTATES 0x1 > #define AGGOP_SERIALIZESTATES 0x2 > #define AGGOP_FINALIZEAGGS 0x4 > > #define AGGTYPE_SIMPLE (AGGOP_FINALIZEAGGS) > #define AGGTYPE_PARTIAL_SERIAL (AGGOP_SERIALIZESTATES) > #define AGGTYPE_FINALIZE_SERIAL > (AGGOP_COMBINESTATES|AGGOP_FINALIZEAGGS|AGG_SERIALIZESTATES) > > Code that requests behavior can do so with the AGGTYPE_* constants, > but code that implements behavior can test AGGOP_* constants. Then if > we decide we need new combinations in the future, we can just add > those --- e.g. AGGTYPE_PARTIAL = 0, AGGTYPE_FINALIZE = > AGGOP_COMBINESTATES|AGGOP_FINALIZEAGGS, AGGTYPE_INTERMEDIATE = > AGGOP_COMBINESTATES --- and have some hope that it will mostly work. > > I actually think nearly all of the combinations are sensible, here, > although I think it may have been a mistake to overload the "serialize > states" flag to mean both "does this combining aggregate expect to > receive the serial type rather than the transition type?" and also > "does this partial aggregate output the serial type rather than the > transition type?". In the example above, you'd want the > IntermediateAggregate to expect the transition type but output the > serial type. Oops. I did consider using bit flags for this before, and it's a little bit of an accident that it didn't end up that way. Originally there were just two bool flags to control finalize and combine. A later patch added the serialisation stuff which required the 3rd flag. It then became a little untidy, but I hesitated to change it as I really just wanted to keep the patch as easy to review as possible. In hindsight, if I should've probably used bit flags from day one. As for the above proposal, I do agree that it'll be cleaner with bit flags, I just really don't see the need for the AGGTYPE_* alias macros. For me it's easier to read if each option is explicitly listed similar to how pull_var_clause() is done, e.g: List *vars = pull_var_clause((Node *) cur_em->em_expr, PVC_RECURSE_AGGREGATES | PVC_RECURSE_WINDOWFUNCS | PVC_INCLUDE_PLACEHOLDERS); I'll start by writing the code to do that much, and if the consensus is to add the alias macros too, then it's a small addition. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > As for the above proposal, I do agree that it'll be cleaner with bit > flags, I just really don't see the need for the AGGTYPE_* alias > macros. For me it's easier to read if each option is explicitly listed > similar to how pull_var_clause() is done, e.g: That does not sound to me like it does anything to address the issue of documenting which combinations of flags are sensible/supported. I prefer something like the enum approach I suggested further downthread. regards, tom lane
On 24 June 2016 at 05:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > #define AGGOP_COMBINESTATES 0x1 > #define AGGOP_SERIALIZESTATES 0x2 > #define AGGOP_DESERIALIZESTATES 0x4 > #define AGGOP_FINALIZEAGGS 0x8 > > typedef enum AggPartialMode > { > AGGPARTIAL_SIMPLE = AGGOP_FINALIZEAGGS, > AGGPARTIAL_PARTIAL = AGGOP_SERIALIZESTATES, > AGGPARTIAL_FINAL = AGGOP_COMBINESTATES | AGGOP_DESERIALIZESTATES | AGGOP_FINALIZEAGGS > } AggPartialMode; > > #define DO_AGGPARTIAL_COMBINE(apm) (((apm) & AGGOP_COMBINESTATES) != 0) > #define DO_AGGPARTIAL_SERIALIZE(apm) (((apm) & AGGOP_SERIALIZESTATES) != 0) > #define DO_AGGPARTIAL_DESERIALIZE(apm) (((apm) & AGGOP_DESERIALIZESTATES) != 0) > #define DO_AGGPARTIAL_FINALIZE(apm) (((apm) & AGGOP_FINALIZEAGGS) != 0) The attached implements this, with the exception that I didn't really think AggPartialMode was the best name for the enum. I've named this AggregateMode instead, as the aggregate is only partial in some cases. I also appended _SERIAL and _DESERIAL to the end of the partial and final modes for now, as likely in 10.0 we'll be performing some partial non-serialized aggregation and we'll likely want to keep those other names for that instead. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > The attached implements this, with the exception that I didn't really > think AggPartialMode was the best name for the enum. I've named this > AggregateMode instead, as the aggregate is only partial in some cases. Hm. We already have an AggStrategy (for hashed vs. grouped aggregation) so adding AggregateMode beside it seems somewhere between confusing and content-free. And it's needlessly inconsistent with the spelling of the existing enum name. I'm not wedded to "AggPartialMode" but I think we need some name that's a bit more specific than "AggregateMode". Suggestions anyone? regards, tom lane
I wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> The attached implements this, with the exception that I didn't really >> think AggPartialMode was the best name for the enum. I've named this >> AggregateMode instead, as the aggregate is only partial in some cases. > Hm. We already have an AggStrategy (for hashed vs. grouped aggregation) > so adding AggregateMode beside it seems somewhere between confusing and > content-free. And it's needlessly inconsistent with the spelling of the > existing enum name. I'm not wedded to "AggPartialMode" but I think > we need some name that's a bit more specific than "AggregateMode". > Suggestions anyone? After a bit of thought, maybe AggDivision or AggSplit or something along those lines? regards, tom lane
On 26 June 2016 at 04:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> David Rowley <david.rowley@2ndquadrant.com> writes: >>> The attached implements this, with the exception that I didn't really >>> think AggPartialMode was the best name for the enum. I've named this >>> AggregateMode instead, as the aggregate is only partial in some cases. > >> Hm. We already have an AggStrategy (for hashed vs. grouped aggregation) >> so adding AggregateMode beside it seems somewhere between confusing and >> content-free. And it's needlessly inconsistent with the spelling of the >> existing enum name. I'm not wedded to "AggPartialMode" but I think >> we need some name that's a bit more specific than "AggregateMode". >> Suggestions anyone? > > After a bit of thought, maybe AggDivision or AggSplit or something > along those lines? How about AggCompletion? It's seems to fit well in the sense of the aggregation being partial or not, but less well when you consider serialisation and combining states. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On 26 June 2016 at 04:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> After a bit of thought, maybe AggDivision or AggSplit or something >> along those lines? > How about AggCompletion? It's seems to fit well in the sense of the > aggregation being partial or not, but less well when you consider > serialisation and combining states. After having worked on the patch some, I think that AggSplit is a pretty good choice, because it is about how we split up the calculation of an aggregate. And it's short, which is a good thing for something that's going to be a component of assorted names. What I've got at the moment looks like: /* Primitive options supported by nodeAgg.c: */ #define AGGSPLITOP_COMBINE 0x1 /* substitute combinefn for transfn */ #define AGGSPLITOP_SERIALIZE 0x2 /* apply serializefn to output */ #define AGGSPLITOP_DESERIALIZE 0x4 /* apply deserializefn to input */ #define AGGSPLITOP_FINALIZE 0x8 /* run finalfn */ /* Supported operating modes (i.e., useful combinations of these options): */ typedef enum AggSplit { /* Basic, non-split aggregation: */ AGGSPLIT_SIMPLE = AGGSPLITOP_FINALIZE, /* Initial phase of partial aggregation,with serialization: */ AGGSPLIT_PARTIAL_SERIAL = AGGSPLITOP_SERIALIZE, /* Final phase of partial aggregation,with deserialization: */ AGGSPLIT_FINAL_DESERIAL = AGGSPLITOP_COMBINE | AGGSPLITOP_DESERIALIZE | AGGSPLITOP_FINALIZE } AggSplit; /* Test macros for the primitive options: */ #define DO_AGGSPLIT_COMBINE(as) (((as) & AGGSPLITOP_COMBINE) != 0) #define DO_AGGSPLIT_SERIALIZE(as) (((as) & AGGSPLITOP_SERIALIZE) != 0) #define DO_AGGSPLIT_DESERIALIZE(as) (((as) & AGGSPLITOP_DESERIALIZE) != 0) #define DO_AGGSPLIT_FINALIZE(as) (((as) & AGGSPLITOP_FINALIZE) != 0) Looking at this in the light of morning, I'm rather strongly tempted to invert the sense of the FINALIZE option, so that "simple" mode works out as zero, ie, select no options. Maybe call it SKIPFINAL instead of FINALIZE? regards, tom lane
On 27 June 2016 at 03:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Looking at this in the light of morning, I'm rather strongly tempted to > invert the sense of the FINALIZE option, so that "simple" mode works out > as zero, ie, select no options. Maybe call it SKIPFINAL instead of > FINALIZE? Aggref calls this aggpartial, and I was tempted to invert that many times and make it aggfinalize, but in the end didn't. It seems nicer to me to keep it as a list of things that are done, rather than to make one exception to that just so we can have the simple mode as 0. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On 27 June 2016 at 03:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Looking at this in the light of morning, I'm rather strongly tempted to >> invert the sense of the FINALIZE option, so that "simple" mode works out >> as zero, ie, select no options. Maybe call it SKIPFINAL instead of >> FINALIZE? > Aggref calls this aggpartial, and I was tempted to invert that many > times and make it aggfinalize, but in the end didn't. > It seems nicer to me to keep it as a list of things that are done, > rather than to make one exception to that just so we can have the > simple mode as 0. [ shrug... ] I do not buy that argument, because it doesn't justify the COMBINE option: why shouldn't that be inverted, ie USEFINALFUNC? The only way to decide that except by fiat is to say that we're enumerating the non-default or non-simple-mode behaviors. regards, tom lane
I wrote: > [ shrug... ] I do not buy that argument, because it doesn't justify > the COMBINE option: why shouldn't that be inverted, ie USEFINALFUNC? Sorry, I meant USETRANSFUNC of course. regards, tom lane
On 27 June 2016 at 08:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> On 27 June 2016 at 03:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Looking at this in the light of morning, I'm rather strongly tempted to >>> invert the sense of the FINALIZE option, so that "simple" mode works out >>> as zero, ie, select no options. Maybe call it SKIPFINAL instead of >>> FINALIZE? > >> Aggref calls this aggpartial, and I was tempted to invert that many >> times and make it aggfinalize, but in the end didn't. >> It seems nicer to me to keep it as a list of things that are done, >> rather than to make one exception to that just so we can have the >> simple mode as 0. > > [ shrug... ] I do not buy that argument, because it doesn't justify > the COMBINE option: why shouldn't that be inverted, ie USEFINALFUNC? > The only way to decide that except by fiat is to say that we're > enumerating the non-default or non-simple-mode behaviors. hmm ok. I guess what exists today is there because I tried to make the options "on" for additional tasks which aggregate must perform. Whether to use transfunc or combine func is more of an alternative than additional, so I have let what is standard tiebreak the decision on which way around to have that flag. So perhaps you're right and we should just normalise the flag to the standard aggregate behaviour to keep it consistent. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 27 June 2016 at 03:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: > After having worked on the patch some, I think that AggSplit is a pretty > good choice, because it is about how we split up the calculation of an > aggregate. And it's short, which is a good thing for something that's > going to be a component of assorted names. I wish to thank you for working hard to improve this area of the code. I had a read over your changes and I think there's quite a large problem with your code which realy needs some more thinking: I can't help wonder how plan to allow future expansions of non-serialized partial aggregates giving that in setrefs.c you're making a hard assumption that mark_partial_aggref() should always receive the SERIAL versions of the aggsplit. This outright wrong as the Agg node may not be a serialized aggregate node at all. The attached very rough patch hacks together some code which has the planner generate some partial aggregates which are not serialized. Here's what I get: CREATE AGGREGATE myavg (numeric) ( stype = internal, sfunc = numeric_avg_accum, finalfunc = numeric_avg, combinefunc = numeric_avg_combine ); create table a (num numeric not null); insert into a select generate_series(1,10); select myavg(num) from a; ERROR: invalid memory alloc request size 18446744072025250716 This is down to the aggtype being set to BYTEA regardless of the fact that it's not a serialized aggregate. Passing the non-serial versions of the enum to mark_partial_aggref() makes this case work, but that's certainly not the fix. David -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
David Rowley <david.rowley@2ndquadrant.com> writes: > I can't help wonder how plan to allow future expansions of > non-serialized partial aggregates giving that in setrefs.c you're > making a hard assumption that mark_partial_aggref() should always > receive the SERIAL versions of the aggsplit. What I was imagining, but didn't bother to implement immediately, is that we could pass down the appropriate AggSplit value from the plan node (using the context argument for the mutator function). planner.c likewise needs a bit more plumbing to generate such plan nodes in the first place, so I didn't feel that setrefs.c had to be smarter right now. regards, tom lane