Thread: [PATCH] Negative Transition Aggregate Functions (WIP)
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > The attached patch is not quite finished yet, I've not yet fully covered > SUM and AVG for all types. I think you *can't* cover them for the float types; roundoff error would mean you don't get the same answers as before. regards, tom lane
<p dir="ltr"><br /> On 14 Dec 2013 15:40, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br/> ><br /> > David Rowley <<a href="mailto:dgrowleyml@gmail.com">dgrowleyml@gmail.com</a>> writes:<br/> > > The attached patch is not quite finished yet, I've not yet fully covered<br /> > > SUM and AVGfor all types.<br /> ><br /> > I think you *can't* cover them for the float types; roundoff error<br /> > wouldmean you don't get the same answers as before.<br /><p dir="ltr">I was going to say the same thing. But then I startedto wonder.... What's so special about the answers we used to give? They are also subject to round off and the resultsare already quite questionable in those cases.
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Dec 15, 2013 at 12:48 PM, Greg Stark <span dir="ltr"><<ahref="mailto:stark@mit.edu" target="_blank">stark@mit.edu</a>></span> wrote:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><pdir="ltr"><br /> On 14 Dec 2013 15:40, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>>wrote:<br /> ><br /> > David Rowley <<a href="mailto:dgrowleyml@gmail.com"target="_blank">dgrowleyml@gmail.com</a>> writes:<br /> > > The attached patchis not quite finished yet, I've not yet fully covered<br /> > > SUM and AVG for all types.<br /> ><br /> >I think you *can't* cover them for the float types; roundoff error<br /> > would mean you don't get the same answersas before.<br /></div></div><p dir="ltr">I was going to say the same thing. But then I started to wonder.... What'sso special about the answers we used to give? They are also subject to round off and the results are already quitequestionable in those cases.</blockquote></div>I guess they probably shouldn't be, subject to rounding / influencedby errors from tuples that are out of scope of the aggregate context.</div><div class="gmail_extra">Though sayingthat it would be a shame to have this optimisation for all but float and double. I can imagine the questions in [GENERAL]..Why is SUM(<int>) OVER ().. fast but SUM(<float>) OVER () slow? I wonder what other RDBMS' do here...</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Regards</div><div class="gmail_extra"><br /></div><divclass="gmail_extra">David Rowley</div></div>
Greg Stark <stark@mit.edu> writes: > On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> I think you *can't* cover them for the float types; roundoff error >> would mean you don't get the same answers as before. > I was going to say the same thing. But then I started to wonder.... What's > so special about the answers we used to give? They are also subject to > round off and the results are already quite questionable in those cases. Well, we can't easily do better than the old answers, and the new ones might be arbitrarily worse. Example: sum or average across single-row windows ought to be exact in any case, but it might be arbitrarily wrong with the negative-transition technique. More generally, this is supposed to be a performance enhancement only; it's not supposed to change the results. This consideration also makes me question whether we should apply the method for NUMERIC. Although in principle numeric addition/subtraction is exact, such a sequence could leave us with a different dscale than is returned by the existing code. I'm not sure if changing the number of trailing zeroes is a big enough behavior change to draw complaints. regards, tom lane
On 12/14/2013 05:00 PM, Tom Lane wrote: > This consideration also makes me question whether we should apply the > method for NUMERIC. Although in principle numeric addition/subtraction > is exact, such a sequence could leave us with a different dscale than > is returned by the existing code. I'm not sure if changing the number of > trailing zeroes is a big enough behavior change to draw complaints. If we're going to disqualify NUMERIC too, we might as well bounce the feature. Without a fast FLOAT or NUMERIC, you've lost most of the target audience. I think even the FLOAT case deserves some consideration. What's the worst-case drift? In general, folks who do aggregate operations on FLOATs aren't expecting an exact answer, or one which is consistent beyond a certain number of significant digits. And Dave is right: how many bug reports would we get about "NUMERIC is fast, but FLOAT is slow"? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
If we're going to disqualify NUMERIC too, we might as well bounce thefeature. Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > I think even the FLOAT case deserves some consideration. What's the > worst-case drift? Complete loss of all significant digits. The case I was considering earlier of single-row windows could be made safe (I think) if we apply the negative transition function first, before incorporating the new row(s). Then for example if you've got float8 1e20 followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer. It's not so good with two-row windows though: Table correct sum of negative-transition this + next value result 1e20 1e20 1e20+ 1 = 1e20 1 1 1e20 - 1e20 + 0 = 0 0 > In general, folks who do aggregate operations on > FLOATs aren't expecting an exact answer, or one which is consistent > beyond a certain number of significant digits. Au contraire. People who know what they're doing expect the results to be what an IEEE float arithmetic unit would produce for the given calculation. They know how the roundoff error ought to behave, and they will not thank us for doing a calculation that's not the one specified. I will grant you that there are plenty of clueless people out there who *don't* know this, but they shouldn't be using float arithmetic anyway. > And Dave is right: how many bug reports would we get about "NUMERIC is > fast, but FLOAT is slow"? I've said this before, but: we can make it arbitrarily fast if we don't have to get the right answer. I'd rather get "it's slow" complaints than "this is the wrong answer" complaints. regards, tom lane
David Rowley <dgrowleyml@gmail.com> writes: > On Sun, Dec 15, 2013 at 2:27 PM, Josh Berkus <josh@agliodbs.com> wrote: >> If we're going to disqualify NUMERIC too, we might as well bounce the >> feature. Without a fast FLOAT or NUMERIC, you've lost most of the >> target audience. > I think the feature is worth it alone if we could improve COUNT(*). Agreed, count(*) and operations on integers are alone enough to be worth the trouble. regards, tom lane
I wrote: > It's not so good with two-row windows though: Actually, carrying that example a bit further makes the point even more forcefully: Table correct sum of negative-transition this + next value result 1e20 1e20 1e20 + 1 = 1e20 1 1 1e20 - 1e20 + 0 = 0 0 0 0 - 1 + 0 = -1 0 1 -1 - 0 + 1 = 0 1 Those last few answers are completely corrupt. regards, tom lane
I wrote:Actually, carrying that example a bit further makes the point even more
> It's not so good with two-row windows though:
forcefully:0 0 0 - 1 + 0 = -1
Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0 1 -1 - 0 + 1 = 0
1
Those last few answers are completely corrupt.
regards, tom lane
David Rowley <dgrowleyml@gmail.com> writes: > I guess the answer for the people that complain about slowness could be > that they create their own aggregate function which implements float4pl as > the trans function and float4mi as the negative trans function. They can > call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note in > the documents for this patch? I think it would be an absolutely perfect documentation example to show how to set up such an aggregate (and then point out the risks, of course). > As for numeric, I did start working on this just after I posted the > original patch and before I saw your comment about it. I did end up making > do_numeric_desperse() which was to be the reverse of do_numeric_accum(), > but I got stuck on the equivalent of when do_numeric_accum() > does mul_var(&X, &X, &X2, X.dscale * 2); Ummm ... why doesn't it work to just use numeric_add and numeric_sub, exactly parallel to the float case? regards, tom lane
I wrote:Actually, carrying that example a bit further makes the point even more
> It's not so good with two-row windows though:
forcefully:0 0 0 - 1 + 0 = -1
Table correct sum of negative-transition
this + next value result
1e20 1e20 1e20 + 1 = 1e20
1 1 1e20 - 1e20 + 0 = 0
0 1 -1 - 0 + 1 = 0
1
Those last few answers are completely corrupt.
regards, tom lane
David Rowley <dgrowleyml@gmail.com> writes:I think it would be an absolutely perfect documentation example to show
> I guess the answer for the people that complain about slowness could be
> that they create their own aggregate function which implements float4pl as
> the trans function and float4mi as the negative trans function. They can
> call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note in
> the documents for this patch?
how to set up such an aggregate (and then point out the risks, of course).
> As for numeric, I did start working on this just after I posted theUmmm ... why doesn't it work to just use numeric_add and numeric_sub,
> original patch and before I saw your comment about it. I did end up making
> do_numeric_desperse() which was to be the reverse of do_numeric_accum(),
> but I got stuck on the equivalent of when do_numeric_accum()
> does mul_var(&X, &X, &X2, X.dscale * 2);
exactly parallel to the float case?
regards, tom lane
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > I've attached an updated patch which includes some documentation. > I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's > an ok name for the option, but if anyone has any better ideas please let > them be known. I'd be a bit inclined to build the terminology around "reverse" instead of "negative" --- the latter seems a bit too arithmetic-centric. But that's just MHO. regards, tom lane
<p dir="ltr">On Dec 15, 2013 6:44 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br/> > David Rowley <<a href="mailto:dgrowleyml@gmail.com">dgrowleyml@gmail.com</a>> writes:<br /> > >I've attached an updated patch which includes some documentation.<br /> > > I've also added support for negfuncin CREATE AGGREGATE. Hopefully that's<br /> > > an ok name for the option, but if anyone has any better ideasplease let<br /> > > them be known.<br /> ><br /> > I'd be a bit inclined to build the terminology around"reverse" instead of<br /> > "negative" --- the latter seems a bit too arithmetic-centric. But that's<br /> >just MHO.<p dir="ltr">To contribute to the bike shedding, inverse is often used in similar contexts.<p dir="ltr">--<br/> Ants Aasma
On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > I've attached an updated patch which includes some documentation.
> > I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's
> > an ok name for the option, but if anyone has any better ideas please let
> > them be known.
>
> I'd be a bit inclined to build the terminology around "reverse" instead of
> "negative" --- the latter seems a bit too arithmetic-centric. But that's
> just MHO.To contribute to the bike shedding, inverse is often used in similar contexts.
--
Ants Aasma
On Mon, Dec 16, 2013 at 08:39:33PM +1300, David Rowley wrote: > On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee> wrote: > > > On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > > > David Rowley <dgrowleyml@gmail.com> writes: > > > > I've attached an updated patch which includes some documentation. > > > > I've also added support for negfunc in CREATE AGGREGATE. Hopefully > > that's > > > > an ok name for the option, but if anyone has any better ideas please > > let > > > > them be known. > > > > > > I'd be a bit inclined to build the terminology around "reverse" instead > > of > > > "negative" --- the latter seems a bit too arithmetic-centric. But that's > > > just MHO. > > > > To contribute to the bike shedding, inverse is often used in similar > > contexts. > > > I guess it's not really bike shedding, most of the work I hope is done, so > I might as well try to get the docs polished up and we'd need a consensus > on what we're going to call them before I can get that done. > > I like both of these better than negative transition function and I agree > negative implies arithmetic rather than opposite. > Out of these 2 I do think inverse fits better than reverse, so I guess that > would make it "inverse aggregate transition function". > Would that make the CREATE AGGREGATE option be INVFUNC ? > > Any other ideas or +1's for any of the existing ones? +1 for inverse. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
+1, inverse good :)On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > I've attached an updated patch which includes some documentation.
> > I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's
> > an ok name for the option, but if anyone has any better ideas please let
> > them be known.
>
> I'd be a bit inclined to build the terminology around "reverse" instead of
> "negative" --- the latter seems a bit too arithmetic-centric. But that's
> just MHO.To contribute to the bike shedding, inverse is often used in similar contexts.
I guess it's not really bike shedding, most of the work I hope is done, so I might as well try to get the docs polished up and we'd need a consensus on what we're going to call them before I can get that done.I like both of these better than negative transition function and I agree negative implies arithmetic rather than opposite.Out of these 2 I do think inverse fits better than reverse, so I guess that would make it "inverse aggregate transition function".Would that make the CREATE AGGREGATE option be INVFUNC ?Any other ideas or +1's for any of the existing ones?
RegardsDavid Rowley--
Ants Aasma
-- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 12/15/2013 03:57 AM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> I think even the FLOAT case deserves some consideration. What's the >> worst-case drift? > > Complete loss of all significant digits. > > The case I was considering earlier of single-row windows could be made > safe (I think) if we apply the negative transition function first, before > incorporating the new row(s). Then for example if you've got float8 1e20 > followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer. > It's not so good with two-row windows though: > > Table correct sum of negative-transition > this + next value result > 1e20 1e20 1e20 + 1 = 1e20 > 1 1 1e20 - 1e20 + 0 = 0 > 0 > >> In general, folks who do aggregate operations on >> FLOATs aren't expecting an exact answer, or one which is consistent >> beyond a certain number of significant digits. > > Au contraire. People who know what they're doing expect the results > to be what an IEEE float arithmetic unit would produce for the given > calculation. They know how the roundoff error ought to behave, and they > will not thank us for doing a calculation that's not the one specified. > I will grant you that there are plenty of clueless people out there > who *don't* know this, but they shouldn't be using float arithmetic > anyway. > >> And Dave is right: how many bug reports would we get about "NUMERIC is >> fast, but FLOAT is slow"? > > I've said this before, but: we can make it arbitrarily fast if we don't > have to get the right answer. I'd rather get "it's slow" complaints > than "this is the wrong answer" complaints. There's another technique we could use which doesn't need a negative transition function, assuming the order you feed the values to the aggreate function doesn't matter: keep subtotals. For example, if the window first contains values 1, 2, 3, 4, you calculate 3 + 4 = 7, and then 1 + 2 + 7 = 10. Next, 1 leaves the window, and 5 enters it. Now you calculate 2 + 7 + 5 = 14. By keeping the subtotal (3 + 4 = 7) around, you saved one addition compared to calculating 2 + 3 + 4 + 5 from scratch. The negative transition function is a lot simpler and faster for count(*) and integer operations, so we probably should implement that anyway. But the subtotals technique could be very useful for other data types. - Heikki
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Dec 16, 2013 at 9:39 PM, Heikki Linnakangas <spandir="ltr"><<a href="mailto:hlinnakangas@vmware.com" target="_blank">hlinnakangas@vmware.com</a>></span> wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><divclass="h5"><br /></div></div> There's another technique we could use which doesn't need a negative transitionfunction, assuming the order you feed the values to the aggreate function doesn't matter: keep subtotals. For example,if the window first contains values 1, 2, 3, 4, you calculate 3 + 4 = 7, and then 1 + 2 + 7 = 10. Next, 1 leavesthe window, and 5 enters it. Now you calculate 2 + 7 + 5 = 14. By keeping the subtotal (3 + 4 = 7) around, you savedone addition compared to calculating 2 + 3 + 4 + 5 from scratch.<br /><br /> The negative transition function is a lotsimpler and faster for count(*) and integer operations, so we probably should implement that anyway. But the subtotalstechnique could be very useful for other data types.<span class=""><font color="#888888"><br /><br /> - Heikki<br/></font></span></blockquote></div><br /></div><div class="gmail_extra">That's quite interesting. I guess we wouldneed another flag in pg_aggregate to mark if the order of the tuples matters, string_agg would be an example of onethat would have to skip this.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">At least for ROWS BETWEENCURRENT ROW AND UNBOUNDED FOLLOWING if the aggregate order did not matter than it would likely be quite efficientjust to aggregate from the bottom up and materialise the results at each tuple and store it in the tuple store,then just use that materialised value when that tuple is processed. This method won't work when the frame base is notfixed.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">I had been thinking ahead on how to improveMIN and MAX cases too. I came up with something called "tuple store indexes" that could be build as binary searchtrees with a composite index on the tuple position and the aggregate's sort operator... Something similar to how thefollowing query could use an index on (id,value) to calculate max()</div><div class="gmail_extra">select max(value) fromtest where id between 1 and 100;<br /></div><div class="gmail_extra">It's certainly not something for this patch, butit was an idea I came up with which I think would be possible without adding any more columns to pg_aggregate.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Regards</div><div class="gmail_extra"><br/></div><div class="gmail_extra">David Rowley</div><div class="gmail_extra"><br /></div></div>
One thing that is currently on my mind is what to do when passing volatile functions to the aggregate. Since the number of times we execute a volatile function will much depend on the window frame options, I think we should include some sort of warning in the documentation that "The number of times that the expression is evaluated within the aggregate function when used in the context of a WINDOW is undefined". The other option would be to disable this optimisation if the aggregate expression contained a volatile function, but doing this, to me seems a bit weird as is anyone actually going to be depending on a volatile function being executed so many times?
Attachment
David Rowley <dgrowleyml@gmail.com> writes: >> One thing that is currently on my mind is what to do when passing volatile >> functions to the aggregate. Since the number of times we execute a volatile >> function will much depend on the window frame options, I think we should >> include some sort of warning in the documentation that "The number of times >> that the expression is evaluated within the aggregate function when used in >> the context of a WINDOW is undefined". The other option would be to disable >> this optimisation if the aggregate expression contained a volatile >> function, but doing this, to me seems a bit weird as is anyone actually >> going to be depending on a volatile function being executed so many times? Once again: this patch has no business changing any user-visible behavior. That would include not changing the number of evaluations of volatile functions. The planner is full of places where optimizations are disabled for volatile subexpressions, and I don't see why this should be different. regards, tom lane
On Sat, Dec 14, 2013 at 8:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@mit.edu> writes: >> On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >>> I think you *can't* cover them for the float types; roundoff error >>> would mean you don't get the same answers as before. > >> I was going to say the same thing. But then I started to wonder.... What's >> so special about the answers we used to give? They are also subject to >> round off and the results are already quite questionable in those cases. > > Well, we can't easily do better than the old answers, and the new ones > might be arbitrarily worse. Example: sum or average across single-row > windows ought to be exact in any case, but it might be arbitrarily wrong > with the negative-transition technique. > > More generally, this is supposed to be a performance enhancement only; > it's not supposed to change the results. > > This consideration also makes me question whether we should apply the > method for NUMERIC. Although in principle numeric addition/subtraction > is exact, such a sequence could leave us with a different dscale than > is returned by the existing code. I'm not sure if changing the number of > trailing zeroes is a big enough behavior change to draw complaints. I tend to think it is. I'm not sure if it's worth it, but you could finesse this problem with a more complex transition state - keep track of how many values with any given scale are part of the current window. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Once again: this patch has no business changing any user-visible behavior.That would include not changing the number of evaluations of volatile
functions. The planner is full of places where optimizations are disabled
for volatile subexpressions, and I don't see why this should be different.
regards, tom lane
On Tue, Dec 17, 2013 at 1:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Once again: this patch has no business changing any user-visible behavior.That would include not changing the number of evaluations of volatile
functions. The planner is full of places where optimizations are disabled
for volatile subexpressions, and I don't see why this should be different.My point was meant to be more along the lines of that I thought it was already broken and it perhaps should be fixed or at the very least we could warn the users about it.I would imagine that most of those other places in the planner are to prevent extra evaluations of volatile functions? In this particular case we're already evaluating these multiple extra times when a tuple moves of the top of the frame. I would have thought that we should only evaluate the volatile function once per tuple. This is not what the current implementation does.I don't have an issue skipping this optimisation when the aggregate's expression contain any volatile functions. I just wanted to raise my concerns about the current behaviour, which I find a bit bizarre.
+1, inverse good :)On 12/16/2013 08:39 AM, David Rowley wrote:Any other ideas or +1's for any of the existing ones?
Attachment
On Mon, Dec 16, 2013 at 9:36 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:+1, inverse good :)On 12/16/2013 08:39 AM, David Rowley wrote:Any other ideas or +1's for any of the existing ones?In the attached patch I've renamed negative to inverse. I've also disabled the inverse functions when an expression in an aggregate contains a volatile function.
Attachment
On Sat, December 21, 2013 10:53, David Rowley wrote: > [inverse_transition_functions_v1.1.patch.gz ] Hi, I know, $subject says "WIP", but I assumed it's supposed to compile, so I tried to get this to run on linux (Centos 5.0). gcc 4.8.2, with ./configure --prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.inverse --with-pgport=6554 --enable-depend --enable-cassert --enable-debug --with-openssl --with-perl --with-libxml --with-libxslt --with-ossp-uuid but although applying the patch and configure were OK, the compile went awry: [...] make[4]: Entering directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/backend' make[4]: Nothing to be done for `submake-errcodes'. make[4]: Leaving directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/backend' make[3]: Leaving directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/common' make -C catalog schemapg.h make[3]: Entering directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/backend/catalog' cd ../../../src/include/catalog && '/home/aardvark/perl-5.19/bin/perl' ./duplicate_oids 1220 1221 1800 1801 1970 1971 make[3]: Leaving directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/backend/catalog' make[2]: Leaving directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/backend' make[1]: Leaving directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src' -- [2013.12.21 11:35:29 inverse] make contrib dblink.c:55:28: fatal error: utils/fmgroids.h: No such file or directory#include "utils/fmgroids.h" ^ compilation terminated. make[1]: *** [dblink.o] Error 1 make: *** [all-dblink-recurse] Error 2 -- make returned 2 - abort ( although that output mentions perl 5,19, I also tried with regular perl 5.18, cassert on/off, etc.; every combination yielded this same compile error ) Thanks, Erik Rijkers
On Sat, December 21, 2013 10:53, David Rowley wrote:
> [inverse_transition_functions_v1.1.patch.gz ]
Hi,
I know, $subject says "WIP", but I assumed it's supposed to compile, so I tried to get this to run on linux (Centos 5.0).
gcc 4.8.2,
with
cd ../../../src/include/catalog && '/home/aardvark/perl-5.19/bin/perl' ./duplicate_oids
1220
1221
1800
1801
1970
1971
On Sat, Dec 21, 2013 at 11:47 PM, Erik Rijkers <er@xs4all.nl> wrote:On Sat, December 21, 2013 10:53, David Rowley wrote:
> [inverse_transition_functions_v1.1.patch.gz ]
Hi,
I know, $subject says "WIP", but I assumed it's supposed to compile, so I tried to get this to run on linux (Centos 5.0).
gcc 4.8.2,
with<snip>cd ../../../src/include/catalog && '/home/aardvark/perl-5.19/bin/perl' ./duplicate_oids
1220
1221
1800
1801
1970
1971Thanks for the report. It seems the windows build does not check for duplicate OIDs and the linux one does.I'll just need to change the Oids of the new functions I've added, only it seems the unused_oids perl script does not like windows.I'll try to get you a working patch shortly.
RegardsDavid Rowley
Attachment
On Sat, December 21, 2013 12:38, David Rowley wrote: > [ inverse_transition_functions_v1.2.patch.gz ] > Please find attached an updated patch which should remove the duplicate OID > problem you saw. That fixes it, thanks There is 1 of 141 failed tests: window ... FAILED but that's easily ignored for now. I'll do some testing later, Thanks Erik Rijkers
On Sat, December 21, 2013 12:38, David Rowley wrote:
> [ inverse_transition_functions_v1.2.patch.gz ]> Please find attached an updated patch which should remove the duplicate OIDThat fixes it, thanks
> problem you saw.
There is 1 of 141 failed tests:
window ... FAILED
but that's easily ignored for now.
I'll do some testing later,
Thanks
Erik Rijkers
On Sat, December 21, 2013 12:52, David Rowley wrote: > On Sun, Dec 22, 2013 at 12:49 AM, Erik Rijkers <er@xs4all.nl> wrote: > >> On Sat, December 21, 2013 12:38, David Rowley wrote: >> > [ inverse_transition_functions_v1.2.patch.gz ] >> > Please find attached an updated patch which should remove the duplicate >> OID >> > problem you saw. >> >> That fixes it, thanks >> >> There is 1 of 141 failed tests: >> >> window ... FAILED >> >> > That's strange, it passes here. > > Would you be able to send me the regression diff file? > attached...
Attachment
On Sat, December 21, 2013 12:52, David Rowley wrote:attached...
> On Sun, Dec 22, 2013 at 12:49 AM, Erik Rijkers <er@xs4all.nl> wrote:
>
>> On Sat, December 21, 2013 12:38, David Rowley wrote:
>> > [ inverse_transition_functions_v1.2.patch.gz ]
>> > Please find attached an updated patch which should remove the duplicate
>> OID
>> > problem you saw.
>>
>> That fixes it, thanks
>>
>> There is 1 of 141 failed tests:
>>
>> window ... FAILED
>>
>>
> That's strange, it passes here.
>
> Would you be able to send me the regression diff file?
>
Attachment
On Sun, Dec 22, 2013 at 1:01 AM, Erik Rijkers <er@xs4all.nl> wrote:On Sat, December 21, 2013 12:52, David Rowley wrote:attached...
> On Sun, Dec 22, 2013 at 12:49 AM, Erik Rijkers <er@xs4all.nl> wrote:
>
>> On Sat, December 21, 2013 12:38, David Rowley wrote:
>> > [ inverse_transition_functions_v1.2.patch.gz ]
>> > Please find attached an updated patch which should remove the duplicate
>> OID
>> > problem you saw.
>>
>> That fixes it, thanks
>>
>> There is 1 of 141 failed tests:
>>
>> window ... FAILED
>>
>>
> That's strange, it passes here.
>
> Would you be able to send me the regression diff file?
>ThanksThis was just down to some missing trailing white space on the expected results. I've fixed these up and attached another patch.
RegardsDavid Rowley
Attachment
On Wed, December 25, 2013 14:49, David Rowley wrote: > [ inverse_transition_functions_v1.5.patch.gz ] I ran into the following problem which is, I think, NOT a problem with your patch but with my setup. Still, if anyone can enlighten me on its cause I'd be thankful (it shows up every now and then for me). I have a 200 GB dev database running under 9.4devel that I thought I could now, for test purposes, compile a patched postgres binary for (i.e.: a HEAD + inverse_transition_functions_v1.5.patch.gz binary), so as to avoid an initdb and use the existing 200 GB data. I know the patch is valid, because a separately built, newly initdb-ed instance runs the below statement without fail. But running from existing 200 gb dev database I get: $ echo "\\set VERBOSITY verbose \\\\ SELECT SUM(n::integer) OVER (ORDER BY n ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM generate_series(1, 100) g(n)" | psql ERROR: XX000: cache lookup failed for type 0 LOCATION: get_typlenbyval, lsyscache.c:1895 Time: 2.752 ms Is there a way I can use the existing database and avoid both initdb and this error? Thanks, Erik Rijkers See also: [1] http://www.postgresql.org/message-id/flat/E5F4C5A18CAB7A4DA23080DE9CE8158603E67AA1@eaubrmw001.eapac.ericsson.se#E5F4C5A18CAB7A4DA23080DE9CE8158603E67AA1@eaubrmw001.eapac.ericsson.se
"Erik Rijkers" <er@xs4all.nl> writes: > I have a 200 GB dev database running under 9.4devel that I thought I could now, for test purposes, compile a patched > postgres binary for (i.e.: a HEAD + inverse_transition_functions_v1.5.patch.gz binary), so as to avoid an initdb and use > the existing 200 GB data. That's not going to work for any patch that touches the initial contents of the system catalogs. > Is there a way I can use the existing database and avoid both initdb and this error? Use pg_upgrade to move the data into a newly initdb'd directory. Note you will need both unpatched and patched executables for this. regards, tom lane
On 12/14/2013 05:00 PM, Tom Lane wrote:If we're going to disqualify NUMERIC too, we might as well bounce the
> This consideration also makes me question whether we should apply the
> method for NUMERIC. Although in principle numeric addition/subtraction
> is exact, such a sequence could leave us with a different dscale than
> is returned by the existing code. I'm not sure if changing the number of
> trailing zeroes is a big enough behavior change to draw complaints.
feature. Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.
I think even the FLOAT case deserves some consideration. What's the
worst-case drift? In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.
And Dave is right: how many bug reports would we get about "NUMERIC is
fast, but FLOAT is slow"?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
Attachment
From what I can tell adding an inverse transition function to support AVG for numeric does not affect the number of trailing zeros in the results, so I've attached a patch which now has inverse transition functions to support numeric types in AVG and all of the STDDEV* aggregates.
Attachment
On Thu, January 2, 2014 13:05, David Rowley wrote: > here's a slightly updated patch > [inverse_transition_functions_v1.8.patch.gz ] patch applies, and compiles (although with new warnings). But make check complains loudly: see attached. warnings: In file included from gram.y:13871:0: scan.c: In function â: scan.c:10167:23: warning: unused variable â [-Wunused-variable] struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; /* This var may be unused depending upon options. */ ^ In file included from ../../../../src/include/utils/probes.h:6:0, from ../../../../src/include/pg_trace.h:15, from tuplesort.c:109: tuplesort.c: In function â: tuplesort.c:935:44: warning: comparison between pointer and integer [enabled by default] TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed); ^ tuplesort.c:935:2: note: in expansion of macro â TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed); I didn't yet use the patch you sent after this patch. Thanks, Erik Rijkers
Attachment
2013/12/15 David Rowley <dgrowleyml@gmail.com>: > I've been working on speeding up aggregate functions when used in the > context of a window's with non fixed frame heads. > 1. Fully implement negative transition functions for SUM and AVG. I would like to mention that this functionality is also extremely useful to have for the incremental maintenance of materialized views that use aggregation (which IMHO is one of the most useful kinds). (Simply imagine a view of the form “SELECT a, agg_function(b) FROM table GROUP BY a”, a lot of rows in the table, a lot of rows in each group, and changes that both remove and add new rows.) For this to work, two things are needed: (1) A way to apply a value normally (already supported) and inversely (i.e., this patch) to the current “internal state” of an aggregation. (2) A way to store the “internal state” of an aggregation in the materialized view’s “extent” (i.e., the physical rows that represent the view’s contents, which may or may not be slightly different from what you get when you do SELECT * FROM matview). As (AFAIK) that state is stored as a normal value, the maintenance code could just take the value, store it in the extent, and next time retrieve it again and perform normal or inverse transitions. When selecting from the matview, the state could be retrieved, and the final function applied to it to yield the value to be returned. To understand (2), assume that one wants to store an AVG() in a materialized view; To be able to update the value incrementally, one needs to actually store the SUM() and COUNT(), and perform the division when selecting from the materialized view. Or it could (initially) be decided to define AVG() as “not supporting fast incremental maintenance,” and require the user (if he/she wants fast incremental maintenance) to put SUM() and COUNT() in the materialized view manually, and perform the division manually when wanting to retrieve the average. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad?
On Thu, January 2, 2014 13:36, Erik Rijkers wrote: > On Thu, January 2, 2014 13:05, David Rowley wrote: >> here's a slightly updated patch >> [inverse_transition_functions_v1.8.patch.gz ] > > patch applies, and compiles (although with new warnings). > But make check complains loudly: see attached. > > warnings: The TRACE_POSTGRESQL_SORT_DONE warnings were not from your patch; sorry about that. They occur on HEAD too (with a debug compile). tuplesort.c:935:44: warning: comparison between pointer and integer [enabled by default] TRACE_POSTGRESQL_SORT_DONE(state->tapeset!= NULL, spaceUsed); ^ tuplesort.c:935:2: note: in expansion of macro â TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed); The 'make check' failure remains a problem The output I sent earlier today was for this configure: ./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.inverse --with-pgport=6594 \ --bindir=/var/data1/pg_stuff/pg_installations/pgsql.inverse/bin \ --libdir=/var/data1/pg_stuff/pg_installations/pgsql.inverse/lib \ --quiet --enable-depend --enable-cassert --enable-debug --with-perl \ --with-openssl --with-libxml --enable-dtrace (and that's still repeatable) Perhaps this helps: with another configure: ./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.inverse --with-pgport=6594 --bindir=/var/data1/pg_stuff/pg_installations/pgsql.inverse/bin.fast --libdir=/var/data1/pg_stuff/pg_installations/pgsql.inverse/lib.fast --quiet --enable-depend --with-perl --with-openssl --with-libxml I get only this single 'make check' error: *** /var/data1/pg_stuff/pg_sandbox/pgsql.inverse/src/test/regress/expected/window.out 2014-01-02 16:19:48.000000000 +0100 --- /var/data1/pg_stuff/pg_sandbox/pgsql.inverse/src/test/regress/results/window.out 2014-01-02 16:21:43.000000000 +0100 *************** *** 1188,1195 **** sum ------ 6.01 ! 5 ! 3 (3 rows) SELECT i,COUNT(v) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) --- 1188,1195 ---- sum ------ 6.01 ! 5.00 ! 3.00 (3 rows) SELECT i,COUNT(v) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) ====================================================================== Centos 5.7 gcc 4.8.2 Thanks; and Happy New Year Erik Rijkers
"Erik Rijkers" <er@xs4all.nl> writes: > The TRACE_POSTGRESQL_SORT_DONE warnings were not from your patch; sorry about that. They occur on HEAD too (with a debug > compile). > tuplesort.c:935:44: warning: comparison between pointer and integer [enabled by default] > TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed); > ^ > tuplesort.c:935:2: note: in expansion of macro � > TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed); FWIW, I don't see any such warnings on either of the machines I have that will accept --enable-dtrace. state->tapeset is a pointer, so these warnings look a tad bogus. regards, tom lane
*** /var/data1/pg_stuff/pg_sandbox/pgsql.inverse/src/test/regress/expected/window.out 2014-01-02 16:19:48.000000000 +0100--- /var/data1/pg_stuff/pg_sandbox/pgsql.inverse/src/test/regress/results/window.out 2014-01-02 16:21:43.000000000 +0100
***************
*** 1188,1195 ****
sum
------
6.01
! 5
! 3
(3 rows)
SELECT i,COUNT(v) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
--- 1188,1195 ----
sum
------
6.01
! 5.00
! 3.00
(3 rows)
On Thu, January 2, 2014 17:33, Erik Rijkers wrote: > On Thu, January 2, 2014 13:36, Erik Rijkers wrote: >> On Thu, January 2, 2014 13:05, David Rowley wrote: >>> here's a slightly updated patch >>> [inverse_transition_functions_v1.8.patch.gz ] >> >> patch applies, and compiles (although with new warnings). >> But make check complains loudly To figure out where this 'make check' failed, I lifted a few statements from the offending sql: src/test/regress/sql/window.sql (see snafu.sh below). That reliably crashes the server. It is caused by a SUM, but only when configured like this (i.e. *not* configured for speed) : $ pg_config --configure '--prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.inverse' '--bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.inverse/bin' '--libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.inverse/lib' '--with-pgport=6554' '--enable-depend' '--enable-cassert' '--enable-debug' '--with-openssl' '--with-perl' '--with-libxml' '--with-libxslt' '--with-ossp-uuid' '--with-zlib' $ cat snafu.sh #!/bin/sh echo " -- -- WINDOW FUNCTIONS -- drop TABLE if exists empsalary ; -- CREATE TEMPORARY TABLE empsalary ( CREATE TABLE empsalary ( depname varchar, empno bigint, salary int, enroll_date date ); INSERT INTO empsalary VALUES ('develop', 10, 5200, '2007-08-01'), ('sales', 1, 5000, '2006-10-01'), ('personnel', 5, 3500, '2007-12-10'), ('sales', 4, 4800, '2007-08-08'), ('personnel', 2, 3900, '2006-12-23'), ('develop', 7, 4200, '2008-01-01'), ('develop', 9, 4500, '2008-01-01'), ('sales', 3, 4800, '2007-08-01'), ('develop', 8, 6000, '2006-10-01'), ('develop', 11, 5200, '2007-08-15'); -- SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary; " | psql echo "select * from empsalary;" | psql echo " SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary; " | psql $ ./snafu.sh Timing is on. DROP TABLE Time: 1.093 ms CREATE TABLE Time: 80.161 ms INSERT 0 10 Time: 11.964 ms Timing is on. depname | empno | salary | enroll_date -----------+-------+--------+-------------develop | 10 | 5200 | 2007-08-01sales | 1 | 5000 | 2006-10-01personnel| 5 | 3500 | 2007-12-10sales | 4 | 4800 | 2007-08-08personnel | 2 | 3900 | 2006-12-23develop | 7 | 4200 | 2008-01-01develop | 9 | 4500 | 2008-01-01sales | 3 | 4800 | 2007-08-01develop | 8 | 6000 | 2006-10-01develop | 11 | 5200 | 2007-08-15 (10 rows) Time: 1.854 ms Timing is on. connection to server was lost So, to repeat, this runs fine on a server compiled for speed. I haven't looked any further (whether perhaps more statements are faulty)
On Fri, January 3, 2014 00:09, Erik Rijkers wrote: > connection to server was lost > > So, to repeat, this runs fine on a server compiled for speed. > I forgot to append the log messages: 2014-01-03 00:19:17.073 CET 14054 LOG: database system is ready to accept connections TRAP: FailedAssertion("!(!((bool) ((invtransfn_oid) != ((Oid) 0))))", File: "parse_agg.c", Line: 1255) 2014-01-03 00:19:29.605 CET 14054 LOG: server process (PID 14143) was terminated by signal 6: Aborted 2014-01-03 00:19:29.605 CET 14054 DETAIL: Failed process was running: SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary; 2014-01-03 00:19:29.605 CET 14054 LOG: terminating any other active server processes 2014-01-03 00:19:29.607 CET 14054 LOG: all server processes terminated; reinitializing etc. etc.
On Fri, January 3, 2014 00:09, Erik Rijkers wrote:I forgot to append the log messages:
> connection to server was lost
>
> So, to repeat, this runs fine on a server compiled for speed.
>
2014-01-03 00:19:17.073 CET 14054 LOG: database system is ready to accept connections
TRAP: FailedAssertion("!(!((bool) ((invtransfn_oid) != ((Oid) 0))))", File: "parse_agg.c", Line: 1255)
2014-01-03 00:19:29.605 CET 14054 LOG: server process (PID 14143) was terminated by signal 6: Aborted
2014-01-03 00:19:29.605 CET 14054 DETAIL: Failed process was running: SELECT depname, empno, salary, sum(salary) OVER(PARTITION BY depname) FROM empsalary ORDER BY depname, salary;2014-01-03 00:19:29.605 CET 14054 LOG: terminating any other active server processes
2014-01-03 00:19:29.607 CET 14054 LOG: all server processes terminated; reinitializing
etc. etc.
Attachment
On 15 December 2013 01:57, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Berkus <josh@agliodbs.com> writes: >> I think even the FLOAT case deserves some consideration. What's the >> worst-case drift? > > Complete loss of all significant digits. > > The case I was considering earlier of single-row windows could be made > safe (I think) if we apply the negative transition function first, before > incorporating the new row(s). Then for example if you've got float8 1e20 > followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer. > It's not so good with two-row windows though: > > Table correct sum of negative-transition > this + next value result > 1e20 1e20 1e20 + 1 = 1e20 > 1 1 1e20 - 1e20 + 0 = 0 > 0 > >> In general, folks who do aggregate operations on >> FLOATs aren't expecting an exact answer, or one which is consistent >> beyond a certain number of significant digits. > > Au contraire. People who know what they're doing expect the results > to be what an IEEE float arithmetic unit would produce for the given > calculation. They know how the roundoff error ought to behave, and they > will not thank us for doing a calculation that's not the one specified. > I will grant you that there are plenty of clueless people out there > who *don't* know this, but they shouldn't be using float arithmetic > anyway. > >> And Dave is right: how many bug reports would we get about "NUMERIC is >> fast, but FLOAT is slow"? > > I've said this before, but: we can make it arbitrarily fast if we don't > have to get the right answer. I'd rather get "it's slow" complaints > than "this is the wrong answer" complaints. > Hi, Reading over this, I realised that there is a problem with NaN handling --- once the state becomes NaN, it can never recover. So the results using the inverse transition function don't match HEAD in cases like this: create table t(a int, b numeric); insert into t values(1,1),(2,2),(3,'NaN'),(4,3),(5,4); select a, b, sum(b) over(order by a rows between 1 preceding and current row) from t; which in HEAD produces: a | b | sum ---+-----+-----1 | 1 | 12 | 2 | 33 | NaN | NaN4 | 3 | NaN5 | 4 | 7 (5 rows) but with this patch produces: a | b | sum ---+-----+-----1 | 1 | 12 | 2 | 33 | NaN | NaN4 | 3 | NaN5 | 4 | NaN (5 rows) Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > Reading over this, I realised that there is a problem with NaN > handling --- once the state becomes NaN, it can never recover. So the > results using the inverse transition function don't match HEAD in > cases like this: Ouch! That takes out numeric, float4, and float8 in one fell swoop. Given the relative infrequency of NaNs in most data, it seems like it might still be possible to get a speedup if we could use inverse transitions until we hit a NaN, then do it the hard way until the NaN is outside the window, then go back to inverse transitions. I'm not sure though if this is at all practical from an implementation standpoint. We certainly don't want the core code knowing about anything as datatype-specific as a NaN, but maybe the inverse transition function could have an API that allows reporting "I can't do it here, fall back to the hard way". regards, tom lane
On Jan9, 2014, at 17:15 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> Reading over this, I realised that there is a problem with NaN >> handling --- once the state becomes NaN, it can never recover. So the >> results using the inverse transition function don't match HEAD in >> cases like this: > > Ouch! That takes out numeric, float4, and float8 in one fell swoop. For numeric, it seems that this could be overcome by having the state be a pair (s numeric, n numeric). s would track the sum of non-NaNs summands and n would track the number of NaN summands. The final function would return NaN if n > 0 and s otherwise. The pair could be represented as a value of type numeric[] to avoid having to invent a new type for this. For float 4 and float8, wasn't the consensus that the potential lossy-ness of addition makes this impossible anyway, even without the NaN issue? But... > Given the relative infrequency of NaNs in most data, it seems like > it might still be possible to get a speedup if we could use inverse > transitions until we hit a NaN, then do it the hard way until the > NaN is outside the window, then go back to inverse transitions. > I'm not sure though if this is at all practical from an implementation > standpoint. We certainly don't want the core code knowing about > anything as datatype-specific as a NaN, but maybe the inverse transition > function could have an API that allows reporting "I can't do it here, > fall back to the hard way". that sounds like it might be possible to make things work for float4 and float8 afterall, if we can determine whether a particular addition was lossy or not. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > For float 4 and float8, wasn't the consensus that the potential > lossy-ness of addition makes this impossible anyway, even without the > NaN issue? But... Well, that was my opinion, I'm not sure if it was consensus ;-). But NaN is an orthogonal problem I think. I'm not sure whether it has analogues in other data types. regards, tom lane
On Jan9, 2014, at 18:09 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florian Pflug <fgp@phlo.org> writes: >> For float 4 and float8, wasn't the consensus that the potential >> lossy-ness of addition makes this impossible anyway, even without the >> NaN issue? But... > > Well, that was my opinion, I'm not sure if it was consensus ;-). I'd say your example showing how it could produce completely bogus results was pretty convincing... > But NaN is an orthogonal problem I think. I'm not sure whether it > has analogues in other data types. Transfer functions which are partially invertible are not that uncommon, I'd say. Browsing through 9.3's list of aggregate functions, the following come to mind max() Values smaller than the maximum can be removed, removing the current maximum requires a rescan. By remembering theN largest values, the number of required rescans can be reduced, but never fully eliminated. Same works for min(). bool_or() FALSE can be removed, removing TRUE requires a rescan. Could be made fully invertible by counting the number ofTRUE and FALSE values, similar to my suggestion for how to handle NaN for sum(numeric). Same works for bool_and(). bit_or() Like boo_or(), 0 can be removed, everything else requires a rescan. Same works for bit_and() Plus, any aggregate with a strict transfer function would be in exactly the same situation regarding NULL as sum(numeric) is regarding NaN. AFAIK we don't have any such aggregate in core, though. best regards, Florian Pflug
Hi,
Reading over this, I realised that there is a problem with NaN
handling --- once the state becomes NaN, it can never recover. So the
results using the inverse transition function don't match HEAD in
cases like this:
create table t(a int, b numeric);
insert into t values(1,1),(2,2),(3,'NaN'),(4,3),(5,4);
select a, b,
sum(b) over(order by a rows between 1 preceding and current row)
from t;
which in HEAD produces:
a | b | sum
---+-----+-----
1 | 1 | 1
2 | 2 | 3
3 | NaN | NaN
4 | 3 | NaN
5 | 4 | 7
(5 rows)
but with this patch produces:
a | b | sum
---+-----+-----
1 | 1 | 1
2 | 2 | 3
3 | NaN | NaN
4 | 3 | NaN
5 | 4 | NaN
(5 rows)
Regards,
Dean
Attachment
Dean Rasheed <dean.a.rasheed@gmail.com> writes:Ouch! That takes out numeric, float4, and float8 in one fell swoop.
> Reading over this, I realised that there is a problem with NaN
> handling --- once the state becomes NaN, it can never recover. So the
> results using the inverse transition function don't match HEAD in
> cases like this:
Given the relative infrequency of NaNs in most data, it seems like
it might still be possible to get a speedup if we could use inverse
transitions until we hit a NaN, then do it the hard way until the
NaN is outside the window, then go back to inverse transitions.
I'm not sure though if this is at all practical from an implementation
standpoint. We certainly don't want the core code knowing about
anything as datatype-specific as a NaN, but maybe the inverse transition
function could have an API that allows reporting "I can't do it here,
fall back to the hard way".
regards, tom lane
bool_or()
FALSE can be removed, removing TRUE requires a rescan. Could be made
fully invertible by counting the number of TRUE and FALSE values,
similar to my suggestion for how to handle NaN for sum(numeric).
Same works for bool_and().
bit_or()
Like boo_or(), 0 can be removed, everything else requires a rescan.
Same works for bit_and()
On 10 January 2014 08:12, David Rowley <dgrowleyml@gmail.com> wrote: > On Fri, Jan 10, 2014 at 4:09 AM, Dean Rasheed <dean.a.rasheed@gmail.com> > wrote: >> >> Hi, >> >> Reading over this, I realised that there is a problem with NaN >> handling --- once the state becomes NaN, it can never recover. So the >> results using the inverse transition function don't match HEAD in >> cases like this: >> >> create table t(a int, b numeric); >> insert into t values(1,1),(2,2),(3,'NaN'),(4,3),(5,4); >> select a, b, >> sum(b) over(order by a rows between 1 preceding and current row) >> from t; >> >> which in HEAD produces: >> >> a | b | sum >> ---+-----+----- >> 1 | 1 | 1 >> 2 | 2 | 3 >> 3 | NaN | NaN >> 4 | 3 | NaN >> 5 | 4 | 7 >> (5 rows) >> >> but with this patch produces: >> >> a | b | sum >> ---+-----+----- >> 1 | 1 | 1 >> 2 | 2 | 3 >> 3 | NaN | NaN >> 4 | 3 | NaN >> 5 | 4 | NaN >> (5 rows) >> > > Nice catch! Thanks for having a look at the patch. > > Ok, so I thought about this and I don't think it's too big a problem at to > fix it all. I think it can be handled very similar to how I'm taking care of > NULL values in window frame. For these, I simply keep a count of them in an > int64 and when the last one leaves the aggregate context things can continue > as normal. > > Lucky for us that all numeric aggregation (and now inverse aggregation) goes > through 2 functions. do_numeric_accum() and the new inverse version of it > do_numeric_discard(), both these functions operate on a NumericAggState > which in the attached I've changed the isNaN bool field to a NaNCount int64 > field. I'm just doing NaNCount++ when we get a NaN value in do_numeric_accum > and NaNCount-- in do_numeric_discard(), in the final functions I'm just > checking if NaNCount > 0. > Cool, that sounds like a neat fix. > Though this implementation does fix the reported problem unfortunately it > may have an undesired performance impact for numeric aggregate functions > when not uses in the context of a window.. Let me explain what I mean: > > Previously there was some code in do_numeric_accum() which did: > > if (state->isNaN || NUMERIC_IS_NAN(newval)) > { > state->isNaN = true; > return; > } > > Which meant that it didn't bother adding new perfectly valid numerics to the > aggregate totals when there was an NaN encountered previously. I had to > change this to continue on regardless as we still need to keep the totals > just in case all the NaN values are removed and the totals are required once > again. This means that the non-window version of SUM(numeric) and > AVG(numeric) and and the stddev aggregates for numeric pay a price and have > to keep on totaling after encountering NaN values. :( > I suspect that NaNs almost never occur in practice, so the fact that it might now take longer to tell you that the sum is NaN doesn't worry me. More important is that it always gives the right answer. Note, if anyone ever did this for floats, +/- Infinity would also need to be handled, so you'd have to maintain 3 counts and deal with logic like Infinity - Infinity = NaN. Regards, Dean
On Jan10, 2014, at 09:34 , David Rowley <dgrowleyml@gmail.com> wrote: > I just don't quite know yet the base way for the inverse transition function to communicate this to the caller yet. Ifyou have any ideas on the best way to do this then I'd really like to hear them. Could they maybe just return NULL as the new state? It would mean that aggregates that do want to provide an inverse transition function couldn't use NULL as a valid aggregate state, but do we need to support that? Looking at the code it seems that for quite a few existing aggregates, the state remains NULL until the first non-NULL input is processed. But that doesn't hurt much - those aggregates can remain as they are until someone wants to add an inverse transfer function. Once that happens, there's a choice between living with needless rescans on trailing NULL inputs or changing the state type. This solution isn't particularly pretty, but I don't currently see a good alternative that allows implementing inverse transfer functions is something other than C and avoid needless overhead for those which are written in C. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > On Jan10, 2014, at 09:34 , David Rowley <dgrowleyml@gmail.com> wrote: >> I just don't quite know yet the base way for the inverse transition function to communicate this to the caller yet. Ifyou have any ideas on the best way to do this then I'd really like to hear them. > Could they maybe just return NULL as the new state? It would mean that > aggregates that do want to provide an inverse transition function couldn't > use NULL as a valid aggregate state, but do we need to support that? Yeah, I was going to suggest the same. Seems like it wouldn't be that difficult to come up with some alternative representation for "everything seen so far was NULL", if you needed to. > Looking at the code it seems that for quite a few existing aggregates, > the state remains NULL until the first non-NULL input is processed. But > that doesn't hurt much - those aggregates can remain as they are until > someone wants to add an inverse transfer function. Once that happens, > there's a choice between living with needless rescans on trailing NULL > inputs or changing the state type. Also, it might be reasonable for both the regular and the inverse transition functions to be strict. If a null entering the window does not matter, then a null exiting the window doesn't either, no? regards, tom lane
On Jan10, 2014, at 15:49 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florian Pflug <fgp@phlo.org> writes: >> Looking at the code it seems that for quite a few existing aggregates, >> the state remains NULL until the first non-NULL input is processed. But >> that doesn't hurt much - those aggregates can remain as they are until >> someone wants to add an inverse transfer function. Once that happens, >> there's a choice between living with needless rescans on trailing NULL >> inputs or changing the state type. > > Also, it might be reasonable for both the regular and the inverse > transition functions to be strict. If a null entering the window > does not matter, then a null exiting the window doesn't either, no? That's not true, I think, unless we're special-casing strict transition functions somewhere. AFAICS, an aggregate with a strict transition function will produce the state NULL whenever any of the inputs was NULL, i.e. we won't ever transition out of the NULL state once we got there. The inverse transition function, however, would *have* to be able to transition away from the NULL state, which requires it to be non-strict. Otherwise, how would the state aver become non-NULL after the last NULL input leaves the window? best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > On Jan10, 2014, at 15:49 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Also, it might be reasonable for both the regular and the inverse >> transition functions to be strict. If a null entering the window >> does not matter, then a null exiting the window doesn't either, no? > That's not true, I think, unless we're special-casing strict transition > functions somewhere. AFAICS, an aggregate with a strict transition function > will produce the state NULL whenever any of the inputs was NULL, i.e. we won't > ever transition out of the NULL state once we got there. Nope, not the case; read xaggr.sgml and/or the CREATE AGGREGATE reference page. An aggregate with a strict transition function essentially just ignores null input rows. I suspect the inverse transition function could just be made strict with a similar special-case rule (viz, keep the old transition value when deleting a null input from the window); but maybe I'm missing something and it has to work harder than that anyway. regards, tom lane
Florian Pflug <fgp@phlo.org> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Florian Pflug <fgp@phlo.org> writes: >>> For float 4 and float8, wasn't the consensus that the potential >>> lossy-ness of addition makes this impossible anyway, even >>> without the NaN issue? But... >> >> Well, that was my opinion, I'm not sure if it was consensus ;-). > > I'd say your example showing how it could produce completely > bogus results was pretty convincing... Aggregates on approximate (floating-point) numbers are not nearly as consistent as many people probably assume. Picture for a minute a table where a column contains positive floating point numbers happen to be located in the heap in increasing order, perhaps as the result of a CLUSTER on an index on the column. SELECT sum(colname) FROM tablename; would tend to give the most accurate answer possible when a sequence scan was run -- unless there happened to be a seqscan already half-way through the heap. Then the result would be different. So the exact same query against the exact same data, with no intervening modifications or maintenance activity could give one answer most of the time, and give various other answers depending on concurrent SELECT queries. Given that this is already the case with aggregates on floating point approximate numbers, why should we rule out an optimization which only makes rounding errors more likely to be visible? The real issue here is that if you are using an approximate data type and expecting exact answers, you will have problems. That's not to say that approximations are useless. If you represent the circumference of the earth with a double precision number you're dealing with an expected rounding error of about a foot. That's close enough for many purposes. The mistake is assuming that it will be exact or that rounding errors cannot accumulate. In situations where SQL does not promise particular ordering of operations, it should not be assumed; so any expectations of a specific or repeatable result from a sum or average of approximate numbers is misplaced. >> But NaN is an orthogonal problem I think. Agreed. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Aggregates on approximate (floating-point) numbers are not nearly > as consistent as many people probably assume.� Picture for a minute > a table where a column contains positive floating point numbers > happen to be located in the heap in increasing order, perhaps as > the result of a CLUSTER on an index on the column.� SELECT > sum(colname) FROM tablename; would tend to give the most accurate > answer possible when a sequence scan was run -- unless there > happened to be a seqscan already half-way through the heap.� Then > the result would be different. I don't think that argument holds any water. In the first place, somebody could turn off synchronize_seqscans if they needed to have the calculation done the same way every time (and I recall questions from users who ended up doing exactly that, shortly after we introduced synchronize_seqscans). In the second place, for most use-cases it'd be pretty foolish to rely on physical heap order, so somebody who was really trying to sum float8s accurately would likely do select sum(x order by x) from ... This is a well-defined, numerically stable calculation, and I don't want to see us put in non-defeatable optimizations that break it. > The real issue here is that if you are using an approximate data type > and expecting exact answers, you will have problems. That's a canard. People who know what they're doing (admittedly a minority) do not expect exact answers, but they do expect to be able to specify how to do the calculation in a way that minimizes roundoff errors. The inverse-transition-function approach breaks that, and it does so at a level where the user can't work around it, short of building his own aggregates. regards, tom lane
I wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> The real issue here is that if you are using an approximate data type >> and expecting exact answers, you will have problems. > That's a canard. People who know what they're doing (admittedly a > minority) do not expect exact answers, but they do expect to be able to > specify how to do the calculation in a way that minimizes roundoff errors. > The inverse-transition-function approach breaks that, and it does so at a > level where the user can't work around it, short of building his own > aggregates. Although, having said that ... maybe "build your own aggregate" would be a reasonable suggestion for people who need this? I grant that it's going to be a minority requirement, maybe even a small minority requirement. People who have the chops to get this sort of thing right can probably manage a custom aggregate definition. The constraint this would pose on the float4 and float8 implementations is that it be possible to use their transition and final functions in a custom aggregate declaration while leaving off the inverse function; or, if that combination doesn't work for some reason, we have to continue to provide the previous transition/final functions for use in user aggregates. Suitable documentation would be needed too, of course. regards, tom lane
On Jan10, 2014, at 18:14 , Kevin Grittner <kgrittn@ymail.com> wrote: > Given that this is already the case with aggregates on floating > point approximate numbers, why should we rule out an optimization > which only makes rounding errors more likely to be visible? The > real issue here is that if you are using an approximate data type > and expecting exact answers, you will have problems. Because without the optimization, only the values which you *actually* process for a given result determine whether you lose precision or not. With the optimization, OTOH, values which have *nothing* to do with the result in question can nevertheless make it completely bogus. SUM() is a good example. As long as all your values are positive, the amount of precision you lose is bound by the number of input values. If I sum over 10 values, the worst that can happen is that the first values is large enough to prevent the other 9 values from influencing the result. That limits the relative error to something like 9*epsilon, where epsilon is the relative precision of the floating point type, i.e. 1e-15 or so for double. In other words, as long as your frames are less than 10e13 rows long, the relative error will stay below 1%. But with the optimization, that is no longer true. If you sum from, say, CURRENT ROW to UNBOUNDED FOLLOWING, the relative error of the result in one row now depends on the magnitude of values *preceding* that row, even though that value isn't in the frame. And since we now internally subtract, not only add, the relative error is no longer bound by the number of rows in the frame. Here's the corresponding SELECT (which is basically the same as Tom's example upthread):select n, x::float, sum(x::float) over ( order by n rows between current row and unboundedfollowing ) from (values (1, 1e20), (2, 1), (3, 2) ) as t(n, x) order by n; Currently that returns n | x | sum ---+-------+------- 1 | 1e+20 | 1e+20 2 | 1 | 3 3 | 2 | 2 but with an inverse transfer function, it may very well return n | x | sum ---+-------+------- 1 | 1e+20 | 1e+20 2 | 1 | 0 3 | 2 | -1 > That's not to say that approximations are useless. If you > represent the circumference of the earth with a double precision > number you're dealing with an expected rounding error of about a > foot. That's close enough for many purposes. The mistake is > assuming that it will be exact or that rounding errors cannot > accumulate. In situations where SQL does not promise particular > ordering of operations, it should not be assumed; so any > expectations of a specific or repeatable result from a sum or > average of approximate numbers is misplaced. But this isn't about ordering, it's replacing one computation with a completely different one that just happens to be equivalent *algebraically*. To me, the proposed optimization for float is akin to C compiler which decided to evaluate a + b + c + … z as -a + (2a - b) + (2b - c) + … + (2y - z) + 2z Algebraically, these are the same, but it'd still be insane to do that. best regards, Florian Pflug
On Jan10, 2014, at 19:08 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Kevin Grittner <kgrittn@ymail.com> writes: >>> The real issue here is that if you are using an approximate data type >>> and expecting exact answers, you will have problems. > >> That's a canard. People who know what they're doing (admittedly a >> minority) do not expect exact answers, but they do expect to be able to >> specify how to do the calculation in a way that minimizes roundoff errors. >> The inverse-transition-function approach breaks that, and it does so at a >> level where the user can't work around it, short of building his own >> aggregates. > > Although, having said that ... maybe "build your own aggregate" would > be a reasonable suggestion for people who need this? I grant that > it's going to be a minority requirement, maybe even a small minority > requirement. People who have the chops to get this sort of thing right > can probably manage a custom aggregate definition. So we'd put a footgun into the hands of people who don't know what they're doing, to be fired for performance's sake, and leave it to the people who know what they are doing to put the safety on? best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > On Jan10, 2014, at 19:08 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Although, having said that ... maybe "build your own aggregate" would >> be a reasonable suggestion for people who need this? I grant that >> it's going to be a minority requirement, maybe even a small minority >> requirement. People who have the chops to get this sort of thing right >> can probably manage a custom aggregate definition. > So we'd put a footgun into the hands of people who don't know what they're > doing, to be fired for performance's sake, and leave it to the people > who know what they are doing to put the safety on? If I may put words in Kevin's mouth, I think his point is that having float8 sum() at all is a foot-gun, and that's hard to deny. You need to know how to use it safely. A compromise compromise might be to provide these alternative "safer" aggregates built-in. Or, depending on what color you like your bikeshed, leave the standard aggregates alone and define "fast_sum" etc for the less safe versions. In any case it'd be incumbent on us to document the tradeoffs. regards, tom lane
Conceptually, when aggregating over floating point numbers, there is some infinitely precise theoretical value, and the computation is approximating it. Call the infinitely precise value 'r'. Call the computed value 'c', which is the result of the aggregation function. (For float4_agg and float8_agg, typeof(c) == float8).
The problem you have with rewinding an aggregate is that you don't know if you are getting the same value of c that you would have gotten from a rescan. But if you have a type that tracks a margin [min,max] where typeof(min) == typeof(max) is higher precision than typeof(c), then you can track:
min <= r <= max
By setting the rounding mode down, then up, when computing the next value of min and max, respectively. (Extra flag bits or booleans could track whether you've encountered +inf, -inf, Nan, and any other oddball cases, with corresponding special logic that has been discussed already upthread.) In many but not all cases:
min != max
but
(typeof(c))min == (typeof(c))max
Because the margin of error is small enough not to render different values when cast to the lower precision typeof(c).
You could rewind the aggregation whenever this second case holds, and only force a rescan when it does not. This would render the same results for queries whether they were performed with rewinds or with rescans. The results might differ from older versions of postgres, but only in that they might be more accurate, with less accumulated rounding errors, owing to the higher precision state transition variable.
For many modern platforms, typeof(min) could be __float128 using libquadmath, or something similar to that. If not available at compile time, it could be float64 instead. Even then, you'd still know that rewinding was possible when min == max and not otherwise, which is useful for cases of aggregation over exact values.
I admit I've done a bit of handwaving on the computation of the margin and the handling of floating-point rounding issues, but I believe the implementation details are tractible.
mark
> Given that this is already the case with aggregates on floating
> point approximate numbers, why should we rule out an optimization
> which only makes rounding errors more likely to be visible? The
> real issue here is that if you are using an approximate data type
> and expecting exact answers, you will have problems.
Because without the optimization, only the values which you
*actually* process for a given result determine whether you lose
precision or not. With the optimization, OTOH, values which have
*nothing* to do with the result in question can nevertheless
make it completely bogus.
SUM() is a good example. As long as all your values are positive,
the amount of precision you lose is bound by the number of input
values. If I sum over 10 values, the worst that can happen is that
the first values is large enough to prevent the other 9 values from
influencing the result. That limits the relative error to something
like 9*epsilon, where epsilon is the relative precision of the
floating point type, i.e. 1e-15 or so for double. In other words,
as long as your frames are less than 10e13 rows long, the relative
error will stay below 1%.
But with the optimization, that is no longer true. If you sum
from, say, CURRENT ROW to UNBOUNDED FOLLOWING, the relative error
of the result in one row now depends on the magnitude of values
*preceding* that row, even though that value isn't in the frame.
And since we now internally subtract, not only add, the relative
error is no longer bound by the number of rows in the frame.
Here's the corresponding SELECT (which is basically the same
as Tom's example upthread):
select
n,
x::float,
sum(x::float) over (
order by n rows between current row and unbounded following
)
from (values
(1, 1e20),
(2, 1),
(3, 2)
) as t(n, x)
order by n;
Currently that returns
n | x | sum
---+-------+-------
1 | 1e+20 | 1e+20
2 | 1 | 3
3 | 2 | 2
but with an inverse transfer function, it may very well return
n | x | sum
---+-------+-------
1 | 1e+20 | 1e+20
2 | 1 | 0
3 | 2 | -1
> That's not to say that approximations are useless. If you
> represent the circumference of the earth with a double precision
> number you're dealing with an expected rounding error of about a
> foot. That's close enough for many purposes. The mistake is
> assuming that it will be exact or that rounding errors cannot
> accumulate. In situations where SQL does not promise particular
> ordering of operations, it should not be assumed; so any
> expectations of a specific or repeatable result from a sum or
> average of approximate numbers is misplaced.
But this isn't about ordering, it's replacing one computation
with a completely different one that just happens to be equivalent
*algebraically*.
To me, the proposed optimization for float is akin to C compiler
which decided to evaluate
a + b + c + … z
as
-a + (2a - b) + (2b - c) + … + (2y - z) + 2z
Algebraically, these are the same, but it'd still be insane to
do that.
best regards,
Florian Pflug
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Although, having said that ... maybe "build your own aggregate" wouldbe a reasonable suggestion for people who need this? I grant that
it's going to be a minority requirement, maybe even a small minority
requirement. People who have the chops to get this sort of thing right
can probably manage a custom aggregate definition.
The constraint this would pose on the float4 and float8 implementations
is that it be possible to use their transition and final functions in
a custom aggregate declaration while leaving off the inverse function;
or, if that combination doesn't work for some reason, we have to continue
to provide the previous transition/final functions for use in user
aggregates.
Suitable documentation would be needed too, of course.
regards, tom lane
On 1/10/14, 1:07 PM, Tom Lane wrote: > Florian Pflug<fgp@phlo.org> writes: >> >On Jan10, 2014, at 19:08 , Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> >>Although, having said that ... maybe "build your own aggregate" would >>> >>be a reasonable suggestion for people who need this? I grant that >>> >>it's going to be a minority requirement, maybe even a small minority >>> >>requirement. People who have the chops to get this sort of thing right >>> >>can probably manage a custom aggregate definition. >> >So we'd put a footgun into the hands of people who don't know what they're >> >doing, to be fired for performance's sake, and leave it to the people >> >who know what they are doing to put the safety on? > If I may put words in Kevin's mouth, I think his point is that having > float8 sum() at all is a foot-gun, and that's hard to deny. You need > to know how to use it safely. And IMHO if you've got something that's going to produce bad data if you do it wrong, I'd rather that the error be as largeas possible so that you're more likely to discover it and fix it... -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Jan10, 2014, at 17:46 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florian Pflug <fgp@phlo.org> writes: >> On Jan10, 2014, at 15:49 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Also, it might be reasonable for both the regular and the inverse >>> transition functions to be strict. If a null entering the window >>> does not matter, then a null exiting the window doesn't either, no? > >> That's not true, I think, unless we're special-casing strict transition >> functions somewhere. AFAICS, an aggregate with a strict transition function >> will produce the state NULL whenever any of the inputs was NULL, i.e. we won't >> ever transition out of the NULL state once we got there. > > Nope, not the case; read xaggr.sgml and/or the CREATE AGGREGATE reference > page. An aggregate with a strict transition function essentially just > ignores null input rows. I wasn't aware of that, sorry for the noise. > I suspect the inverse transition function could > just be made strict with a similar special-case rule (viz, keep the old > transition value when deleting a null input from the window); but maybe > I'm missing something and it has to work harder than that anyway. This seems to hold at least if the state never becomes NULL, i.e. if there's a non-NULL initiate state (S) and the transfer function never becomes NULL. If the transfer function (I) and inverse transfer function (T) obey that I(T(...T(T(S,a),b)...,z), a) = T(...T(S,b)...,z) for a non-NULL a, then setting T(s,NULL) = I(s,NULL) = s makes the same thing work for NULL and non-NULL values for a. If, however, a strict state function T ever returns NULL, things break. If T(S,a) is NULL in the above, the whole expression becomes NULL because T is strict (we only special-case the *value* input, not the state input). But T(...T(S,b)...,z) might very well be non-NULL. That's pretty much the same situation, unsurprisingly, as with NaNs. A variant of that problem arises also if the initial state is NULL. We then wait for the first non-NULL value, and use that as the initial state. When we later apply the inverse transition function, we might or might not have to reset the state back to NULL, depending on whether there are other non-NULL values in the frame or not. This case can be solved by tracking the number of non-NULL rows in the current frame - whenever that number if 0, the the output value is NULL, otherwise it's determined by the state. So in conclusion, things work for pairs of strict transfer and inverse transfer functions AFAICS, as long as we do the non-null tracking and complain should the transfer function ever return NULL. best regards, Florian Pflug
On Jan11, 2014, at 01:24 , Jim Nasby <jim@nasby.net> wrote: > On 1/10/14, 1:07 PM, Tom Lane wrote: >> Florian Pflug<fgp@phlo.org> writes: >>> >On Jan10, 2014, at 19:08 , Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>> >>Although, having said that ... maybe "build your own aggregate" would >>>> >>be a reasonable suggestion for people who need this? I grant that >>>> >>it's going to be a minority requirement, maybe even a small minority >>>> >>requirement. People who have the chops to get this sort of thing right >>>> >>can probably manage a custom aggregate definition. >>> >So we'd put a footgun into the hands of people who don't know what they're >>> >doing, to be fired for performance's sake, and leave it to the people >>> >who know what they are doing to put the safety on? >> If I may put words in Kevin's mouth, I think his point is that having >> float8 sum() at all is a foot-gun, and that's hard to deny. You need >> to know how to use it safely. > > And IMHO if you've got something that's going to produce bad data if you do it > wrong, I'd rather that the error be as large as possible so that you're more > likely to discover it and fix it... To that principle, I agree, I just don't think it applies here. An inverse transition function greatly *increases* the chance of bogus results for sum(float). It also breaks some rather natural assumptions that one might make about sum(float)'s behaviour. For example, sums over single-element frames current return the one row's value unchanged. That's no longer true universally true with an inverse transition function. Even for an approximate type, that's a bid too weird for my taste. best regards, Florian Pflug
On Fri, Jan 10, 2014 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florian Pflug <fgp@phlo.org> writes: >> On Jan10, 2014, at 19:08 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Although, having said that ... maybe "build your own aggregate" would >>> be a reasonable suggestion for people who need this? I grant that >>> it's going to be a minority requirement, maybe even a small minority >>> requirement. People who have the chops to get this sort of thing right >>> can probably manage a custom aggregate definition. > >> So we'd put a footgun into the hands of people who don't know what they're >> doing, to be fired for performance's sake, and leave it to the people >> who know what they are doing to put the safety on? > > If I may put words in Kevin's mouth, I think his point is that having > float8 sum() at all is a foot-gun, and that's hard to deny. You need > to know how to use it safely. Yeah, but Florian's point is that not all foot-guns are created equal.The fact that we're walking around with a loaded BB-gunin our hip pocket is not a good reason to replace it with a howitzer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jan10, 2014, at 22:27 , David Rowley <dgrowleyml@gmail.com> wrote: > As the patch stands at the moment, I currently have a regression test > which currently fails due to these extra zeros after the decimal point: > > -- This test currently fails due extra trailing 0 digits. > SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) > FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n); > > Patched produces: > 6.01 > 5.00 > 3.00 > Unpatched produces: > 6.01 > 5 > 3 Hm, that's annoying. I checked the standard to see if it offers any guidance here, but it doesn't look like it does - the standard just says that SUM() ought to return a type with the same scale as the input type has. That's fine if every NUMERIC type has a fixed scale, but that's not the case in postgres - we allow values of unconstrained NUMERIC types (i.e., numeric types without an explicitly specified precision or scale) to each define their own scale. To fix this, we'd have to track the maximum scale within the current frame. That's easier than the general problem of providing an inverse transition function for MAX, because AFAIK we limit the scale to at most 1000. So it'd be sufficient to track the number of times we saw each scale, and also the current maximum. Once we reduce the current maximum's count back to zero in the inverse transition function, we'd scan from that value to the left to find the next non-zero entry. We could also choose to ignore this, although I'm not sure I really like that. It seems entirely OK at first sight - after all, the values all stay the same, we just emit a different number of trailing zeros. But it still causes results to be affected by values, even if only in the number of trailing zeros, which lie outside the value's range. That seems like more non-determinism than as database should show. > With inverse transitions this query still produces correct results, it just does > not produces the numeric in the same format as it does without performing inverse > transitions. Personally I'd rather focus on trying to get SUM(numeric) in there > for 9.4 I think it'd be worthwile to get this into 9.4, if that's still an option, even if we only support COUNT. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > I think it'd be worthwile to get this into 9.4, if that's still an option, > even if we only support COUNT. My thought is (1) we can certainly implement inverse transitions for COUNT() and the integer variants of SUM(), and that alone would be a killer feature for some people. (2) the float and numeric variants should be implemented under nondefault names (I'm thinking FAST_SUM(), but bikeshed away). People who need extra speed and don't mind the slightly different results can alter their queries to use these variants. One reason I'm thinking this is that whatever we do to ameliorate the semantic issues is going to slow down the forward transition function --- to no benefit unless the aggregate is being used as a window function in a moving window. So I'm less than convinced that we *should* implement any of these designs in the default aggregates, even if we get to the point where we arguably *could* do it with little risk of functional differences. regards, tom lane
On 14/01/14 14:29, Tom Lane wrote: [...] > (2) the float and numeric variants should be implemented under > nondefault names (I'm thinking FAST_SUM(), but bikeshed away). People > who need extra speed and don't mind the slightly different results can > alter their queries to use these variants. One reason I'm thinking > this is that whatever we do to ameliorate the semantic issues is going > to slow down the forward transition function --- to no benefit unless > the aggregate is being used as a window function in a moving window. > So I'm less than convinced that we *should* implement any of these > designs in the default aggregates, even if we get to the point where > we arguably *could* do it with little risk of functional differences. > regards, tom lane How SUM_FAST() instead, then it will more likely to be close to SUM() in an index? Cheers, Gavin
On Jan10, 2014, at 22:27 , David Rowley <dgrowleyml@gmail.com> wrote:Hm, that's annoying. I checked the standard to see if it offers any guidance
> As the patch stands at the moment, I currently have a regression test
> which currently fails due to these extra zeros after the decimal point:
>
> -- This test currently fails due extra trailing 0 digits.
> SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
> FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);
>
> Patched produces:
> 6.01
> 5.00
> 3.00
> Unpatched produces:
> 6.01
> 5
> 3
here, but it doesn't look like it does - the standard just says that SUM() ought
to return a type with the same scale as the input type has. That's fine if
every NUMERIC type has a fixed scale, but that's not the case in postgres -
we allow values of unconstrained NUMERIC types (i.e., numeric types without an
explicitly specified precision or scale) to each define their own scale.
To fix this, we'd have to track the maximum scale within the current frame.
That's easier than the general problem of providing an inverse transition
function for MAX, because AFAIK we limit the scale to at most 1000. So it'd
be sufficient to track the number of times we saw each scale, and also the
current maximum. Once we reduce the current maximum's count back to zero
in the inverse transition function, we'd scan from that value to the left to
find the next non-zero entry.
We could also choose to ignore this, although I'm not sure I really like that.
It seems entirely OK at first sight - after all, the values all stay the same,
we just emit a different number of trailing zeros. But it still causes results
to be affected by values, even if only in the number of trailing zeros, which
lie outside the value's range. That seems like more non-determinism than as
database should show.I think it'd be worthwile to get this into 9.4, if that's still an option,
> With inverse transitions this query still produces correct results, it just does
> not produces the numeric in the same format as it does without performing inverse
> transitions. Personally I'd rather focus on trying to get SUM(numeric) in there
> for 9.4
even if we only support COUNT.
best regards,
Florian Pflug
Florian Pflug <fgp@phlo.org> writes:My thought is
> I think it'd be worthwile to get this into 9.4, if that's still an option,
> even if we only support COUNT.
(1) we can certainly implement inverse transitions for COUNT() and the
integer variants of SUM(), and that alone would be a killer feature for
some people.
(2) the float and numeric variants should be implemented under nondefault
names (I'm thinking FAST_SUM(), but bikeshed away). People who need
extra speed and don't mind the slightly different results can alter
their queries to use these variants.
One reason I'm thinking this is that whatever we do to ameliorate
the semantic issues is going to slow down the forward transition
function --- to no benefit unless the aggregate is being used as a
window function in a moving window. So I'm less than convinced
that we *should* implement any of these designs in the default
aggregates, even if we get to the point where we arguably *could*
do it with little risk of functional differences.
I think unless anyone has some objections I'm going to remove the inverse transition for SUM(numeric) and modify the documents to tell the user how to build their own FAST_SUM(numeric) using the built in functions to do it. I'm starting to think that playing around with resetting numeric scale will turn a possible 9.4 patch into a 9.5/10.0 patch. I see no reason why what's there so far, minus sum(numeric), can't go in...
I think unless anyone has some objections I'm going to remove the inverse transition for SUM(numeric) and modify the documents to tell the user how to build their own FAST_SUM(numeric) using the built in functions to do it. I'm starting to think that playing around with resetting numeric scale will turn a possible 9.4 patch into a 9.5/10.0 patch. I see no reason why what's there so far, minus sum(numeric), can't go in...If so then there's 36 aggregate functions ticked off my "create an inverse transition function for" list! I personally think that's a pretty good win. I'd rather do this than battle and miss deadlines for 9.4. I'd find that pretty annoying.
RegardsDavid Rowley
Attachment
On Jan14, 2014, at 11:06 , David Rowley <dgrowleyml@gmail.com> wrote: > Here's a patch which removes sum(numeric) and changes the documents a little to remove a reference to using sum(numeric)to workaround the fact that there's no inverse transitions for sum(float). I also made a small change in theaggregates.sql tests to check that the aggfnoid <= 9999. I've looked over the patch, here a few comments. For STRICT pairs of transfer and inverse transfer functions we should complain if any of them ever return NULL. That cannever be correct anyway, since a STRICT function cannot undo a non-NULL -> NULL transition. The same goes for non-STRICT, at least if we ever want to allow an inverse transfer function to indicate "Sorry, cannot undoin this case, please rescan". If we allowed NULL as just another state value now, we couldn't easily undo that later,so we'd rob ourselves of the obvious way for the inverse transfer function to indicate this condition to its caller. The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL inputshowever they please, no? The logic around movedaggbase in eval_windowaggregates() seems a bit convoluted. Couldn't the if be moved before the codethat pulls aggregatedbase up to frameheadpos using the inverse aggregation function? Also, could we easily check whether the frames corresponding to the individual rows are all either identical or disjoint,and don't use the inverse transfer function then? Currently, for a frame which contains either just the currentrow, or all the current row's peers, I think we'd use the inverse transfer function to fully un-add the old frame,and then add back the new frame. best regards, Florian Pflug
On 1/13/14, 7:41 PM, Gavin Flower wrote: > On 14/01/14 14:29, Tom Lane wrote: > [...] >> (2) the float and numeric variants should be implemented under nondefault names (I'm thinking FAST_SUM(), but bikeshedaway). People who need extra speed and don't mind the slightly different results can alter their queries to use thesevariants. One reason I'm thinking this is that whatever we do to ameliorate the semantic issues is going to slow downthe forward transition function --- to no benefit unless the aggregate is being used as a window function in a movingwindow. So I'm less than convinced that we *should* implement any of these designs in the default aggregates, evenif we get to the point where we arguably *could* do it with little risk of functional differences. regards, tom lane > How SUM_FAST() instead, then it will more likely to be close to SUM() in an index? +1. That's what I do in cases like this. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Tue, Jan 14, 2014 at 4:16 AM, David Rowley <dgrowleyml@gmail.com> wrote: > On Tue, Jan 14, 2014 at 9:09 PM, David Rowley <dgrowleyml@gmail.com> wrote: >> >> I think unless anyone has some objections I'm going to remove the inverse >> transition for SUM(numeric) and modify the documents to tell the user how to >> build their own FAST_SUM(numeric) using the built in functions to do it. I'm >> starting to think that playing around with resetting numeric scale will turn >> a possible 9.4 patch into a 9.5/10.0 patch. I see no reason why what's there >> so far, minus sum(numeric), can't go in... >> > > Of course its only now that I discover that this is not possible to do this: > > CREATE AGGREGATE fast_sum (numeric) > ( > stype = numeric, > sfunc = numeric_avg_accum, > invfunc = numeric_avg_accum_inv, > finalfunc = numeric_sum > ); > > because SUM(numeric) uses an internal type to store the transition state. > > hmmm, built-in fast_sum anyone? > Is there any simple way to limit these to only be used in the context of a > window? If so is it worth it? > Would we want fast_sum() for float too? Maybe these additional "fast" functions (one might also say "inaccurate") should live in an extension, in contrib. It strikes me that for numeric what you really need is to just tell the sum operation, whether through a parameter or otherwise, how many decimal places to show in the output. Obviously that's not a practical change for sum() itself, but if we're inventing new stuff it can be done. For floats, things are not so good. The data type is inexact by nature, and I think cases where you get substantially wrong answers will be common enough that people who attempt to use whatever we devise in this area will have sum trouble. *ducks* -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jan15, 2014, at 19:56 , Robert Haas <robertmhaas@gmail.com> wrote: > It strikes me that for numeric what you really need is to just tell > the sum operation, whether through a parameter or otherwise, how many > decimal places to show in the output. Obviously that's not a > practical change for sum() itself, but if we're inventing new stuff it > can be done. You can already do that, just cast the result of SUM(numeric) to an appropriately constrained NUMERIC type, i.e. to NUMERIC(prec, scale). BTW, AVG() and STDDEV() have the same issue. The problem is just partially masked by the division by N (or N-1) at the end, because we always emit as least 16 fractional digits when dividing. So you have to have an input value with a larger scale than that to trigger it. For the following query select avg(x) over (order by i rows between current row and 1 following) from (values (1,1), (2,1), (3,0.000000000000000000000000000000001),(4,1), (5,1) ) t(i,x); 9.3 returns avg ------------------------------------- 1.000000000000000000000.5000000000000000000000000000000010.500000000000000000000000000000001 1.00000000000000000000 1.00000000000000000000 but HEAD+patch returns avg ------------------------------------- 1.000000000000000000000.5000000000000000000000000000000010.5000000000000000000000000000000011.0000000000000000000000000000000001.000000000000000000000000000000000 I have to admit that I'm *very* tempted to suggest we simply ignore this - but that *would* mean accepting that windowed aggregates are non- deterministic in the sense that their result (even if only in the number of trailing zeros) depends on values outside of the frame. Which, I guess, is a box that best stays closed... I'm currently thinking the best way forward is to get a basic patch without any NUMERIC stuff committed, and to revisit this after that's done. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > I'm currently thinking the best way forward is to get a basic patch > without any NUMERIC stuff committed, and to revisit this after that's done. +1. I liked Robert's suggestion that the "fast" aggregates be implemented as a contrib module rather than directly in core, too. If we do that then it's perfectly reasonable to handle that as a separate patch. regards, tom lane
On Jan14, 2014, at 17:39 , Florian Pflug <fgp@phlo.org> wrote: > On Jan14, 2014, at 11:06 , David Rowley <dgrowleyml@gmail.com> wrote: >> Here's a patch which removes sum(numeric) and changes the documents a little to remove a reference to using sum(numeric)to workaround the fact that there's no inverse transitions for sum(float). I also made a small change in theaggregates.sql tests to check that the aggfnoid <= 9999. > > I've looked over the patch, here a few comments. > > For STRICT pairs of transfer and inverse transfer functions we should complain if any of them ever return NULL. That cannever be correct anyway, since a STRICT function cannot undo a non-NULL -> NULL transition. > > The same goes for non-STRICT, at least if we ever want to allow an inverse transfer function to indicate "Sorry, cannotundo in this case, please rescan". If we allowed NULL as just another state value now, we couldn't easily undo thatlater, so we'd rob ourselves of the obvious way for the inverse transfer function to indicate this condition to its caller. > > The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL inputshowever they please, no? I modified the patch to do this, and ran into a problem. Currently, aggregates with state type "internal" cannot have a stricttransfer function, even if they behave like they did, i.e. ignore non-NULL inputs. This is because the only possibleinitial value for state type "internal" is NULL, and it's up to the transfer function to create the state - usuallyupon seeing the first non-NULL input. Now, currently that isn't a huge problem - the transfer function simply hasto check for NULL input values itself, and if it's indeed conceptually strict, it just returns in this case. With inverse transfer functions, however, each such pair of forward and inverse transfer functions would also need to duplicatethe NULL-counting logic from nodeWindowAgg.c if it want's to be conceptually strict, i.e. ignore NULL inputs, butreturn NULL if there are *only* NULL inputs (or no inputs at all). That seems like a lot of duplicated code in the longrun. In essence, what we'd want for strict pairs of forward and inverse transfer functions is for the forward transfer functionto be strict in all arguments except the state, and the inverse to be strict according to the usual definition. Wecan't express that property of the forward transfer function within pg_proc, but we don't really have to - a local hackin nodeWindowAgg.c suffices. So what I'm proposing is: We allow the forward transfer function to be non-strict even if the inverse is strict, but only if the initial value is NULL.In that case we behave as if the forward transfer function was strict, except that upon seeing the first non-NULL inputwe call it with a NULL state. The return value must still be non-NULL in all cases. Thoughts? best regards, Florian Pflug
On Jan16, 2014, at 02:32 , Florian Pflug <fgp@phlo.org> wrote: > On Jan14, 2014, at 17:39 , Florian Pflug <fgp@phlo.org> wrote: >> On Jan14, 2014, at 11:06 , David Rowley <dgrowleyml@gmail.com> wrote: >>> Here's a patch which removes sum(numeric) and changes the documents a little to remove a reference to using sum(numeric)to workaround the fact that there's no inverse transitions for sum(float). I also made a small change in theaggregates.sql tests to check that the aggfnoid <= 9999. >> >> I've looked over the patch, here a few comments. >> >> For STRICT pairs of transfer and inverse transfer functions we should complain if any of them ever return NULL. That cannever be correct anyway, since a STRICT function cannot undo a non-NULL -> NULL transition. >> >> The same goes for non-STRICT, at least if we ever want to allow an inverse transfer function to indicate "Sorry, cannotundo in this case, please rescan". If we allowed NULL as just another state value now, we couldn't easily undo thatlater, so we'd rob ourselves of the obvious way for the inverse transfer function to indicate this condition to its caller. >> >> The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL inputshowever they please, no? > > I modified the patch to do this, and ran into a problem. Currently, aggregates with state type "internal" cannot have astrict transfer function, even if they behave like they did, i.e. ignore non-NULL inputs. This is because the only possibleinitial value for state type "internal" is NULL, and it's up to the transfer function to create the state - usuallyupon seeing the first non-NULL input. Now, currently that isn't a huge problem - the transfer function simply hasto check for NULL input values itself, and if it's indeed conceptually strict, it just returns in this case. > > With inverse transfer functions, however, each such pair of forward and inverse transfer functions would also need to duplicatethe NULL-counting logic from nodeWindowAgg.c if it want's to be conceptually strict, i.e. ignore NULL inputs, butreturn NULL if there are *only* NULL inputs (or no inputs at all). That seems like a lot of duplicated code in the longrun. > > In essence, what we'd want for strict pairs of forward and inverse transfer functions is for the forward transfer functionto be strict in all arguments except the state, and the inverse to be strict according to the usual definition. Wecan't express that property of the forward transfer function within pg_proc, but we don't really have to - a local hackin nodeWindowAgg.c suffices. So what I'm proposing is: > > We allow the forward transfer function to be non-strict even if the inverse is strict, but only if the initial value isNULL. In that case we behave as if the forward transfer function was strict, except that upon seeing the first non-NULLinput we call it with a NULL state. The return value must still be non-NULL in all cases. Ok, I tried this and it worked out quite OK. Updated patch is attached. It passes regression tests, but those currently don't seem to include any aggregates which *don't*ignore NULL values, so that case is probably untested. Also, it allows non-strict forward transfer functions togetherwith strict inverse transfer functions even for non-NULL initial states now. It seemed rather pedantic to forbidthis. BTW, as it stands, the patch currently uses the inverse transition function only when *all* the aggregates belonging to onewindow clause provide one. After working with the code a bit, I think that a bit of reshuffling would allow that distinctionto be made on a per-aggregate basis. The question is, is it worth it? best regards, Florian Pflug
Attachment
BTW, as it stands, the patch currently uses the inverse transition function only when *all* the aggregates belonging to one window clause provide one. After working with the code a bit, I think that a bit of reshuffling would allow that distinction to be made on a per-aggregate basis. The question is, is it worth it?
best regards,
Florian Pflug
BTW, AVG() and STDDEV() have the same issue. The problem is just partiallymasked by the division by N (or N-1) at the end, because we always emit as
least 16 fractional digits when dividing. So you have to have an input
value with a larger scale than that to trigger it.
For the following query
select avg(x) over (order by i rows between current row and 1 following)
from (values
(1,1), (2,1), (3,0.000000000000000000000000000000001), (4,1), (5,1)
) t(i,x);
9.3 returns
avg
-------------------------------------
1.00000000000000000000
0.500000000000000000000000000000001
0.500000000000000000000000000000001
1.00000000000000000000
1.00000000000000000000
but HEAD+patch returns
avg
-------------------------------------
1.00000000000000000000
0.500000000000000000000000000000001
0.500000000000000000000000000000001
1.000000000000000000000000000000000
1.000000000000000000000000000000000
I have to admit that I'm *very* tempted to suggest we simply ignore this -
but that *would* mean accepting that windowed aggregates are non-
deterministic in the sense that their result (even if only in the number
of trailing zeros) depends on values outside of the frame. Which, I guess,
is a box that best stays closed...
I'm currently thinking the best way forward is to get a basic patch
without any NUMERIC stuff committed, and to revisit this after that's done.
best regards,
Florian Pflug
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Jan 14, 2014 at 2:29 PM, Tom Lane <span dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"><br/></div> One reason I'm thinking this is that whatever we do to ameliorate<br /> the semantic issues is goingto slow down the forward transition<br /> function --- to no benefit unless the aggregate is being used as a<br /> windowfunction in a moving window. So I'm less than convinced<br /> that we *should* implement any of these designs in thedefault<br /> aggregates, even if we get to the point where we arguably *could*<br /> do it with little risk of functionaldifferences.<br /><br /> regards, tom lane<br /></blockquote></div><br /></div><div class="gmail_extra">Iwas thinking about this earlier today and came up with an idea that perhaps aggregates could support2 transition functions. 1 for normal aggregation and for windows with UNBOUNDED PRECEDING. The 2nd transition functioncould be used when there is a possibility that we would need to perform an inverse transition. This 2nd transitionfunction could do all the extra tracking it needed without having to worry that it would slow down normal aggregation.With numeric that might be tracking each numeric's scale as it enters and exits the window frame. It might evenbe possible to perform inverse transitions with float and double here, we could just internally use numeric, and havethe final function convert back to the original type. Though perhaps that might have the side effect of making floatingpoint calculations too accurate? Or at least not matching the IEEE standards.</div><div class="gmail_extra"><br /></div><divclass="gmail_extra">Of course, I'm not thinking this could be part of this patch, but I thought that I'd postthe idea while all this is fresh in people's heads.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">David</div><divclass="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br/></div></div>
The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be constrained to STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL inputs however they please, no?
The logic around movedaggbase in eval_windowaggregates() seems a bit convoluted. Couldn't the if be moved before the code that pulls aggregatedbase up to frameheadpos using the inverse aggregation function?
Also, could we easily check whether the frames corresponding to the individual rows are all either identical or disjoint, and don't use the inverse transfer function then? Currently, for a frame which contains either just the current row, or all the current row's peers, I think we'd use the inverse transfer function to fully un-add the old frame, and then add back the new frame.
best regards,
Florian Pflug
I'm currently thinking the best way forward is to get a basic patchwithout any NUMERIC stuff committed, and to revisit this after that's done.
best regards,
Florian Pflug
Attachment
On Jan16, 2014, at 09:07 , David Rowley <dgrowleyml@gmail.com> wrote: > On Wed, Jan 15, 2014 at 5:39 AM, Florian Pflug <fgp@phlo.org> wrote: >> The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL inputshowever they please, no? > > The reason I had to track the notnullcount was because for non-strict was because: > > select sum(n) over (order by i rows between current row and unbounded following) from (values(1,1),(2,NULL)) v(i,n); > > would otherwise return > 1 > 0 > > sum is not strict, so I guess we need to track notnullcount for non-strict functions. > See the code around if (peraggstate->notnullcount == 0) in retreat_windowaggregate(). Yeah, I figured that was the reason, but you can't fix it that way. See http://www.postgresql.org/message-id/8E857D95-CBA4-4974-A238-9DD7F61BEA48@phlo.orgfor a detailed explanation why. My 2.2patch allows pairs of non-strict forward and strict inverse transfer functions exactly because of this - i.e., basicallyit decrees that if there's an inverse transfer function, the strict setting of the *inverse* function determinesthe aggregates NULL behaviour. The forward transfer function is then never called for NULL inputs, but it *is* called with the NULL state for the first non-NULL input, and *must* then return a non-NULL state(hence it's technically not strict, it's strict only in the inputs, not in the state). BTW, I just realized I failed to update CREATE AGGREGATE's logic when I did that in 2.2. That doesn't matter for the built-inaggregates since these aren't created with CREATE AGGREGATE, but it's obviously wrong. I'll post a fixed patched. >> The logic around movedaggbase in eval_windowaggregates() seems a bit convoluted. Couldn't the if be moved before the codethat pulls aggregatedbase up to frameheadpos using the inverse aggregation function? > I had a look at this and even tried moving the code to before the inverse transitions, but it looks like that would onlywork if I added more tests to the if condition to ensure that we're not about to perform inverse transitions. To me itjust seemed more bullet proof the way it is rather than duplicating logic and having to ensure that it stays duplicated.But saying that I don't think I've fully got my head around why the original code is valid in the first place.I would have imagined that it should contain a check for FRAMEOPTION_START_UNBOUNDED_FOLLOWING, but if that soundslike complete rubbish then I'll put it down to my head still being fried from work. Ok, I get this now. That code really just is asking "is the previous row's frame the same as the current row's frame" inthat "if" where you added movedagg. What confused me was the rather weird way that condition is expressed, which turnsout is due to the fact that at the point of the if, we don't know the new frame's end. Now, we could use update_frametailpos()to find that, but that's potentially costly, especially if the tuple store was spilled to disk. So instead,the code relies on the fact that only if the frame end is "n FOLLOWING/PRECEDING" can the current row lie withinthe previous row's frame without the two frame's ends being necessarily the same. For added confusion, that "if" never explicitly checks whether the frame's *heads* coincide - the previous code seems tohave relied on the impossibility of having "aggregatedbase <= current < aggregatedupto" after re-initializing the aggregation,because then aggregatedbase = aggregatedupto = 0. That's why you can't just move the "if" before the "frameheadpos== aggregatedbase" check. But you can *if* you also check whether "aggregatedbase == frameheadpos" in the if- which is clearer than relying on that rather subtle assumption anyway. BTW, the your patch will also fail badly if the frame head ever moves backwards - the invariant that "aggregatedbase == frameheadpos"after the inverse transition loop will then be violated. Now, this should never happen, because we require thatthe offset in "n PRECEDING/FOLLOWING" is constant, but we should probably still check for this and elog(). That check was implicit in old code, because it advanced the tuplestore mark, so if the frame head moved backwards, re-scanningthe frame would have failed. That brings me to another todo - as it stands, that mark gets never advanced if we'redoing inverse aggregation. To fix that, we need a call to WinSetMarkPosition() somewhere in eval_windowaggregates(). After doing this things, eval_windowaggregates ended up calling initalize_windowframeaggregates at a single place again,so I inlined it back into eval_windowaggregates(). I also made it use temp_slot_1 in the inverse aggregation loop,since that seemed safe - the code assumes some invariants about aggregatedbase and agg_row_slow. Updated patch is attached (2.4_fgp). >> Also, could we easily check whether the frames corresponding to the individual rows are all either identical or disjoint,and don't use the inverse transfer function then? Currently, for a frame which contains either just the currentrow, or all the current row's peers, I think we'd use the inverse transfer function to fully un-add the old frame,and then add back the new frame. > > I didn't know there was a situation where this could happen. Could you give me an example query and I'll run it throughthe debugger to have a look at what's going on. But sure, if this is possible and I understand what you mean thenI guess it would be a good optimisation to detect this and throw away the previous results and start fresh. The case I had in mind there was "RANGE BETWEEN CURRENT ROW AND CURRENT ROW", i.e. each row's frame consists of all it'sordering peers. Haven't looked into this yet. best regards, Florian Pflug
Attachment
On 01/16/2014 01:02 PM, David Rowley wrote: > sum(bigint) became a bit weird as it uses numeric types internally, so I > had to keep the do_numeric_discard() function to support it. It's pretty weird that we have implemented sum(bigint) that way. I understand that the result is a numeric so that it won't overflow, but implementing it by converting every value to numeric is naive. I propose that we reimplement sum(bigint) in a more efficient way: For the internal state, let's use an int8 and a numeric overflow field. The transition function adds to the int8 variable, and checks for overflow. On overflow, increment the numeric field by one. In the final function, multiply the numeric by 2^64, and add the residual int8 value. - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > I propose that we reimplement sum(bigint) in a more efficient way: For > the internal state, let's use an int8 and a numeric overflow field. The > transition function adds to the int8 variable, and checks for overflow. > On overflow, increment the numeric field by one. In the final function, > multiply the numeric by 2^64, and add the residual int8 value. It'd probably be sufficient to handle it as two int64 fields (handmade 128-bit arithmetic, or maybe even not so handmade if that ever gets reasonably common among C compilers). You're assuming the final output is still numeric, right? regards, tom lane
On 01/16/2014 08:59 PM, Tom Lane wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: >> I propose that we reimplement sum(bigint) in a more efficient way: For >> the internal state, let's use an int8 and a numeric overflow field. The >> transition function adds to the int8 variable, and checks for overflow. >> On overflow, increment the numeric field by one. In the final function, >> multiply the numeric by 2^64, and add the residual int8 value. > > It'd probably be sufficient to handle it as two int64 fields (handmade > 128-bit arithmetic, or maybe even not so handmade if that ever gets > reasonably common among C compilers). True. That would be sufficient for summing 2^64 int8s of INT64_MAX. That sounds like enough, especially considering that that count() will overflow after that too. > You're assuming the final output is still numeric, right? Yep. - Heikki
On 2014-01-16 21:07:33 +0200, Heikki Linnakangas wrote: > On 01/16/2014 08:59 PM, Tom Lane wrote: > >Heikki Linnakangas <hlinnakangas@vmware.com> writes: > >>I propose that we reimplement sum(bigint) in a more efficient way: For > >>the internal state, let's use an int8 and a numeric overflow field. The > >>transition function adds to the int8 variable, and checks for overflow. > >>On overflow, increment the numeric field by one. In the final function, > >>multiply the numeric by 2^64, and add the residual int8 value. > > > >It'd probably be sufficient to handle it as two int64 fields (handmade > >128-bit arithmetic, or maybe even not so handmade if that ever gets > >reasonably common among C compilers). > > True. That would be sufficient for summing 2^64 int8s of INT64_MAX. That > sounds like enough, especially considering that that count() will overflow > after that too. You'll have to handle adding negative values and underflow as well. Maybe it's instead sufficient to just have flag indicating that you're working with a state that hasn't overflowed so far and just plain int8 math as long as that's the case, and entirely fall back to the current path once overflowed. That will probably be slightly faster and easily handle the majority of cases since overflowing int8 ought to be pretty rare in the real world. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > You'll have to handle adding negative values and underflow as > well. Right. > Maybe it's instead sufficient to just have flag indicating that you're > working with a state that hasn't overflowed so far and just plain int8 > math as long as that's the case, and entirely fall back to the current > path once overflowed. That will probably be slightly faster and easily > handle the majority of cases since overflowing int8 ought to be pretty > rare in the real world. Dunno, I think that a transition state containing both an int64 and a (presumably separately palloc'd) numeric will be a real PITA. And it will not be faster, because the principal drag on performance will just be the overflow test, which you have to do either way. regards, tom lane
On 2014-01-16 14:23:47 -0500, Tom Lane wrote: > > Maybe it's instead sufficient to just have flag indicating that you're > > working with a state that hasn't overflowed so far and just plain int8 > > math as long as that's the case, and entirely fall back to the current > > path once overflowed. That will probably be slightly faster and easily > > handle the majority of cases since overflowing int8 ought to be pretty > > rare in the real world. > > Dunno, I think that a transition state containing both an int64 and > a (presumably separately palloc'd) numeric will be a real PITA. Yea, not sure myself. I just dislike the idea of having a good part of a 128bit math implementation for a single transition function. Another alternative would be a configure check for compiler/native 128bit math and fall back to the current implementation if none is provided... That should give decent performance with a pretty low amount of code for most platforms. > And it will not be faster, because the principal drag on performance > will just be the overflow test, which you have to do either way. Well, you don't need to check the second variable for lots of operations. Say, the current sum is 0 and you add a -1. With the two variables scheme that requires checking the second variable, manipulating it etc. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-16 14:23:47 -0500, Tom Lane wrote: >> Dunno, I think that a transition state containing both an int64 and >> a (presumably separately palloc'd) numeric will be a real PITA. > Yea, not sure myself. I just dislike the idea of having a good part of a > 128bit math implementation for a single transition function. Not sure how you figure that we need very much new code beyond the overflow test. > Well, you don't need to check the second variable for lots of > operations. Say, the current sum is 0 and you add a -1. With the two > variables scheme that requires checking the second variable, > manipulating it etc. I'm envisioning just state->lowhalf += input;if (overflowed_up) state->highhalf++;else if (overflowed_down) state->highhalf--; The only thing that might take a moment's thought, or extra cycles in the normal case, is extending the overflow test so that it can tell whether we need to increment or decrement the upper half. [ thinks a bit... ] Note that I'm supposing that the state is defined as (highhalf * 2^64) + lowhalf, not that we need the two variables to be exactly a 128-bit twos-complement value, which is what I think you're talking about. regards, tom lane
I had some more fun with this, the result is v2.5 of the patch (attached). Changes are explained below. On Jan16, 2014, at 19:10 , Florian Pflug <fgp@phlo.org> wrote: > On Jan16, 2014, at 09:07 , David Rowley <dgrowleyml@gmail.com> wrote: >> On Wed, Jan 15, 2014 at 5:39 AM, Florian Pflug <fgp@phlo.org> wrote: >>> The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL inputshowever they please, no? >> >> The reason I had to track the notnullcount was because for non-strict was because: >> >> select sum(n) over (order by i rows between current row and unbounded following) from (values(1,1),(2,NULL)) v(i,n); >> >> would otherwise return >> 1 >> 0 >> >> sum is not strict, so I guess we need to track notnullcount for non-strict functions. >> See the code around if (peraggstate->notnullcount == 0) in retreat_windowaggregate(). > > Yeah, I figured that was the reason, but you can't fix it that way. See http://www.postgresql.org/message-id/8E857D95-CBA4-4974-A238-9DD7F61BEA48@phlo.orgfor a detailed explanation why. My 2.2patch allows pairs of non-strict forward and strict inverse transfer functions exactly because of this - i.e., basicallyit decrees that if there's an inverse transfer function, the strict setting of the *inverse* function determinesthe aggregates NULL behaviour. The forward transfer function is then never called > for NULL inputs, but it *is* called with the NULL state for the first non-NULL input, and *must* then return a non-NULLstate (hence it's technically not strict, it's strict only in the inputs, not in the state). > > BTW, I just realized I failed to update CREATE AGGREGATE's logic when I did that in 2.2. That doesn't matter for the built-inaggregates since these aren't created with CREATE AGGREGATE, but it's obviously wrong. I'll post a fixed patched. Fixed now. >>> The logic around movedaggbase in eval_windowaggregates() seems a bit convoluted. Couldn't the if be moved before thecode that pulls aggregatedbase up to frameheadpos using the inverse aggregation function? > >> I had a look at this and even tried moving the code to before the inverse transitions, but it looks like that would onlywork if I added more tests to the if condition to ensure that we're not about to perform inverse transitions. To me itjust seemed more bullet proof the way it is rather than duplicating logic and having to ensure that it stays duplicated.But saying that I don't think I've fully got my head around why the original code is valid in the first place.I would have imagined that it should contain a check for FRAMEOPTION_START_UNBOUNDED_FOLLOWING, but if that soundslike complete rubbish then I'll put it down to my head still being fried from work. > > Ok, I get this now. That code really just is asking "is the previous row's frame the same as the current row's frame" inthat "if" where you added movedagg. What confused me was the rather weird way that condition is expressed, which turnsout is due to the fact that at the point of the if, we don't know the new frame's end. Now, we could use update_frametailpos()to find that, but that's potentially costly, especially if the tuple store was spilled to disk. So instead,the code relies on the fact that only if the frame end is "n FOLLOWING/PRECEDING" can the current row lie withinthe previous row's frame without the two frame's ends being necessarily the same. > > For added confusion, that "if" never explicitly checks whether the frame's *heads* coincide - the previous code seems tohave relied on the impossibility of having "aggregatedbase <= current < aggregatedupto" after re-initializing the aggregation,because then aggregatedbase = aggregatedupto = 0. That's why you can't just move the "if" before the "frameheadpos== aggregatedbase" check. But you can *if* you also check whether "aggregatedbase == frameheadpos" in the if- which is clearer than relying on that rather subtle assumption anyway. > > BTW, the your patch will also fail badly if the frame head ever moves backwards - the invariant that "aggregatedbase ==frameheadpos" after the inverse transition loop will then be violated. Now, this should never happen, because we requirethat the offset in "n PRECEDING/FOLLOWING" is constant, but we should probably still check for this and elog(). > > That check was implicit in old code, because it advanced the tuplestore mark, so if the frame head moved backwards, re-scanningthe frame would have failed. That brings me to another todo - as it stands, that mark gets never advanced if we'redoing inverse aggregation. To fix that, we need a call to WinSetMarkPosition() somewhere in eval_windowaggregates(). > > After doing this things, eval_windowaggregates ended up calling initalize_windowframeaggregates at a single place again,so I inlined it back into eval_windowaggregates(). I also made it use temp_slot_1 in the inverse aggregation loop,since that seemed safe - the code assumes some invariants about aggregatedbase and agg_row_slow. > > Updated patch is attached (2.4_fgp). I've now shuffled things around so that we can use inverse transition functions even if only some aggregates provide them, and to allow inverse transition functions to force a restart by returning NULL. The rules regarding NULL handling are now the following (A) Transition functions without an inverse behave as they always did (B) Forward transition functions with an inverse may not return NULL, not even for all-NULL inputs. (C) If the inverse transition function returns NULL, the aggregation is restarted from the beginning, just as If no inverse transition function had been provided. (D) If the transition function is strict, the inverse must also be. (E) NULLs are skipped if the inverse transition function is strict, regardless of whether the forward transition function is. The rules may seem a bit weird, but they're the only solution I found which * Doesn't require every invertible transition function with "internal" state to implement it's own NULL counting. This requires (E), since transition functions with state type "internal" need to be callable with a NULL state - since nobody else can allocate the state object. * Still allows transition functions to handle NULL values however they please - they just have to have a non-strict inverse to opt out of (E). They cannot use NULL states however they please, though. * Allows partial inverse transition functions, i.e. inverse transition functions which sometimes report "sorry, please restart". The only easy way to report this is to have the inverse transition function return NULL, which means we cannot allow NULL as just another state value, i.e. this mandates (B,C) (D) is purely to avoid having to deal with any special-cases that might arise from allowing this. I've also moved the volatile functions check into initialize_peragg() - having it in ExecWindowAgg() made no sense to me since it's dealing with static information. Also, ExecWindowAgg() may get called multiple times, e.g. if the window functions are on the inner side of a nested loop join or within a subselect. >>> Also, could we easily check whether the frames corresponding to the individual rows are all either identical or disjoint,and don't use the inverse transfer function then? Currently, for a frame which contains either just the currentrow, or all the current row's peers, I think we'd use the inverse transfer function to fully un-add the old frame,and then add back the new frame. >> >> I didn't know there was a situation where this could happen. Could you give me an example query and I'll run it throughthe debugger to have a look at what's going on. But sure, if this is possible and I understand what you mean thenI guess it would be a good optimisation to detect this and throw away the previous results and start fresh. > > The case I had in mind there was "RANGE BETWEEN CURRENT ROW AND CURRENT ROW", i.e. each row's frame consists of all it'sordering peers. Haven't looked into this yet. It now resets the aggregation when the new frame doesn't overlap the old one, i.e. if frameheadpos > aggregatedupto. The patch passes regression test, but I haven't done any other testing yet. Also, the documentation doesn't explain the NULL and STRICT semantics yet. best regards, Florian Pflug
Attachment
I had some more fun with this, the result is v2.5 of the patch (attached). Changes are explained below.
Attachment
On Jan17, 2014, at 15:14 , David Rowley <dgrowleyml@gmail.com> wrote: > If you make any more changes would it be possible for you to base them on the attached patch instead of the last one yousent? Sure! The only reason I didn't rebase my last patch onto yours was that having the numeric stuff in there meant potentiallybetter test coverage. Thanks for doing the merge! > Perhaps if there's much more hacking to do we could start pushing to a guthub repo to make it easier to work together onthis. Yeah, that seems like a good idea. Easier to keep track of then a bunch of patches floating around. Could you push your latestversion? best regards, Florian Pflug
I've now shuffled things around so that we can use inverse transition functions
even if only some aggregates provide them, and to allow inverse transition
functions to force a restart by returning NULL. The rules regarding NULL handling
are now the following
On Jan17, 2014, at 20:34 , David Rowley <dgrowleyml@gmail.com> wrote: > On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug <fgp@phlo.org> wrote: > >> I've now shuffled things around so that we can use inverse transition functions >> even if only some aggregates provide them, and to allow inverse transition >> functions to force a restart by returning NULL. The rules regarding NULL handling >> are now the following > > Maybe this is me thinking out loud, but I'm just thinking about the numeric case again. > Since the patch can now handle inverse transition functions returning NULL when they > fail to perform inverse transitions, I'm wondering if we could add an "expectedscale" > to NumericAggState, set it to -1 initially, when we get the first value set the > expectedscale to the dscale of that numeric, then if we get anything but that value > we'll set the expectedscale back to -1 again, if we are asked to perform an inverse > transition with a expectedscale as -1 we'll return null, otherwise we can perform > the inverse transition... You could do better than that - the numeric problem amounts to tracking the maximum scale AFAICS, so it'd be sufficient to restart if we remove a value whose scale equals the current maximum. But if we do SUM(numeric) at all, I think we should do so without requiring restarts - I still think the overhead of tracking the maximum scale within the aggregated values isn't that bad. If we zero the dscale counters lazily, the numbers of entries we have to zero is bound by the maximum dscale we encounter. Since actually summing N digits is probably more expensive than zeroing them, and since we pay the zeroing overhead only once per aggregation but save potentially many summations, I'm pretty sure we come out ahead by quite some margin. It'd be interesting to do float() similar to the way you describe, though. We might not be able to guarantee that we yield exactly the same result as without inverse aggregation, but we might be able to bound the error. That might make it acceptable to do this - as Kevin pointed out, float is always an approximation anyway. I haven't really thought that through, though... Anyway, with time running out fast if we want to get this into 9.4, I think we should focus on getting this into a committable state right now. I've started to look over what you've pushed to github, and it looks mostly fine. I have a few comments - mostly cosmetic stuff - that I'll post once I finished reading through it. I also plan to do some basic performance testing to verify that my reshuffling of eval_windowaggregates() doesn't hurt aggregates without an inverse transition function. best regards, Florian Pflug
You could do better than that - the numeric problem amounts to tracking the maximumOn Jan17, 2014, at 20:34 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug <fgp@phlo.org> wrote:
>
>> I've now shuffled things around so that we can use inverse transition functions
>> even if only some aggregates provide them, and to allow inverse transition
>> functions to force a restart by returning NULL. The rules regarding NULL handling
>> are now the following
>
> Maybe this is me thinking out loud, but I'm just thinking about the numeric case again.
> Since the patch can now handle inverse transition functions returning NULL when they
> fail to perform inverse transitions, I'm wondering if we could add an "expectedscale"
> to NumericAggState, set it to -1 initially, when we get the first value set the
> expectedscale to the dscale of that numeric, then if we get anything but that value
> we'll set the expectedscale back to -1 again, if we are asked to perform an inverse
> transition with a expectedscale as -1 we'll return null, otherwise we can perform
> the inverse transition...
scale AFAICS, so it'd be sufficient to restart if we remove a value whose scale equals
the current maximum. But if we do SUM(numeric) at all, I think we should do so
without requiring restarts - I still think the overhead of tracking the maximum scale
within the aggregated values isn't that bad. If we zero the dscale counters lazily,
the numbers of entries we have to zero is bound by the maximum dscale we encounter.
Since actually summing N digits is probably more expensive than zeroing them, and since
we pay the zeroing overhead only once per aggregation but save potentially many
summations, I'm pretty sure we come out ahead by quite some margin.
You could do better than that - the numeric problem amounts to tracking the maximum
scale AFAICS, so it'd be sufficient to restart if we remove a value whose scale equals
the current maximum. But if we do SUM(numeric) at all, I think we should do so
without requiring restarts - I still think the overhead of tracking the maximum scale
within the aggregated values isn't that bad. If we zero the dscale counters lazily,
the numbers of entries we have to zero is bound by the maximum dscale we encounter.
Since actually summing N digits is probably more expensive than zeroing them, and since
we pay the zeroing overhead only once per aggregation but save potentially many
summations, I'm pretty sure we come out ahead by quite some margin.We'll work that out, I don't think it will take very long to code up your idea either.I just thought that my idea was good enough and very cheap too, won't all numerics that are actually stored in a column have the same scale anyway? Is it not only been a problem because we've been testing with random numeric literals the whole time?The test turned out to become:if (state->expectedScale == -1)state->expectedScale = X.dscale;else if (state->expectedScale != X.dscale)state->expectedScale = -2;In do_numeric_accum, then when we do the inverse I just check if expectedScale < 0 then return NULL.I'm not set on it, and I'm willing to try the lazy zeroing of the scale tracker array, but I'm just not quite sure what extra real use cases it covers that the one above does not. Perhaps my way won't perform inverse transitions if the query did sum(numericCol / 10) or so.I'll be committing this to the github repo very soon, so we can hack away at the scale tracking code once it's back in.
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > I just thought that my idea was good enough and very cheap too, won't all > numerics that are actually stored in a column have the same scale anyway? No; unconstrained numeric columns (ie, if you just say "numeric") don't force their contents to any particular scale. It might be that we don't have to optimize for that case, since it's not in the SQL spec, but it is definitely supported by PG. regards, tom lane
First, I've go the feeling that I should somehow update the commitfest app, but I don't really know in which way. Should I put myself in as a reviewer, or as a second author? Or neither? Suggestions welcome... On Jan17, 2014, at 23:34 , David Rowley <dgrowleyml@gmail.com> wrote: > The test turned out to become: > if (state->expectedScale == -1) > state->expectedScale = X.dscale; > else if (state->expectedScale != X.dscale) > state->expectedScale = -2; > > In do_numeric_accum, then when we do the inverse I just check if > expectedScale < 0 then return NULL. Ok, so this will rescan if and only if the dscales of all inputs match. I still that's overly pessimistic - we've only got a problem when we removed the input with the largest dscale, no? So my suggestion would be state->maxDScale = MAX(X.dscale, state->maxDScale); in do_numeric_accum, and in the inverse if (state->maxDScane == X.dscale) return PG_RETURN_NULL; I'd think that this avoids more restarts without about the same effort, but I haven't tried this though, so maybe I'm missing something. > I'm not set on it, and I'm willing to try the lazy zeroing of the scale > tracker array, but I'm just not quite sure what extra real use cases it > covers that the one above does not. Perhaps my way won't perform inverse > transitions if the query did sum(numericCol / 10) or so. Dunno how many people SUM over numerics with different dscales. Easily possible that it's few enough to not be worth fussing over. > create table num (num numeric(10,2) not null); > insert into num (num) select * from generate_series(1,20000); > select sum(num) over(order by num rows between current row and unbounded following) from num; -- 124ms > select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 254ms > select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 108156.917 ms > > The divide by 1 case is slow because of that weird 20 trailing zero > instead of 16 when dividing a numeric by 1 and that causes the inverse > transition function to return NULL because the scale changes. > > I've not tested an unpatched version yet to see how that divide by 1 query > performs on that but I'll get to that soon. So without the patch, all three queries should perform simiarly, i.e. take about 10 seconds, right? If so, the improvement is fantastic! > I'm thinking that the idea about detecting the numeric range with floating > point types and performing an inverse transition providing the range has > not gone beyond some defined danger zone is not material for this patch... > I think it would be not a great deal of work to code, but the testing involved > would probably make this patch not possible for 9.4 Yeah, I never imagined that this would happen for 9.4. > The latest version of the patch is attached. OK, there are a few more comments * build_aggregate_fnexprs() should allow NULL to be passed for invtransfn_oid, I think. I don't quite like that we constructthat even for plain aggregates, and avoiding requires just an additional if. * Don't we need to check for volatile function in the filter expression too? * As it stands, I don't think intXand_inv and intXor_inv are worth it, since the case where they return non-NULL is awefullyslim (only for inputs containing only 1 respectively only zeros). We should either track the number of zeros andones per bit, which would allow us to always perform inverse transitions, or just drop them. * Quite a few of the inverse transition functions are marked strict, yet contain code to handle NULL inputs. You can justremove that code - the system makes sure that strict functions never receive NULL arguments. Affected are, AFAICS numeric_accum_inv,numeric_avg_accum_inv, int2_accum_inv, int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv,int4_sum_inv, int8_sum_inv. Not sure that list is exhaustive... * For any of the new inverse transition functions, I'd be inclined to just elog() if they're called directly and not as anaggregate. In particular those which check for that anyway, plus the *smaller_inv and *larger_inv ones. I don't see whyanyone would ever want to call these directly - and if we elog() now, we can easy change these functions later, becauseno external code can depend on them. E.g., maybe someone wants to keep the top N elements in the MIN/MAX aggregatesone day... * The number of new regression tests seems a bit excessive. I don't think there really a policy what to test and what not,but in this case I think it suffices if we test the basic machinery, plus the more complex functions. But e.g. most ofthe SUM and AVG aggregates use numeric_accum or numeric_avg_accum internally, and the wrapper code basically just doesdatatype conversion, so testing a few cases seems enough there. What I think we *should* test, but don't do currently,is whether the core machinery performs the expected calls of the forward and reverse transition function. I wasthinking about creating an aggregate in the regression tests which simply concatenates all the calls into a string, e.g.you might get "F:1 F:2 F:3 I:1" if we aggregated 1,2,3 and then removed 1. I think that should be possible with an SQL-languageforward and inverse transfer function, but I haven't tried. I can try, if you want. best regards, Florian Pflug
First, I've go the feeling that I should somehow update the commitfest app,
but I don't really know in which way. Should I put myself in as a reviewer,
or as a second author? Or neither? Suggestions welcome...
On Jan17, 2014, at 23:34 , David Rowley <dgrowleyml@gmail.com> wrote:Ok, so this will rescan if and only if the dscales of all inputs match.
> The test turned out to become:
> if (state->expectedScale == -1)
> state->expectedScale = X.dscale;
> else if (state->expectedScale != X.dscale)
> state->expectedScale = -2;
>
> In do_numeric_accum, then when we do the inverse I just check if
> expectedScale < 0 then return NULL.
I still that's overly pessimistic - we've only got a problem when we
removed the input with the largest dscale, no? So my suggestion would be
state->maxDScale = MAX(X.dscale, state->maxDScale);
in do_numeric_accum, and in the inverse
if (state->maxDScane == X.dscale)
return PG_RETURN_NULL;
I'd think that this avoids more restarts without about the same effort,
but I haven't tried this though, so maybe I'm missing something.
> I'm not set on it, and I'm willing to try the lazy zeroing of the scaleDunno how many people SUM over numerics with different dscales. Easily
> tracker array, but I'm just not quite sure what extra real use cases it
> covers that the one above does not. Perhaps my way won't perform inverse
> transitions if the query did sum(numericCol / 10) or so.
possible that it's few enough to not be worth fussing over.
> create table num (num numeric(10,2) not null);So without the patch, all three queries should perform simiarly, i.e. take
> insert into num (num) select * from generate_series(1,20000);
> select sum(num) over(order by num rows between current row and unbounded following) from num; -- 124ms
> select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 254ms
> select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 108156.917 ms
>
> The divide by 1 case is slow because of that weird 20 trailing zero
> instead of 16 when dividing a numeric by 1 and that causes the inverse
> transition function to return NULL because the scale changes.
>
> I've not tested an unpatched version yet to see how that divide by 1 query
> performs on that but I'll get to that soon.
about 10 seconds, right? If so, the improvement is fantastic!
> I'm thinking that the idea about detecting the numeric range with floatingYeah, I never imagined that this would happen for 9.4.
> point types and performing an inverse transition providing the range has
> not gone beyond some defined danger zone is not material for this patch...
> I think it would be not a great deal of work to code, but the testing involved
> would probably make this patch not possible for 9.4OK, there are a few more comments
> The latest version of the patch is attached.
* build_aggregate_fnexprs() should allow NULL to be passed for invtransfn_oid,
I think. I don't quite like that we construct that even for plain aggregates,
and avoiding requires just an additional if.
* Don't we need to check for volatile function in the filter expression too?
* As it stands, I don't think intXand_inv and intXor_inv are worth it, since
the case where they return non-NULL is awefully slim (only for inputs
containing only 1 respectively only zeros). We should either track the number
of zeros and ones per bit, which would allow us to always perform inverse
transitions, or just drop them.
* Quite a few of the inverse transition functions are marked strict, yet
contain code to handle NULL inputs. You can just remove that code - the system
makes sure that strict functions never receive NULL arguments. Affected are,
AFAICS numeric_accum_inv, numeric_avg_accum_inv, int2_accum_inv,
int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv, int4_sum_inv,
int8_sum_inv. Not sure that list is exhaustive...
* For any of the new inverse transition functions, I'd be inclined to just
elog() if they're called directly and not as an aggregate. In particular
those which check for that anyway, plus the *smaller_inv and *larger_inv
ones. I don't see why anyone would ever want to call these directly - and
if we elog() now, we can easy change these functions later, because no external
code can depend on them. E.g., maybe someone wants to keep the top N elements
in the MIN/MAX aggregates one day...
* The number of new regression tests seems a bit excessive. I don't think there
really a policy what to test and what not, but in this case I think it suffices
if we test the basic machinery, plus the more complex functions. But e.g. most
of the SUM and AVG aggregates use numeric_accum or numeric_avg_accum internally,
and the wrapper code basically just does datatype conversion, so testing a few
cases seems enough there. What I think we *should* test, but don't do currently,
is whether the core machinery performs the expected calls of the forward and
reverse transition function. I was thinking about creating an aggregate in the
regression tests which simply concatenates all the calls into a string, e.g.
you might get "F:1 F:2 F:3 I:1" if we aggregated 1,2,3 and then removed 1.
I think that should be possible with an SQL-language forward and inverse
transfer function, but I haven't tried. I can try, if you want.
best regards,
Florian Pflug
On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug <fgp@phlo.org> wrote:* Don't we need to check for volatile function in the filter expression too?I did manual testing on this before and the volatility test for the aggregate arguments seems to cover this. I didn't look into why but it just did. I've not test this again since your refactoring. I could test this easily before when my numeric case was changing the results because of the dscale problem, I noticed that if I did FILTER(WHERE random() > 0) that the extra trailing zeros would disappear.The problem now is that it's pretty hard to determine if an inverse transition took place, the only way we can really tell is performance. I'll see if I can invent a new test case for this by creating a user defined aggregate as you described. I'm thinking just append '+' to a string for transitions and '-' to a string for inverse transitions, then just make sure we only have a string of '+'s when doing something like filter(where random() >= 0).
David Rowley
On Jan18, 2014, at 06:15 , David Rowley <dgrowleyml@gmail.com> wrote: >> On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug <fgp@phlo.org> wrote: >> On Jan17, 2014, at 23:34 , David Rowley <dgrowleyml@gmail.com> wrote: >>> The test turned out to become: >>> if (state->expectedScale == -1) >>> state->expectedScale = X.dscale; >>> else if (state->expectedScale != X.dscale) >>> state->expectedScale = -2; >>> >>> In do_numeric_accum, then when we do the inverse I just check if >>> expectedScale < 0 then return NULL. >> >> Ok, so this will rescan if and only if the dscales of all inputs match. >> I still that's overly pessimistic - we've only got a problem when we >> removed the input with the largest dscale, no? So my suggestion would be >> >> <sniped> >> >> I'd think that this avoids more restarts without about the same effort, >> but I haven't tried this though, so maybe I'm missing something. > > This is not quite right as it means if all the values are the same then > we reject inverse transitions since state->maxScale will always be equal > to X.dscale. > But you are right about the overly strict code I've put in, we should allow > values with a less than the maximum dscale to be unaggregated without > complaint. To implement this I needed a maxScaleCounter variable too so we > only reject when the maxScaleCounter gets back to 0 again. Ups, sorry, yeah. Sounds sensible. BTW, this made me realize that MIN and MAX currently have the same issue - they'll rescan if the inputs are all equal. We could avoid that by doing what you did with dscale - track the number of times we've seen the maximum. I wonder if that would be worth it - it would, unfortunately, require use to use state type "internal" there too, and hence to add final functions for all the MIN/MAX aggregates. But that seems excessive. So for now, let's just live with that. If we really *do* want to optimize this case, we could come to it from a completely different angle. Aggregates already special-case MIN and MAX to be able to use an index to evalutate SELECT MAX(c) FROM t. If we provided a way for the transition function to call the sort operator specified by SORTOP in CREATE AGGREGATE, one generic triple of forward and inverse transition function and final function would work for all the MIN and MAX aggregates. But that's material for a separate patch for 9.5 > Note that after this fix the results for my quick benchmark now look like: > > create table num (num numeric(10,2) not null); > insert into num (num) select * from generate_series(1,20000); > select sum(num) over(order by num rows between current row and unbounded following) from num; -- 113 ms > select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 156ms > select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 144 ms > > So it seems to be much less prone to falling back to brute force forward > transitions. > It also seems the / 10 version must have had to previously do 1 brute > force rescan but now it looks like it can do it in 1 scan. > >>> I'm not set on it, and I'm willing to try the lazy zeroing of the scale >>> tracker array, but I'm just not quite sure what extra real use cases it >>> covers that the one above does not. Perhaps my way won't perform inverse >>> transitions if the query did sum(numericCol / 10) or so. >> >> Dunno how many people SUM over numerics with different dscales. Easily >> possible that it's few enough to not be worth fussing over. > > Going by Tom's comments in the post above this is possible just by having an > unconstrained numeric column, but I guess there's still a good chance that > even those unconstrained numbers have the same scale or at least the scale > will likely not vary wildly enough to make us have to perform brute force > forward transitions for each row. Yeah, I'm convinced by now that your approach is the right trade-off there. Those who do have values with wildly different dscales in their columns can always add a cast to normalize them, if they experience a lot of restarts. So let's just add a sentence or two to the SUM(numeric) documentation about this, and be done. >> * build_aggregate_fnexprs() should allow NULL to be passed for invtransfn_oid, >> I think. I don't quite like that we construct that even for plain aggregates, >> and avoiding requires just an additional if. >> >> I'm not quite sure what you mean on this. It passes InvalidOid in normal >> aggregate calls (search for: "InvalidOid, /* invtrans is not needed here */") >> and only looks up the function in build_aggregate_fnexprs if >> (OidIsValid(invtransfn_oid)) is true. I'm not sure how this can be improved >> since that function is used for window aggregates and normal aggregates. I was thinking about checking for **invtransfnexpr = NULL, and not assigning if it is. But on second thought, you're right - the additional variable doesn't really hurt. So let's leave it as it is. >> * Don't we need to check for volatile function in the filter expression too? > > I did manual testing on this before and the volatility test for the aggregate > arguments seems to cover this. I didn't look into why but it just did. I've > not test this again since your refactoring. I could test this easily before > when my numeric case was changing the results because of the dscale problem, > I noticed that if I did FILTER(WHERE random() > 0) that the extra trailing > zeros would disappear. The problem now is that it's pretty hard to determine > if an inverse transition took place, the only way we can really tell is > performance. I'll see if I can invent a new test case for this by creating a > user defined aggregate as you described. I'm thinking just append '+' to a > string for transitions and '-' to a string for inverse transitions, then > just make sure we only have a string of '+'s when doing something like > filter(where random() >= 0). For your other mail I get that this works as expected. Thanks for testing this! >> * As it stands, I don't think intXand_inv and intXor_inv are worth it, since >> the case where they return non-NULL is awefully slim (only for inputs >> containing only 1 respectively only zeros). We should either track the number >> of zeros and ones per bit, which would allow us to always perform inverse >> transitions, or just drop them. > > I did think of this when I wrote them. I thought that the removing 0 case might > be quite common and worth it, but I thought the ~0 case would be less common, > but I just thought it was weird to do one without the other. > To do more tracking on these it looks like we'd need to change those aggregates > to use an state type that is internal and I think the extra tracking would mean > looping over a 8, 32 or 64 element array of int64's for each value, I just don't > think that would be a winner performance wise since the code that's there is > pretty much a handful of CPU cycles. Yeah, this is similar to the SUM(numeric) problem in that we *could* avoid all restarts, but the overhead of doing so is quite high. But whereas in the SUM(numeric) case we manage to reduce the overhead while still optimizing most cases, here I think we optimize nearly none. My vote is for dropping these functions entirely, but I don't feel particularly strongly about this... >> It's probably far more worth it for the bool and/or aggregates. We could just >> keep track of the values aggregated and the count of values as "true" and return >> true if those are the same in the case of "AND", then check the true count >> is > 0 in the case of "OR". I'd feel more strongly to go and do that if I'd >> actually ever used those aggregates for anything. That, OTOH, would be worthwhile I think. I'll go do that, though probably not today. I hope to get to it sometime tomorrow. >> * Quite a few of the inverse transition functions are marked strict, yet >> contain code to handle NULL inputs. You can just remove that code - the system >> makes sure that strict functions never receive NULL arguments. Affected are, >> AFAICS numeric_accum_inv, numeric_avg_accum_inv, int2_accum_inv, >> int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv, int4_sum_inv, >> int8_sum_inv. Not sure that list is exhaustive... > > Should be able to get a list from: > select proname,proisstrict from pg_proc where proisstrict = true and oid in(select > agginvtransfn from pg_aggregate); > > I might not have time for this today though, so if you feel like checking these > that would be really helpful. Yeah, I'll do that, also tomorrow hopefully. >> * For any of the new inverse transition functions, I'd be inclined to just >> elog() if they're called directly and not as an aggregate. In particular >> those which check for that anyway, plus the *smaller_inv and *larger_inv >> ones. I don't see why anyone would ever want to call these directly - and >> if we elog() now, we can easy change these functions later, because no external >> code can depend on them. E.g., maybe someone wants to keep the top N elements >> in the MIN/MAX aggregates one day... > > Yeah I guess the way it is now may mean we'd need to support legacy functions > for ever and a day if we changed the way they worked, but I'm not sure if adding > a check to see if it was used in an aggregate function changes that, as a user > could be using the built in function in their own user defined aggregate, so > there could still be complaints if we removed them from a major version. What > would be needed is some way to have functions internally but publish these > functions to the user, say only visible from initdb or something. I don't think > that's part of this patch though. Maybe just the fact that they're undocumented > helps give them more ability to be removed later. Hm, true. Still, I think I'd prefer us to elog() for those new functions which explicitly check whether there's an aggregation context anyway. Can do that if you want. >> * The number of new regression tests seems a bit excessive. I don't think there >> really a policy what to test and what not, but in this case I think it suffices >> if we test the basic machinery, plus the more complex functions. But e.g. most >> of the SUM and AVG aggregates use numeric_accum or numeric_avg_accum internally, >> and the wrapper code basically just does datatype conversion, so testing a few >> cases seems enough there. What I think we *should* test, but don't do currently, >> is whether the core machinery performs the expected calls of the forward and >> reverse transition function. I was thinking about creating an aggregate in the >> regression tests which simply concatenates all the calls into a string, e.g. >> you might get "F:1 F:2 F:3 I:1" if we aggregated 1,2,3 and then removed 1. >> I think that should be possible with an SQL-language forward and inverse >> transfer function, but I haven't tried. I can try, if you want. > > I agree that there are quite a lot of tests and I think that's why I took a different > approach when it came to all this little *larger_inv and smaller_inv functions, there > were just so many! I thought the aggregate tests would run in the blank of a eye > anyway, but perhaps I should consider other things than just processing time. I > created most of the aggregate call tests by writing a whole load of queries on an > unpatched version then ran the tests and took the output of that as my expected > results for my patched copy. These were really useful to check for regression when > I was working hard on nodeWindowAgg.c. I'd find it hard to pick and choose what to > remove giving that they all test something slightly different, even if it's just a > different final function. They did pick up some failures earlier when I forgot to > change the strict property on int8_avg_accum_inv. Maybe someone else has an opinion > on that the number of tests? I think the basic guideline for whether to test something in the regression test or not is not so much "did it catch an error during development", but rather "could this catch errors inadvertedly introduced later". That's not to say you shouldn't use more test during development - your procedure there's fine - the question is just whether to cut them down when the patch's done. For all these rather trivial inverse function, I think we can trust that if they work once, they're not going to break unless someone changes the function itself - they don't really have any outside dependencies. That's also what the window functions regression test does today, I think - it doesn't really test all possible cases exhaustively. best regards, Florian Pflug
BTW, this made me realize that MIN and MAX currently have the same issue -
they'll rescan if the inputs are all equal. We could avoid that by doing what
you did with dscale - track the number of times we've seen the maximum. I
wonder if that would be worth it - it would, unfortunately, require use to
use state type "internal" there too, and hence to add final functions for all
the MIN/MAX aggregates. But that seems excessive. So for now, let's just
live with that.
If we really *do* want to optimize this case, we could
come to it from a completely different angle. Aggregates already special-case
MIN and MAX to be able to use an index to evalutate SELECT MAX(c) FROM t.
If we provided a way for the transition function to call the sort operator
specified by SORTOP in CREATE AGGREGATE, one generic triple of forward and
inverse transition function and final function would work for all the
MIN and MAX aggregates. But that's material for a separate patch for 9.5Yeah, I'm convinced by now that your approach is the right trade-off there.
> Note that after this fix the results for my quick benchmark now look like:
>
> create table num (num numeric(10,2) not null);
> insert into num (num) select * from generate_series(1,20000);
> select sum(num) over(order by num rows between current row and unbounded following) from num; -- 113 ms
> select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 156ms
> select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 144 ms
>
> So it seems to be much less prone to falling back to brute force forward
> transitions.
> It also seems the / 10 version must have had to previously do 1 brute
> force rescan but now it looks like it can do it in 1 scan.
>
>>> I'm not set on it, and I'm willing to try the lazy zeroing of the scale
>>> tracker array, but I'm just not quite sure what extra real use cases it
>>> covers that the one above does not. Perhaps my way won't perform inverse
>>> transitions if the query did sum(numericCol / 10) or so.
>>
>> Dunno how many people SUM over numerics with different dscales. Easily
>> possible that it's few enough to not be worth fussing over.
>
> Going by Tom's comments in the post above this is possible just by having an
> unconstrained numeric column, but I guess there's still a good chance that
> even those unconstrained numbers have the same scale or at least the scale
> will likely not vary wildly enough to make us have to perform brute force
> forward transitions for each row.
Those who do have values with wildly different dscales in their columns can
always add a cast to normalize them, if they experience a lot of restarts.
So let's just add a sentence or two to the SUM(numeric) documentation about
this, and be done.
>> I did think of this when I wrote them. I thought that the removing 0 case mightYeah, this is similar to the SUM(numeric) problem in that we *could* avoid
> be quite common and worth it, but I thought the ~0 case would be less common,
> but I just thought it was weird to do one without the other.
> To do more tracking on these it looks like we'd need to change those aggregates
> to use an state type that is internal and I think the extra tracking would mean
> looping over a 8, 32 or 64 element array of int64's for each value, I just don't
> think that would be a winner performance wise since the code that's there is
> pretty much a handful of CPU cycles.
all restarts, but the overhead of doing so is quite high. But whereas in the
SUM(numeric) case we manage to reduce the overhead while still optimizing most
cases, here I think we optimize nearly none. My vote is for dropping these
functions entirely, but I don't feel particularly strongly about this...
>> It's probably far more worth it for the bool and/or aggregates. We could justThat, OTOH, would be worthwhile I think. I'll go do that, though probably
>> keep track of the values aggregated and the count of values as "true" and return
>> true if those are the same in the case of "AND", then check the true count
>> is > 0 in the case of "OR". I'd feel more strongly to go and do that if I'd
>> actually ever used those aggregates for anything.
not today. I hope to get to it sometime tomorrow.
Yeah, I'll do that, also tomorrow hopefully.
>> * Quite a few of the inverse transition functions are marked strict, yet
>> contain code to handle NULL inputs. You can just remove that code - the system
>> makes sure that strict functions never receive NULL arguments. Affected are,
>> AFAICS numeric_accum_inv, numeric_avg_accum_inv, int2_accum_inv,
>> int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv, int4_sum_inv,
>> int8_sum_inv. Not sure that list is exhaustive...
>
> Should be able to get a list from:
> select proname,proisstrict from pg_proc where proisstrict = true and oid in(select
> agginvtransfn from pg_aggregate);
>
> I might not have time for this today though, so if you feel like checking these
> that would be really helpful.
Hm, true. Still, I think I'd prefer us to elog() for those new functions which
>> * For any of the new inverse transition functions, I'd be inclined to just
>> elog() if they're called directly and not as an aggregate. In particular
>> those which check for that anyway, plus the *smaller_inv and *larger_inv
>> ones. I don't see why anyone would ever want to call these directly - and
>> if we elog() now, we can easy change these functions later, because no external
>> code can depend on them. E.g., maybe someone wants to keep the top N elements
>> in the MIN/MAX aggregates one day...
>
> Yeah I guess the way it is now may mean we'd need to support legacy functions
> for ever and a day if we changed the way they worked, but I'm not sure if adding
> a check to see if it was used in an aggregate function changes that, as a user
> could be using the built in function in their own user defined aggregate, so
> there could still be complaints if we removed them from a major version. What
> would be needed is some way to have functions internally but publish these
> functions to the user, say only visible from initdb or something. I don't think
> that's part of this patch though. Maybe just the fact that they're undocumented
> helps give them more ability to be removed later.
explicitly check whether there's an aggregation context anyway. Can do that if you
want.
>> * The number of new regression tests seems a bit excessive. I don't think thereI think the basic guideline for whether to test something in the regression test or
>> really a policy what to test and what not, but in this case I think it suffices
>> if we test the basic machinery, plus the more complex functions. But e.g. most
>> of the SUM and AVG aggregates use numeric_accum or numeric_avg_accum internally,
>> and the wrapper code basically just does datatype conversion, so testing a few
>> cases seems enough there. What I think we *should* test, but don't do currently,
>> is whether the core machinery performs the expected calls of the forward and
>> reverse transition function. I was thinking about creating an aggregate in the
>> regression tests which simply concatenates all the calls into a string, e.g.
>> you might get "F:1 F:2 F:3 I:1" if we aggregated 1,2,3 and then removed 1.
>> I think that should be possible with an SQL-language forward and inverse
>> transfer function, but I haven't tried. I can try, if you want.
>
> I agree that there are quite a lot of tests and I think that's why I took a different
> approach when it came to all this little *larger_inv and smaller_inv functions, there
> were just so many! I thought the aggregate tests would run in the blank of a eye
> anyway, but perhaps I should consider other things than just processing time. I
> created most of the aggregate call tests by writing a whole load of queries on an
> unpatched version then ran the tests and took the output of that as my expected
> results for my patched copy. These were really useful to check for regression when
> I was working hard on nodeWindowAgg.c. I'd find it hard to pick and choose what to
> remove giving that they all test something slightly different, even if it's just a
> different final function. They did pick up some failures earlier when I forgot to
> change the strict property on int8_avg_accum_inv. Maybe someone else has an opinion
> on that the number of tests?
not is not so much "did it catch an error during development", but rather "could this
catch errors inadvertedly introduced later". That's not to say you shouldn't use
more test during development - your procedure there's fine - the question is just
whether to cut them down when the patch's done. For all these rather trivial inverse
function, I think we can trust that if they work once, they're not going to break
unless someone changes the function itself - they don't really have any outside
dependencies. That's also what the window functions regression test does today, I
think - it doesn't really test all possible cases exhaustively.
best regards,
Florian Pflug
That, OTOH, would be worthwhile I think. I'll go do that, though probably
>> It's probably far more worth it for the bool and/or aggregates. We could just
>> keep track of the values aggregated and the count of values as "true" and return
>> true if those are the same in the case of "AND", then check the true count
>> is > 0 in the case of "OR". I'd feel more strongly to go and do that if I'd
>> actually ever used those aggregates for anything.
not today. I hope to get to it sometime tomorrow.I've commited a patch to the github repo to do this.but I'm not sure if we can keep it as I had to remove the sort op as I explained above.
The latest version of the patch is attached.
RegardsDavid Rowley
Attachment
On Jan19, 2014, at 05:27 , David Rowley <dgrowleyml@gmail.com> wrote: >> I just finished implementing the inverse transition functions for bool_and >> and bool_or, these aggregates had a sort operator which I assume would have >> allowed an index scan to be performed, but since I had to change the first >> argument of these aggregates to internal and that meant I had to get rid of >> the sort operator... Why does having transition type "internal" prevent you from specifying a sort operator? The sort operator's argument types must match the *input* type of the aggregate, not the transition type. Here's a pure SQL implementation of an optimized bool_and called myand_agg that uses state type bigint[] and specifies a sort operator. create or replace function myboolagg_fwd(counts bigint[], value bool) returns bigint[] as $$ select array[ counts[1]+ case value when true then 0 else 1 end, counts[2] + case value when true then 1 else 0 end ] $$ languagesql strict immutable; create or replace function myboolagg_inv(counts bigint[], value bool) returns bigint[] as $$ select array[ counts[1]- case value when true then 0 else 1 end, counts[2] - case value when true then 1 else 0 end ] $$ languagesql strict immutable; create or replace function myboolagg_and(counts bigint[]) returns bool as $$ select case counts[1] when 0 then true elsefalse end $$ language sql strict immutable; create aggregate myand_agg (bool) ( stype = bigint[], sfunc = myboolagg_fwd, invfunc = myboolagg_inv, finalfunc= myboolagg_and, sortop = <, initcond = '{0,0}' ); With this, doing create table boolvals as select i, random() < 0.5 as v from generate_series(1,10000) i; create index on boolvals(v); explain analyze select myand_agg(v) from boolvals; yields Result (cost=0.33..0.34 rows=1 width=0) (actual time=0.067..0.067 rows=1 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.29..0.33 rows=1 width=1) (actual time=0.061..0.061 rows=1 loops=1) -> Index Only Scan using boolvals_v_idxon boolvals (cost=0.29..474.41 rows=9950 width=1) (actual time=0.061..0.061 rows=1 loops=1) Index Cond: (v IS NOT NULL) Heap Fetches: 1Total runtime: 0.100 ms which looks fine, no? best regards, Florian Pflug
On Jan19, 2014, at 05:27 , David Rowley <dgrowleyml@gmail.com> wrote:Why does having transition type "internal" prevent you from specifying a
>> I just finished implementing the inverse transition functions for bool_and
>> and bool_or, these aggregates had a sort operator which I assume would have
>> allowed an index scan to be performed, but since I had to change the first
>> argument of these aggregates to internal and that meant I had to get rid of
>> the sort operator...
sort operator? The sort operator's argument types must match the *input*
type of the aggregate, not the transition type.
Here's a pure SQL implementation of an optimized bool_and called myand_agg
that uses state type bigint[] and specifies a sort operator.
create or replace function myboolagg_fwd(counts bigint[], value bool)
returns bigint[] as $$
select array[
counts[1] + case value when true then 0 else 1 end,
counts[2] + case value when true then 1 else 0 end
]
$$ language sql strict immutable;
create or replace function myboolagg_inv(counts bigint[], value bool)
returns bigint[] as $$
select array[
counts[1] - case value when true then 0 else 1 end,
counts[2] - case value when true then 1 else 0 end
]
$$ language sql strict immutable;
create or replace function myboolagg_and(counts bigint[])
returns bool as $$
select case counts[1] when 0 then true else false end
$$ language sql strict immutable;
create aggregate myand_agg (bool) (
stype = bigint[],
sfunc = myboolagg_fwd,
invfunc = myboolagg_inv,
finalfunc = myboolagg_and,
sortop = <,
initcond = '{0,0}'
);
With this, doing
create table boolvals as
select i, random() < 0.5 as v from generate_series(1,10000) i;
create index on boolvals(v);
explain analyze select myand_agg(v) from boolvals;
yields
Result (cost=0.33..0.34 rows=1 width=0) (actual time=0.067..0.067 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.29..0.33 rows=1 width=1) (actual time=0.061..0.061 rows=1 loops=1)
-> Index Only Scan using boolvals_v_idx on boolvals (cost=0.29..474.41 rows=9950 width=1) (actual time=0.061..0.061 rows=1 loops=1)
Index Cond: (v IS NOT NULL)
Heap Fetches: 1
Total runtime: 0.100 ms
which looks fine, no?
best regards,
Florian Pflug
On Jan19, 2014, at 20:00 , David Rowley <dgrowleyml@gmail.com> wrote: > I've applied that patch again and put in the sort operators. I've push a new version to https://github.com/fgp/postgres/tree/invtrans This branch includes the following changes * A bunch of missing declaration for *_inv functions * An assert that the frame end doesn't move backwards - I realized thatit is after all easy to do that, if it's done afterthe loop which addsthe new values, not before. * EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregatetransitions per row and aggregate. It's a bit imprecise,because it doesn'ttrack the count per aggregate, but it's still a good metric for how wellthe inverse transitionfunctions work. If the number is close to one, youknow that very few rescans are happening. * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, andit's the last commit, so if you object to that,then you can merge up toeafa72330f23f7c970019156fcc26b18dd55be27 instead ofde3d9148be9732c4870b76af96c309eaf1d613d7. A few more things I noticed, all minor stuff * do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if theinverse transition function returns NULL once,we never call it again, so theflag won't have any practical effect. And second, assume we ever called theforward transitionfunction after the inverse fail, and then retried the inverse.In the case of do_numeric_discard(), that actually*could* allow the inverseto suddenly succeed - if the call to the forward function increased the dscalebeyond thatof the element we tried to remove, removal would suddenly bepossible again. We never do that, of course, and it seemsunlikely we everwill. But it's still weird to have code which serves no other purpose than topessimize a case whichwould otherwise just work fine. * The state == NULL checks in all the strict inverse transition functions areredundant. I haven't taken a close look at the documentation yet, I hope to be able to do that tomorrow. Otherwise, things look good as far as I'm concerned. best regards, Florian Pflug
On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug <fgp@phlo.org> wrote:* EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate
transitions per row and aggregate. It's a bit imprecise, because it doesn't
track the count per aggregate, but it's still a good metric for how well
the inverse transition functions work. If the number is close to one, you
know that very few rescans are happening.I've not looked at this yet and I don't think I'll have time tonight, but it sounds interesting. I guess it might be quite nice to have a way to see this especially with the way the numeric stuff works, it might be actually pretty hard to otherwise know how many inverse transition "failures" there had been. Do you think it's also worth tracking the inverse transition failures too?
On Jan19, 2014, at 20:00 , David Rowley <dgrowleyml@gmail.com> wrote: > I've applied that patch again and put in the sort operators. I've push a new version to https://github.com/fgp/postgres/tree/invtrans which includes * A bunch of missing declaration for *_inv functions * An assert that the frame end doesn't move backwards - I realized that it is after all easy to do that, if it's done afterthe loop which adds the new values, not before. * EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate transitions per row and aggregate. It's a bit imprecise,because it doesn't track the count per aggregate, but it's still a good metric for how well the inverse transitionfunctions work. If the number is close to one, you know that very few rescans are happening. * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and it's the last commit, so if you object to that,then you can merge up to eafa72330f23f7c970019156fcc26b18dd55be27 instead of de3d9148be9732c4870b76af96c309eaf1d613d7. A few more things I noticed, all minor stuff * do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if the inverse transition function returns NULL once,we never call it again, so the flag won't have any practical effect. And second, assume we ever called the forward transitionfunction after the inverse fail, and then retried the inverse. In the case of do_numeric_discard(), that actually*could* allow the inverse to suddenly succeed - if the call to the forward function increased the dscale beyond thatof the element we tried to remove, removal would suddenly be possible again. We never do that, of course, and it seemsunlikely we ever will. But it's still weird to have code which serves no other purpose than to pessimize a case whichwould otherwise just work fine. * The state == NULL checks in all the strict inverse transition functions are redundant. I haven't taken a close look at the documentation yet, I hope to be able to do that tomorrow. Otherwise, things look good as far as I'm concerned. best regards, Florian Pflug
On Jan19, 2014, at 20:00 , David Rowley <dgrowleyml@gmail.com> wrote:I've push a new version to https://github.com/fgp/postgres/tree/invtrans
> I've applied that patch again and put in the sort operators.
which includes
* A bunch of missing declaration for *_inv functions
* An assert that the frame end doesn't move backwards - I realized that
it is after all easy to do that, if it's done after the loop which adds
the new values, not before.
* EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate
transitions per row and aggregate. It's a bit imprecise, because it doesn't
track the count per aggregate, but it's still a good metric for how well
the inverse transition functions work. If the number is close to one, you
know that very few rescans are happening.
* I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
it's the last commit, so if you object to that, then you can merge up to
eafa72330f23f7c970019156fcc26b18dd55be27 instead of
de3d9148be9732c4870b76af96c309eaf1d613d7.
A few more things I noticed, all minor stuff
* do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if the
inverse transition function returns NULL once, we never call it again, so the
flag won't have any practical effect. And second, assume we ever called the
forward transition function after the inverse fail, and then retried the inverse.
In the case of do_numeric_discard(), that actually *could* allow the inverse
to suddenly succeed - if the call to the forward function increased the dscale
beyond that of the element we tried to remove, removal would suddenly be
possible again. We never do that, of course, and it seems unlikely we ever
will. But it's still weird to have code which serves no other purpose than to
pessimize a case which would otherwise just work fine.
* The state == NULL checks in all the strict inverse transition functions are
redundant.
I haven't taken a close look at the documentation yet, I hope to be able to
do that tomorrow. Otherwise, things look good as far as I'm concerned.
best regards,
Florian Pflug
On Jan20, 2014, at 08:42 , David Rowley <dgrowleyml@gmail.com> wrote: >> On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug <fgp@phlo.org> wrote: >> * An assert that the frame end doesn't move backwards - I realized that >> it is after all easy to do that, if it's done after the loop which adds >> the new values, not before. > > I've applied this too, but I'm wondering why an elog for if the head moves > back, but an assert if the tail moves back? When I put the frame head check in, I was concerned that the code might crash or loop endlessly if aggregatedbase was ever larger than frameheadpos, so I made it elog(), just for safety. The frame end check, OTOH, is done at the very end, so it doesn't protect against much, it just documents that it's not supposed to happen. But yeah, it's bit weird. Feel free to turn the frame end check into an elog() too if you want to. >> * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and >> it's the last commit, so if you object to that, then you can merge up to >> eafa72330f23f7c970019156fcc26b18dd55be27 instead of >> de3d9148be9732c4870b76af96c309eaf1d613d7. > > > Seems like sfunc really should be tfunc then we could have invtfunc. I'd probably > understand this better if I knew what the 's' was for in sfunc. I've not applied > this just yet. Do you have a reason why you think it's better? My issue with just "invfunc" is mainly that it's too generic - it doesn't tell you what it's supposed to be the inverse of. I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and that the naming is inspired by control theory, where the function which acts on the state space is often called S. > Thanks, yeah those really do need a review. I've lost a bit of direction with > them and I'm not quite sure just how much detail to go in to with it. I'd like > to explain a bit that users who need to use their numeric columns in window > aggregates might want to think about having a defined scale to the numeric rather > than an undefined scale and explain that this is because the inverse transition > function for numeric bails out if it loses the maximum seen dscale. Though it > does seem generally a good idea to have a defined scale, but then I guess you've > got to have a bit of knowledge about the numbers you're storing in that case. > I'm not quite sure how to put that into words friendly enough for the documents > just yet and or exactly where to put the words. So any ideas or patches you have > around that would be great. Here's what I image the docs should look like, very roughly * In CREATE AGGREGATE, we should state the precise axioms we assume about forward and inverse transition functions. The last time I read the text there, it was a bit ambiguous about whether inverse transition functions assume commutativity, i.e. whether we assume that we can remove inputs other than the first one from the aggregation. Currently, all the inverse transition functions we have are, in fact, commutative, because all the forward transition function are also. But we could e.g. add an inverse to array_agg and string_agg, and those would obviously need this ordering guarantee. I'd also like us to explain that the "inverse" in "inverse transition function" shouldn't be taken too literally. For forward transition function F, inverse transition function G, and inputs a,b,... we *don't require that G(F(s,a),a) == s. We we do require is that if I is the initial state, then G(F(...F(F(I,a),b)...,z),a) == F(...F(I,b)...,z). (Well, actually we don't strictly require even that, the two states don't need to be identical, they just need to produce identical outputs when passed to the final function) * In CREATE AGGREGATE, we should also explain the NULL semantics you get with various combinations of strict and non-strict forward and inverse transition functions. I think a table listing all the combinations is in order there, with a column explaining the semantics you get. We should also clearly state that once you provide an inverse transition function, NULL isn't a valid state value anymore, except as an initial state, i.e. that the forward transition function must never return NULL in this case. * The window function page should explain the performance hazards when a moving frame head is involved. Ideally, we'd list which aggregates never have to restart, and which sometimes might, and what you can do to avoid that. We can also tell people to use EXPLAIN VERBOSE ANALYZE to check whether restarts are occurring. This is were we'd tell people to cast their numeric operands to a type with defined scale to avoid restarts, and were we'd state the SUM(float) *does* restart always due to precision loss issues. BTW, something that we haven't addressed at all is how inverse transition functions interact with what is called "ordered-set aggregates". I haven't wrapped my head around these fully, I think, so I'm still not sure if there's anything to do there or not. Can those even be used as window functions? Should we forbid ordered-set aggregates from specifying an inverse transition function? best regards, Florian Pflug
>> I think you *can't* cover them for the float types; roundoff errorWell, we can't easily do better than the old answers, and the new ones
>> would mean you don't get the same answers as before.
> I was going to say the same thing. But then I started to wonder.... What's
> so special about the answers we used to give? They are also subject to
> round off and the results are already quite questionable in those cases.
might be arbitrarily worse. Example: sum or average across single-row
windows ought to be exact in any case, but it might be arbitrarily wrong
with the negative-transition technique.
More generally, this is supposed to be a performance enhancement only;
it's not supposed to change the results.
Attachment
On Jan21, 2014, at 10:53 , David Rowley <dgrowleyml@gmail.com> wrote: > It came to me that it might be possible to implement inverse transitions > for floating point aggregates by just detecting if precision has been > lost during forward transitions. > > I've written the test to do this as: > > IF state.value + value = state.value AND value <> 0 THEN > newstate.precision_lost := true; > newstate.value := state.value; > ELSE > newstate.precision_lost := false; > newstate.value := state.value + value; > END IF; > > > The inverse transition function checks the precision_lost and if it's true it > returns NULL. The core code is now implemented (thanks to Florian) to > re-aggregate when NULL is returned from the inverse transition function. That's not sufficient, I fear. You can lose all significant digits of the value and still have precision_lost = false afterwards. Try summing over 1e16, 1.01. "SELECT 1e16::float8 + 1.01::float8 = 1e16::float8" returns FALSE, yet "SELECT 1e16::float8 + 1.01::float8 - 1e16::float8" returns "2" where "1.01" would have been correct. That's still too much precision loss. I'm quite certain that the general idea has merit, but the actual invertibility condition is going to be more involved. If you want to play with this, I think the first step has to be to find a set of guarantees that SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the summands all have the same sign, the error is bound by C * N, where C is some (machine-specific?) constant (about 1e-15 or so), and N is the number of input rows. Or at least so I think from looking at SUMs over floats in descending order, which I guess is the worst case. You could, for example, try to see if you can find a invertibility conditions which guarantees the same, but allows C to be larger. That would put a bound on the number of digits lost by the new SUM(float) compared to the old one. I don't have high hopes for this getting int 9.4, though. > If it seems sound enough, then I may implement it in C to see how much > overhead it adds to forward aggregation for floating point types, but even > if it did add too much overhead to forward aggregation it might be worth > allowing aggregates to have 2 forward transition functions and if the 2nd > one exists then it could be used in windowing functions where the frame > does not have "unbounded following". I don't think adding yet another type of aggregation function is the solution here. best regards, Florian Pflug
On Jan20, 2014, at 15:20 , Florian Pflug <fgp@phlo.org> wrote: > * In CREATE AGGREGATE, we should state the precise axioms we assume about forward > and inverse transition functions. The last time I read the text there, it was > a bit ambiguous about whether inverse transition functions assume commutativity, > i.e. whether we assume that we can remove inputs other than the first one from > the aggregation. Currently, all the inverse transition functions we have are, > in fact, commutative, because all the forward transition function are also. But > we could e.g. add an inverse to array_agg and string_agg, and those would > obviously need this ordering guarantee. I'd also like us to explain that the > "inverse" in "inverse transition function" shouldn't be taken too literally. For > forward transition function F, inverse transition function G, and inputs a,b,... > we *don't require that G(F(s,a),a) == s. We we do require is that if I is the > initial state, then > > G(F(...F(F(I,a),b)...,z),a) == F(...F(I,b)...,z). > > (Well, actually we don't strictly require even that, the two states don't > need to be identical, they just need to produce identical outputs when passed > to the final function) > > * In CREATE AGGREGATE, we should also explain the NULL semantics you get > with various combinations of strict and non-strict forward and inverse > transition functions. I think a table listing all the combinations is in order > there, with a column explaining the semantics you get. We should also clearly > state that once you provide an inverse transition function, NULL isn't a valid > state value anymore, except as an initial state, i.e. that the forward transition > function must never return NULL in this case. I gave that a shot, the results are at https://github.com/fgp/postgres/tree/invtrans > * The window function page should explain the performance hazards when > a moving frame head is involved. Ideally, we'd list which aggregates never > have to restart, and which sometimes might, and what you can do to avoid that. > We can also tell people to use EXPLAIN VERBOSE ANALYZE to check whether > restarts are occurring. This is were we'd tell people to cast their numeric > operands to a type with defined scale to avoid restarts, and were we'd state > the SUM(float) *does* restart always due to precision loss issues. > BTW, something that we haven't addressed at all is how inverse transition > functions interact with what is called "ordered-set aggregates". I haven't > wrapped my head around these fully, I think, so I'm still not sure if there's > anything to do there or not. Can those even be used as window functions? > Should we forbid ordered-set aggregates from specifying an inverse transition > function? It seems that these aren't valid window function anyway, so there's nothing to do for now, I think. best regards, Florian Pflug
On Jan18, 2014, at 06:15 , David Rowley <dgrowleyml@gmail.com> wrote:> Note that after this fix the results for my quick benchmark now look like:>
> create table num (num numeric(10,2) not null);
> insert into num (num) select * from generate_series(1,20000);
> select sum(num) over(order by num rows between current row and unbounded following) from num; -- 113 ms
> select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 156ms
> select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 144 ms
>
> So it seems to be much less prone to falling back to brute force forward
> transitions.
> It also seems the / 10 version must have had to previously do 1 brute
> force rescan but now it looks like it can do it in 1 scan.
>
Attachment
On Jan20, 2014, at 08:42 , David Rowley <dgrowleyml@gmail.com> wrote:
>> On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug <fgp@phlo.org> wrote:>> * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and>> it's the last commit, so if you object to that, then you can merge up toMy issue with just "invfunc" is mainly that it's too generic - it doesn't tell
>> eafa72330f23f7c970019156fcc26b18dd55be27 instead of
>> de3d9148be9732c4870b76af96c309eaf1d613d7.
>
>
> Seems like sfunc really should be tfunc then we could have invtfunc. I'd probably
> understand this better if I knew what the 's' was for in sfunc. I've not applied
> this just yet. Do you have a reason why you think it's better?
you what it's supposed to be the inverse of.
I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and that
the naming is inspired by control theory, where the function which acts on the
state space is often called S.
On Jan21, 2014, at 10:53 , David Rowley <dgrowleyml@gmail.com> wrote:That's not sufficient, I fear. You can lose all significant digits of the value
> It came to me that it might be possible to implement inverse transitions
> for floating point aggregates by just detecting if precision has been
> lost during forward transitions.
>
> I've written the test to do this as:
>
> IF state.value + value = state.value AND value <> 0 THEN
> newstate.precision_lost := true;
> newstate.value := state.value;
> ELSE
> newstate.precision_lost := false;
> newstate.value := state.value + value;
> END IF;
>
>
> The inverse transition function checks the precision_lost and if it's true it
> returns NULL. The core code is now implemented (thanks to Florian) to
> re-aggregate when NULL is returned from the inverse transition function.
and still have precision_lost = false afterwards. Try summing over 1e16, 1.01.
"SELECT 1e16::float8 + 1.01::float8 = 1e16::float8" returns FALSE, yet
"SELECT 1e16::float8 + 1.01::float8 - 1e16::float8" returns "2" where "1.01"
would have been correct. That's still too much precision loss.
I'm quite certain that the general idea has merit, but the actual
invertibility condition is going to be more involved. If you want to play with
this, I think the first step has to be to find a set of guarantees that
SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
summands all have the same sign, the error is bound by C * N, where C is some
(machine-specific?) constant (about 1e-15 or so), and N is the number of input
rows. Or at least so I think from looking at SUMs over floats in descending
order, which I guess is the worst case. You could, for example, try to see if
you can find a invertibility conditions which guarantees the same, but allows
C to be larger. That would put a bound on the number of digits lost by the new
SUM(float) compared to the old one.
I don't have high hopes for this getting int 9.4, though.
> If it seems sound enough, then I may implement it in C to see how muchI don't think adding yet another type of aggregation function is the
> overhead it adds to forward aggregation for floating point types, but even
> if it did add too much overhead to forward aggregation it might be worth
> allowing aggregates to have 2 forward transition functions and if the 2nd
> one exists then it could be used in windowing functions where the frame
> does not have "unbounded following".
solution here.
David Rowley
On Jan23, 2014, at 01:17 , David Rowley <dgrowleyml@gmail.com> wrote: > On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp@phlo.org> wrote: >> If you want to play with >> this, I think the first step has to be to find a set of guarantees that >> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the >> summands all have the same sign, the error is bound by C * N, where C is some >> (machine-specific?) constant (about 1e-15 or so), and N is the number of input >> rows. Or at least so I think from looking at SUMs over floats in descending >> order, which I guess is the worst case. You could, for example, try to see if >> you can find a invertibility conditions which guarantees the same, but allows >> C to be larger. That would put a bound on the number of digits lost by the new >> SUM(float) compared to the old one. > > I guess if the case is that normal (non-window) sum(float) results are undefined > unless you add an order by to the aggregate then I guess there is no possible > logic to put in for inverse transitions that will make them behave the same as > an undefined behaviour. Actually, if sum(float) was really undefined, it'd be *very* easy to provide an inverse transition function - just make it a no-op. Heck, you could then even make the forward transition function a no-op, since the very definition of "undefined behaviour" is "result can be anything, including setting your house on fire". The point is, it's *not* undefined - it's just imprecise. And the imprecision can even be quantified, it just depends on the number of input rows (the equal-sign requirement is mostly there to make "imprecision" equivalent to "relative error"). >> > If it seems sound enough, then I may implement it in C to see how much >> > overhead it adds to forward aggregation for floating point types, but even >> > if it did add too much overhead to forward aggregation it might be worth >> > allowing aggregates to have 2 forward transition functions and if the 2nd >> > one exists then it could be used in windowing functions where the frame >> > does not have "unbounded following". >> >> I don't think adding yet another type of aggregation function is the >> solution here. > > Why not? > > I thought, if in the cases where we need to burden the forward transition > functions with extra code to make inverse transitions possible, then why > not have an extra transition function so that it does not slow down normal > aggregation? > > My testing of sum(numeric) last night does not show any slow down by that > extra dscale tracking code that was added, but if it did then I'd be thinking > seriously about 2 forward transition functions to get around the problem. > I don't understand what would be wrong with that. The core code could just > make use of the 2nd function if it existed and inverse transitions were > thought to be required. i.e in a window context and does not > have UNBOUNDED PRECEDING. First, every additional function increases the maintenance burden, and at some point the expected benefit simply isn't worth it. IMHO at least. Secondly, you'd really need a second aggregate definition - usually, the non-invertible aggregate will get away with a much smaller state representation. Aggregates with state type "internal" could hack their way around that by simply not using the additional parts of their state structure, but e.g. plpgsql aggregates would still incur overhead due to repeated copying of a unnecessary large state value. If it's possible to represent the forward-only but not the invertible state state with a copy-by-value type, that overhead could easily be a large part of the total overhead you paid for having an inverse transition function. And lastly, we could achieve much of the benefit of a second transition function by simply telling the forward transition function whether to expect inverse transition function calls. We already expose things like the aggregation memory context to C-language transition functions, so the basic mechanism is already there. In fact, I pondered whether to do this - but then figured I'd leave it to however needs that facility first. Though it wouldn't be much code - basically, setting a flag in the WindowAggState, and exporting a function to be used by aggregates to read it, similar to what AggCheckCallContext does. best regards, Florian Pflug
On Jan23, 2014, at 01:07 , David Rowley <dgrowleyml@gmail.com> wrote: > On Tue, Jan 21, 2014 at 3:20 AM, Florian Pflug <fgp@phlo.org> wrote: >> On Jan20, 2014, at 08:42 , David Rowley <dgrowleyml@gmail.com> wrote: >> >> On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug <fgp@phlo.org> wrote: >> >> * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and >> >> it's the last commit, so if you object to that, then you can merge up to >> >> eafa72330f23f7c970019156fcc26b18dd55be27 instead of >> >> de3d9148be9732c4870b76af96c309eaf1d613d7. >> > >> > >> > Seems like sfunc really should be tfunc then we could have invtfunc. I'd probably >> > understand this better if I knew what the 's' was for in sfunc. I've not applied >> > this just yet. Do you have a reason why you think it's better? >> >> My issue with just "invfunc" is mainly that it's too generic - it doesn't tell >> you what it's supposed to be the inverse of. >> >> I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and that >> the naming is inspired by control theory, where the function which acts on the >> state space is often called S. > > Ok, that makes more sense now and it seems like a reasonable idea. I'm not not quite > sure yet as when someone said upthread that these "negative transition functions" as > I was calling them at the time should really be called "inverse transition functions", > I then posted that I was going to call the create aggregate option "invfunc" which > nobody seemed to object to. I just don't want to go and change that now. It is very > possible this will come up again when the committer is looking at the patch. It would > be a waste if it ended up back at invfunc after we changed it to invsfunc. Since we already settled on "inverse transition function", I kinda doubt that calling the parameter invsfunc is going to meet a lot of resistance. But we can put that off a little longer still... I've pushed a few additional things to https://github.com/fgp/postgres/tree/invtrans. * I update the CREATE AGGREGATE documentation, trying to include the description of the various modes of inverse transitionfunctions into the paragraphs which already explained about STRICT for transition functions and such. * I've also updated the list of window functions to include a list of those aggregates which potentially need to restartthe computation, i.e. MIN/MAX and the like. * I've changed nodeWindowAgg.c to use per-aggregate aggregation contexts for the invertible aggregates. Without that, theaggregate context is potentially never reset, because that previously required *all* the aggregates to restart at thesame time. That would be OK if we were sure not to leak unbounded amounts of stuff stores in that context, but unfortunatelywe sometimes do. For example, whenever a strict, invertible aggregate ends up with only NULL inputs, we re-initializethe aggregation, which leaks the old state value. We could pfree() that of course, but that state value mightreference other stuff that we don't know about and thus cannot free. Separating the aggregation contexts is the onlysolution I came up with, so I did that. * I've also tweaked an if to flag aggregates as invertible only if the frame head can actually move, i.e. if the frame startclause is something other than UNBOUNDED PRECEEDING. Since the choice of whether to use a private aggregation contextis driven by that flag, that also makes the above apply only to aggregates were the inverse transition function isactually used. I hope to find some time tomorrow or so to complete my pass through the documentation - what's still missing as an explanation of the EXPLAIN VERBOSE ANALYZE field and maybe some cleanup of xaggr.sgml. Do you have any additional things pending? best regards, Florian Pflug
On 22 January 2014 09:54, David Rowley <dgrowleyml@gmail.com> wrote: > I've performed some more benchmarks on this patch tonight. The results and > full recreation scripts are attached along with the patch it was tested > against. > I noticed that the rate of changes to this patch has dropped off, which I took as sign that it is ready for review, so I started doing that. My first thought was wow, this is a big patch! I think it should probably be broken up. It might be overly ambitious to try to get all of this committed during this commitfest, and in any case, I suspect that the committer would probably choose to commit it in stages. Perhaps something like: Patch 1- Basic support for inverse transition functions, CREATE AGGREGATE support and doc updates. This should include test cases to validate that the underlying executor changes are correct, by defining custom aggregates such as sum(int) and array_agg() using inverse transition functions. Patch 2- Add built-in inverse transition functions for count, sum(int), and friends. Patch 3, 4...- Other related groups of built-in aggregates. By this point, it should be a fairly mechanical process. Splitting it up this way now should help to focus on getting patch 1 correct, without being distracted by all the other aggregates that may or may not usefully be made to have inverse transition functions. I think the value of the feature has been proved, and it is good to see that it can be applied to so many aggregates, but let's not try to do it all at once. Regarding what would be in patch 1, I've only given it a cursory look, but I did immediately notice a problem with the nodeWindowAgg.c changes --- for aggregates with non-strict transition functions, something equivalent to the not null counting code is needed to make it work correctly with FILTER. For example, the following query gives the wrong results with this patch: SELECT array_agg(i) FILTER (WHERE i%3=0) OVER (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) FROM generate_series(1,10)g(i); Perhaps it should just always count the non-skipped values added after removal of filtered and null values. Then you might be able to simplify things by getting rid of ignore_nulls from WindowStatePerAggData and replacing notnullcount with a more general valueCount to track the number of values actually added to transValue. I haven't read through nodeWindowAgg.c in any detail yet (I think it will take me a while to fully get my head round that code) but I think it needs some more test cases to verify that the various corner cases are covered. Regards, Dean
On Jan24, 2014, at 08:47 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > I think it should probably be broken up. It might be overly ambitious > to try to get all of this committed during this commitfest, and in any > case, I suspect that the committer would probably choose to commit it > in stages. Perhaps something like: > > Patch 1 > - Basic support for inverse transition functions, CREATE AGGREGATE > support and doc updates. This should include test cases to validate > that the underlying executor changes are correct, by defining custom > aggregates such as sum(int) and array_agg() using inverse transition > functions. > > Patch 2 > - Add built-in inverse transition functions for count, sum(int), and friends. > > Patch 3, 4... > - Other related groups of built-in aggregates. By this point, it > should be a fairly mechanical process. > > Splitting it up this way now should help to focus on getting patch 1 > correct, without being distracted by all the other aggregates that may > or may not usefully be made to have inverse transition functions. I > think the value of the feature has been proved, and it is good to see > that it can be applied to so many aggregates, but let's not try to do > it all at once. Working on that now, will post individual patches later today. best regards, Florian Pflug
On Jan24, 2014, at 08:47 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > My first thought was wow, this is a big patch! > > I think it should probably be broken up. It might be overly ambitious > to try to get all of this committed during this commitfest, and in any > case, I suspect that the committer would probably choose to commit it > in stages. Perhaps something like: > > <snipped> > > Splitting it up this way now should help to focus on getting patch 1 > correct, without being distracted by all the other aggregates that may > or may not usefully be made to have inverse transition functions. I > think the value of the feature has been proved, and it is good to see > that it can be applied to so many aggregates, but let's not try to do > it all at once. I've now split this up into invtrans_base: Basic machinery plus tests with SQL-language aggregates invtrans_arith: COUNT(),SUM(),AVG(),STDDEV() and the like invtrans_minmax: MIN(),MAX(),BOOL_AND(),BOOL_OR() invtrans_collecting: ARRAY_AGG(), STRING_AGG() invtrans_docs: Documentation Each of these corresponds to one of the invtrans_* branches in my github repo at https://github.com/fgp/postgres Attached are the latest versions of these patches. > Regarding what would be in patch 1, I've only given it a cursory look, > but I did immediately notice a problem with the nodeWindowAgg.c > changes --- for aggregates with non-strict transition functions, > something equivalent to the not null counting code is needed to make > it work correctly with FILTER. For example, the following query gives > the wrong results with this patch: > > SELECT array_agg(i) FILTER (WHERE i%3=0) > OVER (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) > FROM generate_series(1,10) g(i); > > Perhaps it should just always count the non-skipped values added after > removal of filtered and null values. Then you might be able to > simplify things by getting rid of ignore_nulls from > WindowStatePerAggData and replacing notnullcount with a more general > valueCount to track the number of values actually added to transValue. Ugh, very true. Though I don't see how to get rid of ignore_nulls, unless we rewrite the filter clause to (filter_clause) AND arg1 IS NOT NULL AND ... argN IS NOT NULL for strict aggregates, which would probably be much slower and wouldn't be right if any of the args call volatile functions. But I've generalized notnullcount to transValueCount as you suggested, and got rid of noTransValue, since that's now equivalent to transValueCount == 0 && transValueIsNull. string_agg() was also severely broken - it needs to track the number of non-NULL strings within the state, since it skips rows with a NULL string, but not with a NULL delimiter, which means we can't mark it strict to make nodeWindowAgg.c take care of that. > I haven't read through nodeWindowAgg.c in any detail yet (I think it > will take me a while to fully get my head round that code) but I think > it needs some more test cases to verify that the various corner cases > are covered. Yeah, I hadn't really gotten around to doing that yet :-( The base patch now contains tests of the various strict and non-strict cases, both in the filtered and non-filtered cases. The invtrans_arith patch contains David's original tests, minus the one with the logging inverse transition function, since that's tested by the base patch already. It doesn't test sum(interval) though, I think (though maybe I missed it, it's quite late already...). I think it maybe should - not sure if there are any gotchas with SUM(interval) and inverse transition functions. The invtrans_collecting patch contains tests of the NULL behaviour of string_agg() and array_agg() - I hope I've covered all the cases. The invtrans_minmax patch doesn't contain any patches yet - David, could you provide some for these functions, and also for bool_and and bool_or? We don't need to test every datatype, but a few would be nice. I've also run the regression tests of invtrans_arith and invtrans_collecting against invtrans_base (i.e, without the actual inverse transition functions) and they still succeed, meaning the inverse transition functions don't change the results. Doing the same against HEAD instead of invtrans_base yields a few diffs, but all of them look OK - the stuff the explicitly tests inverse transition functions via those "logging" aggregations that simply log all calls to both the forward and inverse transition function in a long string obviously yield different results if inverse transition functions aren't used. best regards, Florian Pflug
Attachment
On Jan23, 2014, at 01:17 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp@phlo.org> wrote:>> If you want to play withActually, if sum(float) was really undefined, it'd be *very* easy to provide an
>> this, I think the first step has to be to find a set of guarantees that
>> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
>> summands all have the same sign, the error is bound by C * N, where C is some
>> (machine-specific?) constant (about 1e-15 or so), and N is the number of input
>> rows. Or at least so I think from looking at SUMs over floats in descending
>> order, which I guess is the worst case. You could, for example, try to see if
>> you can find a invertibility conditions which guarantees the same, but allows
>> C to be larger. That would put a bound on the number of digits lost by the new
>> SUM(float) compared to the old one.
>
> I guess if the case is that normal (non-window) sum(float) results are undefined
> unless you add an order by to the aggregate then I guess there is no possible
> logic to put in for inverse transitions that will make them behave the same as
> an undefined behaviour.
inverse transition function - just make it a no-op. Heck, you could then even
make the forward transition function a no-op, since the very definition of
"undefined behaviour" is "result can be anything, including setting your house
on fire". The point is, it's *not* undefined - it's just imprecise. And the
imprecision can even be quantified, it just depends on the number of
input rows (the equal-sign requirement is mostly there to make "imprecision"
equivalent to "relative error").
>> > If it seems sound enough, then I may implement it in C to see how muchFirst, every additional function increases the maintenance burden, and at
>> > overhead it adds to forward aggregation for floating point types, but even
>> > if it did add too much overhead to forward aggregation it might be worth
>> > allowing aggregates to have 2 forward transition functions and if the 2nd
>> > one exists then it could be used in windowing functions where the frame
>> > does not have "unbounded following".
>>
>> I don't think adding yet another type of aggregation function is the
>> solution here.
>
> Why not?
>
> I thought, if in the cases where we need to burden the forward transition
> functions with extra code to make inverse transitions possible, then why
> not have an extra transition function so that it does not slow down normal
> aggregation?
>
> My testing of sum(numeric) last night does not show any slow down by that
> extra dscale tracking code that was added, but if it did then I'd be thinking
> seriously about 2 forward transition functions to get around the problem.
> I don't understand what would be wrong with that. The core code could just
> make use of the 2nd function if it existed and inverse transitions were
> thought to be required. i.e in a window context and does not
> have UNBOUNDED PRECEDING.
some point the expected benefit simply isn't worth it. IMHO at least.
Secondly, you'd really need a second aggregate definition - usually, the
non-invertible aggregate will get away with a much smaller state representation.
Aggregates with state type "internal" could hack their way around that by
simply not using the additional parts of their state structure, but e.g.
plpgsql aggregates would still incur overhead due to repeated copying of
a unnecessary large state value. If it's possible to represent the
forward-only but not the invertible state state with a copy-by-value type,
that overhead could easily be a large part of the total overhead you paid
for having an inverse transition function.
And lastly, we could achieve much of the benefit of a second transition
function by simply telling the forward transition function whether to
expect inverse transition function calls. We already expose things like
the aggregation memory context to C-language transition functions, so the
basic mechanism is already there. In fact, I pondered whether to do this -
but then figured I'd leave it to however needs that facility first. Though
it wouldn't be much code - basically, setting a flag in the WindowAggState,
and exporting a function to be used by aggregates to read it, similar
to what AggCheckCallContext does.
best regards,
Florian Pflug
David Rowley <dgrowleyml@gmail.com> writes: > My point was more that since sum(float) can give different results if it > used an index scan rather than a seq scan, trying to get the inverse > transition to match something that gives varying results sounds like a > tricky task to take on. This is just a variant of the same excuse we heard before. The question is not whether sum(float8) can give bad results; the question is whether we are going to break applications that are using it carefully (ie with an appropriate ORDER BY) and expecting to get good results. >> Secondly, you'd really need a second aggregate definition - usually, the >> non-invertible aggregate will get away with a much smaller state >> representation. Yeah. I think the idea of multiple transition functions in a single aggregate definition is pretty broken to start with, but the likelihood that they couldn't share aggregate state types puts it completely beyond sanity. We're not going to start inventing "stype2" etc. regards, tom lane
On Jan25, 2014, at 09:50 , David Rowley <dgrowleyml@gmail.com> wrote: > On Thu, Jan 23, 2014 at 1:57 PM, Florian Pflug <fgp@phlo.org> wrote: >> On Jan23, 2014, at 01:17 , David Rowley <dgrowleyml@gmail.com> wrote: >> > On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp@phlo.org> wrote: >> >> If you want to play with >> >> this, I think the first step has to be to find a set of guarantees that >> >> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the >> >> summands all have the same sign, the error is bound by C * N, where C is some >> >> (machine-specific?) constant (about 1e-15 or so), and N is the number of input >> >> rows. Or at least so I think from looking at SUMs over floats in descending >> >> order, which I guess is the worst case. You could, for example, try to see if >> >> you can find a invertibility conditions which guarantees the same, but allows >> >> C to be larger. That would put a bound on the number of digits lost by the new >> >> SUM(float) compared to the old one. >> > >> > I guess if the case is that normal (non-window) sum(float) results are undefined >> > unless you add an order by to the aggregate then I guess there is no possible >> > logic to put in for inverse transitions that will make them behave the same as >> > an undefined behaviour. >> >> Actually, if sum(float) was really undefined, it'd be *very* easy to provide an >> inverse transition function - just make it a no-op. Heck, you could then even >> make the forward transition function a no-op, since the very definition of >> "undefined behaviour" is "result can be anything, including setting your house >> on fire". The point is, it's *not* undefined - it's just imprecise. And the >> imprecision can even be quantified, it just depends on the number of >> input rows (the equal-sign requirement is mostly there to make "imprecision" >> equivalent to "relative error"). > > My apologies, I meant to use the term nondeterministic rather than undefined. > There's really not any need that I can see to turn things silly here. I wasn't trying to be absurd, I was trying to get a point across. > My point was more that since sum(float) can give different results if it used > an index scan rather than a seq scan, trying to get the inverse transition to > match something that gives varying results sounds like a tricky task to take on. You don't have to match it digit-by-digit! In that I fully agree with Kevin - floats are *always* an approximation, and so is thus SUM(float). Summarization order is BTW not the only source of non-determinism for SUM(float) - the exact result can very between architectures, and for some architectures between compilers. (Intel is one of these, due to their 80-bit extended precision format that gets truncated to 64-bit when stored to main memory). But in a large number of cases, they won't vary by *much*, which is *the* reason why SUM(float) is *not* totally useless. And reasoning about SUM(float) which ignores that by simply calling it "non-deterministic", "undefined" or whatever, without any quantification of the possible error, has thus zero chance of leading to interesting conclusions. >> Secondly, you'd really need a second aggregate definition - usually, the >> non-invertible aggregate will get away with a much smaller state representation. >> Aggregates with state type "internal" could hack their way around that by >> simply not using the additional parts of their state structure, but e.g. >> plpgsql aggregates would still incur overhead due to repeated copying of >> a unnecessary large state value. If it's possible to represent the >> forward-only but not the invertible state state with a copy-by-value type, >> that overhead could easily be a large part of the total overhead you paid >> for having an inverse transition function. > > I had imagined they keep the same state type and don't use any extra variables > that are defined for the forward transition function that is invertible. Yeah, and the above explains that at least for non-C-language aggregates, passing around that unnecessarily large state may very well prove to be the source of a large part, if not almost all, of the overhead. So having just a separate forward transition function will buy you almost nothing or some cases. I just tried this. I defined two aggregates mymax(int4) and myfastmax(int4), both with just a forward transition function, both SQL-language functions. mymax uses a composite type for the state containing an int4 field holding the current maximum, and a dummy int8 field. myfastmax uses a plain int4 for the state, holding the current maximum. Both forward transition functions essentially do case when current_max > v then current_max else v end On my laptop, computing the maximum of 1e6 rows takes about 4.5 seconds with myfastmax and 7.8 seconds with mymax. If make mymax's transition function increment the dummy field on every transition, the time increases from 7.8 to 8.2 seconds. So here, using a composite type for the state accounts for about 3.3 seconds, or 40%, of the total runtime of 9 seconds, whereas the increment costs about 0.4 seconds or 5% of the total runtime. >> And lastly, we could achieve much of the benefit of a second transition >> function by simply telling the forward transition function whether to >> expect inverse transition function calls. We already expose things like >> the aggregation memory context to C-language transition functions, so the >> basic mechanism is already there. In fact, I pondered whether to do this - >> but then figured I'd leave it to however needs that facility first. Though >> it wouldn't be much code - basically, setting a flag in the WindowAggState, >> and exporting a function to be used by aggregates to read it, similar >> to what AggCheckCallContext does. > > Sounds like a good idea, but what would the solution aggregate transition > functions that are not written in C? See above - non-C aggregates are exactly the ones that benefit least from a specialized non-invertible forward transition function, because the have to serialize and deserialize the state type on every transition... best regards, Florian Pflug
On Jan24, 2014, at 08:47 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:The invtrans_minmax patch doesn't contain any patches yet - David, couldyou provide some for these functions, and also for bool_and and bool_or?
We don't need to test every datatype, but a few would be nice.
David Rowley
On Jan26, 2014, at 00:24 , David Rowley <dgrowleyml@gmail.com> wrote: >> On Sat, Jan 25, 2014 at 3:21 PM, Florian Pflug <fgp@phlo.org> wrote: >> On Jan24, 2014, at 08:47 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> The invtrans_minmax patch doesn't contain any patches yet - David, could >> you provide some for these functions, and also for bool_and and bool_or? >> We don't need to test every datatype, but a few would be nice. > > I've added a few regression tests for min, min and bool_or and bool_and. > I've pushed these up to here: > > https://github.com/david-rowley/postgres/commits/invtrans_minmax OK, I've pushed this to github. I haven't produced new patches yet - I think it's probably better to wait for a few things to pile up, to make this less of a moving target. But ultimately, this is up to Dean - Dean, I'll do whatever makes your life as a reviewer easier. > I did wonder if you'd want to see uses of FILTER in there as I'm thinking > that should really be covered by the core patch and the tests here really > should just be checking the inverse transition functions for min and max. I don't mind the FILTER - when this gets committed, the distinction between what was in which patch goes away anyway. What I did realize when merging this is that we currently don't tests the case of a window which lies fully after the current row (i.e. BETWEEN N FOLLOWING AND m FOLLOWING). We do test BETWEEN n PRECEDING AND m PRECEDING, because the array_agg and string_agg tests do that. So maybe some of the MIN/MAX or arithmetic tests could exercise that case? > As for the data types tested, I ended just adding tests for int and text > for min and max. Let me know if you think that more should be tested. That's sufficient for the regression tests, I think. > As for bool_or and bool_and. I didn't think there was much extra that would > need tested after I added 1 test. It's pretty simple code and adding anything > extra seems like it would be testing something else. Sounds fine to me. best regards, Florian Pflug
On 25 January 2014 02:21, Florian Pflug <fgp@phlo.org> wrote: > I've now split this up into > > invtrans_base: Basic machinery plus tests with SQL-language aggregates > invtrans_arith: COUNT(),SUM(),AVG(),STDDEV() and the like > invtrans_minmax: MIN(),MAX(),BOOL_AND(),BOOL_OR() > invtrans_collecting: ARRAY_AGG(), STRING_AGG() > invtrans_docs: Documentation > Thanks. That makes it a bit easier to review, and hopefully easier to commit. The thing that I'm currently trying to get my head round is the null/not null, strict/not strict handling in this patch, which I find a bit odd, particularly when transfn and inv_transfn have different strictness settings: strict transfn vs non-strict inv_transfn ---------------------------------------- This case is explicitly forbidden, both in CREATE AGGREGATE and in the executor. To me, that seems overly restrictive --- if transfn is strict, then you know for sure that no NULL values are added to the aggregate state, so surely it doesn't matter whether or not inv_transfn is strict. It will never be asked to remove NULL values. I think there are definite use-cases where a user might want to use a pre-existing function as the inverse transition function, so it seems harsh to force them to wrap it in a strict function in this case. AFAICS, the code in advance/retreat_windowaggregate should just work if those checks are removed. non-strict transfn vs strict inv_transfn ---------------------------------------- At first this seems as though it must be an error --- the forwards transition function allows NULL values into the aggregate state, and the inverse transition function is strict, so it cannot remove them. But actually what the patch is doing in this case is treating the forwards transition function as partially strict --- it won't be passed NULL values (at least in a window context), but it may be passed a NULL state in order to build the initial state when the first non-NULL value arrives. It looks like this behaviour is intended to support aggregates that ignore NULL values, but cannot actually be made to have a strict transition function. I think typically this is because the aggregate's initial state is NULL and it's state type differs from the type of the values being aggregated, and so can't be automatically created from the first non-NULL value. That all seems quite ugly though, because now you have a transition function that is not strict, which is passed NULL values when used in a normal aggregate context, and not passed NULL values when used in a window context (whether sliding or not), except for the NULL state for the first non-NULL value. I'm not sure if there is a better way to do it though. If we disallow this case, these aggregates would have to use non-strict functions for both the forward and inverse transition functions, which means they would have to implement their own non-NULL value counting. Alternatively, allowing strict transition functions for these aggregates would require that we provide some other way to initialise the state from the first non-NULL input, such as a new initfn. Thoughts? Regards, Dean
On Jan27, 2014, at 23:28 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > strict transfn vs non-strict inv_transfn > ---------------------------------------- > > This case is explicitly forbidden, both in CREATE AGGREGATE and in the > executor. To me, that seems overly restrictive --- if transfn is > strict, then you know for sure that no NULL values are added to the > aggregate state, so surely it doesn't matter whether or not > inv_transfn is strict. It will never be asked to remove NULL values. > > I think there are definite use-cases where a user might want to use a > pre-existing function as the inverse transition function, so it seems > harsh to force them to wrap it in a strict function in this case. > > AFAICS, the code in advance/retreat_windowaggregate should just work > if those checks are removed. True. It didn't use to in earlier version of the patch because the advance and retreat functions looked at the strict settings directly, but now that there's an ignore_nulls flag in the executor node, the only requirement should be that ignore_nulls is set if either of the transition functions is strict. I'm not sure that the likelihood of someone wanting to combine a strict forward with a non-strict inverse function is very hight, but neither > non-strict transfn vs strict inv_transfn > ---------------------------------------- > > At first this seems as though it must be an error --- the forwards > transition function allows NULL values into the aggregate state, and > the inverse transition function is strict, so it cannot remove them. > > But actually what the patch is doing in this case is treating the > forwards transition function as partially strict --- it won't be > passed NULL values (at least in a window context), but it may be > passed a NULL state in order to build the initial state when the first > non-NULL value arrives. Exactly. > It looks like this behaviour is intended to support aggregates that > ignore NULL values, but cannot actually be made to have a strict > transition function. I think typically this is because the aggregate's > initial state is NULL and it's state type differs from the type of the > values being aggregated, and so can't be automatically created from > the first non-NULL value. Yes. I added this because the alternative would haven been to count non-NULL values in most of the existing SUM() aggregates. > That all seems quite ugly though, because now you have a transition > function that is not strict, which is passed NULL values when used in > a normal aggregate context, and not passed NULL values when used in a > window context (whether sliding or not), except for the NULL state for > the first non-NULL value. Ugh, true. Clearly, nodeAgg.c needs to follow what nodeWindowAgg.c does and skip NULL inputs if the aggregate has a strict inverse transition function. That fact that we never actually *call* the inverse doesn't matter. Will fix that once we decided on the various strictness issues. > I'm not sure if there is a better way to do it though. If we disallow > this case, these aggregates would have to use non-strict functions for > both the forward and inverse transition functions, which means they > would have to implement their own non-NULL value counting. > Alternatively, allowing strict transition functions for these > aggregates would require that we provide some other way to initialise > the state from the first non-NULL input, such as a new initfn. I'm not sure an initfn would really help. It would allow us to return to the initial requirement that the strict settings match - but you already deem the check that the forward transition function can only be strict if the inverse is also overly harsh, so would that really be an improvement? It's also cost us an additional pg_proc entry per aggregate. Another idea would be to have an explicit nulls=ignore|process option for CREATE AGGREGATE. If nulls=process and either of the transition functions are strict, we could either error out, or simply do what normal functions calls do and pretend they return NULL for NULL inputs. Not sure how the rule that forward transition functions may not return NULL if there's an inverse transition function would fit in if we do the latter, though. The question is - is it worth it the effort to add that flag? best regards, Florian Pflug
On 28 January 2014 20:16, Florian Pflug <fgp@phlo.org> wrote: > On Jan27, 2014, at 23:28 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> strict transfn vs non-strict inv_transfn >> ---------------------------------------- >> >> This case is explicitly forbidden, both in CREATE AGGREGATE and in the >> executor. To me, that seems overly restrictive --- if transfn is >> strict, then you know for sure that no NULL values are added to the >> aggregate state, so surely it doesn't matter whether or not >> inv_transfn is strict. It will never be asked to remove NULL values. >> >> I think there are definite use-cases where a user might want to use a >> pre-existing function as the inverse transition function, so it seems >> harsh to force them to wrap it in a strict function in this case. >> >> AFAICS, the code in advance/retreat_windowaggregate should just work >> if those checks are removed. > > True. It didn't use to in earlier version of the patch because the advance > and retreat functions looked at the strict settings directly, but now that > there's an ignore_nulls flag in the executor node, the only requirement > should be that ignore_nulls is set if either of the transition functions > is strict. > > I'm not sure that the likelihood of someone wanting to combine a strict > forward with a non-strict inverse function is very hight, but neither > Me neither, but the checks to forbid it aren't adding anything, and I think it's best to make it as flexible as possible. >> non-strict transfn vs strict inv_transfn >> ---------------------------------------- >> >> At first this seems as though it must be an error --- the forwards >> transition function allows NULL values into the aggregate state, and >> the inverse transition function is strict, so it cannot remove them. >> >> But actually what the patch is doing in this case is treating the >> forwards transition function as partially strict --- it won't be >> passed NULL values (at least in a window context), but it may be >> passed a NULL state in order to build the initial state when the first >> non-NULL value arrives. > > Exactly. > >> It looks like this behaviour is intended to support aggregates that >> ignore NULL values, but cannot actually be made to have a strict >> transition function. I think typically this is because the aggregate's >> initial state is NULL and it's state type differs from the type of the >> values being aggregated, and so can't be automatically created from >> the first non-NULL value. > > Yes. I added this because the alternative would haven been to count > non-NULL values in most of the existing SUM() aggregates. > >> That all seems quite ugly though, because now you have a transition >> function that is not strict, which is passed NULL values when used in >> a normal aggregate context, and not passed NULL values when used in a >> window context (whether sliding or not), except for the NULL state for >> the first non-NULL value. > > Ugh, true. Clearly, nodeAgg.c needs to follow what nodeWindowAgg.c does > and skip NULL inputs if the aggregate has a strict inverse transition > function. That fact that we never actually *call* the inverse doesn't > matter. Will fix that once we decided on the various strictness issues. > Yuk! >> I'm not sure if there is a better way to do it though. If we disallow >> this case, these aggregates would have to use non-strict functions for >> both the forward and inverse transition functions, which means they >> would have to implement their own non-NULL value counting. >> Alternatively, allowing strict transition functions for these >> aggregates would require that we provide some other way to initialise >> the state from the first non-NULL input, such as a new initfn. > > I'm not sure an initfn would really help. It would allow us to return > to the initial requirement that the strict settings match - but you > already deem the check that the forward transition function can only > be strict if the inverse is also overly harsh, so would that really be > an improvement? It's also cost us an additional pg_proc entry per aggregate. > > Another idea would be to have an explicit nulls=ignore|process option > for CREATE AGGREGATE. If nulls=process and either of the transition > functions are strict, we could either error out, or simply do what > normal functions calls do and pretend they return NULL for NULL inputs. > Not sure how the rule that forward transition functions may not return > NULL if there's an inverse transition function would fit in if we do > the latter, though. > > The question is - is it worth it the effort to add that flag? > Yeah, I thought about a flag too. I think that could work quite nicely. Basically where I'm coming from is trying to make this as flexible as possible, without pre-judging the kinds of aggregates that users may want to add. One use-case I had in mind upthread was suppose you wanted to write a custom version of array_agg that only collected non-NULL values. With such a flag, that would be trivial, but with the current patch you'd have to (count-intuitively) wrap the inverse transition function in a strict function. Another use-case I can imagine is suppose you wanted a custom version of sum() that returned NULL if any of the input values were NULL. If you knew that most of your data was non-NULL, you might still wish to benefit from an inverse transition function. Right now the patch won't allow that because of the error in advance_windowaggregate(), but possibly that could be relaxed by forcing a restart in that case. If I've understood correctly that should give similar to current performance if NULLs are present, and enhanced performance as the window moved over non-NULL data. In that second case, it would also be nice if you could simply re-use the existing sum forward and inverse transition functions, with a different null-handling flag. Also, in cases where the forwards transition function cannot be made strict (e.g., it's state type is internal) and there is no inverse transition function, there might be a small performance gain to be had from not calling the transition function with NULL values, rather than have it ignore them. So I think an ignore-nulls flag would have real benefits, as well as being a cleaner design than relying on a strict inverse transition function. What do others think? Regards, Dean
On Jan29, 2014, at 09:59 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 28 January 2014 20:16, Florian Pflug <fgp@phlo.org> wrote: >> On Jan27, 2014, at 23:28 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >>> This case is explicitly forbidden, both in CREATE AGGREGATE and in the >>> executor. To me, that seems overly restrictive --- if transfn is >>> strict, then you know for sure that no NULL values are added to the >>> aggregate state, so surely it doesn't matter whether or not >>> inv_transfn is strict. It will never be asked to remove NULL values. >>> >>> I think there are definite use-cases where a user might want to use a >>> pre-existing function as the inverse transition function, so it seems >>> harsh to force them to wrap it in a strict function in this case. >> >> I'm not sure that the likelihood of someone wanting to combine a strict >> forward with a non-strict inverse function is very hight, but neither >> > > Me neither, but the checks to forbid it aren't adding anything, and I > think it's best to make it as flexible as possible. Ok. >> Another idea would be to have an explicit nulls=ignore|process option >> for CREATE AGGREGATE. If nulls=process and either of the transition >> functions are strict, we could either error out, or simply do what >> normal functions calls do and pretend they return NULL for NULL inputs. >> Not sure how the rule that forward transition functions may not return >> NULL if there's an inverse transition function would fit in if we do >> the latter, though. >> >> The question is - is it worth it the effort to add that flag? > > One use-case I had in mind upthread was suppose you wanted to write a > custom version of array_agg that only collected non-NULL values. With > such a flag, that would be trivial, but with the current patch you'd > have to (count-intuitively) wrap the inverse transition function in a > strict function. I'd be more convinced by that if doing so was actually possible for non-superusers. But it isn't, because the aggregate uses "internal" as it's state type and it's thus entirely up to the user to not crash the database by mixing transfer and final functions with incompatible state data. Plus, instead of creating a custom aggregate, one can just use a FILTER clause to get rid of the NULL values. > Another use-case I can imagine is suppose you wanted a custom version > of sum() that returned NULL if any of the input values were NULL. If > you knew that most of your data was non-NULL, you might still wish to > benefit from an inverse transition function. Right now the patch won't > allow that because of the error in advance_windowaggregate(), but > possibly that could be relaxed by forcing a restart in that case. That's not really true - that patch only forbids that if you insist on representing the state "i have seen a NULL input" with a NULL state value. But if you instead just count the number of NULLS in your transition functions, all you need to do is to have your final function return NULL if that count is not zero. > If I've understood correctly that should give similar to current > performance if NULLs are present, and enhanced performance as the > window moved over non-NULL data. Exactly - and this makes defining a NULL-sensitive SUM() this way rather silly - a simple counter has very nearly zero overhead, and avoids all rescans. > In that second case, it would also be nice if you could simply re-use > the existing sum forward and inverse transition functions, with a > different null-handling flag. Even if we had a nulls=process|ignore flag, SUM's transition functions would still need to take that use-case into account explicitly to make this work - at the very least, the forward transition function would need to return NULL if the input is NULL instead of just skipping it as it does now. But that would leads to completely unnecessary rescans, so what we actually ought to do then is to make it track whether there have been NULL inputs and make the finalfunc return NULL in this case. Which would border on ironic, since the whole raison d'etre for this is to *avoid* spreading NULL-counting logic around... Plus, to make this actually useable, we'd have to document this, and tell people how to define such a SUM aggregate. But I don't want to go there - we really mustn't encourage people to mix-and-match built-in aggregates with state type "internal", since whether they work or crash depends on factors outside the control of the user. And finally, you can get that behaviour quite easily by doing CASE WHEN bool_and(input IS NOT NULL) whatever_agg(input) ELSE NULL END > So I think an ignore-nulls flag would have real benefits, as well as > being a cleaner design than relying on a strict inverse transition > function. The more I think about this, the less convinced I am. In fact, I'm currently leaning towards just forbidding non-strict forward transition function with strict inverses, and adding non-NULL counters to the aggregates that then require them. It's really only the SUM() aggregates that are affected by this, I think. For an argument in favour of that, look at how we handle *non*-NULL initial states. For these, even aggregates with nulls=ignore would require some tracking, because otherwise they'd simply return the initial state if the input contained no non-NULL value. Which quite probably isn't what you'd want - you'd usually want to return NULL in this case (as all the built-in aggregates do). Currently, all built-in aggregates with non-NULL initial states compute an expected value of some kind (average, std. deviation, regressions), and thus need to count the number of inputs anyway. But that's just coincidence - if it wasn't for state type "internal", having SUM start from 0 instead of NULL would be entirely reasonable. So if we have a nulls=ignore flag, we ought to have an nullif=only_nulls flag too to cover all the ground. But at that point, just making this the transition function's responsibility seems more sensible. best regards, Florian Pflug
On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote: > In fact, I'm > currently leaning towards just forbidding non-strict forward transition > function with strict inverses, and adding non-NULL counters to the > aggregates that then require them. It's really only the SUM() aggregates > that are affected by this, I think. I finally got around to doing that, and the results aren't too bad. The attached patches required that the strictness settings of the forward and reverse transition functions agree, and employ exactly the same NULL-skipping logic we always had. The only aggregates seriously affected by that change were SUM(int2) and SUM(int4). The SUM, AVG and STDDEV aggregates which use NumericAggState where already mostly prepared for this - all they required were a few adjustments to correctly handle the last non-NULL, non-NaN input being removed, and a few additional PG_ARGISNULL calls for the inverse transition functions since they're now non-strict. I've also modified them to unconditionally allocate the state at the first call, instead upon seeing the first non-NULL input, but that isn't strictly required. But without that, the state can have three classes of values - SQL-NULL, NULL pointer and valid pointer, and that's just confusing... SUM(int2) and SUM(int4) now simply use the same transition functions as AVG(int2) and AVG(int4), which use an int8 array to track the sum of the inputs and the number of inputs, plus a new final function int2int4_sum(). Previously, they used a single int8 as their state type. Since I was touching the code anyway, I removed some unnecessary inverse transition functions - namely, int8_avg_accum_inv and numeric_avg_accum_inv. These are completely identical to their non-avg cousins - the only difference between the corresponding forward transition functions is whether they request computation of sumX2 (i.e. the sum of squares of the inputs) or not. I haven't yet updated the docs - it'll do that if and when there's consensus about whether this is the way to go or not. best regards, Florian Pflug
Attachment
On Feb20, 2014, at 02:48 , Florian Pflug <fgp@phlo.org> wrote: > On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote: >> In fact, I'm >> currently leaning towards just forbidding non-strict forward transition >> function with strict inverses, and adding non-NULL counters to the >> aggregates that then require them. It's really only the SUM() aggregates >> that are affected by this, I think. > > I finally got around to doing that, and the results aren't too bad. The > attached patches required that the strictness settings of the forward and > reverse transition functions agree, and employ exactly the same NULL-skipping > logic we always had. > > The only aggregates seriously affected by that change were SUM(int2) and > SUM(int4). > > The SUM, AVG and STDDEV aggregates which use NumericAggState where > already mostly prepared for this - all they required were a few adjustments > to correctly handle the last non-NULL, non-NaN input being removed, and a few > additional PG_ARGISNULL calls for the inverse transition functions since they're > now non-strict. I've also modified them to unconditionally allocate the state > at the first call, instead upon seeing the first non-NULL input, but that isn't > strictly required. But without that, the state can have three classes of values - > SQL-NULL, NULL pointer and valid pointer, and that's just confusing... > > SUM(int2) and SUM(int4) now simply use the same transition functions as > AVG(int2) and AVG(int4), which use an int8 array to track the sum of the inputs > and the number of inputs, plus a new final function int2int4_sum(). Previously, > they used a single int8 as their state type. > > Since I was touching the code anyway, I removed some unnecessary inverse > transition functions - namely, int8_avg_accum_inv and numeric_avg_accum_inv. These > are completely identical to their non-avg cousins - the only difference between > the corresponding forward transition functions is whether they request computation > of sumX2 (i.e. the sum of squares of the inputs) or not. > > I haven't yet updated the docs - it'll do that if and when there's consensus > about whether this is the way to go or not. I realized only after posting this that the patches no longer apply to current HEAD. Attached are rebased patches. best regards, Florian Pflug
Attachment
On 20 February 2014 01:48, Florian Pflug <fgp@phlo.org> wrote: > On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote: >> In fact, I'm >> currently leaning towards just forbidding non-strict forward transition >> function with strict inverses, and adding non-NULL counters to the >> aggregates that then require them. It's really only the SUM() aggregates >> that are affected by this, I think. > > I finally got around to doing that, and the results aren't too bad. The > attached patches required that the strictness settings of the forward and > reverse transition functions agree, and employ exactly the same NULL-skipping > logic we always had. > > The only aggregates seriously affected by that change were SUM(int2) and > SUM(int4). > > The SUM, AVG and STDDEV aggregates which use NumericAggState where > already mostly prepared for this - all they required were a few adjustments > to correctly handle the last non-NULL, non-NaN input being removed, and a few > additional PG_ARGISNULL calls for the inverse transition functions since they're > now non-strict. I've also modified them to unconditionally allocate the state > at the first call, instead upon seeing the first non-NULL input, but that isn't > strictly required. But without that, the state can have three classes of values - > SQL-NULL, NULL pointer and valid pointer, and that's just confusing... > > SUM(int2) and SUM(int4) now simply use the same transition functions as > AVG(int2) and AVG(int4), which use an int8 array to track the sum of the inputs > and the number of inputs, plus a new final function int2int4_sum(). Previously, > they used a single int8 as their state type. > > Since I was touching the code anyway, I removed some unnecessary inverse > transition functions - namely, int8_avg_accum_inv and numeric_avg_accum_inv. These > are completely identical to their non-avg cousins - the only difference between > the corresponding forward transition functions is whether they request computation > of sumX2 (i.e. the sum of squares of the inputs) or not. > > I haven't yet updated the docs - it'll do that if and when there's consensus > about whether this is the way to go or not. > I haven't looked at this in any detail yet, but that seems much neater to me. It seems perfectly sensible that the forward and inverse transition functions should have the same strictness settings, and enforcing that keeps the logic simple, as well as hopefully making it easier to document. It's a shame that more transition functions cannot be made strict, when they actually ignore null values, but I think trying to solve that can be regarded as outside the scope of this patch. Regards, Dean
On Feb24, 2014, at 17:50 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 20 February 2014 01:48, Florian Pflug <fgp@phlo.org> wrote: >> On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote: >>> In fact, I'm >>> currently leaning towards just forbidding non-strict forward transition >>> function with strict inverses, and adding non-NULL counters to the >>> aggregates that then require them. It's really only the SUM() aggregates >>> that are affected by this, I think. >> >> I finally got around to doing that, and the results aren't too bad. The >> attached patches required that the strictness settings of the forward and >> reverse transition functions agree, and employ exactly the same NULL-skipping >> logic we always had. >> >> The only aggregates seriously affected by that change were SUM(int2) and >> SUM(int4). > > I haven't looked at this in any detail yet, but that seems much neater > to me. It seems perfectly sensible that the forward and inverse > transition functions should have the same strictness settings, and > enforcing that keeps the logic simple, as well as hopefully making it > easier to document. Good to hear that you agree! I'll try to find some time to update the docs. best regards, Florian Pflug
On 25 February 2014 12:33, Florian Pflug <fgp@phlo.org> wrote: > On Feb24, 2014, at 17:50 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> On 20 February 2014 01:48, Florian Pflug <fgp@phlo.org> wrote: >>> On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote: >>>> In fact, I'm >>>> currently leaning towards just forbidding non-strict forward transition >>>> function with strict inverses, and adding non-NULL counters to the >>>> aggregates that then require them. It's really only the SUM() aggregates >>>> that are affected by this, I think. >>> >>> I finally got around to doing that, and the results aren't too bad. The >>> attached patches required that the strictness settings of the forward and >>> reverse transition functions agree, and employ exactly the same NULL-skipping >>> logic we always had. >>> >>> The only aggregates seriously affected by that change were SUM(int2) and >>> SUM(int4). >> >> I haven't looked at this in any detail yet, but that seems much neater >> to me. It seems perfectly sensible that the forward and inverse >> transition functions should have the same strictness settings, and >> enforcing that keeps the logic simple, as well as hopefully making it >> easier to document. > > Good to hear that you agree! I'll try to find some time to update the docs. > I finally got round to looking at this in more detail. Sorry for the delay. Here is my more detailed review of the base patch. Overall, I think that it is in reasonable shape, and as I said I think the approach of enforcing matching strictness settings on the forward and inverse transition functions is much simpler and neater. I have a few comments, some cosmetic, and a couple more substantive: * In a couple of places: errmsg("stricness of forward and reverse transition functions must match") - misspelling: "stricness". - "reverse" should be "inverse" to match the terminology used elsewhere. * Grammatical error in the comment for lookup_agg_function() - you should drop the word "both". * In show_windowagg_info(), this calculation looks suspicious to me: double tperrow = winaggstate->aggfwdtrans / (inst->nloops * inst->ntuples); If the node is executed multiple times, aggfwdtrans will be reset in each loop, so the transitions per row figure will be under-estimated. ISTM that if you want to report on this, you'd need aggfwdtrans to be reset once per query, but I'm not sure exactly how to do that. Here's a test case: explain (verbose, analyse) select sum(i) over (rows between 4 preceding and current row) from generate_series(1, 10) i; which outputs 10 rows with an average of 1 transition per row, but doing the same window aggregate twice in a nested loop: explain (verbose, analyse) select * from (values (10), (10)) v(x), lateral (select sum(i) over (rows between 4 precedingand current row) from generate_series(1, x) i) t; outputs 20 rows, but only reports 0.5 transitions per row. Actually, I think it's misleading to only count forward transition function calls, because a call to the inverse transition function still represents a state transition, and is likely to be around the same cost. For a window of size 2, there would not be much advantage to using inverse transition functions, because it would be around 2 transitions per row either way. * The function comment for build_aggregate_fnexprs() needs to be updated to reference the inverse transition function. I'd also be tempted to have it allow invtransfnexpr be a NULL pointer, if the inverse transition function expression tree is not required. Then ExecInitAgg() could simply pass NULL, instead of having the local variable invtransfnexpr with the slightly cryptic comment "needed but never used". * In struct WindowStatePerAggData, I think you should change the field order to transfn_oid, invtransfn_oid and then finalfn_oid. It's only a small thing, but that's the order those 3 functions are referred to everywhere else. * In struct WindowStatePerAggData, the comment for transValueCount should read "number of aggregated values". * If AggCheckCallContext() is called from a window function, and it asks for an aggcontext, it will fail because calledaggno will be -1. That can't currently happen for any of our built-in window functions, and I'm not sure if it's likely to happen in the future, but I think it would be better to defend against that possibility just in case. So I think it ought to return the shared context in that case, as the original code would have done. * In advance_windowaggregate(), this code if (peraggstate->transfn.fn_strict) { is against the project style, which is to have curly braces on new lines. But also, that test condition is the same as the preceding block, so the 2 blocks could just be merged. * I was wondering about the case of a forward transition function returning NULL in the presence of an inverse transition function. In this patch there are 3 pieces of code that test for that: 1). advance_windowaggregate() errors out if the forward transition function returns NULL and there is an inverse transition function. 2). an Assert in advance_windowaggregate() fires if a prior call made the state NULL and there is an inverse transition function (should be impossible due to the above error). 3). retreat_windowaggregate() errors out if it sees a NULL state (which ought to be impossible due to both of the above). I find the resulting error "transition function with an inverse returned NULL" surprising. Why shouldn't a transition function return NULL if it wants to? It can if it's used as a regular aggregate, so it seems somewhat odd that it can't if it's used in a window context, and it has an inverse. Would it not be simpler and more flexible to just allow the forward transition function to return NULL, and treat a NULL state as non-invertible, requiring a restart. So delete check (1) and just allow the forward transition function to return NULL, delete Assert (2) so that it propagates a NULL state to the end as it would do in the absence of an inverse transition function, and modify check (3) to return false forcing a restart if the state is NULL. So then if the forward transition function did return NULL, the inverse transition function would not actually be called, and it would compute the answer the hard way, rather than erroring out. * In retreat_windowaggregate(), the comment before the check on transValueCount is missing a "be". * I think the function comment for eval_windowaggregates() should be updated to mention that it may also use an inverse transition function to remove aggregated data from a transition value. * I found the guts of eval_windowaggregates() a little hard to follow, although I think the code is correct. It could perhaps use a little tidying up. Here are a few ideas: - Maybe numaggs_restart would be better called numtorestart. - is_first isn't adding much, and it would probably be easier to read if it were eliminated by inlining it in the one place it is used. - Similarly I would move the variable "ok" to the block that uses it. - The variable pos could be eliminated by having the retreat loop increment winstate->aggregatedbase in each pass, which would better match its comment which says it updates aggregatedbase, which it currently doesn't do. For consistency, perhaps this loop should be written more in the style of the advance loop. The loop should ensure that, on exit, aggregatedbase is equal to frameheadpos. - aggregatedupto_nonrestarted is a bit of a mouthful, but I can't immediately think of a better name. However, there's one comment that refers to it as aggregatedupto_previous. - The comment starting "After this, non-restarted aggregated..." is a bit confusing. What the code following it really seems to be doing is more along the lines of "If there are aggregates to restart, rewind aggregatedupto back to frameheadpos so that we can re-aggregate those values in the aggregates to be restarted...". * The pg_dump modifications don't look correct to me. I didn't test it, but I think you need to modify the pre-9.4 queries to return "-" for agginvtransfn otherwise I think it will be NULL and will seg-fault if you dump a pre-9.4 database containing custom aggregates. * The last regression test in aggregates.sql should test for unequal strictness values, rather than just strict forward transition functions with non-strict inverses. I think that's everything for the base patch. Regards, Dean
On Mar2, 2014, at 20:39 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > * In a couple of places: > > errmsg("stricness of forward and reverse transition functions must match") > > - misspelling: "stricness". > - "reverse" should be "inverse" to match the terminology used elsewhere. Done. > * Grammatical error in the comment for lookup_agg_function() - you > should drop the word "both". Done. > * In show_windowagg_info(), this calculation looks suspicious to me: > > double tperrow = winaggstate->aggfwdtrans / > (inst->nloops * inst->ntuples); > > If the node is executed multiple times, aggfwdtrans will be reset in > each loop, so the transitions per row figure will be under-estimated. > ISTM that if you want to report on this, you'd need aggfwdtrans to be > reset once per query, but I'm not sure exactly how to do that. > > ... > > Actually, I think it's misleading to only count forward transition > function calls, because a call to the inverse transition function > still represents a state transition, and is likely to be around the > same cost. For a window of size 2, there would not be much advantage > to using inverse transition functions, because it would be around 2 > transitions per row either way. True. In fact, I pondered whether to avoid using the inverse transition function for windows of 2 rows. In the end, I didn't because I felt that it makes custom aggregates harder to test. On the question of whether to count inverse transition function calls - the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the number of state transitions, but rather to show whether the aggregation has O(n) or O(n^2) behaviour. The idea being that a value close to "1" means "inverse transition function works as expected", and larger values mean "not working so well". Regarding multiple evaluations - I think I based the behaviour on how ntuples works, which also only reports the value of the last evaluation I think. But maybe I'm confused about this. > * The function comment for build_aggregate_fnexprs() needs to be > updated to reference the inverse transition function. I'd also be > tempted to have it allow invtransfnexpr be a NULL pointer, if the > inverse transition function expression tree is not required. Then > ExecInitAgg() could simply pass NULL, instead of having the local > variable invtransfnexpr with the slightly cryptic comment "needed but > never used". Hm, I suggested that to David Rowley initially, and he wasn't particularly convinced - after all, there aren't *that* many callers of that function, and there isn't much overhead - we just set the value to NULL if an invalid OID is passed. But now that you brought it up too, it's two yea's agains one nay, so I changed it. > * In struct WindowStatePerAggData, I think you should change the field > order to transfn_oid, invtransfn_oid and then finalfn_oid. It's only a > small thing, but that's the order those 3 functions are referred to > everywhere else. Done. > * In struct WindowStatePerAggData, the comment for transValueCount > should read "number of aggregated values". Done. > * If AggCheckCallContext() is called from a window function, and it > asks for an aggcontext, it will fail because calledaggno will be -1. > That can't currently happen for any of our built-in window functions, > and I'm not sure if it's likely to happen in the future, but I think > it would be better to defend against that possibility just in case. So > I think it ought to return the shared context in that case, as the > original code would have done. I did it this way on purpose, because it was never actually safe to use the shared content from window functions - they won't know if and when that context is reset. I added a note to the function comment explaining that this function isn't meant to be used by true window functions > * In advance_windowaggregate(), this code > > if (peraggstate->transfn.fn_strict) { > > is against the project style, which is to have curly braces on new > lines. But also, that test condition is the same as the preceding > block, so the 2 blocks could just be merged. Done. The two conditions used to be different, I think, but requiring the strictness settings to match made them the same. > * I was wondering about the case of a forward transition function > returning NULL in the presence of an inverse transition function. In > this patch there are 3 pieces of code that test for that: > > 1). advance_windowaggregate() errors out if the forward transition > function returns NULL and there is an inverse transition function. > > 2). an Assert in advance_windowaggregate() fires if a prior call made > the state NULL and there is an inverse transition function (should be > impossible due to the above error). Yeah, that assert is just there to document that "yes, this cannot be". > 3). retreat_windowaggregate() errors out if it sees a NULL state > (which ought to be impossible due to both of the above). This too is simply meant to document "No worries, cannot be". > I find the resulting error "transition function with an inverse > returned NULL" surprising. Why shouldn't a transition function return > NULL if it wants to? It can if it's used as a regular aggregate, so it > seems somewhat odd that it can't if it's used in a window context, and > it has an inverse. Because transition function cannot use NULL as a state value freely if they have an inverse, since if the inverse returns NULL that means "sorry, cannot invert, please restart aggregation". > Would it not be simpler and more flexible to just allow the forward > transition function to return NULL, and treat a NULL state as > non-invertible, requiring a restart. So delete check (1) and just > allow the forward transition function to return NULL, delete Assert > (2) so that it propagates a NULL state to the end as it would do in > the absence of an inverse transition function, and modify check (3) to > return false forcing a restart if the state is NULL. So then if the > forward transition function did return NULL, the inverse transition > function would not actually be called, and it would compute the answer > the hard way, rather than erroring out. Though that would be weird too - if the inverse transition function is non-strict, we actually can *call* it with a NULL state, it just cannot really return a NULL state (well it can, but that means "restart the aggregation, please"). I don't think there's anything to be gained by allowing this, so I'd prefer if we don't. > * In retreat_windowaggregate(), the comment before the check on > transValueCount is missing a "be". Done. > * I think the function comment for eval_windowaggregates() should be > updated to mention that it may also use an inverse transition function > to remove aggregated data from a transition value. Done. > * I found the guts of eval_windowaggregates() a little hard to follow, > although I think the code is correct. It could perhaps use a little > tidying up. Here are a few ideas: > > - Maybe numaggs_restart would be better called numtorestart. Hm, I'm not particularly fond of that - it doesn't really emphasize the connection to numaggs. > - is_first isn't adding much, and it would probably be easier to read > if it were eliminated by inlining it in the one place it is used. Done. > > - Similarly I would move the variable "ok" to the block that uses it. Done. > > - The variable pos could be eliminated by having the retreat loop > increment winstate->aggregatedbase in each pass, which would better > match its comment which says it updates aggregatedbase, which it > currently doesn't do. For consistency, perhaps this loop should be > written more in the style of the advance loop. The loop should ensure > that, on exit, aggregatedbase is equal to frameheadpos. Done, mostly. The loops still look a bit different, but that's mostly because of the tuple-slot optimization in the advance loop (it re-uses the last row it fetched during the *previous* call to eval_windowaggregates if possible), and because the retreat loop already knows how far to retreat, while the advance loop figures that out as it goes. > - aggregatedupto_nonrestarted is a bit of a mouthful, but I can't > immediately think of a better name. However, there's one comment that > refers to it as aggregatedupto_previous. I don't have a better name either, so I kept the current name. > - The comment starting "After this, non-restarted aggregated..." is a > bit confusing. What the code following it really seems to be doing is > more along the lines of "If there are aggregates to restart, rewind > aggregatedupto back to frameheadpos so that we can re-aggregate those > values in the aggregates to be restarted...". I re-wrote that comment, and also moved the block that updates the tuplestore mark up above the restart-block. Thus, the block that updates aggregatedupto is now immediately before the advance loop, which hopefully makes the connection more obvious. > * The pg_dump modifications don't look correct to me. I didn't test > it, but I think you need to modify the pre-9.4 queries to return "-" > for agginvtransfn otherwise I think it will be NULL and will seg-fault > if you dump a pre-9.4 database containing custom aggregates. Hm, I haven't touched this code, it comes from David's original patch. I agree that it looks wrong - I've updated it per your suggestion, and checked that it dumps 9.3 and HEAD+patch correctly. > * The last regression test in aggregates.sql should test for unequal > strictness values, rather than just strict forward transition > functions with non-strict inverses. Done. That was a leftover from when only one case was rejected. Attached a new versions of all 5 patches. Only the base patch has actually changed, the others are just included for completeness' sake. best regards, Florian Pflug
Attachment
On 3 March 2014 23:00, Florian Pflug <fgp@phlo.org> wrote: >> * In show_windowagg_info(), this calculation looks suspicious to me: >> >> double tperrow = winaggstate->aggfwdtrans / >> (inst->nloops * inst->ntuples); >> >> If the node is executed multiple times, aggfwdtrans will be reset in >> each loop, so the transitions per row figure will be under-estimated. >> ISTM that if you want to report on this, you'd need aggfwdtrans to be >> reset once per query, but I'm not sure exactly how to do that. >> >> ... >> >> Actually, I think it's misleading to only count forward transition >> function calls, because a call to the inverse transition function >> still represents a state transition, and is likely to be around the >> same cost. For a window of size 2, there would not be much advantage >> to using inverse transition functions, because it would be around 2 >> transitions per row either way. > > True. In fact, I pondered whether to avoid using the inverse transition > function for windows of 2 rows. In the end, I didn't because I felt that > it makes custom aggregates harder to test. > > On the question of whether to count inverse transition function calls - > the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the > number of state transitions, but rather to show whether the aggregation > has O(n) or O(n^2) behaviour. The idea being that a value close to "1" > means "inverse transition function works as expected", and larger values > mean "not working so well". > > Regarding multiple evaluations - I think I based the behaviour on how > ntuples works, which also only reports the value of the last evaluation > I think. But maybe I'm confused about this. > No, it doesn't look like that's correct for multiple loops. Consider this example: explain (verbose, analyse) select * from (values (10), (20), (30), (40)) v(x), lateral (select sum(i) over (rows between4 preceding and current row) from generate_series(1, x) i) t; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=0.00..170.06 rows=4000 width=12) (actual time=0.027..0.414 rows=100 loops=1) Output: "*VALUES*".column1, (sum(i.i) OVER (?)) -> Values Scan on "*VALUES*" (cost=0.00..0.05rows=4 width=4) (actual time=0.002..0.006 rows=4 loops=1) Output: "*VALUES*".column1 -> WindowAgg (cost=0.00..22.50 rows=1000 width=4)(actual time=0.019..0.094 rows=25 loops=4) Output: sum(i.i) OVER (?) Transitions Per Row: 0.2 -> Function Scanon pg_catalog.generate_series i (cost=0.00..10.00 rows=1000 width=4) (actual time=0.010..0.015 rows=25 loops=4) Output: i.i Function Call: generate_series(1, "*VALUES*".column1) It turns out that show_windowagg_info() is only called once at the end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get 1, you'd have to use this formula: double tperrow = winaggstate->aggfwdtrans / inst->ntuples; I'm still not convinced that's the most useful thing to report though. Personally, I'd prefer to just see the separate counts, e.g.: -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual time=0.019..0.094 rows=25 loops=4) Output: sum(i.i) OVER (?) Forward transitions: 25 Inverse transitions:25 IMO that gives a clearer picture of what's going on. Thoughts? Regards, Dean
On Mar4, 2014, at 21:09 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 3 March 2014 23:00, Florian Pflug <fgp@phlo.org> wrote: >>> * In show_windowagg_info(), this calculation looks suspicious to me: >>> >>> double tperrow = winaggstate->aggfwdtrans / >>> (inst->nloops * inst->ntuples); >>> >>> If the node is executed multiple times, aggfwdtrans will be reset in >>> each loop, so the transitions per row figure will be under-estimated. >>> ISTM that if you want to report on this, you'd need aggfwdtrans to be >>> reset once per query, but I'm not sure exactly how to do that. >>> >>> ... >>> >>> Actually, I think it's misleading to only count forward transition >>> function calls, because a call to the inverse transition function >>> still represents a state transition, and is likely to be around the >>> same cost. For a window of size 2, there would not be much advantage >>> to using inverse transition functions, because it would be around 2 >>> transitions per row either way. >> >> True. In fact, I pondered whether to avoid using the inverse transition >> function for windows of 2 rows. In the end, I didn't because I felt that >> it makes custom aggregates harder to test. >> >> On the question of whether to count inverse transition function calls - >> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the >> number of state transitions, but rather to show whether the aggregation >> has O(n) or O(n^2) behaviour. The idea being that a value close to "1" >> means "inverse transition function works as expected", and larger values >> mean "not working so well". >> >> Regarding multiple evaluations - I think I based the behaviour on how >> ntuples works, which also only reports the value of the last evaluation >> I think. But maybe I'm confused about this. >> > > No, it doesn't look like that's correct for multiple loops. Consider > this example: > > ... > > It turns out that show_windowagg_info() is only called once at the > end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing > tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get > 1, you'd have to use this formula: > > double tperrow = winaggstate->aggfwdtrans / inst->ntuples; Hm, so I *was* confused - seems I mixed up ntuples (which counts the total number of tuples over all loops) with what we report as "rows" (i.e. the average number of rows per loop). Thanks for clearing that up! > I'm still not convinced that's the most useful thing to report though. > Personally, I'd prefer to just see the separate counts, e.g.: > > -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual > time=0.019..0.094 rows=25 loops=4) > Output: sum(i.i) OVER (?) > Forward transitions: 25 Inverse transitions: 25 > > IMO that gives a clearer picture of what's going on. My problem with this is that if there are multiple aggregates, the numbers don't really count the number of forward and reverse transfer function invocations. What nodeWindowAgg.c really counts is the number of *rows* it has to fetch from the tuple store to perform forward aggregations - the relevant code snippet is if (numaggs_restart > 0) winstate->aggfwdtrans += (winstate->aggregatedupto - winstate->frameheadpos);else winstate->aggfwdtrans += (winstate->aggregatedupto - aggregatedupto_nonrestarted); Now we could of course report these figures per aggregate instead of only once per aggregation node. But as I said earlier, my guess is that people usually won't be interested in the precise counts - most likely, what you want to know is "how much do the restarts cost me" When I added the EXPLAIN stuff, I initially simply reported the number of times nodeWindowAgg has to restart the aggregation. The problem with that approach is that not all restarts carry the same cost. If the frames either don't overlap at all or are identical, restarts don't cause any additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND m FOLLOWING), the performance effects of restarts depends on m-n. Which is why I made it count the number of aggregated input rows instead. Having said that, I' not really 100% happy with the name "Transitions Per Row" for this - it was simply the best I could come up with that was reasonably short. And I'm certainly open to reporting the absolute count instead of a factor relative to ntuples. If we made it an absolute count, would calling this "Aggregated Rows" work for you? best regards, Florian Pflug
On 5 March 2014 14:35, Florian Pflug <fgp@phlo.org> wrote: > On Mar4, 2014, at 21:09 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> On 3 March 2014 23:00, Florian Pflug <fgp@phlo.org> wrote: >>>> * In show_windowagg_info(), this calculation looks suspicious to me: >>>> >>>> double tperrow = winaggstate->aggfwdtrans / >>>> (inst->nloops * inst->ntuples); >>>> >>>> If the node is executed multiple times, aggfwdtrans will be reset in >>>> each loop, so the transitions per row figure will be under-estimated. >>>> ISTM that if you want to report on this, you'd need aggfwdtrans to be >>>> reset once per query, but I'm not sure exactly how to do that. >>>> >>>> ... >>>> >>>> Actually, I think it's misleading to only count forward transition >>>> function calls, because a call to the inverse transition function >>>> still represents a state transition, and is likely to be around the >>>> same cost. For a window of size 2, there would not be much advantage >>>> to using inverse transition functions, because it would be around 2 >>>> transitions per row either way. >>> >>> True. In fact, I pondered whether to avoid using the inverse transition >>> function for windows of 2 rows. In the end, I didn't because I felt that >>> it makes custom aggregates harder to test. >>> >>> On the question of whether to count inverse transition function calls - >>> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the >>> number of state transitions, but rather to show whether the aggregation >>> has O(n) or O(n^2) behaviour. The idea being that a value close to "1" >>> means "inverse transition function works as expected", and larger values >>> mean "not working so well". >>> >>> Regarding multiple evaluations - I think I based the behaviour on how >>> ntuples works, which also only reports the value of the last evaluation >>> I think. But maybe I'm confused about this. >>> >> >> No, it doesn't look like that's correct for multiple loops. Consider >> this example: >> >> ... >> >> It turns out that show_windowagg_info() is only called once at the >> end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing >> tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get >> 1, you'd have to use this formula: >> >> double tperrow = winaggstate->aggfwdtrans / inst->ntuples; > > Hm, so I *was* confused - seems I mixed up ntuples (which counts the > total number of tuples over all loops) with what we report as "rows" > (i.e. the average number of rows per loop). Thanks for clearing that up! > >> I'm still not convinced that's the most useful thing to report though. >> Personally, I'd prefer to just see the separate counts, e.g.: >> >> -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual >> time=0.019..0.094 rows=25 loops=4) >> Output: sum(i.i) OVER (?) >> Forward transitions: 25 Inverse transitions: 25 >> >> IMO that gives a clearer picture of what's going on. > > My problem with this is that if there are multiple aggregates, the > numbers don't really count the number of forward and reverse transfer > function invocations. What nodeWindowAgg.c really counts is the number > of *rows* it has to fetch from the tuple store to perform forward > aggregations - the relevant code snippet is > > if (numaggs_restart > 0) > winstate->aggfwdtrans += (winstate->aggregatedupto > - winstate->frameheadpos); > else > winstate->aggfwdtrans += (winstate->aggregatedupto > - aggregatedupto_nonrestarted); > > Now we could of course report these figures per aggregate instead of > only once per aggregation node. But as I said earlier, my guess is that > people usually won't be interested in the precise counts - most likely, > what you want to know is "how much do the restarts cost me" > The problem I have with the single "Transitions Per Row" figure, and the idea that a value close to 1.0 is supposed to be good, is that it's not really true. For example, with a window size of 2 and a "perfect" invertible aggregate, you'd get a value of 1.0, but with a non-invertible aggregate you'd get a value of 2.0, when actually it wouldn't do any better if it had an inverse. I think trying to represent this as a single number is too simplistic. > When I added the EXPLAIN stuff, I initially simply reported the number > of times nodeWindowAgg has to restart the aggregation. The problem with > that approach is that not all restarts carry the same cost. If the frames > either don't overlap at all or are identical, restarts don't cause any > additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND > m FOLLOWING), the performance effects of restarts depends on m-n. > > Which is why I made it count the number of aggregated input rows instead. > > Having said that, I' not really 100% happy with the name > "Transitions Per Row" for this - it was simply the best I could come up with > that was reasonably short. And I'm certainly open to reporting the absolute > count instead of a factor relative to ntuples. > > If we made it an absolute count, would calling this "Aggregated Rows" work > for you? > I'm not sure about naming, but I think my preference would be to output the correct absolute counts for both the forward and inverse transitions (i.e. multiply by the right combinations of numaggs and numaggs_restart). The EXPLAIN output already tells you how many aggregates there are, and how many rows there are, so you'd be able to work out from that how much extra work it's doing. I think we really need a larger consensus on this though, so I'd be interested to hear what others think. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > I think we really need a larger consensus on this though, so I'd be > interested to hear what others think. My advice is to lose the EXPLAIN output entirely. If the authors of the patch can't agree on what it means, what hope have everyday users got of making sense of it? regards, tom lane
<p dir="ltr">Tom,<p dir="ltr">I did not follow the thread very close, so I need to look what the ambiguity is there, howeverI would love to see window rescans in explain analyze.<br /><p dir="ltr">I have great experience in tuning Oraclequeries.<br /> There are features in PostgreSQL's explain analyze that I miss badly in Oracle: 'rows removed by filter'is my favourite one. Improving explain analyze is great for performance analysis.<p dir="ltr">I would say target audiencefor 'explain analyze' is not all the users, but someone closer to 'performance engineers'. Those beings are usedto triple-check results and build/validate hypothesis since all the counters tend to lie, so 'a bit misleading counter'is not a showstopper.<p dir="ltr">I did not think of 'non-being-able-to-use-negative-transition-since-floats-do-not-commute'before.<br /> Thanks to this discussion I see whatkind of dragons live here.<p dir="ltr">I would vote (if I had any vote at all) for the inclusion of performance statisticsto explain analyze (e.g. number of rescans and number of rows fed to aggregate) provided performance impact istolerable in both regular and explain analyze mode.<p dir="ltr">I wonder how Oracle handles negative transition (does it?),however that is a different discussion.<p dir="ltr">Regards,<br /> Vladimir Sitnikov
On Mar5, 2014, at 18:37 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> I think we really need a larger consensus on this though, so I'd be >> interested to hear what others think. > > My advice is to lose the EXPLAIN output entirely. If the authors of > the patch can't agree on what it means, what hope have everyday users > got of making sense of it? The question isn't what the current output means, but whether it's a good metric to report or not. If we don't report anything, then how would a user check whether a query is slow because of O(n^2) behaviour of a windowed aggregate, or because of some other reasons? If inevitability where a purely static property, then maybe we could get away with that, and say "check whether your aggregates are invertible or not". But since we have partially invertible aggregates, the performance characteristics depends on the input data, so we IMHO need some way for users to check what's actually happening. best regards, Florian Pflug
On Mar5, 2014, at 18:27 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 5 March 2014 14:35, Florian Pflug <fgp@phlo.org> wrote: >> When I added the EXPLAIN stuff, I initially simply reported the number >> of times nodeWindowAgg has to restart the aggregation. The problem with >> that approach is that not all restarts carry the same cost. If the frames >> either don't overlap at all or are identical, restarts don't cause any >> additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND >> m FOLLOWING), the performance effects of restarts depends on m-n. >> >> Which is why I made it count the number of aggregated input rows instead. >> >> Having said that, I' not really 100% happy with the name >> "Transitions Per Row" for this - it was simply the best I could come up with >> that was reasonably short. And I'm certainly open to reporting the absolute >> count instead of a factor relative to ntuples. >> >> If we made it an absolute count, would calling this "Aggregated Rows" work >> for you? > > I'm not sure about naming, but I think my preference would be to > output the correct absolute counts for both the forward and inverse > transitions (i.e. multiply by the right combinations of numaggs and > numaggs_restart). The EXPLAIN output already tells you how many > aggregates there are, and how many rows there are, so you'd be able to > work out from that how much extra work it's doing. Hm, if we do that we might as well go all the way and simply report these numbers per aggregate, instead of once per window aggregation node. That'd provide the maximum amount of information, and be quite unambiguous. best regards, Florian Pflug
On Wed, Mar 5, 2014 at 10:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > [ shrug... ] They can see whether the Window plan node is where the time > is going. It's not apparent to me that the extra numbers you propose to > report will edify anybody. Perhaps just saying "Incremental Window Function" versus "Iterated Window Function" or something like that be sufficient? At least that way query tuning quidelines have a keyword they can say to watch out for. And someone trying to figure out *why* the time is being spent in this node has something they might notice a correlation with. -- greg
Florian Pflug <fgp@phlo.org> writes: > On Mar5, 2014, at 18:37 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My advice is to lose the EXPLAIN output entirely. If the authors of >> the patch can't agree on what it means, what hope have everyday users >> got of making sense of it? > The question isn't what the current output means, but whether it's a > good metric to report or not. If you can't agree, then it isn't. > If we don't report anything, then how would a user check whether a query > is slow because of O(n^2) behaviour of a windowed aggregate, or because > of some other reasons? [ shrug... ] They can see whether the Window plan node is where the time is going. It's not apparent to me that the extra numbers you propose to report will edify anybody. regards, tom lane
On Mar5, 2014, at 23:49 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florian Pflug <fgp@phlo.org> writes: >> On Mar5, 2014, at 18:37 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> My advice is to lose the EXPLAIN output entirely. If the authors of >>> the patch can't agree on what it means, what hope have everyday users >>> got of making sense of it? > >> The question isn't what the current output means, but whether it's a >> good metric to report or not. > > If you can't agree, then it isn't. Probably, yes, so let's find something that *is* a good metric. (BTW, it's not the authors who disagree here. It was me who put the EXPLAIN feature in, and Dean reviewed it and found it confusing. The original author David seems to run out of time to work on this, and AFAIK hasn't weighted in on that particular feature at all) >> If we don't report anything, then how would a user check whether a query >> is slow because of O(n^2) behaviour of a windowed aggregate, or because >> of some other reasons? > > [ shrug... ] They can see whether the Window plan node is where the time > is going. It's not apparent to me that the extra numbers you propose to > report will edify anybody. By the same line of reasoning, every metric except execution time is superfluous. Comparing execution times really is a horrible way to measure this - not only does it include all kinds of thing that have nothing to do with the restart behaviour of aggregates, you'd also have to construct a base-line query first which is guaranteed to not restart. best regards, Florian Pflug
On Mar5, 2014, at 23:49 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florian Pflug <fgp@phlo.org> writes: >> On Mar5, 2014, at 18:37 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> My advice is to lose the EXPLAIN output entirely. If the authors of >>> the patch can't agree on what it means, what hope have everyday users >>> got of making sense of it? > >> The question isn't what the current output means, but whether it's a >> good metric to report or not. > > If you can't agree, then it isn't. Probably, yes, so let's find something that *is* a good metric. (BTW, it's not the authors who disagree here. It was me who put the EXPLAIN feature in, and Dean reviewed it and found it confusing. The original author David seems to run out of time to work on this, and AFAIK hasn't weighted in on that particular feature at all) >> If we don't report anything, then how would a user check whether a query >> is slow because of O(n^2) behaviour of a windowed aggregate, or because >> of some other reasons? > > [ shrug... ] They can see whether the Window plan node is where the time > is going. It's not apparent to me that the extra numbers you propose to > report will edify anybody. By the same line of reasoning, every metric except execution time is superfluous. Comparing execution times really is a horrible way to measure this - not only does it include all kinds of thing that have nothing to do with the restart behaviour of aggregates, you'd also have to construct a base-line query first which is guaranteed to not restart. best regards, Florian Pflug
Dean Rasheed <dean.a.rasheed@gmail.com> writes:My advice is to lose the EXPLAIN output entirely. If the authors of
> I think we really need a larger consensus on this though, so I'd be
> interested to hear what others think.
the patch can't agree on what it means, what hope have everyday users
got of making sense of it?
regards, tom lane
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > I've attached an updated invtrans_strictstrict_base patch which has the > feature removed. What is the state of play on this patch? Is the latest version what's in http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org plus this sub-patch? Is everybody reasonably happy with it? I don't see it marked "ready for committer" in the CF app, but time is running out. regards, tom lane
David Rowley <dgrowleyml@gmail.com> writes:What is the state of play on this patch? Is the latest version what's in
> I've attached an updated invtrans_strictstrict_base patch which has the
> feature removed.
http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org
plus this sub-patch? Is everybody reasonably happy with it? I don't
see it marked "ready for committer" in the CF app, but time is running
out.
regards, tom lane
On 26 March 2014 19:43, David Rowley <dgrowleyml@gmail.com> wrote: > On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> David Rowley <dgrowleyml@gmail.com> writes: >> > I've attached an updated invtrans_strictstrict_base patch which has the >> > feature removed. >> >> What is the state of play on this patch? Is the latest version what's in >> >> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org >> plus this sub-patch? Is everybody reasonably happy with it? I don't >> see it marked "ready for committer" in the CF app, but time is running >> out. >> > > As far as I know the only concern left was around the extra stats in the > explain output, which I removed in the patch I attached in the previous > email. > Agreed. That was my last concern regarding the base patch, and I agree that removing the new explain output is probably the best course of action, given that we haven't reached consensus as to what the most useful output would be. > The invtrans_strictstrict_base.patch in my previous email replaces the > invtrans_strictstrict_base_038070.patch in that Florian sent here > http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org > all of the other patches are unchanged so it's save to use Florian's latest > ones > > Perhaps Dean can confirm that there's nothing else outstanding? > Florian mentioned upthread that the docs hadn't been updated to reflect the latest changes, so I think they need a little attention. Regards, Dean
First, sorry guys for letting this slide - I was overwhelmed by other works, and this kind of slipped my mind :-( On Mar27, 2014, at 09:04 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 26 March 2014 19:43, David Rowley <dgrowleyml@gmail.com> wrote: >> On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> >>> David Rowley <dgrowleyml@gmail.com> writes: >>>> I've attached an updated invtrans_strictstrict_base patch which has the >>>> feature removed. >>> >>> What is the state of play on this patch? Is the latest version what's in >>> >>> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org >>> plus this sub-patch? Is everybody reasonably happy with it? I don't >>> see it marked "ready for committer" in the CF app, but time is running >>> out. >>> >> >> As far as I know the only concern left was around the extra stats in the >> explain output, which I removed in the patch I attached in the previous >> email. >> > > Agreed. That was my last concern regarding the base patch, and I agree > that removing the new explain output is probably the best course of > action, given that we haven't reached consensus as to what the most > useful output would be. After re-reading the thread, I'd prefer to go with Dean's suggestion, i.e. simply reporting the total number of invocations of the forward transition functions, and the total number of invocations of the reverse transition function, over reporting nothing. The labels of the two counts would simply be "Forward Transitions" and "Reverse Transitions". But I don't want this issue to prevent us from getting this patch into 9.4, so if there are objections to this, I'll rip out the EXPLAIN stuff all together. >> The invtrans_strictstrict_base.patch in my previous email replaces the >> invtrans_strictstrict_base_038070.patch in that Florian sent here >> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org >> all of the other patches are unchanged so it's save to use Florian's latest >> ones >> >> Perhaps Dean can confirm that there's nothing else outstanding? >> > > Florian mentioned upthread that the docs hadn't been updated to > reflect the latest changes, so I think they need a little attention. I'll see to updating the docs, and will post a final patch within the next few days. Dean, have you by chance looked at the other patches yet? best regards, Florian Pflug
On 27 March 2014 21:01, Florian Pflug <fgp@phlo.org> wrote: > First, sorry guys for letting this slide - I was overwhelmed by other works, > and this kind of slipped my mind :-( > > On Mar27, 2014, at 09:04 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> On 26 March 2014 19:43, David Rowley <dgrowleyml@gmail.com> wrote: >>> On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> >>>> David Rowley <dgrowleyml@gmail.com> writes: >>>>> I've attached an updated invtrans_strictstrict_base patch which has the >>>>> feature removed. >>>> >>>> What is the state of play on this patch? Is the latest version what's in >>>> >>>> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org >>>> plus this sub-patch? Is everybody reasonably happy with it? I don't >>>> see it marked "ready for committer" in the CF app, but time is running >>>> out. >>>> >>> >>> As far as I know the only concern left was around the extra stats in the >>> explain output, which I removed in the patch I attached in the previous >>> email. >>> >> >> Agreed. That was my last concern regarding the base patch, and I agree >> that removing the new explain output is probably the best course of >> action, given that we haven't reached consensus as to what the most >> useful output would be. > > After re-reading the thread, I'd prefer to go with Dean's suggestion, i.e. > simply reporting the total number of invocations of the forward transition > functions, and the total number of invocations of the reverse transition > function, over reporting nothing. The labels of the two counts would simply > be "Forward Transitions" and "Reverse Transitions". > That should be "Inverse" not "Reverse" according to the terminology agreed upthread. Personally, I'm not a big fan of that terminology because "forward" and "inverse" aren't natural antonyms. But actually I think that it's "forward" that is the wrong word to use, because they actually both move (different ends of) the frame forwards. The only alternatives I can think of are "direct" and "inverse", which are natural antonyms, but I don't want to hold up this patch bikeshedding over this. OTOH this is not the first time on this thread that someone has slipped into calling them "forward" and "reverse" transitions. > But I don't want this issue to prevent us from getting this patch into 9.4, > so if there are objections to this, I'll rip out the EXPLAIN stuff all > together. > >>> The invtrans_strictstrict_base.patch in my previous email replaces the >>> invtrans_strictstrict_base_038070.patch in that Florian sent here >>> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org >>> all of the other patches are unchanged so it's save to use Florian's latest >>> ones >>> >>> Perhaps Dean can confirm that there's nothing else outstanding? >>> >> >> Florian mentioned upthread that the docs hadn't been updated to >> reflect the latest changes, so I think they need a little attention. > > I'll see to updating the docs, and will post a final patch within the next > few days. > > Dean, have you by chance looked at the other patches yet? > No, sorry. I too have been swamped by other work. I will try to look at them over the next few days. I don't anticipate that they will be as complex as the base patch, so I hope that this can be finished in time for 9.4. If there are any other reviewers with spare cycles, feel free to jump in too. Regards, Dean
On Mar28, 2014, at 08:58 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 27 March 2014 21:01, Florian Pflug <fgp@phlo.org> wrote: >> After re-reading the thread, I'd prefer to go with Dean's suggestion, i.e. >> simply reporting the total number of invocations of the forward transition >> functions, and the total number of invocations of the reverse transition >> function, over reporting nothing. The labels of the two counts would simply >> be "Forward Transitions" and "Reverse Transitions". >> > > That should be "Inverse" not "Reverse" according to the terminology > agreed upthread. Ups, yeah. > Personally, I'm not a big fan of that terminology because "forward" > and "inverse" aren't natural antonyms. But actually I think that it's > "forward" that is the wrong word to use, because they actually both > move (different ends of) the frame forwards. The only alternatives I > can think of are "direct" and "inverse", which are natural antonyms, > but I don't want to hold up this patch bikeshedding over this. OTOH > this is not the first time on this thread that someone has slipped > into calling them "forward" and "reverse" transitions. Head and tail come to mind. But having said that, I'd strongly prefer not to start bikeschedding over this either. It's not user-visible enough to fuss about it... Also, for order-independent transfer functions, it really *is* an inverse functions, in the sense that g(f(x,a),a) = x. >>> Florian mentioned upthread that the docs hadn't been updated to >>> reflect the latest changes, so I think they need a little attention. >> >> I'll see to updating the docs, and will post a final patch within the next >> few days. Attached are updated patches that include the EXPLAIN changes mentioned above and updated docs. best regards, Florian Pflug
Attachment
On 31 March 2014 01:58, Florian Pflug <fgp@phlo.org> wrote: > Attached are updated patches that include the EXPLAIN changes mentioned > above and updated docs. > These patches need re-basing --- they no longer apply to HEAD. Regards, Dean
On Apr1, 2014, at 10:08 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 31 March 2014 01:58, Florian Pflug <fgp@phlo.org> wrote: >> Attached are updated patches that include the EXPLAIN changes mentioned >> above and updated docs. >> > > These patches need re-basing --- they no longer apply to HEAD. Rebased to head (554bb3beba27bf4a49edecc40f6c0f249974bc7c). I had to re-assign OIDs in the dependent paches again (those which actually add the various inverse transition functions). best regards, Florian Pflug
Attachment
On 1 April 2014 20:58, Florian Pflug <fgp@phlo.org> wrote: > On Apr1, 2014, at 10:08 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> On 31 March 2014 01:58, Florian Pflug <fgp@phlo.org> wrote: >>> Attached are updated patches that include the EXPLAIN changes mentioned >>> above and updated docs. >>> >> >> These patches need re-basing --- they no longer apply to HEAD. > > Rebased to head (554bb3beba27bf4a49edecc40f6c0f249974bc7c). I had to > re-assign OIDs in the dependent paches again (those which actually > add the various inverse transition functions). > Looking at the new explain output, that is not exactly what I was suggesting upthread. In particular, in cases where the WindowAgg node is executed more than once, I think that the reported transition counts should be the averages per-execution of the node. That way the number of transition function calls reported for the node is directly comparable with its "rows" value. Thus I think the output should be divided by nloops, which would be more consistent with the way other similar values are reported in explain output (c.f. show_instrumentation_count). I started looking at the "arith" patch, and I got as far as looking at the changes to count(*) and count(val), which seem reasonable. But then I started testing performance, and I found cases where the improvement is not nearly what I expected. The example cited at the start of this thread is indeed orders of magnitude faster than HEAD: SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM generate_series(1,20000) g(n); This executes in 20ms on my box, vs 30sec on HEAD, which reflects the change from an O(n^2) to an O(n) algorithm. However, if both ends of the frame move, the benefits are far less impressive: SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING) FROM generate_series(1,20000) g(n); -- 33ms SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 100 FOLLOWING) FROM generate_series(1,20000) g(n); -- 173ms SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 1000 FOLLOWING) FROM generate_series(1,20000) g(n); -- 1467ms SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10000 FOLLOWING) FROM generate_series(1,20000) g(n); -- 7709ms This is still exhibiting the behaviour of an O(n^2) algorithm. The problem lies in window_gettupleslot() which steps one row at a time from the last position to the new required position. So if both ends of the frame are moving, it needs to step forwards and backwards through the entire window, for each row processed - hence the O(n^2) behaviour. Looking at window_gettupleslot, there is an obvious potential mirco-optimisation that can be made if there are multiple rows to be skipped --- instead of calling tuplestore_gettupleslot() multiple times, call tuplestore_advance() multiple times, then call tuplestore_gettupleslot() once to fetch the final required tuple, thus avoiding a lot of unnecessary tuple copying. That of course won't eliminate the O(n^2) behaviour, but it will reduce the overall factor, and so is probably worth doing. However, to fix the fundamental O(n^2) problem, I think there needs to be separate tuplestore read pointers for the head and tail of the frame. There may also be a case for another read pointer for the current row too, and possibly one for general purpose user-triggered fetches. One approach might be to have up to a small number N (say 3 or 4) of read pointers, with window_gettupleslot() automatically choosing the one nearest to the requested row. Possibly there are better approaches. I think a little more investigation is needed. I'm not sure how much additional work is required to sort this out, but to me it looks more realistic to target 9.5 than 9.4, so at this point I tend to think that the patch ought to be marked as returned with feedback. Thoughts? Regards, Dean
>> ), which seem reasonable. But > then I started testing performance, and I found cases where the > improvement is not nearly what I expected. > > The example cited at the start of this thread is indeed orders of > magnitude faster than HEAD: > > SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) > F > > > > > > > I'm not sure how much additional work is required to sort this out, > but to me it looks more realistic to target 9.5 than 9.4, so at this > point I tend to think that the patch ought to be marked as returned > with feedback. > > Thoughts? > > Regards, > Dean
> On 04.04.2014, at 09:40, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > I'm not sure how much additional work is required to sort this out, > but to me it looks more realistic to target 9.5 than 9.4, so at this > point I tend to think that the patch ought to be marked as returned > with feedback. I think the patch is worthwhile, even without this additional optimization. In fact, If the optimization was part of thepatch, there would probably be calls to factor it out, on the ground that the patch is already rather large. I don't see what bumping the whole thing to 9.5 buys, compared do applying what we have now, and optimizing in 9.5 further. best regards, Florian Pflug PS: Sorry for the broken mail I sent earlier - miss-touched on my Phone ;-(
On 2014-04-04 12:56:55 +0200, Florian Pflug wrote: > > > On 04.04.2014, at 09:40, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > > I'm not sure how much additional work is required to sort this out, > > but to me it looks more realistic to target 9.5 than 9.4, so at this > > point I tend to think that the patch ought to be marked as returned > > with feedback. > > I think the patch is worthwhile, even without this additional > optimization. In fact, If the optimization was part of the patch, > there would probably be calls to factor it out, on the ground that the > patch is already rather large. > > I don't see what bumping the whole thing to 9.5 buys, compared do > applying what we have now, and optimizing in 9.5 further. From my POV applying this patch can't be considered a very high priority for 9.4x. It came *really* late to the game for a relatively complex patch. A significant portion of the development only happened *after* the start of the last commitfest. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 4 April 2014 11:56, Florian Pflug <fgp@phlo.org> wrote: > >> On 04.04.2014, at 09:40, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> >> I'm not sure how much additional work is required to sort this out, >> but to me it looks more realistic to target 9.5 than 9.4, so at this >> point I tend to think that the patch ought to be marked as returned >> with feedback. > Just doing the first optimisation I recommended (which I pessimistically referred to as a "micro-optimisation") actually gives up to a 4x performance boost relative to the current patch for the queries above: SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING) FROM generate_series(1,20000) g(n); -- 20ms (was 33ms) SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 100 FOLLOWING) FROM generate_series(1,20000) g(n); -- 54ms (was 173ms) SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 1000 FOLLOWING) FROM generate_series(1,20000) g(n); -- 368ms (was 1467ms) SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10000 FOLLOWING) FROM generate_series(1,20000) g(n); -- 1866ms (was 7709ms) See the attached patch, which applies on top of yours but is actually independent of it. IMO, this doesn't go far enough though. We should be aiming to eliminate the O(n^2) behaviour completely and have all the above queries take roughly the same amount of time. > I think the patch is worthwhile, even without this additional optimization. In fact, If the optimization was part of thepatch, there would probably be calls to factor it out, on the ground that the patch is already rather large. > > I don't see what bumping the whole thing to 9.5 buys, compared do applying what we have now, and optimizing in 9.5 further. > The problem with the current state of the patch is that we're going to a lot of effort to add this new inverse aggregate function, documenting it's benefits and revealing via EXPLAIN how it can reduce by several orders of magnitude the number of transition function calls, but then not giving a commensurate performance boost unless the window is UNBOUNDED FOLLOWING. That's going to be awkward to explain to users, and so releasing it in this state feels a little half-baked to me. Regards, Dean
Attachment
On 5 April 2014 08:38, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > [snip] releasing it in this state feels a little half-baked > to me. > I regret writing that almost as soon as I sent it. The last of those queries is now over 10 times faster than HEAD, which is certainly worthwhile. What bugs me is that there is so much more potential in this patch. I think it's pretty close to being committable, but I fear that time is running out for 9.4. Regards, Dean
On Apr5, 2014, at 09:55 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 5 April 2014 08:38, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> [snip] releasing it in this state feels a little half-baked >> to me. >> > > I regret writing that almost as soon as I sent it. The last of those > queries is now over 10 times faster than HEAD, which is certainly > worthwhile. What bugs me is that there is so much more potential in > this patch. Well, the perfect is the enemy of the good as they say. By all means, let's get rid of the O(n^2) behaviour in 9.5, but let's get a basic version into 9.4. > I think it's pretty close to being committable, but I fear that time > is running out for 9.4. I'm not aware of any open issues in the basic patch, other then scaling the reported calling statistics by nloops, which should be trivial. I'll post an updated patch later today. I don't really expect all the add-on patches to make it into 9.4 - they don't seem to have gotten much attention yet - but at least the inverse transition functions for the basic arithmetic aggregates should be doable I hope. best regards, Florian Pflug
On 7 April 2014 14:09, Florian Pflug <fgp@phlo.org> wrote: > On Apr5, 2014, at 09:55 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> On 5 April 2014 08:38, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >>> [snip] releasing it in this state feels a little half-baked >>> to me. >>> >> >> I regret writing that almost as soon as I sent it. The last of those >> queries is now over 10 times faster than HEAD, which is certainly >> worthwhile. What bugs me is that there is so much more potential in >> this patch. > > Well, the perfect is the enemy of the good as they say. By all means, > let's get rid of the O(n^2) behaviour in 9.5, but let's get a basic > version into 9.4. > >> I think it's pretty close to being committable, but I fear that time >> is running out for 9.4. > > I'm not aware of any open issues in the basic patch, other then > scaling the reported calling statistics by nloops, which should be > trivial. I'll post an updated patch later today. > > I don't really expect all the add-on patches to make it into 9.4 - > they don't seem to have gotten much attention yet - but at least > the inverse transition functions for the basic arithmetic aggregates > should be doable I hope. > I've just finished reading through all the other patches, and they all look OK to me. It's mostly straightforward stuff, so despite the size it's hopefully all committable once the base patch goes in. I think that you're probably right that optimising window_gettupleslot() to eliminate the O(n^2) behaviour can be left to a later patch --- the existing performance benefits of this patch are enough to justify its inclusion IMO. It would be nice to include the trivial optimisation to window_gettupleslot() that I posted upthread, since it gives such a big improvement for only a few lines of code changed. If you post a new patch set, I'll mark it as ready for committer attention. Regards, Dean
On Apr7, 2014, at 17:41 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > I've just finished reading through all the other patches, and they all > look OK to me. It's mostly straightforward stuff, so despite the size > it's hopefully all committable once the base patch goes in. Hm, I'm starting to have second thoughts about the minmax patch. The inverse transition functions for MIN and MAX have a non-trivial probability of failure - they trigger a rescan whenever the value that is removed isn't strictly smaller (respectively strictly larger) then the current maximum (respectively minimum). Thus, whenever that happens, we both call the inverse transition function *and* (since it fails) restart the aggregation. For windows based on ROWS, this isn't too bad - even if we fail every second time, we still avoid half the rescans, which should be a net win if the average window size is > 2. But for RANGE-based windows, more than one call of the inverse transition function must succeed for us to save anything, since we must successfully remove *all* peers to avoid one restart. This greatly increases the chance that using the inverse transition function hurts rather then helps - the situation is worse the larger the average number of peers is. I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch invtrans_bool - it previously was part of invtrans_minmax. Given the performance risk involved, I think that we probably shouldn't commit invtrans_minmax at this time. I should have brought this up earlier, but the issue had slipped my mind :-( Sorry for that. > I think that you're probably right that optimising > window_gettupleslot() to eliminate the O(n^2) behaviour can be left to > a later patch --- the existing performance benefits of this patch are > enough to justify its inclusion IMO. It would be nice to include the > trivial optimisation to window_gettupleslot() that I posted upthread, > since it gives such a big improvement for only a few lines of code > changed. Agreed, but since it's independent from the rest of the base patch, I kept it as a separate patch (invtrans_optimize) instead of merging it into the base patch. It should probably be committed separately too. > If you post a new patch set, I'll mark it as ready for committer attention. New patch set is attached. The only difference to the previous one is that "Forward Transitions" and "Inverse Transitions" are now scaled with nloops, and that it includes your window_gettupleslot patch under the name invtrans_optimize. Your nested loop query explain (verbose, analyse) select * from (values (10), (20), (30), (40)) v(x), lateral (select sum(i) over (rows between 4 preceding and current row) from generate_series(1, x) i) t now produces QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..170.06 rows=4000 width=12) (actual time=0.092..0.257 rows=100 loops=1) Output: "*VALUES*".column1, (sum(i.i) OVER (?)) -> Values Scan on "*VALUES*" (cost=0.00..0.05 rows=4 width=4) (actual time=0.003..0.007 rows=4 loops=1) Output: "*VALUES*".column1 -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual time=0.027..0.055 rows=25 loops=4) Output: sum(i.i) OVER (?) Forward Transitions: 25.0 Inverse Transitions: 20.0 -> Function Scan on pg_catalog.generate_series i (cost=0.00..10.00 rows=1000 width=4) (actual time=0.013..0.015 rows=25loops=4) Output: i.i Function Call: generate_series(1, "*VALUES*".column1) Planning time: 0.359 ms Total runtime: 0.544 ms The patch dependencies are as follows: invtrans_{optimize,docs) are independent from the rest invtrans_{arith,bool,minmax,collecting} are pairwise independent and all depend on invtrans_base. As explain above, invtrans_bool is a bit problematic, since it carries a real risk of performance regressions. It's included for completeness' sake, and should probably not be committed at this time. invtrans_optimize was authored by Dean Rasheed. best regards, Florian Pflug
Attachment
On Apr7, 2014, at 17:41 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > I've just finished reading through all the other patches, and they all > look OK to me. It's mostly straightforward stuff, so despite the size > it's hopefully all committable once the base patch goes in. Hm, I'm starting to have second thoughts about the minmax patch. The inverse transition functions for MIN and MAX have a non-trivial probability of failure - they trigger a rescan whenever the value that is removed isn't strictly smaller (respectively strictly larger) then the current maximum (respectively minimum). Thus, whenever that happens, we both call the inverse transition function *and* (since it fails) restart the aggregation. For windows based on ROWS, this isn't too bad - even if we fail every second time, we still avoid half the rescans, which should be a net win if the average window size is > 2. But for RANGE-based windows, more than one call of the inverse transition function must succeed for us to save anything, since we must successfully remove *all* peers to avoid one restart. This greatly increases the chance that using the inverse transition function hurts rather then helps - the situation is worse the larger the average number of peers is. I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch invtrans_bool - it previously was part of invtrans_minmax. Given the performance risk involved, I think that we probably shouldn't commit invtrans_minmax at this time. I should have brought this up earlier, but the issue had slipped my mind :-( Sorry for that. > I think that you're probably right that optimising > window_gettupleslot() to eliminate the O(n^2) behaviour can be left to > a later patch --- the existing performance benefits of this patch are > enough to justify its inclusion IMO. It would be nice to include the > trivial optimisation to window_gettupleslot() that I posted upthread, > since it gives such a big improvement for only a few lines of code > changed. Agreed, but since it's independent from the rest of the base patch, I kept it as a separate patch (invtrans_optimize) instead of merging it into the base patch. It should probably be committed separately too. > If you post a new patch set, I'll mark it as ready for committer attention. New patch set is attached. The only difference to the previous one is that "Forward Transitions" and "Inverse Transitions" are now scaled with nloops, and that it includes your window_gettupleslot patch under the name invtrans_optimize. Your nested loop query explain (verbose, analyse) select * from (values (10), (20), (30), (40)) v(x), lateral (select sum(i) over (rows between 4 preceding and current row) from generate_series(1, x) i) t now produces QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..170.06 rows=4000 width=12) (actual time=0.092..0.257 rows=100 loops=1) Output: "*VALUES*".column1, (sum(i.i) OVER (?)) -> Values Scan on "*VALUES*" (cost=0.00..0.05 rows=4 width=4) (actual time=0.003..0.007 rows=4 loops=1) Output: "*VALUES*".column1 -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual time=0.027..0.055 rows=25 loops=4) Output: sum(i.i) OVER (?) Forward Transitions: 25.0 Inverse Transitions: 20.0 -> Function Scan on pg_catalog.generate_series i (cost=0.00..10.00 rows=1000 width=4) (actual time=0.013..0.015 rows=25loops=4) Output: i.i Function Call: generate_series(1, "*VALUES*".column1) Planning time: 0.359 ms Total runtime: 0.544 ms The patch dependencies are as follows: invtrans_{optimize,docs) are independent from the rest invtrans_{arith,bool,minmax,collecting} are pairwise independent and all depend on invtrans_base. As explain above, invtrans_bool is a bit problematic, since it carries a real risk of performance regressions. It's included for completeness' sake, and should probably not be committed at this time. invtrans_optimize was authored by Dean Rasheed. best regards, Florian Pflug
Attachment
As explain above, invtrans_bool is a bit problematic, since it carries
a real risk of performance regressions. It's included for completeness'
sake, and should probably not be committed at this time.
On Apr9, 2014, at 02:55 , David Rowley <dgrowleyml@gmail.com> wrote: > On Wed, Apr 9, 2014 at 8:48 AM, Florian Pflug <fgp@phlo.org> wrote: > > As explain above, invtrans_bool is a bit problematic, since it carries > a real risk of performance regressions. It's included for completeness' > sake, and should probably not be committed at this time. > > Did you mean to write invtrans_minmax? Otherwise you didn't explain about > you concerns with bool. Grmpf. Should have re-read that once more before sending :-( Yes, I meant invtrans_minmax is problematic! invtrans_bool is fine, the inverse transition function never fails for BOOL_AND and BOOL_OR. This is why I factored it out into a separate patch, to make it easy to not apply the MIN/MAX stuff, while still applying the BOOL stuff. Sorry for the confision. best regards, Florian Pflug
On 8 April 2014 21:48, Florian Pflug <fgp@phlo.org> wrote: > On Apr7, 2014, at 17:41 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> I've just finished reading through all the other patches, and they all >> look OK to me. It's mostly straightforward stuff, so despite the size >> it's hopefully all committable once the base patch goes in. > > Hm, I'm starting to have second thoughts about the minmax patch. The > inverse transition functions for MIN and MAX have a non-trivial probability > of failure - they trigger a rescan whenever the value that is removed isn't > strictly smaller (respectively strictly larger) then the current maximum > (respectively minimum). Thus, whenever that happens, we both call the > inverse transition function *and* (since it fails) restart the aggregation. > > For windows based on ROWS, this isn't too bad - even if we fail every second > time, we still avoid half the rescans, which should be a net win if the > average window size is > 2. > > But for RANGE-based windows, more than one call of the inverse transition > function must succeed for us to save anything, since we must successfully > remove *all* peers to avoid one restart. This greatly increases the chance > that using the inverse transition function hurts rather then helps - the > situation is worse the larger the average number of peers is. > Argh, I hadn't really considered that case. I suppose any imperfect inverse transition function has the potential to make performance worse rather than better. But working out the likelihood of that isn't necessarily straightforward. It might be possible to include some sort of heuristic based on the known information --- the number of rows P in the peer group about to be removed vs the total number N of rows aggregated so far. If the data were fairly random, then a quick back-of-the-envelope calculation suggests that trying the inverse min/max functions would be worthwhile on average if P were less than around 0.4N, but of course different data distributions could make that much worse. Even a perfect inverse transition function isn't going to be much use if P > N/2 (e.g., imagine a case where the peer groups decrease in size exponentially), so perhaps we should be including such a check anyway. That's also assuming that the cost of the inverse transition function is about the same as the cost of the forward function, which is not necessarily the case. Perhaps imperfect inverse transition functions should be assigned a higher cost, and that should be factored into the decision as to whether they are likely to be worth trying. All that feels very speculative though, and I think it's too late to be considering that for 9.4, so yes, let's leave out the min/max aggregates for now. > I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch > invtrans_bool - it previously was part of invtrans_minmax. Given the > performance risk involved, I think that we probably shouldn't commit > invtrans_minmax at this time. I should have brought this up earlier, but > the issue had slipped my mind :-( Sorry for that. > >> I think that you're probably right that optimising >> window_gettupleslot() to eliminate the O(n^2) behaviour can be left to >> a later patch --- the existing performance benefits of this patch are >> enough to justify its inclusion IMO. It would be nice to include the >> trivial optimisation to window_gettupleslot() that I posted upthread, >> since it gives such a big improvement for only a few lines of code >> changed. > > Agreed, but since it's independent from the rest of the base patch, > I kept it as a separate patch (invtrans_optimize) instead of merging it > into the base patch. It should probably be committed separately too. > It would be good to commit at least the "base", "arith" and "optimize" patches for 9.4. I think the "collecting" and "bool" patches are also committable, but I also suspect that those aggregates are less well used, so they could be considered lower priority. >> If you post a new patch set, I'll mark it as ready for committer attention. > > New patch set is attached. The only difference to the previous one is that > "Forward Transitions" and "Inverse Transitions" are now scaled with nloops, > and that it includes your window_gettupleslot patch under the name > invtrans_optimize. > OK, I'm marking this ready for committer attention, on the understanding that that doesn't include the invtrans_minmax patch. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > OK, I'm marking this ready for committer attention, on the > understanding that that doesn't include the invtrans_minmax patch. I've started to look at this patch set. After rereading the thread, I'm thinking that it's a mistake to just add the inverse transition function and require it to work with the standard forward transition function for the aggregate. There was discussion upthread of providing two separate forward transition functions, but Florian argued that that would do nothing that you couldn't accomplish with a runtime check in the forward function. I think that's nonsense though, because one of the key points here is that an invertible aggregate may require a more complex transition state data structure --- in particular, if you're forced to go from a pass-by-value to a pass-by-reference data type, right there you are going to take a big hit in aggregate performance, and there is no way for the forward transition function to avoid it. The patch has in fact already done that to a couple of basic aggregates like sum(int4). Has anyone bothered to test what side-effects that has on non-windowed aggregation performance? I think what'd make sense is to have a separate forward function *and* separate state datatype to be used when we want invertible aggregation (hm, and I guess a separate final function too). So an aggregate definition would include two entirely independent implementations. If they chance to share sfunc and stype, fine, but they don't have to. This would mean we'd need some new names for the doppelgangers of the CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond (but not sortop). I guess that'd open up a chance to use a more uniform naming scheme, but I'm not too sure what would be good. Comments? regards, tom lane
I wrote: > ... an invertible aggregate may require a more > complex transition state data structure --- in particular, if you're > forced to go from a pass-by-value to a pass-by-reference data type, right > there you are going to take a big hit in aggregate performance, and there > is no way for the forward transition function to avoid it. The patch > has in fact already done that to a couple of basic aggregates like > sum(int4). Has anyone bothered to test what side-effects that has on > non-windowed aggregation performance? As a quick check, I compared aggregation performance in HEAD, non-assert builds, with and without --disable-float8-byval on a 64-bit machine. So this tests replacing a pass-by-val transition datatype with a pass-by-ref one without any other changes. There's essentially no difference in performance of sum(int4), AFAICT, but that's because int4_sum goes out of its way to cheat and avoid palloc overhead. I looked to the bit_and() aggregates to see what would happen to an aggregate not thus optimized. As expected, int4 and int8 bit_and are just about the same speed if int8 is pass by value ... but if it's pass by ref, the int8 case is a good 60% slower. So added palloc overhead, at least, is a no-go. I see that the patched version of sum(int4) avoids that trap, but nonetheless it's replaced a pretty cheap transition function with a less cheap function, namely the function previously used for avg(int4). A quick test says that avg(int4) is about five percent slower than sum(int4), so that's the kind of hit we'd be taking on non-windowed aggregations if we do it like this. regards, tom lane
On Apr9, 2014, at 21:35 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > As a quick check, I compared aggregation performance in HEAD, non-assert > builds, with and without --disable-float8-byval on a 64-bit machine. > So this tests replacing a pass-by-val transition datatype with a > pass-by-ref one without any other changes. There's essentially no > difference in performance of sum(int4), AFAICT, but that's because > int4_sum goes out of its way to cheat and avoid palloc overhead. > I looked to the bit_and() aggregates to see what would happen to > an aggregate not thus optimized. As expected, int4 and int8 bit_and > are just about the same speed if int8 is pass by value ... but if it's > pass by ref, the int8 case is a good 60% slower. True, but that just means that aggregate transition functions really ought to update the state in-place, no? > So added palloc overhead, at least, is a no-go. I see that the patched > version of sum(int4) avoids that trap, but nonetheless it's replaced a > pretty cheap transition function with a less cheap function, namely the > function previously used for avg(int4). A quick test says that avg(int4) > is about five percent slower than sum(int4), so that's the kind of hit > we'd be taking on non-windowed aggregations if we do it like this. That's rather surprising though. AFAICS, there's isn't much difference between the two transfer functions int4_sum and int4_avg_accum at all. The differences come down to (+ denoting things which ought to make int4_avg_accum slower compared to int4_sum, - denoting the opposite) 1. +) int4_avg_accum calls AggCheckCallContext2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum)3.+) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS to verify that the state is a 2-elementarray without NULL entries4. -) int4_sum checks if the input is NULL The number of conditional branches should be about the same (and all are seldomly taken). The validity checks on the state array, i.e. (3), should be rather cheap I think - not quite as cheap as PG_ARGISNULL maybe, but not so much more expensive either. That leaves the AggCheckCallContext call. If that call costs us 5%, maybe we can find a way to make it faster, or get rid of it entirely? Still seems a lot of a call of a not-very-complex function, though... I'll go and check the disassembly - maybe something in int4_avg_accum turns out to be more complex than is immediately obvious. I'll also try to create a call profile, unless you already have one from your test runs. best regards, Florian Pflug
On Apr9, 2014, at 20:20 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > There was discussion upthread of providing > two separate forward transition functions, but Florian argued that that > would do nothing that you couldn't accomplish with a runtime check in > the forward function. I think that's nonsense though, because one of > the key points here is that an invertible aggregate may require a more > complex transition state data structure --- in particular, if you're > forced to go from a pass-by-value to a pass-by-reference data type, right > there you are going to take a big hit in aggregate performance, and there > is no way for the forward transition function to avoid it. To be precise, my point was that *only* having a separate non-invertible forward transition function is pointless, exactly because of the reason you gave - that won't allow a cheaper state representation for the non-invertible case. So all such a non-invertible forward transition function can do is to skip some bookkeeping that's required by the inverse transition function - and *that* can just as easily be accomplished by a runtime check. I was (and still am) not in favour of duplicating the whole quadruple of (state, initialvalue, transferfunction, finalfunction) because it seems excessive. In fact, I believed that doing this would probably be grounds for outright rejection of the patch, on the base of catalog bloat. And your initial response to this suggestion seemed to confirm this. > The patch has in fact already done that to a couple of basic aggregates like > sum(int4). Has anyone bothered to test what side-effects that has on > non-windowed aggregation performance? I'm pretty sure David Rowley did some benchmarking. The results should be in this thread somewhere I think, but they currently evade me... Maybe David can re-post, if he's following this... > I think what'd make sense is to have a separate forward function *and* > separate state datatype to be used when we want invertible aggregation > (hm, and I guess a separate final function too). So an aggregate > definition would include two entirely independent implementations. > If they chance to share sfunc and stype, fine, but they don't have to. > > This would mean we'd need some new names for the doppelgangers of the > CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond > (but not sortop). I guess that'd open up a chance to use a more uniform > naming scheme, but I'm not too sure what would be good. If we really go down that road (and I'm far from convinced), then maybe instead of having a bunch of additional fields, we could have separate entries in pg_aggregate for the two cases, with links between them. The non-invertible aggregates would have something like "_noninv" or so appended to their name, and we'd automatically use them if we know we won't need to remove entries from the aggregation state. That would also allow the users to *force* the non-invertible aggregate to be used by simply saying "SUM_NONINV" instead of "SUM". Then all we'd need would be an additional OID field that links the invertible to the non-invertible definition. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > I was (and still am) not in favour of duplicating the whole quadruple of > (state, initialvalue, transferfunction, finalfunction) because it seems > excessive. In fact, I believed that doing this would probably be grounds for > outright rejection of the patch, on the base of catalog bloat. And your > initial response to this suggestion seemed to confirm this. Well, I think it's much more likely that causing a performance penalty for cases unrelated to window aggregates would lead to outright rejection :-(. The majority of our users probably don't ever use window functions, but for sure they've heard of SUM(). We can't penalize the non-window case. Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this suggestion) is a little annoying but it doesn't sound like a show stopper. It seems reasonable to assume that the extra initval would be NULL in most cases, so it's probably a net addition of 12 bytes per row. > On Apr9, 2014, at 20:20 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The patch has in fact already done that to a couple of basic aggregates like >> sum(int4). Has anyone bothered to test what side-effects that has on >> non-windowed aggregation performance? > I'm pretty sure David Rowley did some benchmarking. The results should be > in this thread somewhere I think, but they currently evade me... Maybe David > can re-post, if he's following this... I saw benchmarks addressing window aggregation, but none looking for side-effects on plain aggregation. > If we really go down that road (and I'm far from convinced), then maybe > instead of having a bunch of additional fields, we could have separate > entries in pg_aggregate for the two cases, with links between them. That seems like a complete mess; in particular it would break the primary key for pg_aggregate (aggfnoid), and probably break every existing query that looks at pg_aggregate. Some extra fields would not break such expectations (in fact we've added fields to pg_aggregate in the past). regards, tom lane
On Apr9, 2014, at 23:17 , Florian Pflug <fgp@phlo.org> wrote: > On Apr9, 2014, at 21:35 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A quick test says that avg(int4) >> is about five percent slower than sum(int4), so that's the kind of hit >> we'd be taking on non-windowed aggregations if we do it like this. > > That's rather surprising though. AFAICS, there's isn't much difference > between the two transfer functions int4_sum and int4_avg_accum at all. > > The differences come down to (+ denoting things which ought to make > int4_avg_accum slower compared to int4_sum, - denoting the opposite) > > 1. +) int4_avg_accum calls AggCheckCallContext > 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum) > 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS > to verify that the state is a 2-element array without NULL entries > 4. -) int4_sum checks if the input is NULL I've done a bit of profiling on this (using Instruments.app on OSX). One thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly, since we know that the datum cannot possibly be toasted I think (or if it was, nodeAgg.c should do the de-toasting). The profile also attributes a rather large percent of the total runtime of int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting rid of that would help quite a few transition functions, invertible or not. That certainly seems doable - we'd need a way to mark functions as internal support functions, and prevent user-initiated calls of such functions. Transition functions marked that way could then safely scribble over their state arguments without having to consult AggCheckCallContext() first. I've also compared the disassemblies of int4_sum and int4_avg_accum, and apart from these differences, they are pretty similar. Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's transition type is that it needs to return NULL, not 0, if zero rows were aggregates. It might seems that it could just use int8 as state with initial value NULL, but that only works if the transition functions are non-strict (since the special case of state type == input type doesn't apply here). And for non-strict transition functions need to deal with NULL inputs themselves, which means counting the number of non-NULL inputs.. That problem would go away though if we had support for an initalfunction, which would receive the first input value and initialize the state. In the case of SUM(), the initial function would be strict, and thus would be called on the first non-NULL input value, which it'd convert to int8 and return as the new state. However, I still believe the best approach at this point is to just work on making int4_avg_accum faster. I still see no principal reason what it has to be noticeably slower - the only additional work it absolutely *has* to perform is *one* 64-bit increment. best regards, Florian Pflug
On 9 April 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florian Pflug <fgp@phlo.org> writes: >> I was (and still am) not in favour of duplicating the whole quadruple of >> (state, initialvalue, transferfunction, finalfunction) because it seems >> excessive. In fact, I believed that doing this would probably be grounds for >> outright rejection of the patch, on the base of catalog bloat. And your >> initial response to this suggestion seemed to confirm this. > > Well, I think it's much more likely that causing a performance penalty for > cases unrelated to window aggregates would lead to outright rejection :-(. > The majority of our users probably don't ever use window functions, but > for sure they've heard of SUM(). We can't penalize the non-window case. > > Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this > suggestion) is a little annoying but it doesn't sound like a show stopper. > It seems reasonable to assume that the extra initval would be NULL in most > cases, so it's probably a net addition of 12 bytes per row. > >> On Apr9, 2014, at 20:20 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The patch has in fact already done that to a couple of basic aggregates like >>> sum(int4). Has anyone bothered to test what side-effects that has on >>> non-windowed aggregation performance? > >> I'm pretty sure David Rowley did some benchmarking. The results should be >> in this thread somewhere I think, but they currently evade me... Maybe David >> can re-post, if he's following this... > > I saw benchmarks addressing window aggregation, but none looking for > side-effects on plain aggregation. > >> If we really go down that road (and I'm far from convinced), then maybe >> instead of having a bunch of additional fields, we could have separate >> entries in pg_aggregate for the two cases, with links between them. > > That seems like a complete mess; in particular it would break the primary > key for pg_aggregate (aggfnoid), and probably break every existing query > that looks at pg_aggregate. Some extra fields would not break such > expectations (in fact we've added fields to pg_aggregate in the past). > This may initially sound unrelated, but I think it might address some of these issues. Suppose we added a 'firsttrans' function, that took a single argument (the first value to be aggregated) and was responsible for creating the initial state from that first value. This would apply to aggregates that ignore null values, but whose transition function cannot currently be declared strict (either because the state type is internal, or because it is not the same as the aggregate's argument type). I think quite a lot of the existing aggregates fall into this category, and this would allow their transition functions to be made strict and simplified --- no more testing if the state is null, and then building it, and no more testing if the argument is null and ignoring it. That might give a noticeable performance boost in the regular aggregate case, especially over data containing nulls. But in addition, it would help with writing inverse transition functions because if the transition functions could be made strict, they wouldn't need to do null-counting, which would mean that their state types would not need to be expanded. So for example sum(int4) could continue to have int8 as its state type, it could use int8(int4) as its firsttrans function, and int4_sum() could become strict and lose all its null-handling logic. Then int4_sum_inv() would be the trivial to write - just doing the same in reverse. I'm not sure it helps for all aggregates, but there are certainly some where it would seem to simplify things. Regards, Dean
On 10 April 2014 01:13, Florian Pflug <fgp@phlo.org> wrote: > Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's > transition type is that it needs to return NULL, not 0, if zero rows > were aggregates. It might seems that it could just use int8 as state > with initial value NULL, but that only works if the transition functions > are non-strict (since the special case of state type == input type doesn't > apply here). And for non-strict transition functions need to deal with > NULL inputs themselves, which means counting the number of non-NULL inputs.. > > That problem would go away though if we had support for an initalfunction, > which would receive the first input value and initialize the state. In > the case of SUM(), the initial function would be strict, and thus would be > called on the first non-NULL input value, which it'd convert to int8 and > return as the new state. > Ah snap! > However, I still believe the best approach at this point is to just work > on making int4_avg_accum faster. I still see no principal reason what it > has to be noticeably slower - the only additional work it absolutely *has* > to perform is *one* 64-bit increment. > In the best case that would make sum() not noticeably slower than avg(), whereas using a firsttrans/initialfunction would potentially make both of them faster than they currently are, and not just in window queries. Also, IMO a first/initial function leads to a cleaner separation of logic, allowing some of the transition functions to be simplified rather than becoming more complex. Regards, Dean
> I'm pretty sure David Rowley did some benchmarking. The results should beI saw benchmarks addressing window aggregation, but none looking for
> in this thread somewhere I think, but they currently evade me... Maybe David
> can re-post, if he's following this...
side-effects on plain aggregation.
Florian Pflug <fgp@phlo.org> writes:Well, I think it's much more likely that causing a performance penalty for
> I was (and still am) not in favour of duplicating the whole quadruple of
> (state, initialvalue, transferfunction, finalfunction) because it seems
> excessive. In fact, I believed that doing this would probably be grounds for
> outright rejection of the patch, on the base of catalog bloat. And your
> initial response to this suggestion seemed to confirm this.
cases unrelated to window aggregates would lead to outright rejection :-(.
The majority of our users probably don't ever use window functions, but
for sure they've heard of SUM(). We can't penalize the non-window case.
Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this
suggestion) is a little annoying but it doesn't sound like a show stopper.
It seems reasonable to assume that the extra initval would be NULL in most
cases, so it's probably a net addition of 12 bytes per row.
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 10 April 2014 01:13, Florian Pflug <fgp@phlo.org> wrote: >> However, I still believe the best approach at this point is to just work >> on making int4_avg_accum faster. I still see no principal reason what it >> has to be noticeably slower - the only additional work it absolutely *has* >> to perform is *one* 64-bit increment. > In the best case that would make sum() not noticeably slower than > avg(), whereas using a firsttrans/initialfunction would potentially > make both of them faster than they currently are, and not just in > window queries. I'm still of the opinion that we should separate the transfn for invertible cases from the normal one, and allow for two separate state types. One of the things that helps with is the strictness consideration: you no longer have to have the same strictness setting for the plain and invertible forward transfns. This idea of a separate firsttrans function is interesting but perhaps orthogonal to the current patch. Also, I don't quite understand how it would work for aggregates with null initvalues; don't you end up with exactly the same conflict about how you can't mark the transfn strict? Or is the idea that firsttrans would *only* apply to aggregates with null initvalue, and so you wouldn't even pass the previous state value to it? regards, tom lane
On 10 April 2014 15:18, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> On 10 April 2014 01:13, Florian Pflug <fgp@phlo.org> wrote: >>> However, I still believe the best approach at this point is to just work >>> on making int4_avg_accum faster. I still see no principal reason what it >>> has to be noticeably slower - the only additional work it absolutely *has* >>> to perform is *one* 64-bit increment. > >> In the best case that would make sum() not noticeably slower than >> avg(), whereas using a firsttrans/initialfunction would potentially >> make both of them faster than they currently are, and not just in >> window queries. > > I'm still of the opinion that we should separate the transfn for > invertible cases from the normal one, and allow for two separate > state types. One of the things that helps with is the strictness > consideration: you no longer have to have the same strictness > setting for the plain and invertible forward transfns. > Yes, I can imagine that there would be some aggregates for which the plain forwards transition function would be simpler than the invertible one, with a simpler state type. You'd still be left with quite a large number of existing aggregates having non-strict plain transition functions, in addition to a bunch of new non-strict invertible transition functions that had to do null counting. > This idea of a separate firsttrans function is interesting but perhaps > orthogonal to the current patch. Also, I don't quite understand how > it would work for aggregates with null initvalues; don't you end up > with exactly the same conflict about how you can't mark the transfn > strict? Or is the idea that firsttrans would *only* apply to aggregates > with null initvalue, and so you wouldn't even pass the previous state > value to it? > I was imagining that firsttrans would only be passed the first value to be aggregated, not any previous state, and that it would be illegal to specify both an initcond and a firsttrans function. The forward transition function would only be called for values after the first, by which point the state would be non-null, and so it could be made strict in most cases. The same would apply to the invertible transition functions, so they wouldn't have to do null counting, which in turn would make their state types simpler. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 10 April 2014 15:18, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> This idea of a separate firsttrans function is interesting but perhaps >> orthogonal to the current patch. Also, I don't quite understand how >> it would work for aggregates with null initvalues; don't you end up >> with exactly the same conflict about how you can't mark the transfn >> strict? Or is the idea that firsttrans would *only* apply to aggregates >> with null initvalue, and so you wouldn't even pass the previous state >> value to it? > I was imagining that firsttrans would only be passed the first value > to be aggregated, not any previous state, and that it would be illegal > to specify both an initcond and a firsttrans function. Got it. So the existing behavior where we insert the first non-null value could be seen as a degenerate case in which the firsttrans function is an identity. > The forward transition function would only be called for values after > the first, by which point the state would be non-null, and so it could > be made strict in most cases. The same would apply to the invertible > transition functions, so they wouldn't have to do null counting, which > in turn would make their state types simpler. Makes sense to me. We need names for these things though. I don't think abbreviating to "ffunc" is a good idea because of the likelihood of confusion with the finalfunc (indeed I see the CREATE AGGREGATE ref page is already using "ffunc" as a short form for finalfunc). Maybe "initfunc", which would parallel "initcond"? What about names for the invertible-aggregate infrastructure? I'm tempted to prefix "inv" to all the existing names, but then "invsfunc" means the alternate forward function ... can we use "invifunc" for the inverse transition function? Or maybe the prefix should be just "i". regards, tom lane
On 10 April 2014 19:04, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> On 10 April 2014 15:18, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> This idea of a separate firsttrans function is interesting but perhaps >>> orthogonal to the current patch. Also, I don't quite understand how >>> it would work for aggregates with null initvalues; don't you end up >>> with exactly the same conflict about how you can't mark the transfn >>> strict? Or is the idea that firsttrans would *only* apply to aggregates >>> with null initvalue, and so you wouldn't even pass the previous state >>> value to it? > >> I was imagining that firsttrans would only be passed the first value >> to be aggregated, not any previous state, and that it would be illegal >> to specify both an initcond and a firsttrans function. > > Got it. So the existing behavior where we insert the first non-null > value could be seen as a degenerate case in which the firsttrans function > is an identity. > Right. >> The forward transition function would only be called for values after >> the first, by which point the state would be non-null, and so it could >> be made strict in most cases. The same would apply to the invertible >> transition functions, so they wouldn't have to do null counting, which >> in turn would make their state types simpler. > > Makes sense to me. We need names for these things though. I don't > think abbreviating to "ffunc" is a good idea because of the likelihood > of confusion with the finalfunc (indeed I see the CREATE AGGREGATE > ref page is already using "ffunc" as a short form for finalfunc). > Maybe "initfunc", which would parallel "initcond"? > Yes, I was already thinking that "initfunc" is actually a much better name for that. > What about names for the invertible-aggregate infrastructure? > I'm tempted to prefix "inv" to all the existing names, but then > "invsfunc" means the alternate forward function ... can we use > "invifunc" for the inverse transition function? Or maybe the > prefix should be just "i". > Hmm, I'm not a fan of any of those names. Perhaps "win" as a prefix to denote a sliding window? Or just "m" for "moving aggregate". Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 10 April 2014 19:04, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What about names for the invertible-aggregate infrastructure? >> I'm tempted to prefix "inv" to all the existing names, but then >> "invsfunc" means the alternate forward function ... can we use >> "invifunc" for the inverse transition function? Or maybe the >> prefix should be just "i". > Hmm, I'm not a fan of any of those names. Perhaps "win" as a prefix to > denote a sliding window? Or just "m" for "moving aggregate". Hmm ... "moving aggregate" is actually a pretty good name for this whole feature -- better than "invertible aggregate" anyway. I can feel a global-search-and-replace coming on. So if we go with that terminology, perhaps these names for the new CREATE AGGREGATE parameters: initfunc applies to plain aggregation, mutually exclusive with initcond msfunc (or just mfunc?) forward transition for moving-agg mode mifunc inverse transition for moving-agg mode mstype state datatype for moving-agg mode msspace space estimate for mstype mfinalfunc final function for moving-agg mode minitfunc "firsttrans" for moving-agg mode minitcond mutually exclusive with minitfunc That takes us up to 16 columns in pg_aggregate, but it's still not going to be a very voluminous catalog --- there's only 171 rows there today. So I'm not particularly concerned about space, and if there's a chance of squeezing out cycles, I think we should seize it. regards, tom lane
On 10 April 2014 19:54, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> On 10 April 2014 19:04, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> What about names for the invertible-aggregate infrastructure? >>> I'm tempted to prefix "inv" to all the existing names, but then >>> "invsfunc" means the alternate forward function ... can we use >>> "invifunc" for the inverse transition function? Or maybe the >>> prefix should be just "i". > >> Hmm, I'm not a fan of any of those names. Perhaps "win" as a prefix to >> denote a sliding window? Or just "m" for "moving aggregate". > > Hmm ... "moving aggregate" is actually a pretty good name for this > whole feature -- better than "invertible aggregate" anyway. I can > feel a global-search-and-replace coming on. > > So if we go with that terminology, perhaps these names for the > new CREATE AGGREGATE parameters: > > initfunc applies to plain aggregation, mutually exclusive with initcond > msfunc (or just mfunc?) forward transition for moving-agg mode > mifunc inverse transition for moving-agg mode > mstype state datatype for moving-agg mode > msspace space estimate for mstype > mfinalfunc final function for moving-agg mode > minitfunc "firsttrans" for moving-agg mode > minitcond mutually exclusive with minitfunc > Yeah, those work for me. I think I prefer "mfunc" to "msfunc", but perhaps that's just my natural aversion to the "ms" prefix :-) Also, perhaps "minvfunc" rather than "mifunc" because "i" by itself could mean "initial". Regards, Dean
On Apr10, 2014, at 02:13 , Florian Pflug <fgp@phlo.org> wrote: > On Apr9, 2014, at 23:17 , Florian Pflug <fgp@phlo.org> wrote: >> On Apr9, 2014, at 21:35 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> A quick test says that avg(int4) >>> is about five percent slower than sum(int4), so that's the kind of hit >>> we'd be taking on non-windowed aggregations if we do it like this. >> >> That's rather surprising though. AFAICS, there's isn't much difference >> between the two transfer functions int4_sum and int4_avg_accum at all. >> >> The differences come down to (+ denoting things which ought to make >> int4_avg_accum slower compared to int4_sum, - denoting the opposite) >> >> 1. +) int4_avg_accum calls AggCheckCallContext >> 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum) >> 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS >> to verify that the state is a 2-element array without NULL entries >> 4. -) int4_sum checks if the input is NULL > > I've done a bit of profiling on this (using Instruments.app on OSX). One > thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that > calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly, > since we know that the datum cannot possibly be toasted I think (or if it > was, nodeAgg.c should do the de-toasting). > > The profile also attributes a rather large percent of the total runtime of > int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting > rid of that would help quite a few transition functions, invertible or not. > That certainly seems doable - we'd need a way to mark functions as internal > support functions, and prevent user-initiated calls of such functions. > Transition functions marked that way could then safely scribble over their > state arguments without having to consult AggCheckCallContext() first. > > ... > > However, I still believe the best approach at this point is to just work > on making int4_avg_accum faster. I still see no principal reason what it > has to be noticeably slower - the only additional work it absolutely *has* > to perform is *one* 64-bit increment. I played with this a bit. Currently, int4_avg_accum invokes AggCheckCallContext every time, and also repeatedly checks whether the array has the required dimension - which, incidentally, is the only big difference between int4_avg_accum and int4_sum. To avoid that, I added a flags field to fcinfo which nodeAgg uses to tell transition functions whether they're called for the first time, or are being called with whatever state they themselves returned last time, i.e the n-th time. If the n-th time flag is set, int4_avg_accum simply retrieves the state with PG_GETARG_DATUM() instead of PG_GETARG_ARRAYTYPE_P(), relying on the fact that it never returns toasted datums itself, and also doesn't bother to validate the array size, for the same reason. If the flag is not set, it uses PG_GETARG_ARRAYTYPE_COPY_P and does validate the array size. In theory, it could further distinguish between a 1st call in an aggregation context (where the copy is unnecessary), and a user-initiated call (i.e. outside an aggregation). But that seems unnecessary - one additional copy per aggregation group seems unlikely to be a problem. With this in place, instruction-level profiling using Apple's Instruments.app shows that int4_avg_accum takes about 1.5% of the total runtime of a simple aggregation, while int4_sum takes about 1.2%. A (very rough) patch is attached. I haven't been able to repeat Tom's measurement which shows a 5% difference in performance between the invertible and the non-invertible versions of SUM(int4), so I cannot say if this removes that. But the profiling I've done would certainly indicate it should. best regards, Florian Pflug
Attachment
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > I was imagining that firsttrans would only be passed the first value > to be aggregated, not any previous state, and that it would be illegal > to specify both an initcond and a firsttrans function. > The forward transition function would only be called for values after > the first, by which point the state would be non-null, and so it could > be made strict in most cases. The same would apply to the invertible > transition functions, so they wouldn't have to do null counting, which > in turn would make their state types simpler. I put together a very fast proof-of-concept patch for this (attached). It has a valid execution path for an aggregate with initfunc, but I didn't bother writing the CREATE AGGREGATE support yet. I made sum(int4) work as you suggest, marking the transfn strict and ripping out int4_sum's internal support for null inputs. The result seems to be about a 4% or so improvement in the overall aggregation speed, for a simple "SELECT sum(int4col) FROM table" query. So from a performance standpoint this seems only marginally worth doing. The real problem is not that 4% isn't worth the trouble, it's that AFAICS the only built-in aggregates that can benefit are sum(int2) and sum(int4). So that looks like a rather narrow use-case. You had suggested upthread that we could use this idea to make the transition functions strict for aggregates using "internal" transition datatypes, but that does not work because the initfunc would violate the safety rule that a function returning internal must take at least one internal-type argument. That puts a pretty strong damper on the usefulness of the approach, given how many internal-transtype aggregates we have (and the moving-aggregate case is not going to be better is it?) So at this point I'm feeling unexcited about the initfunc idea. Unless it does something really good for the moving-aggregate case, I think we should drop it. regards, tom lane diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c index fe6dc8a..1ca5c8f 100644 *** a/src/backend/catalog/pg_aggregate.c --- b/src/backend/catalog/pg_aggregate.c *************** AggregateCreate(const char *aggName, *** 390,395 **** --- 390,396 ---- values[Anum_pg_aggregate_aggfnoid - 1] = ObjectIdGetDatum(procOid); values[Anum_pg_aggregate_aggkind - 1] = CharGetDatum(aggKind); values[Anum_pg_aggregate_aggnumdirectargs - 1] = Int16GetDatum(numDirectArgs); + values[Anum_pg_aggregate_agginitfn - 1] = ObjectIdGetDatum(InvalidOid); values[Anum_pg_aggregate_aggtransfn - 1] = ObjectIdGetDatum(transfn); values[Anum_pg_aggregate_aggfinalfn - 1] = ObjectIdGetDatum(finalfn); values[Anum_pg_aggregate_aggsortop - 1] = ObjectIdGetDatum(sortop); diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c index 7e4bca5..2671c49 100644 *** a/src/backend/executor/nodeAgg.c --- b/src/backend/executor/nodeAgg.c *************** typedef struct AggStatePerAggData *** 152,157 **** --- 152,158 ---- int numTransInputs; /* Oids of transfer functions */ + Oid initfn_oid; /* may be InvalidOid */ Oid transfn_oid; Oid finalfn_oid; /* may be InvalidOid */ *************** typedef struct AggStatePerAggData *** 160,165 **** --- 161,167 ---- * corresponding oid is not InvalidOid. Note in particular that fn_strict * flags are kept here. */ + FmgrInfo initfn; FmgrInfo transfn; FmgrInfo finalfn; *************** typedef struct AggHashEntryData *** 296,301 **** --- 298,306 ---- static void initialize_aggregates(AggState *aggstate, AggStatePerAgg peragg, AggStatePerGroup pergroup); + static void initialize_transition_value(AggState *aggstate, + AggStatePerAgg peraggstate, + AggStatePerGroup pergroupstate); static void advance_transition_function(AggState *aggstate, AggStatePerAgg peraggstate, AggStatePerGroup pergroupstate); *************** initialize_aggregates(AggState *aggstate *** 403,408 **** --- 408,498 ---- } /* + * Initialize the transition value upon reaching the first non-NULL input(s). + * + * We use this code when the initcond is NULL and the transfn is strict, + * which otherwise would mean the transition value can never become non-null. + * If an initfn has been provided, call it on the current input value(s); + * otherwise, take the current input value as the new transition value. + * (In the latter case, we already checked that this is okay datatype-wise.) + */ + static void + initialize_transition_value(AggState *aggstate, + AggStatePerAgg peraggstate, + AggStatePerGroup pergroupstate) + { + FunctionCallInfo tfcinfo = &peraggstate->transfn_fcinfo; + MemoryContext oldContext; + + if (OidIsValid(peraggstate->initfn_oid)) + { + FunctionCallInfoData fcinfo; + int numInitArgs; + Datum newVal; + + /* We run the transition functions in per-input-tuple memory context */ + oldContext = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory); + + /* + * Set up an fcinfo for calling the initfn. It takes one fewer + * argument than the transfn: the old transValue is not passed, since + * by definition it's NULL at this point. We know that the other + * transfn arguments are all non-null, so no need to check. + */ + numInitArgs = peraggstate->numTransInputs; + InitFunctionCallInfoData(fcinfo, &(peraggstate->initfn), + numInitArgs, + peraggstate->aggCollation, + (void *) aggstate, NULL); + /* Skip tfcinfo->arg[0], which would be the transValue */ + memcpy(fcinfo.arg, tfcinfo->arg + 1, numInitArgs * sizeof(Datum)); + memset(fcinfo.argnull, 0, numInitArgs * sizeof(bool)); + + /* set up aggstate->curperagg for AggGetAggref() */ + aggstate->curperagg = peraggstate; + + newVal = FunctionCallInvoke(&fcinfo); + + aggstate->curperagg = NULL; + + /* + * If pass-by-ref datatype, must copy the new value into aggcontext. + * Needn't pfree the prior transValue, since it's NULL. + */ + if (!peraggstate->transtypeByVal && !fcinfo.isnull) + { + MemoryContextSwitchTo(aggstate->aggcontext); + newVal = datumCopy(newVal, + peraggstate->transtypeByVal, + peraggstate->transtypeLen); + } + + pergroupstate->transValue = newVal; + pergroupstate->transValueIsNull = fcinfo.isnull; + + MemoryContextSwitchTo(oldContext); + } + else + { + /* + * No initfn, so use the current input value as the new transValue. + * + * We must copy the datum into aggcontext if it is pass-by-ref. We do + * not need to pfree the old transValue, since it's NULL. + */ + oldContext = MemoryContextSwitchTo(aggstate->aggcontext); + pergroupstate->transValue = datumCopy(tfcinfo->arg[1], + peraggstate->transtypeByVal, + peraggstate->transtypeLen); + pergroupstate->transValueIsNull = false; + MemoryContextSwitchTo(oldContext); + } + + /* Reset state so we won't do this again in the current group */ + pergroupstate->noTransValue = false; + } + + /* * Given new input value(s), advance the transition function of an aggregate. * * The new values (and null flags) have been preloaded into argument positions *************** advance_transition_function(AggState *ag *** 438,458 **** if (pergroupstate->noTransValue) { /* ! * transValue has not been initialized. This is the first non-NULL ! * input value. We use it as the initial value for transValue. (We ! * already checked that the agg's input type is binary-compatible ! * with its transtype, so straight copy here is OK.) ! * ! * We must copy the datum into aggcontext if it is pass-by-ref. We ! * do not need to pfree the old transValue, since it's NULL. */ ! oldContext = MemoryContextSwitchTo(aggstate->aggcontext); ! pergroupstate->transValue = datumCopy(fcinfo->arg[1], ! peraggstate->transtypeByVal, ! peraggstate->transtypeLen); ! pergroupstate->transValueIsNull = false; ! pergroupstate->noTransValue = false; ! MemoryContextSwitchTo(oldContext); return; } if (pergroupstate->transValueIsNull) --- 528,538 ---- if (pergroupstate->noTransValue) { /* ! * We had a NULL initcond, and this is the first non-NULL input ! * value(s). Perform special initialization, using the initfn if ! * any. */ ! initialize_transition_value(aggstate, peraggstate, pergroupstate); return; } if (pergroupstate->transValueIsNull) *************** ExecInitAgg(Agg *node, EState *estate, i *** 1676,1682 **** Form_pg_aggregate aggform; Oid aggtranstype; AclResult aclresult; ! Oid transfn_oid, finalfn_oid; Expr *transfnexpr, *finalfnexpr; --- 1756,1763 ---- Form_pg_aggregate aggform; Oid aggtranstype; AclResult aclresult; ! Oid initfn_oid, ! transfn_oid, finalfn_oid; Expr *transfnexpr, *finalfnexpr; *************** ExecInitAgg(Agg *node, EState *estate, i *** 1728,1733 **** --- 1809,1815 ---- get_func_name(aggref->aggfnoid)); InvokeFunctionExecuteHook(aggref->aggfnoid); + peraggstate->initfn_oid = initfn_oid = aggform->agginitfn; peraggstate->transfn_oid = transfn_oid = aggform->aggtransfn; peraggstate->finalfn_oid = finalfn_oid = aggform->aggfinalfn; *************** ExecInitAgg(Agg *node, EState *estate, i *** 1802,1808 **** &transfnexpr, &finalfnexpr); ! /* set up infrastructure for calling the transfn and finalfn */ fmgr_info(transfn_oid, &peraggstate->transfn); fmgr_info_set_expr((Node *) transfnexpr, &peraggstate->transfn); --- 1884,1896 ---- &transfnexpr, &finalfnexpr); ! /* set up infrastructure for calling the aggregate's functions */ ! if (OidIsValid(initfn_oid)) ! { ! fmgr_info(initfn_oid, &peraggstate->initfn); ! // fmgr_info_set_expr((Node *) initfnexpr, &peraggstate->initfn); ! } ! fmgr_info(transfn_oid, &peraggstate->transfn); fmgr_info_set_expr((Node *) transfnexpr, &peraggstate->transfn); *************** ExecInitAgg(Agg *node, EState *estate, i *** 1849,1855 **** * transValue. This should have been checked at agg definition time, * but just in case... */ ! if (peraggstate->transfn.fn_strict && peraggstate->initValueIsNull) { if (numArguments <= numDirectArgs || !IsBinaryCoercible(inputTypes[numDirectArgs], aggtranstype)) --- 1937,1944 ---- * transValue. This should have been checked at agg definition time, * but just in case... */ ! if (peraggstate->transfn.fn_strict && peraggstate->initValueIsNull && ! !OidIsValid(initfn_oid)) { if (numArguments <= numDirectArgs || !IsBinaryCoercible(inputTypes[numDirectArgs], aggtranstype)) diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c index 64eb0f8..cc2a55d 100644 *** a/src/backend/utils/adt/numeric.c --- b/src/backend/utils/adt/numeric.c *************** int4_sum(PG_FUNCTION_ARGS) *** 3021,3036 **** { int64 newval; - if (PG_ARGISNULL(0)) - { - /* No non-null input seen so far... */ - if (PG_ARGISNULL(1)) - PG_RETURN_NULL(); /* still no non-null */ - /* This is the first non-null input. */ - newval = (int64) PG_GETARG_INT32(1); - PG_RETURN_INT64(newval); - } - /* * If we're invoked as an aggregate, we can cheat and modify our first * parameter in-place to avoid palloc overhead. If not, we need to return --- 3021,3026 ---- *************** int4_sum(PG_FUNCTION_ARGS) *** 3043,3051 **** { int64 *oldsum = (int64 *) PG_GETARG_POINTER(0); ! /* Leave the running sum unchanged in the new input is null */ ! if (!PG_ARGISNULL(1)) ! *oldsum = *oldsum + (int64) PG_GETARG_INT32(1); PG_RETURN_POINTER(oldsum); } --- 3033,3039 ---- { int64 *oldsum = (int64 *) PG_GETARG_POINTER(0); ! *oldsum = *oldsum + (int64) PG_GETARG_INT32(1); PG_RETURN_POINTER(oldsum); } *************** int4_sum(PG_FUNCTION_ARGS) *** 3054,3064 **** { int64 oldsum = PG_GETARG_INT64(0); - /* Leave sum unchanged if new input is null. */ - if (PG_ARGISNULL(1)) - PG_RETURN_INT64(oldsum); - - /* OK to do the addition. */ newval = oldsum + (int64) PG_GETARG_INT32(1); PG_RETURN_INT64(newval); --- 3042,3047 ---- diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h index f189998..79d327c 100644 *** a/src/include/catalog/pg_aggregate.h --- b/src/include/catalog/pg_aggregate.h *************** *** 30,35 **** --- 30,36 ---- * aggfnoid pg_proc OID of the aggregate itself * aggkind aggregate kind, see AGGKIND_ categories below * aggnumdirectargs number of arguments that are "direct" arguments + * agginitfn first-transition function (0 if none) * aggtransfn transition function * aggfinalfn final function (0 if none) * aggsortop associated sort operator (0 if none) *************** CATALOG(pg_aggregate,2600) BKI_WITHOUT_O *** 45,50 **** --- 46,52 ---- regproc aggfnoid; char aggkind; int16 aggnumdirectargs; + regproc agginitfn; regproc aggtransfn; regproc aggfinalfn; Oid aggsortop; *************** typedef FormData_pg_aggregate *Form_pg_a *** 68,83 **** * ---------------- */ ! #define Natts_pg_aggregate 9 #define Anum_pg_aggregate_aggfnoid 1 #define Anum_pg_aggregate_aggkind 2 #define Anum_pg_aggregate_aggnumdirectargs 3 ! #define Anum_pg_aggregate_aggtransfn 4 ! #define Anum_pg_aggregate_aggfinalfn 5 ! #define Anum_pg_aggregate_aggsortop 6 ! #define Anum_pg_aggregate_aggtranstype 7 ! #define Anum_pg_aggregate_aggtransspace 8 ! #define Anum_pg_aggregate_agginitval 9 /* * Symbolic values for aggkind column. We distinguish normal aggregates --- 70,86 ---- * ---------------- */ ! #define Natts_pg_aggregate 10 #define Anum_pg_aggregate_aggfnoid 1 #define Anum_pg_aggregate_aggkind 2 #define Anum_pg_aggregate_aggnumdirectargs 3 ! #define Anum_pg_aggregate_agginitfn 4 ! #define Anum_pg_aggregate_aggtransfn 5 ! #define Anum_pg_aggregate_aggfinalfn 6 ! #define Anum_pg_aggregate_aggsortop 7 ! #define Anum_pg_aggregate_aggtranstype 8 ! #define Anum_pg_aggregate_aggtransspace 9 ! #define Anum_pg_aggregate_agginitval 10 /* * Symbolic values for aggkind column. We distinguish normal aggregates *************** typedef FormData_pg_aggregate *Form_pg_a *** 101,277 **** */ /* avg */ ! DATA(insert ( 2100 n 0 int8_avg_accum numeric_avg 0 2281 128 _null_ )); ! DATA(insert ( 2101 n 0 int4_avg_accum int8_avg 0 1016 0 "{0,0}" )); ! DATA(insert ( 2102 n 0 int2_avg_accum int8_avg 0 1016 0 "{0,0}" )); ! DATA(insert ( 2103 n 0 numeric_avg_accum numeric_avg 0 2281 128 _null_ )); ! DATA(insert ( 2104 n 0 float4_accum float8_avg 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2105 n 0 float8_accum float8_avg 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2106 n 0 interval_accum interval_avg 0 1187 0 "{0 second,0 second}" )); /* sum */ ! DATA(insert ( 2107 n 0 int8_avg_accum numeric_sum 0 2281 128 _null_ )); ! DATA(insert ( 2108 n 0 int4_sum - 0 20 0 _null_ )); ! DATA(insert ( 2109 n 0 int2_sum - 0 20 0 _null_ )); ! DATA(insert ( 2110 n 0 float4pl - 0 700 0 _null_ )); ! DATA(insert ( 2111 n 0 float8pl - 0 701 0 _null_ )); ! DATA(insert ( 2112 n 0 cash_pl - 0 790 0 _null_ )); ! DATA(insert ( 2113 n 0 interval_pl - 0 1186 0 _null_ )); ! DATA(insert ( 2114 n 0 numeric_avg_accum numeric_sum 0 2281 128 _null_ )); /* max */ ! DATA(insert ( 2115 n 0 int8larger - 413 20 0 _null_ )); ! DATA(insert ( 2116 n 0 int4larger - 521 23 0 _null_ )); ! DATA(insert ( 2117 n 0 int2larger - 520 21 0 _null_ )); ! DATA(insert ( 2118 n 0 oidlarger - 610 26 0 _null_ )); ! DATA(insert ( 2119 n 0 float4larger - 623 700 0 _null_ )); ! DATA(insert ( 2120 n 0 float8larger - 674 701 0 _null_ )); ! DATA(insert ( 2121 n 0 int4larger - 563 702 0 _null_ )); ! DATA(insert ( 2122 n 0 date_larger - 1097 1082 0 _null_ )); ! DATA(insert ( 2123 n 0 time_larger - 1112 1083 0 _null_ )); ! DATA(insert ( 2124 n 0 timetz_larger - 1554 1266 0 _null_ )); ! DATA(insert ( 2125 n 0 cashlarger - 903 790 0 _null_ )); ! DATA(insert ( 2126 n 0 timestamp_larger - 2064 1114 0 _null_ )); ! DATA(insert ( 2127 n 0 timestamptz_larger - 1324 1184 0 _null_ )); ! DATA(insert ( 2128 n 0 interval_larger - 1334 1186 0 _null_ )); ! DATA(insert ( 2129 n 0 text_larger - 666 25 0 _null_ )); ! DATA(insert ( 2130 n 0 numeric_larger - 1756 1700 0 _null_ )); ! DATA(insert ( 2050 n 0 array_larger - 1073 2277 0 _null_ )); ! DATA(insert ( 2244 n 0 bpchar_larger - 1060 1042 0 _null_ )); ! DATA(insert ( 2797 n 0 tidlarger - 2800 27 0 _null_ )); ! DATA(insert ( 3526 n 0 enum_larger - 3519 3500 0 _null_ )); /* min */ ! DATA(insert ( 2131 n 0 int8smaller - 412 20 0 _null_ )); ! DATA(insert ( 2132 n 0 int4smaller - 97 23 0 _null_ )); ! DATA(insert ( 2133 n 0 int2smaller - 95 21 0 _null_ )); ! DATA(insert ( 2134 n 0 oidsmaller - 609 26 0 _null_ )); ! DATA(insert ( 2135 n 0 float4smaller - 622 700 0 _null_ )); ! DATA(insert ( 2136 n 0 float8smaller - 672 701 0 _null_ )); ! DATA(insert ( 2137 n 0 int4smaller - 562 702 0 _null_ )); ! DATA(insert ( 2138 n 0 date_smaller - 1095 1082 0 _null_ )); ! DATA(insert ( 2139 n 0 time_smaller - 1110 1083 0 _null_ )); ! DATA(insert ( 2140 n 0 timetz_smaller - 1552 1266 0 _null_ )); ! DATA(insert ( 2141 n 0 cashsmaller - 902 790 0 _null_ )); ! DATA(insert ( 2142 n 0 timestamp_smaller - 2062 1114 0 _null_ )); ! DATA(insert ( 2143 n 0 timestamptz_smaller - 1322 1184 0 _null_ )); ! DATA(insert ( 2144 n 0 interval_smaller - 1332 1186 0 _null_ )); ! DATA(insert ( 2145 n 0 text_smaller - 664 25 0 _null_ )); ! DATA(insert ( 2146 n 0 numeric_smaller - 1754 1700 0 _null_ )); ! DATA(insert ( 2051 n 0 array_smaller - 1072 2277 0 _null_ )); ! DATA(insert ( 2245 n 0 bpchar_smaller - 1058 1042 0 _null_ )); ! DATA(insert ( 2798 n 0 tidsmaller - 2799 27 0 _null_ )); ! DATA(insert ( 3527 n 0 enum_smaller - 3518 3500 0 _null_ )); /* count */ ! DATA(insert ( 2147 n 0 int8inc_any - 0 20 0 "0" )); ! DATA(insert ( 2803 n 0 int8inc - 0 20 0 "0" )); /* var_pop */ ! DATA(insert ( 2718 n 0 int8_accum numeric_var_pop 0 2281 128 _null_ )); ! DATA(insert ( 2719 n 0 int4_accum numeric_var_pop 0 2281 128 _null_ )); ! DATA(insert ( 2720 n 0 int2_accum numeric_var_pop 0 2281 128 _null_ )); ! DATA(insert ( 2721 n 0 float4_accum float8_var_pop 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2722 n 0 float8_accum float8_var_pop 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2723 n 0 numeric_accum numeric_var_pop 0 2281 128 _null_ )); /* var_samp */ ! DATA(insert ( 2641 n 0 int8_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2642 n 0 int4_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2643 n 0 int2_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2644 n 0 float4_accum float8_var_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2645 n 0 float8_accum float8_var_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2646 n 0 numeric_accum numeric_var_samp 0 2281 128 _null_ )); /* variance: historical Postgres syntax for var_samp */ ! DATA(insert ( 2148 n 0 int8_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2149 n 0 int4_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2150 n 0 int2_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2151 n 0 float4_accum float8_var_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2152 n 0 float8_accum float8_var_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2153 n 0 numeric_accum numeric_var_samp 0 2281 128 _null_ )); /* stddev_pop */ ! DATA(insert ( 2724 n 0 int8_accum numeric_stddev_pop 0 2281 128 _null_ )); ! DATA(insert ( 2725 n 0 int4_accum numeric_stddev_pop 0 2281 128 _null_ )); ! DATA(insert ( 2726 n 0 int2_accum numeric_stddev_pop 0 2281 128 _null_ )); ! DATA(insert ( 2727 n 0 float4_accum float8_stddev_pop 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2728 n 0 float8_accum float8_stddev_pop 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2729 n 0 numeric_accum numeric_stddev_pop 0 2281 128 _null_ )); /* stddev_samp */ ! DATA(insert ( 2712 n 0 int8_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2713 n 0 int4_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2714 n 0 int2_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2715 n 0 float4_accum float8_stddev_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2716 n 0 float8_accum float8_stddev_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2717 n 0 numeric_accum numeric_stddev_samp 0 2281 128 _null_ )); /* stddev: historical Postgres syntax for stddev_samp */ ! DATA(insert ( 2154 n 0 int8_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2155 n 0 int4_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2156 n 0 int2_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2157 n 0 float4_accum float8_stddev_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2158 n 0 float8_accum float8_stddev_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2159 n 0 numeric_accum numeric_stddev_samp 0 2281 128 _null_ )); /* SQL2003 binary regression aggregates */ ! DATA(insert ( 2818 n 0 int8inc_float8_float8 - 0 20 0 "0" )); ! DATA(insert ( 2819 n 0 float8_regr_accum float8_regr_sxx 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2820 n 0 float8_regr_accum float8_regr_syy 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2821 n 0 float8_regr_accum float8_regr_sxy 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2822 n 0 float8_regr_accum float8_regr_avgx 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2823 n 0 float8_regr_accum float8_regr_avgy 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2824 n 0 float8_regr_accum float8_regr_r2 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2825 n 0 float8_regr_accum float8_regr_slope 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2826 n 0 float8_regr_accum float8_regr_intercept 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2827 n 0 float8_regr_accum float8_covar_pop 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2828 n 0 float8_regr_accum float8_covar_samp 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2829 n 0 float8_regr_accum float8_corr 0 1022 0 "{0,0,0,0,0,0}" )); /* boolean-and and boolean-or */ ! DATA(insert ( 2517 n 0 booland_statefunc - 58 16 0 _null_ )); ! DATA(insert ( 2518 n 0 boolor_statefunc - 59 16 0 _null_ )); ! DATA(insert ( 2519 n 0 booland_statefunc - 58 16 0 _null_ )); /* bitwise integer */ ! DATA(insert ( 2236 n 0 int2and - 0 21 0 _null_ )); ! DATA(insert ( 2237 n 0 int2or - 0 21 0 _null_ )); ! DATA(insert ( 2238 n 0 int4and - 0 23 0 _null_ )); ! DATA(insert ( 2239 n 0 int4or - 0 23 0 _null_ )); ! DATA(insert ( 2240 n 0 int8and - 0 20 0 _null_ )); ! DATA(insert ( 2241 n 0 int8or - 0 20 0 _null_ )); ! DATA(insert ( 2242 n 0 bitand - 0 1560 0 _null_ )); ! DATA(insert ( 2243 n 0 bitor - 0 1560 0 _null_ )); /* xml */ ! DATA(insert ( 2901 n 0 xmlconcat2 - 0 142 0 _null_ )); /* array */ ! DATA(insert ( 2335 n 0 array_agg_transfn array_agg_finalfn 0 2281 0 _null_ )); /* text */ ! DATA(insert ( 3538 n 0 string_agg_transfn string_agg_finalfn 0 2281 0 _null_ )); /* bytea */ ! DATA(insert ( 3545 n 0 bytea_string_agg_transfn bytea_string_agg_finalfn 0 2281 0 _null_ )); /* json */ ! DATA(insert ( 3175 n 0 json_agg_transfn json_agg_finalfn 0 2281 0 _null_ )); ! DATA(insert ( 3197 n 0 json_object_agg_transfn json_object_agg_finalfn 0 2281 0 _null_ )); /* ordered-set and hypothetical-set aggregates */ ! DATA(insert ( 3972 o 1 ordered_set_transition percentile_disc_final 0 2281 0 _null_ )); ! DATA(insert ( 3974 o 1 ordered_set_transition percentile_cont_float8_final 0 2281 0 _null_)); ! DATA(insert ( 3976 o 1 ordered_set_transition percentile_cont_interval_final 0 2281 0 _null_ )); ! DATA(insert ( 3978 o 1 ordered_set_transition percentile_disc_multi_final 0 2281 0 _null_ )); ! DATA(insert ( 3980 o 1 ordered_set_transition percentile_cont_float8_multi_final 0 2281 0 _null_ )); ! DATA(insert ( 3982 o 1 ordered_set_transition percentile_cont_interval_multi_final 0 2281 0 _null_)); ! DATA(insert ( 3984 o 0 ordered_set_transition mode_final 0 2281 0 _null_ )); ! DATA(insert ( 3986 h 1 ordered_set_transition_multi rank_final 0 2281 0 _null_)); ! DATA(insert ( 3988 h 1 ordered_set_transition_multi percent_rank_final 0 2281 0 _null_)); ! DATA(insert ( 3990 h 1 ordered_set_transition_multi cume_dist_final 0 2281 0 _null_ )); ! DATA(insert ( 3992 h 1 ordered_set_transition_multi dense_rank_final 0 2281 0 _null_)); /* --- 104,280 ---- */ /* avg */ ! DATA(insert ( 2100 n 0 - int8_avg_accum numeric_avg 0 2281 128 _null_ )); ! DATA(insert ( 2101 n 0 - int4_avg_accum int8_avg 0 1016 0 "{0,0}" )); ! DATA(insert ( 2102 n 0 - int2_avg_accum int8_avg 0 1016 0 "{0,0}" )); ! DATA(insert ( 2103 n 0 - numeric_avg_accum numeric_avg 0 2281 128 _null_ )); ! DATA(insert ( 2104 n 0 - float4_accum float8_avg 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2105 n 0 - float8_accum float8_avg 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2106 n 0 - interval_accum interval_avg 0 1187 0 "{0 second,0 second}" )); /* sum */ ! DATA(insert ( 2107 n 0 - int8_avg_accum numeric_sum 0 2281 128 _null_ )); ! DATA(insert ( 2108 n 0 481 int4_sum - 0 20 0 _null_ )); ! DATA(insert ( 2109 n 0 - int2_sum - 0 20 0 _null_ )); ! DATA(insert ( 2110 n 0 - float4pl - 0 700 0 _null_ )); ! DATA(insert ( 2111 n 0 - float8pl - 0 701 0 _null_ )); ! DATA(insert ( 2112 n 0 - cash_pl - 0 790 0 _null_ )); ! DATA(insert ( 2113 n 0 - interval_pl - 0 1186 0 _null_ )); ! DATA(insert ( 2114 n 0 - numeric_avg_accum numeric_sum 0 2281 128 _null_ )); /* max */ ! DATA(insert ( 2115 n 0 - int8larger - 413 20 0 _null_ )); ! DATA(insert ( 2116 n 0 - int4larger - 521 23 0 _null_ )); ! DATA(insert ( 2117 n 0 - int2larger - 520 21 0 _null_ )); ! DATA(insert ( 2118 n 0 - oidlarger - 610 26 0 _null_ )); ! DATA(insert ( 2119 n 0 - float4larger - 623 700 0 _null_ )); ! DATA(insert ( 2120 n 0 - float8larger - 674 701 0 _null_ )); ! DATA(insert ( 2121 n 0 - int4larger - 563 702 0 _null_ )); ! DATA(insert ( 2122 n 0 - date_larger - 1097 1082 0 _null_ )); ! DATA(insert ( 2123 n 0 - time_larger - 1112 1083 0 _null_ )); ! DATA(insert ( 2124 n 0 - timetz_larger - 1554 1266 0 _null_ )); ! DATA(insert ( 2125 n 0 - cashlarger - 903 790 0 _null_ )); ! DATA(insert ( 2126 n 0 - timestamp_larger - 2064 1114 0 _null_ )); ! DATA(insert ( 2127 n 0 - timestamptz_larger - 1324 1184 0 _null_ )); ! DATA(insert ( 2128 n 0 - interval_larger - 1334 1186 0 _null_ )); ! DATA(insert ( 2129 n 0 - text_larger - 666 25 0 _null_ )); ! DATA(insert ( 2130 n 0 - numeric_larger - 1756 1700 0 _null_ )); ! DATA(insert ( 2050 n 0 - array_larger - 1073 2277 0 _null_ )); ! DATA(insert ( 2244 n 0 - bpchar_larger - 1060 1042 0 _null_ )); ! DATA(insert ( 2797 n 0 - tidlarger - 2800 27 0 _null_ )); ! DATA(insert ( 3526 n 0 - enum_larger - 3519 3500 0 _null_ )); /* min */ ! DATA(insert ( 2131 n 0 - int8smaller - 412 20 0 _null_ )); ! DATA(insert ( 2132 n 0 - int4smaller - 97 23 0 _null_ )); ! DATA(insert ( 2133 n 0 - int2smaller - 95 21 0 _null_ )); ! DATA(insert ( 2134 n 0 - oidsmaller - 609 26 0 _null_ )); ! DATA(insert ( 2135 n 0 - float4smaller - 622 700 0 _null_ )); ! DATA(insert ( 2136 n 0 - float8smaller - 672 701 0 _null_ )); ! DATA(insert ( 2137 n 0 - int4smaller - 562 702 0 _null_ )); ! DATA(insert ( 2138 n 0 - date_smaller - 1095 1082 0 _null_ )); ! DATA(insert ( 2139 n 0 - time_smaller - 1110 1083 0 _null_ )); ! DATA(insert ( 2140 n 0 - timetz_smaller - 1552 1266 0 _null_ )); ! DATA(insert ( 2141 n 0 - cashsmaller - 902 790 0 _null_ )); ! DATA(insert ( 2142 n 0 - timestamp_smaller - 2062 1114 0 _null_ )); ! DATA(insert ( 2143 n 0 - timestamptz_smaller - 1322 1184 0 _null_ )); ! DATA(insert ( 2144 n 0 - interval_smaller - 1332 1186 0 _null_ )); ! DATA(insert ( 2145 n 0 - text_smaller - 664 25 0 _null_ )); ! DATA(insert ( 2146 n 0 - numeric_smaller - 1754 1700 0 _null_ )); ! DATA(insert ( 2051 n 0 - array_smaller - 1072 2277 0 _null_ )); ! DATA(insert ( 2245 n 0 - bpchar_smaller - 1058 1042 0 _null_ )); ! DATA(insert ( 2798 n 0 - tidsmaller - 2799 27 0 _null_ )); ! DATA(insert ( 3527 n 0 - enum_smaller - 3518 3500 0 _null_ )); /* count */ ! DATA(insert ( 2147 n 0 - int8inc_any - 0 20 0 "0" )); ! DATA(insert ( 2803 n 0 - int8inc - 0 20 0 "0" )); /* var_pop */ ! DATA(insert ( 2718 n 0 - int8_accum numeric_var_pop 0 2281 128 _null_ )); ! DATA(insert ( 2719 n 0 - int4_accum numeric_var_pop 0 2281 128 _null_ )); ! DATA(insert ( 2720 n 0 - int2_accum numeric_var_pop 0 2281 128 _null_ )); ! DATA(insert ( 2721 n 0 - float4_accum float8_var_pop 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2722 n 0 - float8_accum float8_var_pop 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2723 n 0 - numeric_accum numeric_var_pop 0 2281 128 _null_ )); /* var_samp */ ! DATA(insert ( 2641 n 0 - int8_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2642 n 0 - int4_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2643 n 0 - int2_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2644 n 0 - float4_accum float8_var_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2645 n 0 - float8_accum float8_var_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2646 n 0 - numeric_accum numeric_var_samp 0 2281 128 _null_ )); /* variance: historical Postgres syntax for var_samp */ ! DATA(insert ( 2148 n 0 - int8_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2149 n 0 - int4_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2150 n 0 - int2_accum numeric_var_samp 0 2281 128 _null_ )); ! DATA(insert ( 2151 n 0 - float4_accum float8_var_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2152 n 0 - float8_accum float8_var_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2153 n 0 - numeric_accum numeric_var_samp 0 2281 128 _null_ )); /* stddev_pop */ ! DATA(insert ( 2724 n 0 - int8_accum numeric_stddev_pop 0 2281 128 _null_ )); ! DATA(insert ( 2725 n 0 - int4_accum numeric_stddev_pop 0 2281 128 _null_ )); ! DATA(insert ( 2726 n 0 - int2_accum numeric_stddev_pop 0 2281 128 _null_ )); ! DATA(insert ( 2727 n 0 - float4_accum float8_stddev_pop 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2728 n 0 - float8_accum float8_stddev_pop 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2729 n 0 - numeric_accum numeric_stddev_pop 0 2281 128 _null_ )); /* stddev_samp */ ! DATA(insert ( 2712 n 0 - int8_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2713 n 0 - int4_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2714 n 0 - int2_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2715 n 0 - float4_accum float8_stddev_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2716 n 0 - float8_accum float8_stddev_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2717 n 0 - numeric_accum numeric_stddev_samp 0 2281 128 _null_ )); /* stddev: historical Postgres syntax for stddev_samp */ ! DATA(insert ( 2154 n 0 - int8_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2155 n 0 - int4_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2156 n 0 - int2_accum numeric_stddev_samp 0 2281 128 _null_ )); ! DATA(insert ( 2157 n 0 - float4_accum float8_stddev_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2158 n 0 - float8_accum float8_stddev_samp 0 1022 0 "{0,0,0}" )); ! DATA(insert ( 2159 n 0 - numeric_accum numeric_stddev_samp 0 2281 128 _null_ )); /* SQL2003 binary regression aggregates */ ! DATA(insert ( 2818 n 0 - int8inc_float8_float8 - 0 20 0 "0" )); ! DATA(insert ( 2819 n 0 - float8_regr_accum float8_regr_sxx 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2820 n 0 - float8_regr_accum float8_regr_syy 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2821 n 0 - float8_regr_accum float8_regr_sxy 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2822 n 0 - float8_regr_accum float8_regr_avgx 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2823 n 0 - float8_regr_accum float8_regr_avgy 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2824 n 0 - float8_regr_accum float8_regr_r2 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2825 n 0 - float8_regr_accum float8_regr_slope 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2826 n 0 - float8_regr_accum float8_regr_intercept 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2827 n 0 - float8_regr_accum float8_covar_pop 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2828 n 0 - float8_regr_accum float8_covar_samp 0 1022 0 "{0,0,0,0,0,0}" )); ! DATA(insert ( 2829 n 0 - float8_regr_accum float8_corr 0 1022 0 "{0,0,0,0,0,0}" )); /* boolean-and and boolean-or */ ! DATA(insert ( 2517 n 0 - booland_statefunc - 58 16 0 _null_ )); ! DATA(insert ( 2518 n 0 - boolor_statefunc - 59 16 0 _null_ )); ! DATA(insert ( 2519 n 0 - booland_statefunc - 58 16 0 _null_ )); /* bitwise integer */ ! DATA(insert ( 2236 n 0 - int2and - 0 21 0 _null_ )); ! DATA(insert ( 2237 n 0 - int2or - 0 21 0 _null_ )); ! DATA(insert ( 2238 n 0 - int4and - 0 23 0 _null_ )); ! DATA(insert ( 2239 n 0 - int4or - 0 23 0 _null_ )); ! DATA(insert ( 2240 n 0 - int8and - 0 20 0 _null_ )); ! DATA(insert ( 2241 n 0 - int8or - 0 20 0 _null_ )); ! DATA(insert ( 2242 n 0 - bitand - 0 1560 0 _null_ )); ! DATA(insert ( 2243 n 0 - bitor - 0 1560 0 _null_ )); /* xml */ ! DATA(insert ( 2901 n 0 - xmlconcat2 - 0 142 0 _null_ )); /* array */ ! DATA(insert ( 2335 n 0 - array_agg_transfn array_agg_finalfn 0 2281 0 _null_ )); /* text */ ! DATA(insert ( 3538 n 0 - string_agg_transfn string_agg_finalfn 0 2281 0 _null_ )); /* bytea */ ! DATA(insert ( 3545 n 0 - bytea_string_agg_transfn bytea_string_agg_finalfn 0 2281 0 _null_ )); /* json */ ! DATA(insert ( 3175 n 0 - json_agg_transfn json_agg_finalfn 0 2281 0 _null_ )); ! DATA(insert ( 3197 n 0 - json_object_agg_transfn json_object_agg_finalfn 0 2281 0 _null_ )); /* ordered-set and hypothetical-set aggregates */ ! DATA(insert ( 3972 o 1 - ordered_set_transition percentile_disc_final 0 2281 0 _null_ )); ! DATA(insert ( 3974 o 1 - ordered_set_transition percentile_cont_float8_final 0 2281 0 _null_ )); ! DATA(insert ( 3976 o 1 - ordered_set_transition percentile_cont_interval_final 0 2281 0 _null_ )); ! DATA(insert ( 3978 o 1 - ordered_set_transition percentile_disc_multi_final 0 2281 0 _null_ )); ! DATA(insert ( 3980 o 1 - ordered_set_transition percentile_cont_float8_multi_final 0 2281 0 _null_ )); ! DATA(insert ( 3982 o 1 - ordered_set_transition percentile_cont_interval_multi_final 0 2281 0 _null_ )); ! DATA(insert ( 3984 o 0 - ordered_set_transition mode_final 0 2281 0 _null_ )); ! DATA(insert ( 3986 h 1 - ordered_set_transition_multi rank_final 0 2281 0 _null_ )); ! DATA(insert ( 3988 h 1 - ordered_set_transition_multi percent_rank_final 0 2281 0 _null_ )); ! DATA(insert ( 3990 h 1 - ordered_set_transition_multi cume_dist_final 0 2281 0 _null_ )); ! DATA(insert ( 3992 h 1 - ordered_set_transition_multi dense_rank_final 0 2281 0 _null_ )); /* diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 7b9c587..0bcb76d 100644 *** a/src/include/catalog/pg_proc.h --- b/src/include/catalog/pg_proc.h *************** DATA(insert OID = 1839 ( numeric_stddev *** 2445,2451 **** DESCR("aggregate final function"); DATA(insert OID = 1840 ( int2_sum PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 20 "20 21" _null_ _null_ _null_ _null_int2_sum _null_ _null_ _null_ )); DESCR("aggregate transition function"); ! DATA(insert OID = 1841 ( int4_sum PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 20 "20 23" _null_ _null_ _null_ _null_int4_sum _null_ _null_ _null_ )); DESCR("aggregate transition function"); DATA(insert OID = 1842 ( int8_sum PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 1700 "1700 20" _null_ _null_ _null__null_ int8_sum _null_ _null_ _null_ )); DESCR("aggregate transition function"); --- 2445,2451 ---- DESCR("aggregate final function"); DATA(insert OID = 1840 ( int2_sum PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 20 "20 21" _null_ _null_ _null_ _null_int2_sum _null_ _null_ _null_ )); DESCR("aggregate transition function"); ! DATA(insert OID = 1841 ( int4_sum PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 20 "20 23" _null_ _null_ _null_ _null_int4_sum _null_ _null_ _null_ )); DESCR("aggregate transition function"); DATA(insert OID = 1842 ( int8_sum PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 1700 "1700 20" _null_ _null_ _null__null_ int8_sum _null_ _null_ _null_ )); DESCR("aggregate transition function");
On Apr10, 2014, at 21:34 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On 10 April 2014 19:54, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So if we go with that terminology, perhaps these names for the >> new CREATE AGGREGATE parameters: >> >> initfunc applies to plain aggregation, mutually exclusive with initcond >> msfunc (or just mfunc?) forward transition for moving-agg mode >> mifunc inverse transition for moving-agg mode >> mstype state datatype for moving-agg mode >> msspace space estimate for mstype >> mfinalfunc final function for moving-agg mode >> minitfunc "firsttrans" for moving-agg mode >> minitcond mutually exclusive with minitfunc > > I think I prefer "mfunc" to "msfunc", but perhaps that's just my > natural aversion to the "ms" prefix :-) > > Also, perhaps "minvfunc" rather than "mifunc" because "i" by itself > could mean "initial". I still think you're getting ahead of yourselves here. The number of aggregates which benefit from this is tiny SUM(int2,int4) and maybe BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs - for the others, the state type is already pass-by-ref. I don't think we should be additional that much additional machinery until it has been conclusively demonstrated that the performance regression cannot be fixed any other way. Which, quite frankly, I don't believe. Nothing in int4_avg_accum looks particularly expensive, and the things that *do* seem to cost something certainly can be made cheaper. c.f. the patch I just posted. Another reason I'm so opposed to this is that inverse transition functions might not be the last kind of transition functions we ever add. For example, if we ever get ROLLUP/CUBE, we might want to have a mergefunc which takes two aggregation states and combines them into one. What do we do if we add those? Add yet a another set of "mergable" transition functions? What about the various combinations of invertible/non-invertible mergable/non-mergable that could result? The opportunity cost seems pretty high here... best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > I still think you're getting ahead of yourselves here. The number of > aggregates which benefit from this is tiny SUM(int2,int4) and maybe > BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs - > for the others, the state type is already pass-by-ref. That argument is reasonable for the initfunc idea, but it doesn't apply otherwise. > Another reason I'm so opposed to this is that inverse transition > functions might not be the last kind of transition functions we ever > add. For example, if we ever get ROLLUP/CUBE, we might want to have > a mergefunc which takes two aggregation states and combines them > into one. What do we do if we add those? Make more pg_aggregate columns. What exactly do you think is either easier or harder about such cases? Also, maybe I'm misremembering the spec, but ROLLUP/CUBE wouldn't apply to window functions would they? So if your argument is based on the assumption that these features need to combine, I'm not sure it's true. The bottom line for me is that it seems conceptually far cleaner to make the moving-aggregate support be independent than to insist that it share an stype and sfunc with the plain case. Furthermore, I do not buy the argument that if we hack hard enough, we can make the performance cost of forcing the sfunc to do double duty negligible. In the first place, that argument is unsupported by much evidence, and in the second place, it will certainly cost us complexity to make the performance issue go away. Instead we can just design the problem out, for nothing that I see as a serious drawback. regards, tom lane
On Apr11, 2014, at 00:07 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florian Pflug <fgp@phlo.org> writes: >> I still think you're getting ahead of yourselves here. The number of >> aggregates which benefit from this is tiny SUM(int2,int4) and maybe >> BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs - >> for the others, the state type is already pass-by-ref. > > That argument is reasonable for the initfunc idea, but it doesn't apply > otherwise. Why not? AFAICS, the increase in cost comes from going from an by-value to a by-reference state type. Once you're using a by-refence type, you already pay the overhead of the additional dereferences, and for calling AggCheckCallContext() or some equivalent. >> Another reason I'm so opposed to this is that inverse transition >> functions might not be the last kind of transition functions we ever >> add. For example, if we ever get ROLLUP/CUBE, we might want to have >> a mergefunc which takes two aggregation states and combines them >> into one. What do we do if we add those? > > Make more pg_aggregate columns. What exactly do you think is either > easier or harder about such cases? Also, maybe I'm misremembering > the spec, but ROLLUP/CUBE wouldn't apply to window functions would > they? So if your argument is based on the assumption that these > features need to combine, I'm not sure it's true. Well, it was just an example - there might be other future extensions which *do* need to combine. And we've been known to go beyond the spec sometimes... > Furthermore, I do not buy the argument that if we hack hard enough, > we can make the performance cost of forcing the sfunc to do double duty > negligible. In the first place, that argument is unsupported by much > evidence, and in the second place, it will certainly cost us complexity > to make the performance issue go away. Instead we can just design the > problem out, for nothing that I see as a serious drawback. My argument is that is costs us more complexity to duplicate everything for the invertible case, *and* the result seems less flexible - not from the POV of aggregate implementations, but from the POV of future extensions. As for evidence - have you looked at the patch I posted? I'd be very interested to know if it removes the performance differences you saw. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > My argument is that is costs us more complexity to duplicate everything > for the invertible case, *and* the result seems less flexible - not > from the POV of aggregate implementations, but from the POV of future > extensions. [ shrug... ] You can argue against any feature whatsoever by claiming that it might possibly conflict with something we would wish to do sometime in the future. I think you need to have a much more concrete argument about specific issues this will cause in order to be convincing. For all we know about ROLLUP/CUBE implementation issues right now, doing this feature with separate implementations might make that easier not harder. (I note that the crux of my complaint right now is that we're asking sfuncs to serve two masters --- how's it going to be better when they have to serve three or four?) > As for evidence - have you looked at the patch I posted? I'd be very > interested to know if it removes the performance differences you saw. (1) You can't really prove the absence of a performance issue by showing that one specific aggregate doesn't have an issue. (2) These results (mine as well as yours) seem mighty platform-dependent, so the fact you don't see an issue doesn't mean someone else won't. (3) A new FunctionCallInfoData field just for this? Surely not. There's got to be a distributed cost to that (although I notice you didn't bother initializing the field most places, which is cheating). Now point 3 could be addressed by doing the signaling in some other way with the existing context field. But it's still the case that you're trying to band-aid a bad design. There's no good reason to make the sfunc do extra work to be invertible in contexts where we know, with certainty, that that work is useless. Especially not when we know that even a few instructions of extra work can be performance-significant. regards, tom lane
On Apr11, 2014, at 01:30 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florian Pflug <fgp@phlo.org> writes: >> As for evidence - have you looked at the patch I posted? I'd be very >> interested to know if it removes the performance differences you saw. > > (1) You can't really prove the absence of a performance issue by showing > that one specific aggregate doesn't have an issue. I'm claiming that SUM(int4) is about as simple as it gets, so if the effect can be mitigated there, it can be mitigated everywhere. The more complex a forward-only transition function is, the less will and added if or two hurt. > (2) These results > (mine as well as yours) seem mighty platform-dependent, so the fact you > don't see an issue doesn't mean someone else won't. Yeah, though *any* change - mine, the one your propose, and any other - has the potential to hurt some platform due to weird interactions (say, cache trashing). > (3) A new > FunctionCallInfoData field just for this? Surely not. There's got to be > a distributed cost to that (although I notice you didn't bother > initializing the field most places, which is cheating). I think the field doesn't actually increase the size of the structure at all - at least not if the bool before it has size 1 and the short following it is 2-byte aligned. Or at least that was why I made it a char, and added it at the place I did. But I might be missing something... Also, InitFunctionCallInfoData *does* initialize the flags to zero. Though maybe not everybody uses that - I didn't check, this was just a quick hack. > Now point 3 could be addressed by doing the signaling in some other way > with the existing context field. But it's still the case that you're > trying to band-aid a bad design. There's no good reason to make the sfunc > do extra work to be invertible in contexts where we know, with certainty, > that that work is useless. This is the principal point where we disagree, I think. From my POV, the problem isn't invertibility here at all. Heck, SUM(int4) wouldn't need *any* extra state at all to be invertible, if it weren't for those pesky issues surrounding NULL handling. In fact, an earlier version of the invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The only reason this doesn't work nowadays is that Dean didn't like the forward-nonstrict-but-inverse-strict special case that made this work. The way I see it, the main problem is the drop in performance that comes from using a pass-by-ref state type. Which IMHO happens because this *already* is a heavily band-aided design - all the state validation and "if (AggCheckCallContext(,NULL))" stuff really works around the fact that transition functions *aren't* supposed to be user-called, yet they are, and must deal. Which is why I feel that having two separate sets of transition functions and state types solves the wrong problem. If we find a way to prevent transition functions from being called directly, we'll shave a few cycles of quite a few existing aggregates, invertible or not. If we find a way around the initfunc-for-internal-statetype problem you discovered, the implementation would get much simpler, since we could then make nearly all of them strict. And this would again shave off a few cycles - for lots of NULL inputs, the effect could even be large. Compared to that, the benefit of having a completely separate set of transition functions and state types for the invertible case is much less tangible, IMHO. > Especially not when we know that even a few instructions of extra work > can be performance-significant. But where do we draw that line? nodeWindowAgg.c quite certainly wastes about as many cycles as int4_avg_accum does on various checks that are unnecessary unless in the non-sliding window case. Do we duplicate those functions too? best regards, Florian Pflug
On 10 April 2014 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> I was imagining that firsttrans would only be passed the first value >> to be aggregated, not any previous state, and that it would be illegal >> to specify both an initcond and a firsttrans function. > >> The forward transition function would only be called for values after >> the first, by which point the state would be non-null, and so it could >> be made strict in most cases. The same would apply to the invertible >> transition functions, so they wouldn't have to do null counting, which >> in turn would make their state types simpler. > > I put together a very fast proof-of-concept patch for this (attached). > It has a valid execution path for an aggregate with initfunc, but I didn't > bother writing the CREATE AGGREGATE support yet. I made sum(int4) work > as you suggest, marking the transfn strict and ripping out int4_sum's > internal support for null inputs. The result seems to be about a 4% or so > improvement in the overall aggregation speed, for a simple "SELECT > sum(int4col) FROM table" query. So from a performance standpoint this > seems only marginally worth doing. The real problem is not that 4% isn't > worth the trouble, it's that AFAICS the only built-in aggregates that > can benefit are sum(int2) and sum(int4). So that looks like a rather > narrow use-case. > > You had suggested upthread that we could use this idea to make the > transition functions strict for aggregates using "internal" transition > datatypes, but that does not work because the initfunc would violate > the safety rule that a function returning internal must take at least > one internal-type argument. That puts a pretty strong damper on the > usefulness of the approach, given how many internal-transtype aggregates > we have (and the moving-aggregate case is not going to be better is it?) > Ah, that's disappointing. If it can't be made to work for aggregates with internal state types, then I think it loses most of it's value, and I don't think it will be of much more use in the moving aggregate case either. > So at this point I'm feeling unexcited about the initfunc idea. > Unless it does something really good for the moving-aggregate case, > I think we should drop it. > Agreed. Thanks for prototyping it though. Regards, Dean
Florian Pflug <fgp@phlo.org> writes: > Which is why I feel that having two separate sets of transition functions > and state types solves the wrong problem. If we find a way to prevent > transition functions from being called directly, we'll shave a few cycles > of quite a few existing aggregates, invertible or not. If we find a way > around the initfunc-for-internal-statetype problem you discovered, the > implementation would get much simpler, since we could then make nearly > all of them strict. And this would again shave off a few cycles - for lots > of NULL inputs, the effect could even be large. Since neither of those latter things seems likely to happen, I don't find this argument convincing at all. Even if they did happen, they would represent an incremental improvement that would be equally useful with either one or two sfuncs. Basically, this comes down to a design judgment call, and my judgment is differing from yours. In the absence of opinions from others, I'm going to do it my way. However, quite independently of how many sfuncs there are, I'm interested about this: > ... SUM(int4) wouldn't need > *any* extra state at all to be invertible, if it weren't for those pesky > issues surrounding NULL handling. In fact, an earlier version of the > invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The > only reason this doesn't work nowadays is that Dean didn't like the > forward-nonstrict-but-inverse-strict special case that made this work. Tell me about the special case here again? Should we revisit the issue? regards, tom lane
On Apr11, 2014, at 17:09 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Basically, this comes down to a design judgment call, and my judgment > is differing from yours. In the absence of opinions from others, > I'm going to do it my way. Ok. Are you going to do the necessary changes, or shall I? I'm happy to leave it to you, but if you lack the time I can try to find some. >> ... SUM(int4) wouldn't need >> *any* extra state at all to be invertible, if it weren't for those pesky >> issues surrounding NULL handling. In fact, an earlier version of the >> invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The >> only reason this doesn't work nowadays is that Dean didn't like the >> forward-nonstrict-but-inverse-strict special case that made this work. > > Tell me about the special case here again? Should we revisit the issue? My original coding allows the combination of non-strict forward with strict reverse transition functions. The combination behaved like a strict aggregate regarding NULL handling - i.e., neither the forward nor the reverse transition function received NULL inputs. But if the initial state was NULL, the forward transition function *would* be called with that NULL state value upon seeing the first non-NULL input. In the window case, the aggregation machinery would also take care to reset the state type to it's initial value when it removed the last non-NULL input from the aggregation state (just like it does for strict aggregates today). This had two advantages 1) First, it allows strict aggregates to use state type internal. As it stands now, aggregates with state type internalmust implement their own NULL handling, even if they behave exactly like most standard aggregates do, namelyignore NULLS and return NULL only if there were no non-NULL inputs. 2) It allows strict aggregates whose state type and input type aren't binary coercible to return NULL if all inputs wereNULL without any special coding. As it stands today, this doesn't work without some kind of counter in the state,because the final function otherwise won't know if there were non-NULL inputs or not. Aggregates whose state and input types are binary coercible get around that by setting the initial value to NULL while *still* being strict,and relying on the state replacement behaviour of the aggregate machinery. It, however, also has a few disadvantages 3) It means that one needs to look at the inverse transition function's strictness setting even if that function is neverused. 4) It feels a bit hacky. (2) is what affects SUM(int4). The only reason it track the number of inputs is to be able to return NULL instead of 0 if no inputs remain. One idea to fix (3) and (4) was *explicitly* flagging aggregates as NULL-handling or NULL-ignoring, and also as what one might call "weakly strict", i.e. as returning NULL exactly if there were no non-NULL inputs. There are various variations of that theme possible - one flag for each behaviour, or simply a single "common behaviour" flag. In the end, I decided not to pursue that, mostly because the aggregates affected by that issued turned out to be relatively easy to fix. For the ones with state type internal, I simply added a counter, and I made SUM(int4) use AVG's transition function. I don't feel too strongly either way on this one - I initially implemented the exception because I noticed that the NULL handling of some aggregates was broken otherwise, and it seemed simpler to fix this in one place than going over all the aggregates separately. OTOH, when I wrote the docs, I noticed how hard it was to describe the behaviour accurately, which made me like it less and less. And Dean wasn't happy with it at all, so that finally settled it. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > On Apr11, 2014, at 17:09 , Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Basically, this comes down to a design judgment call, and my judgment >> is differing from yours. In the absence of opinions from others, >> I'm going to do it my way. > Ok. Are you going to do the necessary changes, or shall I? I'm happy to > leave it to you, but if you lack the time I can try to find some. Nah, I'm happy to do the work, since it's me insisting on changing it. >> Tell me about the special case here again? Should we revisit the issue? > ... > I don't feel too strongly either way on this one - I initially implemented the > exception because I noticed that the NULL handling of some aggregates was > broken otherwise, and it seemed simpler to fix this in one place than going > over all the aggregates separately. OTOH, when I wrote the docs, I noticed > how hard it was to describe the behaviour accurately, which made me like it > less and less. And Dean wasn't happy with it at all, so that finally settled it. Yeah, if you can't document the design nicely, it's probably not a good idea :-(. Thanks for the summary. It strikes me that your second point > 2) It allows strict aggregates whose state type and input type aren't > binary coercible to return NULL if all inputs were NULL without any > special coding. As it stands today, this doesn't work without some > kind of counter in the state, because the final function otherwise > won't know if there were non-NULL inputs or not. Aggregates whose state > and input types are binary coercible get around that by setting the > initial value to NULL while *still* being strict, and relying on the > state replacement behaviour of the aggregate machinery. could be addressed by the initfunc idea, but I'm still not sufficiently excited by that one. Given the problem with internal-type transition values, I think this could only win if there are cases where it would let us use a regular SQL type instead of internal for the transition value; and I'm not sure that there are many/any such cases. regards, tom lane
On Apr11, 2014, at 19:04 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > It strikes me that your second point > >> 2) It allows strict aggregates whose state type and input type aren't >> binary coercible to return NULL if all inputs were NULL without any >> special coding. As it stands today, this doesn't work without some >> kind of counter in the state, because the final function otherwise >> won't know if there were non-NULL inputs or not. Aggregates whose state >> and input types are binary coercible get around that by setting the >> initial value to NULL while *still* being strict, and relying on the >> state replacement behaviour of the aggregate machinery. > > could be addressed by the initfunc idea, but I'm still not sufficiently > excited by that one. Given the problem with internal-type transition > values, I think this could only win if there are cases where it would let > us use a regular SQL type instead of internal for the transition value; > and I'm not sure that there are many/any such cases. Yes, the idea had come up at some point during the review discussion. I agree that it's only worthwhile if it works for state type internal - though I think there ought to be a way to allow that. We could for example simply decree that the initfunc's first parameter must be of type internal if it's return type is, and then just pass NULL for that parameter. What I like about the initfunc idea is that it also naturally extends to ordered-set aggregates, I think it'd be very useful for some possible ordered-set aggregates to received their direct arguments beforehand and not afterwards. But that all seems largely orthogonal to the invtrans patch. best regards, Florian Pflug
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 10 April 2014 19:54, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So if we go with that terminology, perhaps these names for the >> new CREATE AGGREGATE parameters: >> >> initfunc applies to plain aggregation, mutually exclusive with initcond >> msfunc (or just mfunc?) forward transition for moving-agg mode >> mifunc inverse transition for moving-agg mode >> mstype state datatype for moving-agg mode >> msspace space estimate for mstype >> mfinalfunc final function for moving-agg mode >> minitfunc "firsttrans" for moving-agg mode >> minitcond mutually exclusive with minitfunc > Yeah, those work for me. > I think I prefer "mfunc" to "msfunc", but perhaps that's just my > natural aversion to the "ms" prefix :-) Meh. We've got mstype, and I don't think leaving out the "s" there feels right. > Also, perhaps "minvfunc" rather than "mifunc" because "i" by itself > could mean "initial". Good point. So with initfuncs out of the picture, we have new CREATE AGGREGATE parameter names msfunc forward transition for moving-agg mode minvfunc inverse transition for moving-agg mode mfinalfunc final function for moving-agg mode mstype state datatype for moving-agg mode msspace space estimate for mstype minitcond initial state value for moving-agg mode and new pg_aggregate columns aggmtransfn | regproc | not null aggminvtransfn | regproc | not null aggmfinalfn | regproc | not null aggmtranstype | oid | not null aggmtransspace | integer | not null aggminitval | text | It's a bit unfortunate that the catalog column names aren't quite on the same page as CREATE AGGREGATE, but it doesn't seem like a good idea to try to fix that now. regards, tom lane
Florian Pflug <fgp@phlo.org> writes: > Yes, the idea had come up at some point during the review discussion. I > agree that it's only worthwhile if it works for state type internal - though > I think there ought to be a way to allow that. We could for example simply > decree that the initfunc's first parameter must be of type internal if it's > return type is, and then just pass NULL for that parameter. I had thought about that, but it doesn't really work since it'd be violating the strictness spec of the function. regards, tom lane
On Apr11, 2014, at 19:42 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Florian Pflug <fgp@phlo.org> writes: >> Yes, the idea had come up at some point during the review discussion. I >> agree that it's only worthwhile if it works for state type internal - though >> I think there ought to be a way to allow that. We could for example simply >> decree that the initfunc's first parameter must be of type internal if it's >> return type is, and then just pass NULL for that parameter. > > I had thought about that, but it doesn't really work since it'd be > violating the strictness spec of the function. Only if we insist on passing SQL NULL, not if we passed an non-NULL value that happens to be (char*)0. Or we could simply require the initfunc to be non-strict in the case of state type internal. best regards, Florian Pflug
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > OK, I'm marking this ready for committer attention, on the > understanding that that doesn't include the invtrans_minmax patch. I've finished reviewing/revising/committing this submission. Some notes: * I left out the EXPLAIN ANALYZE statistics, as I still feel that they're of little if any use to regular users, and neither very well defined nor well documented. We can revisit that later of course. * I also left out the table documenting which aggregates have this optimization. That's not the kind of thing we ordinarily document, and it seems inevitable to me that such a table would be noteworthy mostly for wrong/incomplete/obsolete information in the future. * I rejected the invtrans_collecting sub-patch altogether. I didn't like anything about the idea of adding a poorly-documented field to ArrayBuildState and then teaching some random subset of the functions using that struct what to do with it. I think it would've been simpler, more reliable, and not that much less efficient to just do memmove's in shiftArrayResult. The string-aggregate changes were at least more localized, but they still seemed awfully messy. And there's a bigger issue: these aggregates have to do O(N) work per row for a frame of length N anyway, so it's not clear to me that there's any big-O gain to be had from making them into moving aggregates. I doubt people are going to be using these aggregates with very wide frames, just because they're going to be horribly expensive no matter what we do. Anyway, this is nice forward progress for 9.4, even if we get no further. regards, tom lane
On Apr13, 2014, at 20:23 , Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> OK, I'm marking this ready for committer attention, on the >> understanding that that doesn't include the invtrans_minmax patch. > > I've finished reviewing/revising/committing this submission. Cool! Thanks! > Some notes: > > * I left out the EXPLAIN ANALYZE statistics, as I still feel that they're > of little if any use to regular users, and neither very well defined nor > well documented. We can revisit that later of course. > > * I also left out the table documenting which aggregates have this > optimization. That's not the kind of thing we ordinarily document, > and it seems inevitable to me that such a table would be noteworthy > mostly for wrong/incomplete/obsolete information in the future. I can live with each leaving each of these out individually, but leaving them both out means there's now no way of determining how a particular sliding window aggregation will behave. That leaves our users a bit out in the cold IMHO. For example, with this patch you'd usually want to write SUM(numeric_value::numeric(n,m)) instead of SUM(numeric_value)::numeric(n,m) in a sliding window aggregation, and there should be *some* way for users to figure this out. We have exhaustive tables of all the functions, operators and aggregates we support, and maintenance of those seems to work well for the most part. Why would a table of invertible aggregates be any different? If it's the non-locality that bothers you - I guess the same information could be added to the descriptions of the individual aggregates, instead of having one big table. > * I rejected the invtrans_collecting sub-patch altogether. I didn't > like anything about the idea of adding a poorly-documented field to > ArrayBuildState and then teaching some random subset of the functions > using that struct what to do with it. I think it would've been simpler, > more reliable, and not that much less efficient to just do memmove's in > shiftArrayResult. The string-aggregate changes were at least more > localized, but they still seemed awfully messy. And there's a bigger > issue: these aggregates have to do O(N) work per row for a frame of length > N anyway, so it's not clear to me that there's any big-O gain to be had > from making them into moving aggregates. Yeah, those probably aren't super-usefull. I initially added array_agg because with the nonstrict-forward/strict-reverse rule still in place, there wouldn't otherwise have been an in-core aggregate which exercises the non-strict case, which seemed like a bad idea. For the string case - I didn't expect that to turn out to be *quite* this messy when I started implementing it. > Anyway, this is nice forward progress for 9.4, even if we get no further. Yup! Thanks to everybody who made this happens! best regards, Florian Pflug
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 14 April 2014 06:23 > To: Dean Rasheed > Cc: Florian Pflug; David Rowley; Kevin Grittner; Josh Berkus; Greg Stark; > PostgreSQL-development > Subject: Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions > (WIP) > > Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > OK, I'm marking this ready for committer attention, on the > > understanding that that doesn't include the invtrans_minmax patch. > > I've finished reviewing/revising/committing this submission. That's great news! Tom, Thanks for taking the time to make the all modifications and commit this. Dean, Thank you for all your hard work reviewing the patch. The detail of the review was quite impressive. I'm really happy to see this make it in for 9.4. Regards David Rowley
On Sun, Apr 13, 2014 at 2:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > * I also left out the table documenting which aggregates have this > optimization. That's not the kind of thing we ordinarily document, > and it seems inevitable to me that such a table would be noteworthy > mostly for wrong/incomplete/obsolete information in the future. I tend to think that not documenting such things is an error. Sure, the documentation could become obsolete, but I don't see why it's particularly more likely with this table than anywhere else, and if it does happen, and people care, they'll submit patches to fix it. More to the point, when we don't document things like this, it doesn't cause the knowledge not to be important to end-users; it just means that the knowledge lives on wiki pages, support fora, and the minds of people who are "in the know" rather than being easily and generally accessible. I'd rather have documentation on this topic that was, say, 80% correct than have none at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company