Thread: [PATCH] Negative Transition Aggregate Functions (WIP)

[PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
I've been working on speeding up aggregate functions when used in the context of a window's with non fixed frame heads.

Currently if the top of the window is not in a fixed position then when the window frame moves down the aggregates must be recalculated by looping over each record in the new scope of the window frame.

Take the following as an example:

SELECT SUM(n::int) OVER (ORDER BY n ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM generate_series(1,20000) g(n);

Here the frame head moves along with the current row and always finishes with the last row in the partition or the last row in the frame if no partitioning exists.
Currently PostgreSQL must recalculate the aggregate each time, this means looping from the current row to the end of the frame for each row produced... So the first row needs 20000 iterations, the 2nd row 19999, 3rd row 19998 etc.

The above query on the unpatched head, on my laptop takes 36676 ms. With the attached patch it takes 73 ms. So a speed increase of about 500 times for 20,000 rows. Of course this performance gap will increase with more rows and decrease with less rows, but it's even seeing a performance improvement with as little as 50 rows.


The attached patch is not quite finished yet, I've not yet fully covered SUM and AVG for all types. I just need to look into numeric a bit more before I figure that one out.
Here is a list of things still todo:

1. Fully implement negative transition functions for SUM and AVG.
2. CREATE AGGREGATE support for user defined aggregates.
3. Documentation.
4. Look to see which other aggregates apart from COUNT, SUM and AVG that can make use of this.

Please note that if the aggregate function does not have a negative transition function setup, e.g MAX and MIN, then the old behaviour will take place.

One thing that is currently on my mind is what to do when passing volatile functions to the aggregate. Since the number of times we execute a volatile function will much depend on the window frame options, I think we should include some sort of warning in the documentation that "The number of times that the expression is evaluated within the aggregate function when used in the context of a WINDOW is undefined". The other option would be to disable this optimisation if the aggregate expression contained a volatile function, but doing this, to me seems a bit weird as is anyone actually going to be depending on a volatile function being executed so many times?

If anyone can see any road blocking issues to why this optimisation cannot be done please let me know.

Otherwise I'll continue to work on this so that it is ready for CF4.

Regards

David Rowley
Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> The attached patch is not quite finished yet, I've not yet fully covered
> SUM and AVG for all types.

I think you *can't* cover them for the float types; roundoff error
would mean you don't get the same answers as before.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Greg Stark
Date:
<p dir="ltr"><br /> On 14 Dec 2013 15:40, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/> ><br /> > David Rowley <<a href="mailto:dgrowleyml@gmail.com">dgrowleyml@gmail.com</a>>
writes:<br/> > > The attached patch is not quite finished yet, I've not yet fully covered<br /> > > SUM and
AVGfor all types.<br /> ><br /> > I think you *can't* cover them for the float types; roundoff error<br /> >
wouldmean you don't get the same answers as before.<br /><p dir="ltr">I was going to say the same thing. But then I
startedto wonder.... What's so special about the answers we used to give? They are also subject to round off and the
resultsare already quite questionable in those cases. 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Dec 15, 2013 at 12:48 PM, Greg Stark <span
dir="ltr"><<ahref="mailto:stark@mit.edu" target="_blank">stark@mit.edu</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div
class="h5"><pdir="ltr"><br /> On 14 Dec 2013 15:40, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us"
target="_blank">tgl@sss.pgh.pa.us</a>>wrote:<br /> ><br /> > David Rowley <<a
href="mailto:dgrowleyml@gmail.com"target="_blank">dgrowleyml@gmail.com</a>> writes:<br /> > > The attached
patchis not quite finished yet, I've not yet fully covered<br /> > > SUM and AVG for all types.<br /> ><br />
>I think you *can't* cover them for the float types; roundoff error<br /> > would mean you don't get the same
answersas before.<br /></div></div><p dir="ltr">I was going to say the same thing. But then I started to wonder....
What'sso special about the answers we used to give? They are also subject to round off and the results are already
quitequestionable in those cases.</blockquote></div>I guess they probably shouldn't be, subject to rounding /
influencedby errors from tuples that are out of scope of the aggregate context.</div><div class="gmail_extra">Though
sayingthat it would be a shame to have this optimisation for all but float and double. I can imagine the questions in
[GENERAL]..Why is SUM(<int>) OVER ().. fast but SUM(<float>) OVER () slow? I wonder what other RDBMS' do
here...</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Regards</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">David Rowley</div></div> 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> I think you *can't* cover them for the float types; roundoff error
>> would mean you don't get the same answers as before.

> I was going to say the same thing. But then I started to wonder.... What's
> so special about the answers we used to give? They are also subject to
> round off and the results are already quite questionable in those cases.

Well, we can't easily do better than the old answers, and the new ones
might be arbitrarily worse.  Example: sum or average across single-row
windows ought to be exact in any case, but it might be arbitrarily wrong
with the negative-transition technique.

More generally, this is supposed to be a performance enhancement only;
it's not supposed to change the results.

This consideration also makes me question whether we should apply the
method for NUMERIC.  Although in principle numeric addition/subtraction
is exact, such a sequence could leave us with a different dscale than
is returned by the existing code.  I'm not sure if changing the number of
trailing zeroes is a big enough behavior change to draw complaints.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Josh Berkus
Date:
On 12/14/2013 05:00 PM, Tom Lane wrote:
> This consideration also makes me question whether we should apply the
> method for NUMERIC.  Although in principle numeric addition/subtraction
> is exact, such a sequence could leave us with a different dscale than
> is returned by the existing code.  I'm not sure if changing the number of
> trailing zeroes is a big enough behavior change to draw complaints.

If we're going to disqualify NUMERIC too, we might as well bounce the
feature.  Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.

I think even the FLOAT case deserves some consideration.  What's the
worst-case drift?  In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.

And Dave is right: how many bug reports would we get about "NUMERIC is
fast, but FLOAT is slow"?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 15, 2013 at 2:27 PM, Josh Berkus <josh@agliodbs.com> wrote:
If we're going to disqualify NUMERIC too, we might as well bounce the
feature.  Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.


I don't agree with this. I'm going with the opinion that the more types and aggregates we can support (properly) the better. I'd rather delay implementations of the ones which could change results than see them as a roadblock for the ones we can implement today without this danger.

I think the feature is worth it alone if we could improve COUNT(*). 
It's a bit silly to have to loop through all the tuples in a tuplestore to see how many there are after removing one, when we already knew the count before removing it.  Something like, 10 - 1 .... ummm I dunno, let's count again.. 1, 2, 3, 4, 5, 6, 7, 8, 9.... It's 9!! Where with this patch it's just 10 - 1 *result*. Feels a bit like asking a kid, if you have 10 beans and you take 1 away, how many will there be. You and I know 9, but the kid might have to count them again. PostgreSQL counts them again.

Regards

David Rowley 

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I think even the FLOAT case deserves some consideration.  What's the
> worst-case drift?

Complete loss of all significant digits.

The case I was considering earlier of single-row windows could be made
safe (I think) if we apply the negative transition function first, before
incorporating the new row(s).  Then for example if you've got float8 1e20
followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer.
It's not so good with two-row windows though:
   Table    correct sum of        negative-transition           this + next value    result   1e20    1e20
1e20+ 1 = 1e20   1        1            1e20 - 1e20 + 0 = 0   0
 

> In general, folks who do aggregate operations on
> FLOATs aren't expecting an exact answer, or one which is consistent
> beyond a certain number of significant digits.

Au contraire.  People who know what they're doing expect the results
to be what an IEEE float arithmetic unit would produce for the given
calculation.  They know how the roundoff error ought to behave, and they
will not thank us for doing a calculation that's not the one specified.
I will grant you that there are plenty of clueless people out there
who *don't* know this, but they shouldn't be using float arithmetic
anyway.

> And Dave is right: how many bug reports would we get about "NUMERIC is
> fast, but FLOAT is slow"?

I've said this before, but: we can make it arbitrarily fast if we don't
have to get the right answer.  I'd rather get "it's slow" complaints
than "this is the wrong answer" complaints.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Sun, Dec 15, 2013 at 2:27 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> If we're going to disqualify NUMERIC too, we might as well bounce the
>> feature.  Without a fast FLOAT or NUMERIC, you've lost most of the
>> target audience.

> I think the feature is worth it alone if we could improve COUNT(*).

Agreed, count(*) and operations on integers are alone enough to be worth
the trouble.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
I wrote:
> It's not so good with two-row windows though:

Actually, carrying that example a bit further makes the point even more
forcefully:
   Table       correct sum of          negative-transition               this + next value       result   1e20
1e20                   1e20 + 1 = 1e20   1           1                       1e20 - 1e20 + 0 = 0   0           0
              0 - 1 + 0 = -1   0           1                       -1 - 0 + 1 = 0   1
 

Those last few answers are completely corrupt.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 15, 2013 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> It's not so good with two-row windows though:

Actually, carrying that example a bit further makes the point even more
forcefully:

    Table       correct sum of          negative-transition
                this + next value       result
    1e20        1e20                    1e20 + 1 = 1e20
    1           1                       1e20 - 1e20 + 0 = 0
    0           0                       0 - 1 + 0 = -1
    0           1                       -1 - 0 + 1 = 0
    1

Those last few answers are completely corrupt.


I guess the answer for the people that complain about slowness could be that they create their own aggregate function which implements float4pl as the trans function and float4mi as the negative trans function. They can call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note in the documents for this patch?

I've removed the negative trans function setups for float4 and float8 with SUM and AVG stuff in my working copy now.

As for numeric, I did start working on this just after I posted the original patch and before I saw your comment about it. I did end up making do_numeric_desperse() which was to be the reverse of do_numeric_accum(), but I got stuck on the equivalent of when do_numeric_accum() does mul_var(&X, &X, &X2, X.dscale * 2); 

If it is decided that we don't want to implement a negative trans function for numeric, then I guess I could leave in my trans function to allow users to create their own fast version, maybe?

Regards

David Rowley
 
                        regards, tom lane

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> I guess the answer for the people that complain about slowness could be
> that they create their own aggregate function which implements float4pl as
> the trans function and float4mi as the negative trans function. They can
> call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note in
> the documents for this patch?

I think it would be an absolutely perfect documentation example to show
how to set up such an aggregate (and then point out the risks, of course).

> As for numeric, I did start working on this just after I posted the
> original patch and before I saw your comment about it. I did end up making
> do_numeric_desperse() which was to be the reverse of do_numeric_accum(),
> but I got stuck on the equivalent of when do_numeric_accum()
> does mul_var(&X, &X, &X2, X.dscale * 2);

Ummm ... why doesn't it work to just use numeric_add and numeric_sub,
exactly parallel to the float case?
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 15, 2013 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> It's not so good with two-row windows though:

Actually, carrying that example a bit further makes the point even more
forcefully:

    Table       correct sum of          negative-transition
                this + next value       result
    1e20        1e20                    1e20 + 1 = 1e20
    1           1                       1e20 - 1e20 + 0 = 0
    0           0                       0 - 1 + 0 = -1
    0           1                       -1 - 0 + 1 = 0
    1

Those last few answers are completely corrupt.


For sake of the archives I just wanted to reproduce this...
I used the following query with the patch which was attached upthread to confirm this:

SELECT sum(n::float8) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)
FROM (VALUES(1,1e20),(2,1)) n(i,n);

  sum
--------
 1e+020
      0
(2 rows)

SUM(1) should equal 1 not 0.

But unpatched I get:

  sum
--------
 1e+020
      1
(2 rows)

This discovery seems like good information to keep around, so I've added a regression test in my local copy of the patch to try to make sure nobody tries to add a negative trans for float or double in the future.

Regards

David Rowley

 
                        regards, tom lane

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 15, 2013 at 9:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> I guess the answer for the people that complain about slowness could be
> that they create their own aggregate function which implements float4pl as
> the trans function and float4mi as the negative trans function. They can
> call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note in
> the documents for this patch?

I think it would be an absolutely perfect documentation example to show
how to set up such an aggregate (and then point out the risks, of course).


I've attached an updated patch which includes some documentation.
I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's an ok name for the option, but if anyone has any better ideas please let them be known.

For the checks before the aggregate is created, I put these together quite quickly and I think I'm still missing a check to ensure that the strict property for the trans and negtrans functions are set to the same thing, though I do have a runtime check for this, it's likely bad to wait until then to tell the user about it.

> As for numeric, I did start working on this just after I posted the
> original patch and before I saw your comment about it. I did end up making
> do_numeric_desperse() which was to be the reverse of do_numeric_accum(),
> but I got stuck on the equivalent of when do_numeric_accum()
> does mul_var(&X, &X, &X2, X.dscale * 2);

Ummm ... why doesn't it work to just use numeric_add and numeric_sub,
exactly parallel to the float case?


I've not quite got back to this yet and I actually pulled out my initial try at this thinking that we didn't want it because it was affecting the scale. The transition function for SUM numeric does not seem to use numeric_add, it uses numeric_avg_accum as the transition function which lets do_numeric_accum do the hard work and that just does add_var. I changes these add_var's to sub_var's and altered the initial value to flip the sign so that NULL,10 would be -10 instead of 10. I think that's all it needs, and I guess I leave the dscale as is in this situation then?

Regards

David Rowley
 
                        regards, tom lane

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> I've attached an updated patch which includes some documentation.
> I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's
> an ok name for the option, but if anyone has any better ideas please let
> them be known.

I'd be a bit inclined to build the terminology around "reverse" instead of
"negative" --- the latter seems a bit too arithmetic-centric.  But that's
just MHO.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Ants Aasma
Date:
<p dir="ltr">On Dec 15, 2013 6:44 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/> > David Rowley <<a href="mailto:dgrowleyml@gmail.com">dgrowleyml@gmail.com</a>> writes:<br /> >
>I've attached an updated patch which includes some documentation.<br /> > > I've also added support for
negfuncin CREATE AGGREGATE. Hopefully that's<br /> > > an ok name for the option, but if anyone has any better
ideasplease let<br /> > > them be known.<br /> ><br /> > I'd be a bit inclined to build the terminology
around"reverse" instead of<br /> > "negative" --- the latter seems a bit too arithmetic-centric.  But that's<br />
>just MHO.<p dir="ltr">To contribute to the bike shedding, inverse is often used in similar contexts.<p
dir="ltr">--<br/> Ants Aasma 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > I've attached an updated patch which includes some documentation.
> > I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's
> > an ok name for the option, but if anyone has any better ideas please let
> > them be known.
>
> I'd be a bit inclined to build the terminology around "reverse" instead of
> "negative" --- the latter seems a bit too arithmetic-centric.  But that's
> just MHO.

To contribute to the bike shedding, inverse is often used in similar contexts.

I guess it's not really bike shedding, most of the work I hope is done, so I might as well try to get the docs polished up and we'd need a consensus on what we're going to call them before I can get that done.
 
I like both of these better than negative transition function and I agree negative implies arithmetic rather than opposite.
Out of these 2 I do think inverse fits better than reverse, so I guess that would make it "inverse aggregate transition function". 
Would that make the CREATE AGGREGATE option be INVFUNC ?
 
Any other ideas or +1's for any of the existing ones?

Regards

David Rowley

--
Ants Aasma


Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Fetter
Date:
On Mon, Dec 16, 2013 at 08:39:33PM +1300, David Rowley wrote:
> On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:
> 
> > On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> > > David Rowley <dgrowleyml@gmail.com> writes:
> > > > I've attached an updated patch which includes some documentation.
> > > > I've also added support for negfunc in CREATE AGGREGATE. Hopefully
> > that's
> > > > an ok name for the option, but if anyone has any better ideas please
> > let
> > > > them be known.
> > >
> > > I'd be a bit inclined to build the terminology around "reverse" instead
> > of
> > > "negative" --- the latter seems a bit too arithmetic-centric.  But that's
> > > just MHO.
> >
> > To contribute to the bike shedding, inverse is often used in similar
> > contexts.
> >
> I guess it's not really bike shedding, most of the work I hope is done, so
> I might as well try to get the docs polished up and we'd need a consensus
> on what we're going to call them before I can get that done.
> 
> I like both of these better than negative transition function and I agree
> negative implies arithmetic rather than opposite.
> Out of these 2 I do think inverse fits better than reverse, so I guess that
> would make it "inverse aggregate transition function".
> Would that make the CREATE AGGREGATE option be INVFUNC ?
> 
> Any other ideas or +1's for any of the existing ones?

+1 for inverse.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Hannu Krosing
Date:
On 12/16/2013 08:39 AM, David Rowley wrote:
On Mon, Dec 16, 2013 at 6:00 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Dec 15, 2013 6:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > I've attached an updated patch which includes some documentation.
> > I've also added support for negfunc in CREATE AGGREGATE. Hopefully that's
> > an ok name for the option, but if anyone has any better ideas please let
> > them be known.
>
> I'd be a bit inclined to build the terminology around "reverse" instead of
> "negative" --- the latter seems a bit too arithmetic-centric.  But that's
> just MHO.

To contribute to the bike shedding, inverse is often used in similar contexts.

I guess it's not really bike shedding, most of the work I hope is done, so I might as well try to get the docs polished up and we'd need a consensus on what we're going to call them before I can get that done.
 
I like both of these better than negative transition function and I agree negative implies arithmetic rather than opposite.
Out of these 2 I do think inverse fits better than reverse, so I guess that would make it "inverse aggregate transition function". 
Would that make the CREATE AGGREGATE option be INVFUNC ?
 
Any other ideas or +1's for any of the existing ones?
+1, inverse good :)

Regards

David Rowley

--
Ants Aasma




-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Heikki Linnakangas
Date:
On 12/15/2013 03:57 AM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> I think even the FLOAT case deserves some consideration.  What's the
>> worst-case drift?
>
> Complete loss of all significant digits.
>
> The case I was considering earlier of single-row windows could be made
> safe (I think) if we apply the negative transition function first, before
> incorporating the new row(s).  Then for example if you've got float8 1e20
> followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer.
> It's not so good with two-row windows though:
>
>      Table    correct sum of        negative-transition
>              this + next value    result
>      1e20    1e20            1e20 + 1 = 1e20
>      1        1            1e20 - 1e20 + 0 = 0
>      0
>
>> In general, folks who do aggregate operations on
>> FLOATs aren't expecting an exact answer, or one which is consistent
>> beyond a certain number of significant digits.
>
> Au contraire.  People who know what they're doing expect the results
> to be what an IEEE float arithmetic unit would produce for the given
> calculation.  They know how the roundoff error ought to behave, and they
> will not thank us for doing a calculation that's not the one specified.
> I will grant you that there are plenty of clueless people out there
> who *don't* know this, but they shouldn't be using float arithmetic
> anyway.
>
>> And Dave is right: how many bug reports would we get about "NUMERIC is
>> fast, but FLOAT is slow"?
>
> I've said this before, but: we can make it arbitrarily fast if we don't
> have to get the right answer.  I'd rather get "it's slow" complaints
> than "this is the wrong answer" complaints.

There's another technique we could use which doesn't need a negative 
transition function, assuming the order you feed the values to the 
aggreate function doesn't matter: keep subtotals. For example, if the 
window first contains values 1, 2, 3, 4, you calculate 3 + 4 = 7, and 
then 1 + 2 + 7 = 10. Next, 1 leaves the window, and 5 enters it. Now you 
calculate  2 + 7 + 5 = 14. By keeping the subtotal (3 + 4 = 7) around, 
you saved one addition compared to calculating 2 + 3 + 4 + 5 from scratch.

The negative transition function is a lot simpler and faster for 
count(*) and integer operations, so we probably should implement that 
anyway. But the subtotals technique could be very useful for other data 
types.

- Heikki



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Dec 16, 2013 at 9:39 PM, Heikki Linnakangas
<spandir="ltr"><<a href="mailto:hlinnakangas@vmware.com" target="_blank">hlinnakangas@vmware.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div
class=""><divclass="h5"><br /></div></div> There's another technique we could use which doesn't need a negative
transitionfunction, assuming the order you feed the values to the aggreate function doesn't matter: keep subtotals. For
example,if the window first contains values 1, 2, 3, 4, you calculate 3 + 4 = 7, and then 1 + 2 + 7 = 10. Next, 1
leavesthe window, and 5 enters it. Now you calculate  2 + 7 + 5 = 14. By keeping the subtotal (3 + 4 = 7) around, you
savedone addition compared to calculating 2 + 3 + 4 + 5 from scratch.<br /><br /> The negative transition function is a
lotsimpler and faster for count(*) and integer operations, so we probably should implement that anyway. But the
subtotalstechnique could be very useful for other data types.<span class=""><font color="#888888"><br /><br /> -
Heikki<br/></font></span></blockquote></div><br /></div><div class="gmail_extra">That's quite interesting. I guess we
wouldneed another flag in pg_aggregate to mark if the order of the tuples matters, string_agg would be an example of
onethat would have to skip this.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">At least for ROWS
BETWEENCURRENT ROW AND UNBOUNDED FOLLOWING if the aggregate order did not matter than it would likely be quite
efficientjust to aggregate from the bottom up and materialise the results at each tuple and store it in the tuple
store,then just use that materialised value when that tuple is processed. This method won't work when the frame base is
notfixed.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">I had been thinking ahead on how to
improveMIN and MAX cases too. I came up with something called "tuple store indexes" that could be build as binary
searchtrees with a composite index on the tuple position and the aggregate's sort operator... Something similar to how
thefollowing query could use an index on (id,value) to calculate max()</div><div class="gmail_extra">select max(value)
fromtest where id between 1 and 100;<br /></div><div class="gmail_extra">It's certainly not something for this patch,
butit was an idea I came up with which I think would be possible without adding any more columns to
pg_aggregate.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Regards</div><div
class="gmail_extra"><br/></div><div class="gmail_extra">David Rowley</div><div class="gmail_extra"><br /></div></div> 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 15, 2013 at 12:17 PM, David Rowley <dgrowleyml@gmail.com> wrote:

One thing that is currently on my mind is what to do when passing volatile functions to the aggregate. Since the number of times we execute a volatile function will much depend on the window frame options, I think we should include some sort of warning in the documentation that "The number of times that the expression is evaluated within the aggregate function when used in the context of a WINDOW is undefined". The other option would be to disable this optimisation if the aggregate expression contained a volatile function, but doing this, to me seems a bit weird as is anyone actually going to be depending on a volatile function being executed so many times?



I just wanted to bring this back into people's minds.
During writing this patch I found and removed a comment which was a todo item to implement these negative transition functions. This comment said about maybe disallowing the use of these if the expression of the function contained a volatile function. I wondered why this was important and I still don't really see why we would disallow this only to enforce that we call that function an undefined number of times anyway.

nextval was the only volatile function that I could think of that would allow me to give an example which was easy to understand what is going on here.

CREATE SEQUENCE test_seq;
SELECT currval('test_seq'),
       COUNT(*) OVER (ORDER BY x.x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
       SUM(nextval('test_seq')) OVER (ORDER BY x.x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM generate_series(1,2) x (x);
DROP SEQUENCE test_seq;

The results are:
 currval | count | sum
---------+-------+-----
       2 |     2 |   3
       3 |     1 |   3

I've not looked to see if the spec has anything about this but, the first row will have sum as 1+2 then the 2nd row will just have 1 row to aggregate and the value will be 3 due to nextval returning 3, I could see an argument that the results for this should actually be:

 currval | count | sum
---------+-------+-----
       2 |     2 |   3
       3 |     1 |   2 

If it was then the solution would have to be to materalise the expression by evaluating it once for each tuple which sounds like a big change. I thought maybe if we're going to be playing around with the number of times these expressions are evaluated then we should stick a node in the docs to tell our users not to depend on this.

Something like the attached maybe.
Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
>> One thing that is currently on my mind is what to do when passing volatile
>> functions to the aggregate. Since the number of times we execute a volatile
>> function will much depend on the window frame options, I think we should
>> include some sort of warning in the documentation that "The number of times
>> that the expression is evaluated within the aggregate function when used in
>> the context of a WINDOW is undefined". The other option would be to disable
>> this optimisation if the aggregate expression contained a volatile
>> function, but doing this, to me seems a bit weird as is anyone actually
>> going to be depending on a volatile function being executed so many times?

Once again: this patch has no business changing any user-visible behavior.
That would include not changing the number of evaluations of volatile
functions.  The planner is full of places where optimizations are disabled
for volatile subexpressions, and I don't see why this should be different.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Robert Haas
Date:
On Sat, Dec 14, 2013 at 8:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>> I think you *can't* cover them for the float types; roundoff error
>>> would mean you don't get the same answers as before.
>
>> I was going to say the same thing. But then I started to wonder.... What's
>> so special about the answers we used to give? They are also subject to
>> round off and the results are already quite questionable in those cases.
>
> Well, we can't easily do better than the old answers, and the new ones
> might be arbitrarily worse.  Example: sum or average across single-row
> windows ought to be exact in any case, but it might be arbitrarily wrong
> with the negative-transition technique.
>
> More generally, this is supposed to be a performance enhancement only;
> it's not supposed to change the results.
>
> This consideration also makes me question whether we should apply the
> method for NUMERIC.  Although in principle numeric addition/subtraction
> is exact, such a sequence could leave us with a different dscale than
> is returned by the existing code.  I'm not sure if changing the number of
> trailing zeroes is a big enough behavior change to draw complaints.

I tend to think it is.  I'm not sure if it's worth it, but you could
finesse this problem with a more complex transition state - keep track
of how many values with any given scale are part of the current
window.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Tue, Dec 17, 2013 at 1:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Once again: this patch has no business changing any user-visible behavior.
That would include not changing the number of evaluations of volatile
functions.  The planner is full of places where optimizations are disabled
for volatile subexpressions, and I don't see why this should be different.


My point was meant to be more along the lines of that I thought it was already broken and it perhaps should be fixed or at the very least we could warn the users about it.
I would imagine that most of those other places in the planner are to prevent extra evaluations of volatile functions? In this particular case we're already evaluating these multiple extra times when a tuple moves of the top of the frame. I would have thought that we should only evaluate the volatile function once per tuple. This is not what the current implementation does.

I don't have an issue skipping this optimisation when the aggregate's expression contain any volatile functions. I just wanted to raise my concerns about the current behaviour, which I find a bit bizarre.

Regards

David Rowley
 
                        regards, tom lane

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Tue, Dec 17, 2013 at 11:06 AM, David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, Dec 17, 2013 at 1:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Once again: this patch has no business changing any user-visible behavior.
That would include not changing the number of evaluations of volatile
functions.  The planner is full of places where optimizations are disabled
for volatile subexpressions, and I don't see why this should be different.


My point was meant to be more along the lines of that I thought it was already broken and it perhaps should be fixed or at the very least we could warn the users about it.
I would imagine that most of those other places in the planner are to prevent extra evaluations of volatile functions? In this particular case we're already evaluating these multiple extra times when a tuple moves of the top of the frame. I would have thought that we should only evaluate the volatile function once per tuple. This is not what the current implementation does.

I don't have an issue skipping this optimisation when the aggregate's expression contain any volatile functions. I just wanted to raise my concerns about the current behaviour, which I find a bit bizarre.



To improve on the example I used to try and get my point across:

These were all run on an unpatched copy of HEAD. Ignore the actual results from sum() and look at what currval() is set to after each query.

create sequence seq;
select sum(nextval('seq')) over (order by n rows between current row and unbounded following)
from generate_series(1,10) n(n);
select currval('seq');

drop sequence seq;
create sequence seq;
select sum(nextval('seq')) over (order by n)
from generate_series(1,10) n(n);
select currval('seq');

nextval() is executed 55 times with the first query and 10 times with the 2nd query. Of course this is because the current implementation requires that when a tuple moves out of scope that the whole frame be re-aggregated.
I had thought that all of the places that disabled optimisations due to there being volatile somewhere were to stop the number of executions being undefined, this case seems undefined already, or at least I can't find anywhere in the docs that says the expression will be executed this number of times. 
 
Once again, I'm not fighting to have inverse transitions uses when volatile functions are involved, I'll happily disable that. I just wanted to raise this to find out if it's intended or not and it seemed like a good thread to do it on.

Regards

David Rowley

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Mon, Dec 16, 2013 at 9:36 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
On 12/16/2013 08:39 AM, David Rowley wrote:
 
Any other ideas or +1's for any of the existing ones?
+1, inverse good :)


In the attached patch I've renamed negative to inverse. I've also disabled the inverse functions when an expression in an aggregate contains a volatile function.

Regards

David Rowley

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Tue, Dec 17, 2013 at 10:51 PM, David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, Dec 16, 2013 at 9:36 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
On 12/16/2013 08:39 AM, David Rowley wrote:
 
Any other ideas or +1's for any of the existing ones?
+1, inverse good :)


In the attached patch I've renamed negative to inverse. I've also disabled the inverse functions when an expression in an aggregate contains a volatile function.


I've attached an updated patch in which I've done some more work to the documents and also added some more sanity checking around CREATE AGGREGATE. The previous patch allowed CREATE AGGREGATE to accept a transition function which was strict and an inverse transition function which was not, or vise versa.

I've also added a bunch of regression tests around create aggregate which were missing in the previous patch.

Regards

David Rowley 

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
"Erik Rijkers"
Date:
On Sat, December 21, 2013 10:53, David Rowley wrote:

>  [inverse_transition_functions_v1.1.patch.gz ]

Hi,

I know, $subject says "WIP", but I assumed it's supposed to compile, so I tried to get this to run on linux (Centos
5.0).
gcc 4.8.2,
with

./configure  --prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.inverse --with-pgport=6554  --enable-depend
--enable-cassert --enable-debug --with-openssl --with-perl --with-libxml --with-libxslt --with-ossp-uuid

but although applying the patch and configure were OK, the compile went awry:

[...]
make[4]: Entering directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/backend'
make[4]: Nothing to be done for `submake-errcodes'.
make[4]: Leaving directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/backend'
make[3]: Leaving directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/common'
make -C catalog schemapg.h
make[3]: Entering directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/backend/catalog'
cd ../../../src/include/catalog && '/home/aardvark/perl-5.19/bin/perl' ./duplicate_oids
1220
1221
1800
1801
1970
1971
make[3]: Leaving directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/backend/catalog'
make[2]: Leaving directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src/backend'
make[1]: Leaving directory `/home/aardvark/pg_stuff/pg_sandbox/pgsql.inverse/src'
-- [2013.12.21 11:35:29 inverse] make contrib
dblink.c:55:28: fatal error: utils/fmgroids.h: No such file or directory#include "utils/fmgroids.h"
     ^
 
compilation terminated.
make[1]: *** [dblink.o] Error 1
make: *** [all-dblink-recurse] Error 2
-- make returned 2 - abort

( although that output mentions perl 5,19, I also tried with regular perl 5.18, cassert on/off, etc.; every
combination
yielded this same compile error )


Thanks,

Erik Rijkers





Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sat, Dec 21, 2013 at 11:47 PM, Erik Rijkers <er@xs4all.nl> wrote:
On Sat, December 21, 2013 10:53, David Rowley wrote:

>  [inverse_transition_functions_v1.1.patch.gz ]

Hi,

I know, $subject says "WIP", but I assumed it's supposed to compile, so I tried to get this to run on linux (Centos 5.0).
gcc 4.8.2,
with

<snip>
 
cd ../../../src/include/catalog && '/home/aardvark/perl-5.19/bin/perl' ./duplicate_oids
1220
1221
1800
1801
1970
1971
 
Thanks for the report. It seems the windows build does not check for duplicate OIDs and the linux one does.
I'll just need to change the Oids of the new functions I've added, only it seems the unused_oids perl script does not like windows.

I'll try to get you a working patch shortly.

Regards

David Rowley

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 22, 2013 at 12:12 AM, David Rowley <dgrowleyml@gmail.com> wrote:
On Sat, Dec 21, 2013 at 11:47 PM, Erik Rijkers <er@xs4all.nl> wrote:
On Sat, December 21, 2013 10:53, David Rowley wrote:

>  [inverse_transition_functions_v1.1.patch.gz ]

Hi,

I know, $subject says "WIP", but I assumed it's supposed to compile, so I tried to get this to run on linux (Centos 5.0).
gcc 4.8.2,
with

<snip>
 
cd ../../../src/include/catalog && '/home/aardvark/perl-5.19/bin/perl' ./duplicate_oids
1220
1221
1800
1801
1970
1971
 
Thanks for the report. It seems the windows build does not check for duplicate OIDs and the linux one does.
I'll just need to change the Oids of the new functions I've added, only it seems the unused_oids perl script does not like windows.

I'll try to get you a working patch shortly.


Please find attached an updated patch which should remove the duplicate OID problem you saw.

Regards

David Rowley
 
Regards

David Rowley


Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
"Erik Rijkers"
Date:
On Sat, December 21, 2013 12:38, David Rowley wrote:
> [ inverse_transition_functions_v1.2.patch.gz  ]
> Please find attached an updated patch which should remove the duplicate OID
> problem you saw.

That fixes it, thanks

There is 1 of 141 failed tests:
   window                   ... FAILED

but that's easily ignored for now.

I'll do some testing later,

Thanks

Erik Rijkers









Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 22, 2013 at 12:49 AM, Erik Rijkers <er@xs4all.nl> wrote:
On Sat, December 21, 2013 12:38, David Rowley wrote:
> [ inverse_transition_functions_v1.2.patch.gz  ]
> Please find attached an updated patch which should remove the duplicate OID
> problem you saw.

That fixes it, thanks

There is 1 of 141 failed tests:

    window                   ... FAILED


That's strange, it passes here.

Would you be able to send me the regression diff file?

Regards

David Rowley
 
but that's easily ignored for now.

I'll do some testing later,

Thanks

Erik Rijkers







Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
"Erik Rijkers"
Date:
On Sat, December 21, 2013 12:52, David Rowley wrote:
> On Sun, Dec 22, 2013 at 12:49 AM, Erik Rijkers <er@xs4all.nl> wrote:
>
>> On Sat, December 21, 2013 12:38, David Rowley wrote:
>> > [ inverse_transition_functions_v1.2.patch.gz  ]
>> > Please find attached an updated patch which should remove the duplicate
>> OID
>> > problem you saw.
>>
>> That fixes it, thanks
>>
>> There is 1 of 141 failed tests:
>>
>>     window                   ... FAILED
>>
>>
> That's strange, it passes here.
>
> Would you be able to send me the regression diff file?
>

attached...

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 22, 2013 at 1:01 AM, Erik Rijkers <er@xs4all.nl> wrote:
On Sat, December 21, 2013 12:52, David Rowley wrote:
> On Sun, Dec 22, 2013 at 12:49 AM, Erik Rijkers <er@xs4all.nl> wrote:
>
>> On Sat, December 21, 2013 12:38, David Rowley wrote:
>> > [ inverse_transition_functions_v1.2.patch.gz  ]
>> > Please find attached an updated patch which should remove the duplicate
>> OID
>> > problem you saw.
>>
>> That fixes it, thanks
>>
>> There is 1 of 141 failed tests:
>>
>>     window                   ... FAILED
>>
>>
> That's strange, it passes here.
>
> Would you be able to send me the regression diff file?
>

attached...

Thanks

This was just down to some missing trailing white space on the expected results. I've fixed these up and attached another patch.

Regards

David Rowley

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 22, 2013 at 2:42 AM, David Rowley <dgrowleyml@gmail.com> wrote:
On Sun, Dec 22, 2013 at 1:01 AM, Erik Rijkers <er@xs4all.nl> wrote:
On Sat, December 21, 2013 12:52, David Rowley wrote:
> On Sun, Dec 22, 2013 at 12:49 AM, Erik Rijkers <er@xs4all.nl> wrote:
>
>> On Sat, December 21, 2013 12:38, David Rowley wrote:
>> > [ inverse_transition_functions_v1.2.patch.gz  ]
>> > Please find attached an updated patch which should remove the duplicate
>> OID
>> > problem you saw.
>>
>> That fixes it, thanks
>>
>> There is 1 of 141 failed tests:
>>
>>     window                   ... FAILED
>>
>>
> That's strange, it passes here.
>
> Would you be able to send me the regression diff file?
>

attached...

Thanks

This was just down to some missing trailing white space on the expected results. I've fixed these up and attached another patch.


I've attached an updated patch which is re-merged to current head.

I've also done a bit more work to the patch:
Added inverse transition functions for all the stddev() type aggregates for all int types.
Added inverse transition function for SUM with bigint
Added pg_dump support.

I've yet to do SUM or AVG for numeric types due to some concern with having more digits after the decimal point after inverse transitions take place.

Regards

David Rowley


Regards

David Rowley


Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
"Erik Rijkers"
Date:
On Wed, December 25, 2013 14:49, David Rowley wrote:

> [ inverse_transition_functions_v1.5.patch.gz ]

I ran into the following problem which is, I think, NOT a problem with your patch but with my setup.  Still, if anyone
can
enlighten me on its cause I'd be thankful (it shows up every now and then for me).

I have a 200 GB dev database running under 9.4devel  that I thought I could now, for test purposes, compile a patched
postgres binary for (i.e.: a HEAD + inverse_transition_functions_v1.5.patch.gz binary), so as to avoid an initdb and
use
the existing 200 GB data.

I know the patch is valid, because a separately built, newly initdb-ed instance runs the below statement without fail.

But running  from existing 200 gb dev database I get:

$  echo "\\set VERBOSITY verbose \\\\ SELECT SUM(n::integer) OVER (ORDER BY n ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING) FROM generate_series(1, 100) g(n)" | psql
ERROR:  XX000: cache lookup failed for type 0
LOCATION:  get_typlenbyval, lsyscache.c:1895
Time: 2.752 ms

Is there a way I can use the existing database and avoid both initdb and this error?

Thanks,

Erik Rijkers


See also:
[1]

http://www.postgresql.org/message-id/flat/E5F4C5A18CAB7A4DA23080DE9CE8158603E67AA1@eaubrmw001.eapac.ericsson.se#E5F4C5A18CAB7A4DA23080DE9CE8158603E67AA1@eaubrmw001.eapac.ericsson.se











Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
"Erik Rijkers" <er@xs4all.nl> writes:
> I have a 200 GB dev database running under 9.4devel  that I thought I could now, for test purposes, compile a
patched
> postgres binary for (i.e.: a HEAD + inverse_transition_functions_v1.5.patch.gz binary), so as to avoid an initdb and
use
> the existing 200 GB data.

That's not going to work for any patch that touches the initial contents
of the system catalogs.

> Is there a way I can use the existing database and avoid both initdb and this error?

Use pg_upgrade to move the data into a newly initdb'd directory.  Note
you will need both unpatched and patched executables for this.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 15, 2013 at 2:27 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 12/14/2013 05:00 PM, Tom Lane wrote:
> This consideration also makes me question whether we should apply the
> method for NUMERIC.  Although in principle numeric addition/subtraction
> is exact, such a sequence could leave us with a different dscale than
> is returned by the existing code.  I'm not sure if changing the number of
> trailing zeroes is a big enough behavior change to draw complaints.

If we're going to disqualify NUMERIC too, we might as well bounce the
feature.  Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.

I think even the FLOAT case deserves some consideration.  What's the
worst-case drift?  In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.

And Dave is right: how many bug reports would we get about "NUMERIC is
fast, but FLOAT is slow"?


From what I can tell adding an inverse transition function to support AVG for numeric does not affect the number of trailing zeros in the results, so I've attached a patch which now has inverse transition functions to support numeric types in AVG and all of the STDDEV* aggregates. 

The function numeric_avg_accum_inv is the inverse transition function for AVG. In the attached patch I've set this to be the inverse transition function for SUM too. I know it was mentioned that having extra trailing zeros here is a change of results and could be unwanted, but I've left it in for now and I've included a failing regression test to demonstrate exactly what has changed in the hope that it may seed a discussion on what the best solution is.

Here's a quick example of what I'm talking about:

With this query I'll include a call to MAX to force the executor to not use inverse transitions.
SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
MAX(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM (VALUES(1,1.00000000000001),(2,2),(3,3)) v(i,n);
       sum        | max
------------------+-----
 6.00000000000001 |   3
                5 |   3
                3 |   3
(3 rows)

Notice lack of trailing zeros for the sum results in rows 2 and 3.

Here an inverse transition will be performed... Notice the extra trailing zeros on rows 2 and 3.
SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM (VALUES(1,1.00000000000001),(2,2),(3,3)) v(i,n);
       sum
------------------
 6.00000000000001
 5.00000000000000
 3.00000000000000
(3 rows)


My ideas so far on what to do about this:

1. Don't allow inverse transition for SUM with numeric, but since numeric_avg_accum_inv is already written for AVG numeric support we could include a note in the docs to tell users how to create a SUM aggregate for numeric that supports inverse transitions.
2. Don't worry about the trailing zeros... After all SELECT AVG(n) FROM (VALUES(2::numeric)) v(n); adds a whole bunch of trailing zeros. Only the above 2 queries shows that this behaviour can be a bit weird.
3. Invent some way to remove trailing zeros, perhaps in numeric_out. Is there a reason they are there? Note that this would affect numeric globally.

What I don't currently know is: Are there any rules about trailing zeros on numeric? I see that AVG with a single numeric produces many trailing zeros after the decimal point. Are these meant to be there or is it just a side effect of dividing on numerics?

Please note that I'm not trying to push for any of the above points. I just want to get the information I currently have out there as perhaps someone can think of something better or show a good reason for or against any of the 3 points.
 
Regards

David Rowley

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Fri, Dec 27, 2013 at 1:36 AM, David Rowley <dgrowleyml@gmail.com> wrote:
From what I can tell adding an inverse transition function to support AVG for numeric does not affect the number of trailing zeros in the results, so I've attached a patch which now has inverse transition functions to support numeric types in AVG and all of the STDDEV* aggregates. 

here's a slightly updated patch in which I've fixed up an comment that had become out-dated because of the patch. I also changed the wording in quite a few of my comments and made some changes to the docs. The only other change was an extra error check just in case window_gettupleslot was to fail to get a tuple that should always be in the tuple store.

I'm now classing the patch is not WIP anymore. I think it's ready for the commitfest.

Regards

David Rowley
Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
"Erik Rijkers"
Date:
On Thu, January 2, 2014 13:05, David Rowley wrote:
> here's a slightly updated patch
> [inverse_transition_functions_v1.8.patch.gz ]

patch applies, and compiles (although with new warnings).
But make check complains loudly: see attached.


warnings:


In file included from gram.y:13871:0:
scan.c: In function â:
scan.c:10167:23: warning: unused variable â [-Wunused-variable]
 struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; /* This var may be unused depending upon options. */
                   ^
In file included from ../../../../src/include/utils/probes.h:6:0,
                 from ../../../../src/include/pg_trace.h:15,
                 from tuplesort.c:109:
tuplesort.c: In function â:
tuplesort.c:935:44: warning: comparison between pointer and integer [enabled by default]
  TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed);
                                            ^
tuplesort.c:935:2: note: in expansion of macro â
  TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed);



I didn't yet use the patch you sent after this patch.


Thanks,

Erik Rijkers




Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Nicolas Barbier
Date:
2013/12/15 David Rowley <dgrowleyml@gmail.com>:

> I've been working on speeding up aggregate functions when used in the
> context of a window's with non fixed frame heads.

> 1. Fully implement negative transition functions for SUM and AVG.

I would like to mention that this functionality is also extremely
useful to have for the incremental maintenance of materialized views
that use aggregation (which IMHO is one of the most useful kinds).

(Simply imagine a view of the form “SELECT a, agg_function(b) FROM
table GROUP BY a”, a lot of rows in the table, a lot of rows in each
group, and changes that both remove and add new rows.)

For this to work, two things are needed:

(1) A way to apply a value normally (already supported) and inversely
(i.e., this patch) to the current “internal state” of an aggregation.

(2) A way to store the “internal state” of an aggregation in the
materialized view’s “extent” (i.e., the physical rows that represent
the view’s contents, which may or may not be slightly different from
what you get when you do SELECT * FROM matview). As (AFAIK) that state
is stored as a normal value, the maintenance code could just take the
value, store it in the extent, and next time retrieve it again and
perform normal or inverse transitions. When selecting from the
matview, the state could be retrieved, and the final function applied
to it to yield the value to be returned.

To understand (2), assume that one wants to store an AVG() in a
materialized view; To be able to update the value incrementally, one
needs to actually store the SUM() and COUNT(), and perform the
division when selecting from the materialized view. Or it could
(initially) be decided to define AVG() as “not supporting fast
incremental maintenance,” and require the user (if he/she wants fast
incremental maintenance) to put SUM() and COUNT() in the materialized
view manually, and perform the division manually when wanting to
retrieve the average.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
"Erik Rijkers"
Date:
On Thu, January 2, 2014 13:36, Erik Rijkers wrote:
> On Thu, January 2, 2014 13:05, David Rowley wrote:
>> here's a slightly updated patch
>> [inverse_transition_functions_v1.8.patch.gz ]
>
> patch applies, and compiles (although with new warnings).
> But make check complains loudly: see attached.
>
> warnings:

The TRACE_POSTGRESQL_SORT_DONE warnings were not from your patch; sorry about that. They occur on HEAD too (with a
debug
compile).

tuplesort.c:935:44: warning: comparison between pointer and integer [enabled by default]
TRACE_POSTGRESQL_SORT_DONE(state->tapeset!= NULL, spaceUsed);                                           ^
 
tuplesort.c:935:2: note: in expansion of macro â TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed);



The 'make check' failure remains a problem

The output I sent earlier today was for this configure:

./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.inverse --with-pgport=6594 \
--bindir=/var/data1/pg_stuff/pg_installations/pgsql.inverse/bin \
--libdir=/var/data1/pg_stuff/pg_installations/pgsql.inverse/lib \
--quiet --enable-depend --enable-cassert --enable-debug --with-perl \
--with-openssl --with-libxml --enable-dtrace

(and that's still repeatable)


Perhaps this helps:

with another configure:

./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.inverse --with-pgport=6594
--bindir=/var/data1/pg_stuff/pg_installations/pgsql.inverse/bin.fast
--libdir=/var/data1/pg_stuff/pg_installations/pgsql.inverse/lib.fast --quiet --enable-depend --with-perl
--with-openssl
--with-libxml

I get only this single 'make check' error:


*** /var/data1/pg_stuff/pg_sandbox/pgsql.inverse/src/test/regress/expected/window.out    2014-01-02 16:19:48.000000000
+0100
--- /var/data1/pg_stuff/pg_sandbox/pgsql.inverse/src/test/regress/results/window.out    2014-01-02 16:21:43.000000000
+0100
***************
*** 1188,1195 ****  sum ------  6.01
!     5
!     3 (3 rows)
 SELECT i,COUNT(v) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
--- 1188,1195 ----  sum ------  6.01
!  5.00
!  3.00 (3 rows)
 SELECT i,COUNT(v) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)

======================================================================


Centos 5.7
gcc 4.8.2



Thanks; and Happy New Year

Erik Rijkers




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
"Erik Rijkers" <er@xs4all.nl> writes:
> The TRACE_POSTGRESQL_SORT_DONE warnings were not from your patch; sorry about that. They occur on HEAD too (with a
debug
> compile).

> tuplesort.c:935:44: warning: comparison between pointer and integer [enabled by default]
>   TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed);
>                                             ^
> tuplesort.c:935:2: note: in expansion of macro �
>   TRACE_POSTGRESQL_SORT_DONE(state->tapeset != NULL, spaceUsed);


FWIW, I don't see any such warnings on either of the machines I have that
will accept --enable-dtrace.  state->tapeset is a pointer, so these
warnings look a tad bogus.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Fri, Jan 3, 2014 at 5:33 AM, Erik Rijkers <er@xs4all.nl> wrote:
*** /var/data1/pg_stuff/pg_sandbox/pgsql.inverse/src/test/regress/expected/window.out   2014-01-02 16:19:48.000000000 +0100
--- /var/data1/pg_stuff/pg_sandbox/pgsql.inverse/src/test/regress/results/window.out    2014-01-02 16:21:43.000000000 +0100
***************
*** 1188,1195 ****
   sum
  ------
   6.01
!     5
!     3
  (3 rows)

  SELECT i,COUNT(v) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
--- 1188,1195 ----
   sum
  ------
   6.01
!  5.00
!  3.00
  (3 rows)


I've left those failures in for now in the hope to generate some discussion on if we can inverse transition for sum(numeric). Please see my email before the previous one for details. To fix it pg_aggregate.h just needs to be changed to remove the inverse transition for sum(numeric). 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
"Erik Rijkers"
Date:
On Thu, January 2, 2014 17:33, Erik Rijkers wrote:
> On Thu, January 2, 2014 13:36, Erik Rijkers wrote:
>> On Thu, January 2, 2014 13:05, David Rowley wrote:
>>> here's a slightly updated patch
>>> [inverse_transition_functions_v1.8.patch.gz ]
>>
>> patch applies, and compiles (although with new warnings).
>> But make check complains loudly

To figure out where this 'make check' failed, I lifted a few statements from the offending sql:
src/test/regress/sql/window.sql (see snafu.sh below).

That reliably crashes the server.  It is caused by a SUM, but only when configured like this (i.e. *not* configured
for
speed) :

$ pg_config --configure
'--prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.inverse'
'--bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.inverse/bin'
'--libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.inverse/lib' '--with-pgport=6554' '--enable-depend'
'--enable-cassert' '--enable-debug' '--with-openssl' '--with-perl' '--with-libxml' '--with-libxslt' '--with-ossp-uuid'
'--with-zlib'


$ cat snafu.sh
#!/bin/sh

echo "
--
-- WINDOW FUNCTIONS
--
  drop             TABLE if exists empsalary ;

-- CREATE TEMPORARY TABLE empsalary (  CREATE          TABLE empsalary (   depname varchar,   empno bigint,   salary
int,  enroll_date date
 
);

INSERT INTO empsalary VALUES
('develop', 10, 5200, '2007-08-01'),
('sales', 1, 5000, '2006-10-01'),
('personnel', 5, 3500, '2007-12-10'),
('sales', 4, 4800, '2007-08-08'),
('personnel', 2, 3900, '2006-12-23'),
('develop', 7, 4200, '2008-01-01'),
('develop', 9, 4500, '2008-01-01'),
('sales', 3, 4800, '2007-08-01'),
('develop', 8, 6000, '2006-10-01'),
('develop', 11, 5200, '2007-08-15');

-- SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname, salary;

" | psql

echo "select * from empsalary;" | psql

echo "  SELECT depname, empno, salary, sum(salary) OVER (PARTITION BY depname) FROM empsalary ORDER BY depname,
salary;
" | psql

$ ./snafu.sh
Timing is on.
DROP TABLE
Time: 1.093 ms
CREATE TABLE
Time: 80.161 ms
INSERT 0 10
Time: 11.964 ms
Timing is on. depname  | empno | salary | enroll_date
-----------+-------+--------+-------------develop   |    10 |   5200 | 2007-08-01sales     |     1 |   5000 |
2006-10-01personnel|     5 |   3500 | 2007-12-10sales     |     4 |   4800 | 2007-08-08personnel |     2 |   3900 |
2006-12-23develop  |     7 |   4200 | 2008-01-01develop   |     9 |   4500 | 2008-01-01sales     |     3 |   4800 |
2007-08-01develop  |     8 |   6000 | 2006-10-01develop   |    11 |   5200 | 2007-08-15
 
(10 rows)

Time: 1.854 ms
Timing is on.
connection to server was lost

So, to repeat, this runs fine on a server compiled for speed.

I haven't looked any further (whether perhaps more statements are faulty)








Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
"Erik Rijkers"
Date:
On Fri, January 3, 2014 00:09, Erik Rijkers wrote:


> connection to server was lost
>
> So, to repeat, this runs fine on a server compiled for speed.
>

I forgot to append the log messages:

2014-01-03 00:19:17.073 CET 14054 LOG:  database system is ready to accept connections
TRAP: FailedAssertion("!(!((bool) ((invtransfn_oid) != ((Oid) 0))))", File: "parse_agg.c", Line: 1255)
2014-01-03 00:19:29.605 CET 14054 LOG:  server process (PID 14143) was terminated by signal 6: Aborted
2014-01-03 00:19:29.605 CET 14054 DETAIL:  Failed process was running: SELECT depname, empno, salary, sum(salary) OVER
(PARTITION BY depname) FROM empsalary ORDER BY depname, salary;
2014-01-03 00:19:29.605 CET 14054 LOG:  terminating any other active server processes
2014-01-03 00:19:29.607 CET 14054 LOG:  all server processes terminated; reinitializing
etc. etc.





Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Fri, Jan 3, 2014 at 12:23 PM, Erik Rijkers <er@xs4all.nl> wrote:
On Fri, January 3, 2014 00:09, Erik Rijkers wrote:


> connection to server was lost
>
> So, to repeat, this runs fine on a server compiled for speed.
>

I forgot to append the log messages:

2014-01-03 00:19:17.073 CET 14054 LOG:  database system is ready to accept connections
TRAP: FailedAssertion("!(!((bool) ((invtransfn_oid) != ((Oid) 0))))", File: "parse_agg.c", Line: 1255)
2014-01-03 00:19:29.605 CET 14054 LOG:  server process (PID 14143) was terminated by signal 6: Aborted
2014-01-03 00:19:29.605 CET 14054 DETAIL:  Failed process was running: SELECT depname, empno, salary, sum(salary) OVER
(PARTITION BY depname) FROM empsalary ORDER BY depname, salary;
2014-01-03 00:19:29.605 CET 14054 LOG:  terminating any other active server processes
2014-01-03 00:19:29.607 CET 14054 LOG:  all server processes terminated; reinitializing
etc. etc.



hmm, yeah, compiling and testing a build with assets enabled... That's a good idea!
I probably should have tried that :)
I've attached another patch which should fix this problem.

The single failing SUM(numeric) regression test is still in there and it is a known failure to do the extra trailing zeros that it can now produce that would not be present in an unpatched version. It's there purely in the hope to generate some discussion about it to find out if we can use inverse transitions for sum(numeric) or not.

Regards

David Rowley
Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 15 December 2013 01:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> I think even the FLOAT case deserves some consideration.  What's the
>> worst-case drift?
>
> Complete loss of all significant digits.
>
> The case I was considering earlier of single-row windows could be made
> safe (I think) if we apply the negative transition function first, before
> incorporating the new row(s).  Then for example if you've got float8 1e20
> followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer.
> It's not so good with two-row windows though:
>
>     Table       correct sum of          negative-transition
>                 this + next value       result
>     1e20        1e20                    1e20 + 1 = 1e20
>     1           1                       1e20 - 1e20 + 0 = 0
>     0
>
>> In general, folks who do aggregate operations on
>> FLOATs aren't expecting an exact answer, or one which is consistent
>> beyond a certain number of significant digits.
>
> Au contraire.  People who know what they're doing expect the results
> to be what an IEEE float arithmetic unit would produce for the given
> calculation.  They know how the roundoff error ought to behave, and they
> will not thank us for doing a calculation that's not the one specified.
> I will grant you that there are plenty of clueless people out there
> who *don't* know this, but they shouldn't be using float arithmetic
> anyway.
>
>> And Dave is right: how many bug reports would we get about "NUMERIC is
>> fast, but FLOAT is slow"?
>
> I've said this before, but: we can make it arbitrarily fast if we don't
> have to get the right answer.  I'd rather get "it's slow" complaints
> than "this is the wrong answer" complaints.
>

Hi,

Reading over this, I realised that there is a problem with NaN
handling --- once the state becomes NaN, it can never recover. So the
results using the inverse transition function don't match HEAD in
cases like this:

create table t(a int, b numeric);
insert into t values(1,1),(2,2),(3,'NaN'),(4,3),(5,4);
select a, b,      sum(b) over(order by a rows between 1 preceding and current row) from t;

which in HEAD produces:
a |  b  | sum
---+-----+-----1 |   1 |   12 |   2 |   33 | NaN | NaN4 |   3 | NaN5 |   4 |   7
(5 rows)

but with this patch produces:
a |  b  | sum
---+-----+-----1 |   1 |   12 |   2 |   33 | NaN | NaN4 |   3 | NaN5 |   4 | NaN
(5 rows)

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> Reading over this, I realised that there is a problem with NaN
> handling --- once the state becomes NaN, it can never recover. So the
> results using the inverse transition function don't match HEAD in
> cases like this:

Ouch!  That takes out numeric, float4, and float8 in one fell swoop.

Given the relative infrequency of NaNs in most data, it seems like
it might still be possible to get a speedup if we could use inverse
transitions until we hit a NaN, then do it the hard way until the
NaN is outside the window, then go back to inverse transitions.
I'm not sure though if this is at all practical from an implementation
standpoint.  We certainly don't want the core code knowing about
anything as datatype-specific as a NaN, but maybe the inverse transition
function could have an API that allows reporting "I can't do it here,
fall back to the hard way".
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan9, 2014, at 17:15 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> Reading over this, I realised that there is a problem with NaN
>> handling --- once the state becomes NaN, it can never recover. So the
>> results using the inverse transition function don't match HEAD in
>> cases like this:
> 
> Ouch!  That takes out numeric, float4, and float8 in one fell swoop.

For numeric, it seems that this could be overcome by having the state
be a pair (s numeric, n numeric). s would track the sum of non-NaNs
summands and n would track the number of NaN summands. The final
function would return NaN if n > 0 and s otherwise. The pair could
be represented as a value of type numeric[] to avoid having to invent
a new type for this.

For float 4 and float8, wasn't the consensus that the potential
lossy-ness of addition makes this impossible anyway, even without the
NaN issue? But...

> Given the relative infrequency of NaNs in most data, it seems like
> it might still be possible to get a speedup if we could use inverse
> transitions until we hit a NaN, then do it the hard way until the
> NaN is outside the window, then go back to inverse transitions.
> I'm not sure though if this is at all practical from an implementation
> standpoint.  We certainly don't want the core code knowing about
> anything as datatype-specific as a NaN, but maybe the inverse transition
> function could have an API that allows reporting "I can't do it here,
> fall back to the hard way".

that sounds like it might be possible to make things work for float4
and float8 afterall, if we can determine whether a particular addition
was lossy or not.

best regards,
Florian Pflug







Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> For float 4 and float8, wasn't the consensus that the potential
> lossy-ness of addition makes this impossible anyway, even without the
> NaN issue? But...

Well, that was my opinion, I'm not sure if it was consensus ;-).
But NaN is an orthogonal problem I think.  I'm not sure whether it
has analogues in other data types.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan9, 2014, at 18:09 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> For float 4 and float8, wasn't the consensus that the potential
>> lossy-ness of addition makes this impossible anyway, even without the
>> NaN issue? But...
> 
> Well, that was my opinion, I'm not sure if it was consensus ;-).

I'd say your example showing how it could produce completely bogus
results was pretty convincing...

> But NaN is an orthogonal problem I think.  I'm not sure whether it
> has analogues in other data types.

Transfer functions which are partially invertible are not that
uncommon, I'd say. Browsing through 9.3's list of aggregate functions,
the following come to mind

max() Values smaller than the maximum can be removed, removing the current maximum requires a rescan. By remembering
theN largest values, the number of required rescans can be reduced, but never fully eliminated. Same works for min().
 

bool_or() FALSE can be removed, removing TRUE requires a rescan. Could be made fully invertible by counting the number
ofTRUE and FALSE values, similar to my suggestion for how to handle NaN for sum(numeric). Same works for bool_and().
 

bit_or() Like boo_or(), 0 can be removed, everything else requires a rescan. Same works for bit_and()

Plus, any aggregate with a strict transfer function would be in
exactly the same situation regarding NULL as sum(numeric) is regarding
NaN. AFAIK we don't have any such aggregate in core, though.

best regards,
Florian Pflug





Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Fri, Jan 10, 2014 at 4:09 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
Hi,

Reading over this, I realised that there is a problem with NaN
handling --- once the state becomes NaN, it can never recover. So the
results using the inverse transition function don't match HEAD in
cases like this:

create table t(a int, b numeric);
insert into t values(1,1),(2,2),(3,'NaN'),(4,3),(5,4);
select a, b,
       sum(b) over(order by a rows between 1 preceding and current row)
  from t;

which in HEAD produces:

 a |  b  | sum
---+-----+-----
 1 |   1 |   1
 2 |   2 |   3
 3 | NaN | NaN
 4 |   3 | NaN
 5 |   4 |   7
(5 rows)

but with this patch produces:

 a |  b  | sum
---+-----+-----
 1 |   1 |   1
 2 |   2 |   3
 3 | NaN | NaN
 4 |   3 | NaN
 5 |   4 | NaN
(5 rows)


Nice catch! Thanks for having a look at the patch.

Ok, so I thought about this and I don't think it's too big a problem at to fix it all. I think it can be handled very similar to how I'm taking care of NULL values in window frame. For these, I simply keep a count of them in an int64 and when the last one leaves the aggregate context things can continue as normal.

Lucky for us that all numeric aggregation (and now inverse aggregation) goes through 2 functions. do_numeric_accum() and the new inverse version of it do_numeric_discard(), both these functions operate on a NumericAggState which in the attached I've changed the isNaN bool field to a NaNCount int64 field. I'm just doing NaNCount++ when we get a NaN value in do_numeric_accum and NaNCount-- in do_numeric_discard(), in the final functions I'm just checking if NaNCount > 0.

Though this implementation does fix the reported problem unfortunately it may have an undesired performance impact for numeric aggregate functions when not uses in the context of a window.. Let me explain what I mean:

Previously there was some code in do_numeric_accum() which did:

if (state->isNaN || NUMERIC_IS_NAN(newval))
{
  state->isNaN = true;
  return;
}

Which meant that it didn't bother adding new perfectly valid numerics to the aggregate totals when there was an NaN encountered previously. I had to change this to continue on regardless as we still need to keep the totals just in case all the NaN values are removed and the totals are required once again. This means that the non-window version of SUM(numeric) and AVG(numeric) and and the stddev aggregates for numeric pay a price and have to keep on totaling after encountering NaN values. :(

If there was a way to know if the function was being called in a window context or a normal aggregate context then we probably almost completely restore that possible performance regression just by skipping the totaling when not in windows context. I really don't know how common NaN values are in the real world to know if this matters too much. I'd hazard a guess that more people would benefit from inverse transitions on numeric types more, but I have nothing to back that up.

I've attached version 2 of the patch which fixes the NaN problem and adds a regression test to cover it.

Thanks again for testing this and finding the problem.

Regards

David Rowley
 
Regards,
Dean

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Fri, Jan 10, 2014 at 5:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> Reading over this, I realised that there is a problem with NaN
> handling --- once the state becomes NaN, it can never recover. So the
> results using the inverse transition function don't match HEAD in
> cases like this:

Ouch!  That takes out numeric, float4, and float8 in one fell swoop.

Given the relative infrequency of NaNs in most data, it seems like
it might still be possible to get a speedup if we could use inverse
transitions until we hit a NaN, then do it the hard way until the
NaN is outside the window, then go back to inverse transitions.
I'm not sure though if this is at all practical from an implementation
standpoint.  We certainly don't want the core code knowing about
anything as datatype-specific as a NaN, but maybe the inverse transition
function could have an API that allows reporting "I can't do it here,
fall back to the hard way".


I had thought about that API, not for numeric as I think I've managed to find another solution, it was for MAX and MIN.

but it didn't generate much interest at the time and I didn't have any ideas on how the inverse aggregate functions would communicate this inability to remove the value to the caller. Perhaps it would be an idea still, but I had put it to the back of my mind in favour of tuplestore indexes that could be created on the fly based on the row position within the frame and the aggregate's sort operator on the aggregate value.
This would mean that MAX and MIN values could be found quickly all the time rather than just when the value being removed happened not to affect the current maximum or minimum. It's not something I have planned for this patch though and I'd have lots of questions around memory allocation before I'd want to start any work on it.

Regards

David Rowley
 
                        regards, tom lane

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:

bool_or()
  FALSE can be removed, removing TRUE requires a rescan. Could be made
  fully invertible by counting the number of TRUE and FALSE values,
  similar to my suggestion for how to handle NaN for sum(numeric).
  Same works for bool_and().

bit_or()
  Like boo_or(), 0 can be removed, everything else requires a rescan.
  Same works for bit_and()


Interesting, I'd not thought of any way to optimise these ones, but I had originally thought about allowing the inverse transition functions to report if they could perform the inverse transition based on the value they received and if they reported failure, then perform the rescan.
I just don't quite know yet the base way for the inverse transition function to communicate this to the caller yet. If you have any ideas on the best way to do this then I'd really like to hear them.

Regards

David Rowley

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 10 January 2014 08:12, David Rowley <dgrowleyml@gmail.com> wrote:
> On Fri, Jan 10, 2014 at 4:09 AM, Dean Rasheed <dean.a.rasheed@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Reading over this, I realised that there is a problem with NaN
>> handling --- once the state becomes NaN, it can never recover. So the
>> results using the inverse transition function don't match HEAD in
>> cases like this:
>>
>> create table t(a int, b numeric);
>> insert into t values(1,1),(2,2),(3,'NaN'),(4,3),(5,4);
>> select a, b,
>>        sum(b) over(order by a rows between 1 preceding and current row)
>>   from t;
>>
>> which in HEAD produces:
>>
>>  a |  b  | sum
>> ---+-----+-----
>>  1 |   1 |   1
>>  2 |   2 |   3
>>  3 | NaN | NaN
>>  4 |   3 | NaN
>>  5 |   4 |   7
>> (5 rows)
>>
>> but with this patch produces:
>>
>>  a |  b  | sum
>> ---+-----+-----
>>  1 |   1 |   1
>>  2 |   2 |   3
>>  3 | NaN | NaN
>>  4 |   3 | NaN
>>  5 |   4 | NaN
>> (5 rows)
>>
>
> Nice catch! Thanks for having a look at the patch.
>
> Ok, so I thought about this and I don't think it's too big a problem at to
> fix it all. I think it can be handled very similar to how I'm taking care of
> NULL values in window frame. For these, I simply keep a count of them in an
> int64 and when the last one leaves the aggregate context things can continue
> as normal.
>
> Lucky for us that all numeric aggregation (and now inverse aggregation) goes
> through 2 functions. do_numeric_accum() and the new inverse version of it
> do_numeric_discard(), both these functions operate on a NumericAggState
> which in the attached I've changed the isNaN bool field to a NaNCount int64
> field. I'm just doing NaNCount++ when we get a NaN value in do_numeric_accum
> and NaNCount-- in do_numeric_discard(), in the final functions I'm just
> checking if NaNCount > 0.
>

Cool, that sounds like a neat fix.


> Though this implementation does fix the reported problem unfortunately it
> may have an undesired performance impact for numeric aggregate functions
> when not uses in the context of a window.. Let me explain what I mean:
>
> Previously there was some code in do_numeric_accum() which did:
>
> if (state->isNaN || NUMERIC_IS_NAN(newval))
> {
>   state->isNaN = true;
>   return;
> }
>
> Which meant that it didn't bother adding new perfectly valid numerics to the
> aggregate totals when there was an NaN encountered previously. I had to
> change this to continue on regardless as we still need to keep the totals
> just in case all the NaN values are removed and the totals are required once
> again. This means that the non-window version of SUM(numeric) and
> AVG(numeric) and and the stddev aggregates for numeric pay a price and have
> to keep on totaling after encountering NaN values. :(
>

I suspect that NaNs almost never occur in practice, so the fact that
it might now take longer to tell you that the sum is NaN doesn't worry
me. More important is that it always gives the right answer.

Note, if anyone ever did this for floats, +/- Infinity would also need
to be handled, so you'd have to maintain 3 counts and deal with logic
like Infinity - Infinity = NaN.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan10, 2014, at 09:34 , David Rowley <dgrowleyml@gmail.com> wrote:
> I just don't quite know yet the base way for the inverse transition function to communicate this to the caller yet.
Ifyou have any ideas on the best way to do this then I'd really like to hear them. 

Could they maybe just return NULL as the new state? It would mean that
aggregates that do want to provide an inverse transition function couldn't
use NULL as a valid aggregate state, but do we need to support that?

Looking at the code it seems that for quite a few existing aggregates,
the state remains NULL until the first non-NULL input is processed. But
that doesn't hurt much - those aggregates can remain as they are until
someone wants to add an inverse transfer function. Once that happens,
there's a choice between living with needless rescans on trailing NULL
inputs or changing the state type.

This solution isn't particularly pretty, but I don't currently see a good
alternative that allows implementing inverse transfer functions is something
other than C and avoid needless overhead for those which are written in C.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> On Jan10, 2014, at 09:34 , David Rowley <dgrowleyml@gmail.com> wrote:
>> I just don't quite know yet the base way for the inverse transition function to communicate this to the caller yet.
Ifyou have any ideas on the best way to do this then I'd really like to hear them.
 

> Could they maybe just return NULL as the new state? It would mean that
> aggregates that do want to provide an inverse transition function couldn't
> use NULL as a valid aggregate state, but do we need to support that?

Yeah, I was going to suggest the same.  Seems like it wouldn't be that
difficult to come up with some alternative representation for "everything
seen so far was NULL", if you needed to.

> Looking at the code it seems that for quite a few existing aggregates,
> the state remains NULL until the first non-NULL input is processed. But
> that doesn't hurt much - those aggregates can remain as they are until
> someone wants to add an inverse transfer function. Once that happens,
> there's a choice between living with needless rescans on trailing NULL
> inputs or changing the state type.

Also, it might be reasonable for both the regular and the inverse
transition functions to be strict.  If a null entering the window
does not matter, then a null exiting the window doesn't either, no?
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan10, 2014, at 15:49 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> Looking at the code it seems that for quite a few existing aggregates,
>> the state remains NULL until the first non-NULL input is processed. But
>> that doesn't hurt much - those aggregates can remain as they are until
>> someone wants to add an inverse transfer function. Once that happens,
>> there's a choice between living with needless rescans on trailing NULL
>> inputs or changing the state type.
>
> Also, it might be reasonable for both the regular and the inverse
> transition functions to be strict.  If a null entering the window
> does not matter, then a null exiting the window doesn't either, no?

That's not true, I think, unless we're special-casing strict transition
functions somewhere. AFAICS, an aggregate with a strict transition function
will produce the state NULL whenever any of the inputs was NULL, i.e. we won't
ever transition out of the NULL state once we got there. The inverse
transition function, however, would *have* to be able to transition away
from the NULL state, which requires it to be non-strict. Otherwise, how would
the state aver become non-NULL after the last NULL input leaves the window?

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> On Jan10, 2014, at 15:49 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, it might be reasonable for both the regular and the inverse
>> transition functions to be strict.  If a null entering the window
>> does not matter, then a null exiting the window doesn't either, no?

> That's not true, I think, unless we're special-casing strict transition
> functions somewhere. AFAICS, an aggregate with a strict transition function
> will produce the state NULL whenever any of the inputs was NULL, i.e. we won't
> ever transition out of the NULL state once we got there.

Nope, not the case; read xaggr.sgml and/or the CREATE AGGREGATE reference
page.  An aggregate with a strict transition function essentially just
ignores null input rows.  I suspect the inverse transition function could
just be made strict with a similar special-case rule (viz, keep the old
transition value when deleting a null input from the window); but maybe
I'm missing something and it has to work harder than that anyway.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Kevin Grittner
Date:
Florian Pflug <fgp@phlo.org> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Florian Pflug <fgp@phlo.org> writes:
>>> For float 4 and float8, wasn't the consensus that the potential
>>> lossy-ness of addition makes this impossible anyway, even
>>> without the NaN issue? But...
>>
>> Well, that was my opinion, I'm not sure if it was consensus ;-).
>
> I'd say your example showing how it could produce completely
> bogus results was pretty convincing...

Aggregates on approximate (floating-point) numbers are not nearly
as consistent as many people probably assume.  Picture for a minute
a table where a column contains positive floating point numbers
happen to be located in the heap in increasing order, perhaps as
the result of a CLUSTER on an index on the column.  SELECT
sum(colname) FROM tablename; would tend to give the most accurate
answer possible when a sequence scan was run -- unless there
happened to be a seqscan already half-way through the heap.  Then
the result would be different.  So the exact same query against the
exact same data, with no intervening modifications or maintenance
activity could give one answer most of the time, and give various
other answers depending on concurrent SELECT queries.

Given that this is already the case with aggregates on floating
point approximate numbers, why should we rule out an optimization
which only makes rounding errors more likely to be visible?  The
real issue here is that if you are using an approximate data type
and expecting exact answers, you will have problems.

That's not to say that approximations are useless.  If you
represent the circumference of the earth with a double precision
number you're dealing with an expected rounding error of about a
foot.  That's close enough for many purposes.  The mistake is
assuming that it will be exact or that rounding errors cannot
accumulate.  In situations where SQL does not promise particular
ordering of operations, it should not be assumed; so any
expectations of a specific or repeatable result from a sum or
average of approximate numbers is misplaced.

>> But NaN is an orthogonal problem I think.

Agreed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Aggregates on approximate (floating-point) numbers are not nearly
> as consistent as many people probably assume.� Picture for a minute
> a table where a column contains positive floating point numbers
> happen to be located in the heap in increasing order, perhaps as
> the result of a CLUSTER on an index on the column.� SELECT
> sum(colname) FROM tablename; would tend to give the most accurate
> answer possible when a sequence scan was run -- unless there
> happened to be a seqscan already half-way through the heap.� Then
> the result would be different.

I don't think that argument holds any water.  In the first place, somebody
could turn off synchronize_seqscans if they needed to have the calculation
done the same way every time (and I recall questions from users who ended
up doing exactly that, shortly after we introduced synchronize_seqscans).
In the second place, for most use-cases it'd be pretty foolish to rely on
physical heap order, so somebody who was really trying to sum float8s
accurately would likely do
      select sum(x order by x) from ...

This is a well-defined, numerically stable calculation, and I don't want
to see us put in non-defeatable optimizations that break it.

> The real issue here is that if you are using an approximate data type
> and expecting exact answers, you will have problems.

That's a canard.  People who know what they're doing (admittedly a
minority) do not expect exact answers, but they do expect to be able to
specify how to do the calculation in a way that minimizes roundoff errors.
The inverse-transition-function approach breaks that, and it does so at a
level where the user can't work around it, short of building his own
aggregates.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
I wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> The real issue here is that if you are using an approximate data type
>> and expecting exact answers, you will have problems.

> That's a canard.  People who know what they're doing (admittedly a
> minority) do not expect exact answers, but they do expect to be able to
> specify how to do the calculation in a way that minimizes roundoff errors.
> The inverse-transition-function approach breaks that, and it does so at a
> level where the user can't work around it, short of building his own
> aggregates.

Although, having said that ... maybe "build your own aggregate" would
be a reasonable suggestion for people who need this?  I grant that
it's going to be a minority requirement, maybe even a small minority
requirement.  People who have the chops to get this sort of thing right
can probably manage a custom aggregate definition.

The constraint this would pose on the float4 and float8 implementations
is that it be possible to use their transition and final functions in
a custom aggregate declaration while leaving off the inverse function;
or, if that combination doesn't work for some reason, we have to continue
to provide the previous transition/final functions for use in user
aggregates.

Suitable documentation would be needed too, of course.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan10, 2014, at 18:14 , Kevin Grittner <kgrittn@ymail.com> wrote:
> Given that this is already the case with aggregates on floating
> point approximate numbers, why should we rule out an optimization
> which only makes rounding errors more likely to be visible?  The
> real issue here is that if you are using an approximate data type
> and expecting exact answers, you will have problems.

Because without the optimization, only the values which you
*actually* process for a given result determine whether you lose
precision or not. With the optimization, OTOH, values which have
*nothing* to do with the result in question can nevertheless
make it completely bogus.

SUM() is a good example. As long as all your values are positive,
the amount of precision you lose is bound by the number of input
values. If I sum over 10 values, the worst that can happen is that
the first values is large enough to prevent the other 9 values from
influencing the result. That limits the relative error to something
like 9*epsilon, where epsilon is the relative precision of the
floating point type, i.e. 1e-15 or so for double. In other words,
as long as your frames are less than 10e13 rows long, the relative
error will stay below 1%.

But with the optimization, that is no longer true. If you sum
from, say, CURRENT ROW to UNBOUNDED FOLLOWING, the relative error
of the result in one row now depends on the magnitude of values
*preceding* that row, even though that value isn't in the frame.
And since we now internally subtract, not only add, the relative
error is no longer bound by the number of rows in the frame.

Here's the corresponding SELECT (which is basically the same
as Tom's example upthread):select  n,  x::float,  sum(x::float) over (   order by n rows between current row and
unboundedfollowing  ) from (values   (1, 1e20),   (2, 1),   (3, 2) ) as t(n, x) order by n; 

Currently that returns  n |   x   |  sum   ---+-------+-------  1 | 1e+20 | 1e+20  2 |     1 |     3  3 |     2 |     2

but with an inverse transfer function, it may very well return  n |   x   |  sum   ---+-------+-------  1 | 1e+20 |
1e+20 2 |     1 |     0  3 |     2 |    -1 

> That's not to say that approximations are useless.  If you
> represent the circumference of the earth with a double precision
> number you're dealing with an expected rounding error of about a
> foot.  That's close enough for many purposes.  The mistake is
> assuming that it will be exact or that rounding errors cannot
> accumulate.  In situations where SQL does not promise particular
> ordering of operations, it should not be assumed; so any
> expectations of a specific or repeatable result from a sum or
> average of approximate numbers is misplaced.

But this isn't about ordering, it's replacing one computation
with a completely different one that just happens to be equivalent
*algebraically*.

To me, the proposed optimization for float is akin to C compiler
which decided to evaluate a + b + c + … z
as -a + (2a - b) + (2b - c) + … + (2y - z) + 2z
Algebraically, these are the same, but it'd still be insane to
do that.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan10, 2014, at 19:08 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Kevin Grittner <kgrittn@ymail.com> writes:
>>> The real issue here is that if you are using an approximate data type
>>> and expecting exact answers, you will have problems.
> 
>> That's a canard.  People who know what they're doing (admittedly a
>> minority) do not expect exact answers, but they do expect to be able to
>> specify how to do the calculation in a way that minimizes roundoff errors.
>> The inverse-transition-function approach breaks that, and it does so at a
>> level where the user can't work around it, short of building his own
>> aggregates.
> 
> Although, having said that ... maybe "build your own aggregate" would
> be a reasonable suggestion for people who need this?  I grant that
> it's going to be a minority requirement, maybe even a small minority
> requirement.  People who have the chops to get this sort of thing right
> can probably manage a custom aggregate definition.

So we'd put a footgun into the hands of people who don't know what they're
doing, to be fired for performance's sake, and leave it to the people
who know what they are doing to put the safety on?

best regards,
Florian Pflug



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> On Jan10, 2014, at 19:08 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Although, having said that ... maybe "build your own aggregate" would
>> be a reasonable suggestion for people who need this?  I grant that
>> it's going to be a minority requirement, maybe even a small minority
>> requirement.  People who have the chops to get this sort of thing right
>> can probably manage a custom aggregate definition.

> So we'd put a footgun into the hands of people who don't know what they're
> doing, to be fired for performance's sake, and leave it to the people
> who know what they are doing to put the safety on?

If I may put words in Kevin's mouth, I think his point is that having
float8 sum() at all is a foot-gun, and that's hard to deny.  You need
to know how to use it safely.

A compromise compromise might be to provide these alternative "safer"
aggregates built-in.  Or, depending on what color you like your bikeshed,
leave the standard aggregates alone and define "fast_sum" etc for the less
safe versions.  In any case it'd be incumbent on us to document the
tradeoffs.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Mark Dilger
Date:
FYI, I'm using the verb "rewind" to talk about using the negative transition aggregation function to get a prior value.  I don't know if this is the right verb.

Conceptually, when aggregating over floating point numbers, there is some infinitely precise theoretical value, and the computation is approximating it.  Call the infinitely precise value 'r'.  Call the computed value 'c', which is the result of the aggregation function.  (For float4_agg and float8_agg, typeof(c) == float8).

The problem you have with rewinding an aggregate is that you don't know if you are getting the same value of c that you would have gotten from a rescan.  But if you have a type that tracks a margin [min,max] where typeof(min) == typeof(max) is higher precision than typeof(c), then you can track:

min <= r <= max

By setting the rounding mode down, then up, when computing the next value of min and max, respectively.  (Extra flag bits or booleans could track whether you've encountered +inf, -inf, Nan, and any other oddball cases, with corresponding special logic that has been discussed already upthread.)  In many but not all cases:

min != max

but

(typeof(c))min == (typeof(c))max

Because the margin of error is small enough not to render different values when cast to the lower precision typeof(c).

You could rewind the aggregation whenever this second case holds, and only force a rescan when it does not.  This would render the same results for queries whether they were performed with rewinds or with rescans.  The results might differ from older versions of postgres, but only in that they might be more accurate, with less accumulated rounding errors, owing to the higher precision state transition variable.

For many modern platforms, typeof(min) could be __float128 using libquadmath, or something similar to that.  If not available at compile time, it could be float64 instead.  Even then, you'd still know that rewinding was possible when min == max and not otherwise, which is useful for cases of aggregation over exact values.

I admit I've done a bit of handwaving on the computation of the margin and the handling of floating-point rounding issues, but I believe the implementation details are tractible.


mark


On Friday, January 10, 2014 10:10 AM, Florian Pflug <fgp@phlo.org> wrote:
On Jan10, 2014, at 18:14 , Kevin Grittner <kgrittn@ymail.com> wrote:
> Given that this is already the case with aggregates on floating
> point approximate numbers, why should we rule out an optimization
> which only makes rounding errors more likely to be visible?  The
> real issue here is that if you are using an approximate data type
> and expecting exact answers, you will have problems.

Because without the optimization, only the values which you
*actually* process for a given result determine whether you lose
precision or not. With the optimization, OTOH, values which have
*nothing* to do with the result in question can nevertheless
make it completely bogus.

SUM() is a good example. As long as all your values are positive,
the amount of precision you lose is bound by the number of input
values. If I sum over 10 values, the worst that can happen is that
the first values is large enough to prevent the other 9 values from
influencing the result. That limits the relative error to something
like 9*epsilon, where epsilon is the relative precision of the
floating point type, i.e. 1e-15 or so for double. In other words,
as long as your frames are less than 10e13 rows long, the relative
error will stay below 1%.

But with the optimization, that is no longer true. If you sum
from, say, CURRENT ROW to UNBOUNDED FOLLOWING, the relative error
of the result in one row now depends on the magnitude of values
*preceding* that row, even though that value isn't in the frame.
And since we now internally subtract, not only add, the relative
error is no longer bound by the number of rows in the frame.

Here's the corresponding SELECT (which is basically the same
as Tom's example upthread):
select
  n,
  x::float,
  sum(x::float) over (
    order by n rows between current row and unbounded following
  )
  from (values
    (1, 1e20),
    (2, 1),
    (3, 2)
  ) as t(n, x)
  order by n;

Currently that returns
  n |  x  |  sum 
  ---+-------+-------
  1 | 1e+20 | 1e+20
  2 |    1 |    3
  3 |    2 |    2

but with an inverse transfer function, it may very well return
  n |  x  |  sum 
  ---+-------+-------
  1 | 1e+20 | 1e+20
  2 |    1 |    0
  3 |    2 |    -1

> That's not to say that approximations are useless.  If you
> represent the circumference of the earth with a double precision
> number you're dealing with an expected rounding error of about a
> foot.  That's close enough for many purposes.  The mistake is
> assuming that it will be exact or that rounding errors cannot
> accumulate.  In situations where SQL does not promise particular
> ordering of operations, it should not be assumed; so any
> expectations of a specific or repeatable result from a sum or
> average of approximate numbers is misplaced.

But this isn't about ordering, it's replacing one computation
with a completely different one that just happens to be equivalent
*algebraically*.

To me, the proposed optimization for float is akin to C compiler
which decided to evaluate
  a + b + c + … z
as
  -a + (2a - b) + (2b - c) + … + (2y - z) + 2z
Algebraically, these are the same, but it'd still be insane to
do that.

best regards,

Florian Pflug



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sat, Jan 11, 2014 at 7:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Although, having said that ... maybe "build your own aggregate" would
be a reasonable suggestion for people who need this?  I grant that
it's going to be a minority requirement, maybe even a small minority
requirement.  People who have the chops to get this sort of thing right
can probably manage a custom aggregate definition.


I more or less wrote off the idea of inverse transition functions after your example upthread. I had thought that perhaps if we could get inverse transitions in there for SUM(numeric) then people who need more speed could just cast their value to numeric then back to float or double precision after aggregation takes place. I had to delay writing any documentation around that as I'm still not sure if we can have sum(numeric) use an inverse transition function due to the fact that it can introduce extra zeros after the decimal point.

As the patch stands at the moment, I currently have a regression test which currently fails due to these extra zeros after the decimal point:

-- This test currently fails due extra trailing 0 digits.
SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
  FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);

Patched produces:
 6.01
 5.00
 3.00
Unpatched produces:
 6.01
 5
 3


With inverse transitions this query still produces correct results, it just does not produces the numeric in the same format as it does without performing inverse transitions. Personally I'd rather focus on trying to get SUM(numeric) in there for 9.4 and maybe focus on floating point stuff at a later date as casting to numeric can be the work around for users who complain about the speed. Or if they really want they can create their own aggregate, using an existing built in function as the inverse transition, like float8_mi.

There's certain things that currently seem a big magical to me when it comes to numeric, for example I've no idea why the following query produces 20 0's after the decimal point for 1 and only 16 for 2.

select n::numeric / 1 from generate_series(1,2) g(n);

To me it does not look very consistent at all and I'm really wondering if there is some special reason why we bother including the useless zeros at the end at all. I've written a patch which gets rid of them in numeric_out, but I had not planned on posting it here in case it gets laughed off stage due to some special reason we have for keeping those zeros that I don't know about.

Can anyone explain to me why we have these unneeded zeros in numeric when the precision is not supplied?

Regards

David Rowley
 
The constraint this would pose on the float4 and float8 implementations
is that it be possible to use their transition and final functions in
a custom aggregate declaration while leaving off the inverse function;
or, if that combination doesn't work for some reason, we have to continue
to provide the previous transition/final functions for use in user
aggregates.

Suitable documentation would be needed too, of course.

                        regards, tom lane

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Jim Nasby
Date:
On 1/10/14, 1:07 PM, Tom Lane wrote:
> Florian Pflug<fgp@phlo.org>  writes:
>> >On Jan10, 2014, at 19:08 , Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>> >>Although, having said that ... maybe "build your own aggregate" would
>>> >>be a reasonable suggestion for people who need this?  I grant that
>>> >>it's going to be a minority requirement, maybe even a small minority
>>> >>requirement.  People who have the chops to get this sort of thing right
>>> >>can probably manage a custom aggregate definition.
>> >So we'd put a footgun into the hands of people who don't know what they're
>> >doing, to be fired for performance's sake, and leave it to the people
>> >who know what they are doing to put the safety on?
> If I may put words in Kevin's mouth, I think his point is that having
> float8 sum() at all is a foot-gun, and that's hard to deny.  You need
> to know how to use it safely.

And IMHO if you've got something that's going to produce bad data if you do it wrong, I'd rather that the error be as
largeas possible so that you're more likely to discover it and fix it...
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan10, 2014, at 17:46 , Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Florian Pflug <fgp@phlo.org> writes:
>> On Jan10, 2014, at 15:49 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Also, it might be reasonable for both the regular and the inverse
>>> transition functions to be strict.  If a null entering the window
>>> does not matter, then a null exiting the window doesn't either, no?
>
>> That's not true, I think, unless we're special-casing strict transition
>> functions somewhere. AFAICS, an aggregate with a strict transition function
>> will produce the state NULL whenever any of the inputs was NULL, i.e. we won't
>> ever transition out of the NULL state once we got there.
>
> Nope, not the case; read xaggr.sgml and/or the CREATE AGGREGATE reference
> page.  An aggregate with a strict transition function essentially just
> ignores null input rows.

I wasn't aware of that, sorry for the noise.

> I suspect the inverse transition function could
> just be made strict with a similar special-case rule (viz, keep the old
> transition value when deleting a null input from the window); but maybe
> I'm missing something and it has to work harder than that anyway.

This seems to hold at least if the state never becomes NULL, i.e. if there's
a non-NULL initiate state (S) and the transfer function never becomes NULL. If
the transfer function (I) and inverse transfer function (T) obey that I(T(...T(T(S,a),b)...,z), a) = T(...T(S,b)...,z)
for a non-NULL a, then setting T(s,NULL) = I(s,NULL) = s makes the same
thing work for NULL and non-NULL values for a.

If, however, a strict state function T ever returns NULL, things break.
If T(S,a) is NULL in the above, the whole expression becomes NULL because
T is strict (we only special-case the *value* input, not the state input).
But T(...T(S,b)...,z) might very well be non-NULL. That's pretty much
the same situation, unsurprisingly, as with NaNs.

A variant of that problem arises also if the initial state is NULL.
We then wait for the first non-NULL value, and use that as the initial
state. When we later apply the inverse transition function, we might or
might not have to reset the state back to NULL, depending on whether there
are other non-NULL values in the frame or not. This case can be solved by
tracking the number of non-NULL rows in the current frame - whenever
that number if 0, the the output value is NULL, otherwise it's determined
by the state.

So in conclusion, things work for pairs of strict transfer and inverse
transfer functions AFAICS, as long as we do the non-null tracking and
complain should the transfer function ever return NULL.

best regards,
Florian Pflug






Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan11, 2014, at 01:24 , Jim Nasby <jim@nasby.net> wrote:
> On 1/10/14, 1:07 PM, Tom Lane wrote:
>> Florian Pflug<fgp@phlo.org>  writes:
>>> >On Jan10, 2014, at 19:08 , Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>>> >>Although, having said that ... maybe "build your own aggregate" would
>>>> >>be a reasonable suggestion for people who need this?  I grant that
>>>> >>it's going to be a minority requirement, maybe even a small minority
>>>> >>requirement.  People who have the chops to get this sort of thing right
>>>> >>can probably manage a custom aggregate definition.
>>> >So we'd put a footgun into the hands of people who don't know what they're
>>> >doing, to be fired for performance's sake, and leave it to the people
>>> >who know what they are doing to put the safety on?
>> If I may put words in Kevin's mouth, I think his point is that having
>> float8 sum() at all is a foot-gun, and that's hard to deny.  You need
>> to know how to use it safely.
>
> And IMHO if you've got something that's going to produce bad data if you do it
> wrong, I'd rather that the error be as large as possible so that you're more
> likely to discover it and fix it...

To that principle, I agree, I just don't think it applies here. An inverse
transition function greatly *increases* the chance of bogus results for
sum(float). It also breaks some rather natural assumptions that one might
make about sum(float)'s behaviour. For example, sums over single-element
frames current return the one row's value unchanged. That's no longer true
universally true with an inverse transition function. Even for an approximate
type, that's a bid too weird for my taste.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Robert Haas
Date:
On Fri, Jan 10, 2014 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> On Jan10, 2014, at 19:08 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Although, having said that ... maybe "build your own aggregate" would
>>> be a reasonable suggestion for people who need this?  I grant that
>>> it's going to be a minority requirement, maybe even a small minority
>>> requirement.  People who have the chops to get this sort of thing right
>>> can probably manage a custom aggregate definition.
>
>> So we'd put a footgun into the hands of people who don't know what they're
>> doing, to be fired for performance's sake, and leave it to the people
>> who know what they are doing to put the safety on?
>
> If I may put words in Kevin's mouth, I think his point is that having
> float8 sum() at all is a foot-gun, and that's hard to deny.  You need
> to know how to use it safely.

Yeah, but Florian's point is that not all foot-guns are created equal.The fact that we're walking around with a loaded
BB-gunin our hip
 
pocket is not a good reason to replace it with a howitzer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan10, 2014, at 22:27 , David Rowley <dgrowleyml@gmail.com> wrote:
> As the patch stands at the moment, I currently have a regression test
> which currently fails due to these extra zeros after the decimal point:
>
> -- This test currently fails due extra trailing 0 digits.
> SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
>   FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);
>
> Patched produces:
>  6.01
>  5.00
>  3.00
> Unpatched produces:
>  6.01
>  5
>  3

Hm, that's annoying. I checked the standard to see if it offers any guidance
here, but it doesn't look like it does - the standard just says that SUM() ought
to return a type with the same scale as the input type has. That's fine if
every NUMERIC type has a fixed scale, but that's not the case in postgres -
we allow values of unconstrained NUMERIC types (i.e., numeric types without an
explicitly specified precision or scale) to each define their own scale.

To fix this, we'd have to track the maximum scale within the current frame.
That's easier than the general problem of providing an inverse transition
function for MAX, because AFAIK we limit the scale to at most 1000. So it'd
be sufficient to track the number of times we saw each scale, and also the
current maximum. Once we reduce the current maximum's count back to zero
in the inverse transition function, we'd scan from that value to the left to
find the next non-zero entry.

We could also choose to ignore this, although I'm not sure I really like that.
It seems entirely OK at first sight - after all, the values all stay the same,
we just emit a different number of trailing zeros. But it still causes results
to be affected by values, even if only in the number of trailing zeros, which
lie outside the value's range. That seems like more non-determinism than as
database should show.

> With inverse transitions this query still produces correct results, it just does
> not produces the numeric in the same format as it does without performing inverse
> transitions. Personally I'd rather focus on trying to get SUM(numeric) in there
> for 9.4

I think it'd be worthwile to get this into 9.4, if that's still an option,
even if we only support COUNT.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> I think it'd be worthwile to get this into 9.4, if that's still an option,
> even if we only support COUNT.

My thought is

(1) we can certainly implement inverse transitions for COUNT() and the
integer variants of SUM(), and that alone would be a killer feature for
some people.

(2) the float and numeric variants should be implemented under nondefault
names (I'm thinking FAST_SUM(), but bikeshed away).  People who need
extra speed and don't mind the slightly different results can alter
their queries to use these variants.

One reason I'm thinking this is that whatever we do to ameliorate
the semantic issues is going to slow down the forward transition
function --- to no benefit unless the aggregate is being used as a
window function in a moving window.  So I'm less than convinced
that we *should* implement any of these designs in the default
aggregates, even if we get to the point where we arguably *could*
do it with little risk of functional differences.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Gavin Flower
Date:
On 14/01/14 14:29, Tom Lane wrote:
[...]
> (2) the float and numeric variants should be implemented under 
> nondefault names (I'm thinking FAST_SUM(), but bikeshed away). People 
> who need extra speed and don't mind the slightly different results can 
> alter their queries to use these variants. One reason I'm thinking 
> this is that whatever we do to ameliorate the semantic issues is going 
> to slow down the forward transition function --- to no benefit unless 
> the aggregate is being used as a window function in a moving window. 
> So I'm less than convinced that we *should* implement any of these 
> designs in the default aggregates, even if we get to the point where 
> we arguably *could* do it with little risk of functional differences. 
> regards, tom lane 
How SUM_FAST() instead, then it will more likely to be close to SUM() in 
an index?


Cheers,
Gavin



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Tue, Jan 14, 2014 at 2:00 PM, Florian Pflug <fgp@phlo.org> wrote:
On Jan10, 2014, at 22:27 , David Rowley <dgrowleyml@gmail.com> wrote:
> As the patch stands at the moment, I currently have a regression test
> which currently fails due to these extra zeros after the decimal point:
>
> -- This test currently fails due extra trailing 0 digits.
> SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
>   FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);
>
> Patched produces:
>  6.01
>  5.00
>  3.00
> Unpatched produces:
>  6.01
>  5
>  3

Hm, that's annoying. I checked the standard to see if it offers any guidance
here, but it doesn't look like it does - the standard just says that SUM() ought
to return a type with the same scale as the input type has. That's fine if
every NUMERIC type has a fixed scale, but that's not the case in postgres -
we allow values of unconstrained NUMERIC types (i.e., numeric types without an
explicitly specified precision or scale) to each define their own scale.


Thanks for digging that out in the standard and thanks for all that information you supplied here http://www.postgresql.org/message-id/0FA6C08E-2166-405B-83F7-63B196B88CA3@phlo.org too. Sorry I've not had the chance to reply yet. I was kind of hoping that the answer would be more in my favour to help with inverse transitions for sum(numeric).

 
To fix this, we'd have to track the maximum scale within the current frame.
That's easier than the general problem of providing an inverse transition
function for MAX, because AFAIK we limit the scale to at most 1000. So it'd
be sufficient to track the number of times we saw each scale, and also the
current maximum. Once we reduce the current maximum's count back to zero
in the inverse transition function, we'd scan from that value to the left to
find the next non-zero entry.


I've been thinking about this, but I had thought that the maximum dscale was bigger than 1000. The docs seem to claim 16383 here --> http://www.postgresql.org/docs/devel/static/datatype-numeric.html I'd go ahead and implement this if that number was smaller, but I'm thinking zeroing out an array of 16383 elements on first call to do_numeric_accum might be too big an overhead to write off as "background noise". If it was 20 or even 100 then I'd probably try for that. 

I think the overhead for each call after that would likely be ok as it would probably just be an operation like
state->scaleCount[X.dscale]++; which I would imagine would be a very small percentage overhead on normal aggregate functions. Of course the inverse would have to do the harder work of looping backwards over the array until it found an element with the count above 0 and setting that as the current maximum. I think this would be a winner if it wasn't for the high initial hit of zeroing that 16383 element array.. Or 1000 whichever.

 
We could also choose to ignore this, although I'm not sure I really like that.
It seems entirely OK at first sight - after all, the values all stay the same,
we just emit a different number of trailing zeros. But it still causes results
to be affected by values, even if only in the number of trailing zeros, which
lie outside the value's range. That seems like more non-determinism than as
database should show.

> With inverse transitions this query still produces correct results, it just does
> not produces the numeric in the same format as it does without performing inverse
> transitions. Personally I'd rather focus on trying to get SUM(numeric) in there
> for 9.4

I think it'd be worthwile to get this into 9.4, if that's still an option,
even if we only support COUNT.

best regards,
Florian Pflug


Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Tue, Jan 14, 2014 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Florian Pflug <fgp@phlo.org> writes:
> I think it'd be worthwile to get this into 9.4, if that's still an option,
> even if we only support COUNT.

My thought is

(1) we can certainly implement inverse transitions for COUNT() and the
integer variants of SUM(), and that alone would be a killer feature for
some people.


100% Agreed about sum(int) and count, but I've so far managed only to find problems with SUM(numeric), other inverse transition functions seem just fine with numeric types. e.g. AVG(numeric) and STDDEV(numeric), I can't produce any results that differ from the unpatched head... but perhaps someone can give me a case where things could vary?
 
I've been testing with something like:
select avg(n) over w,SUM(3::float) over w
from (values(1,10::numeric),(2,0),(3,0)) v(i,n) 
window w as (order by i rows between current row and unbounded following);

where I included the sum(float) to force the old school brute force transitions to be performed. I then compared the results to the ones given without the sum(float) tagged on the end. I've so far not found anything out of place.

(2) the float and numeric variants should be implemented under nondefault
names (I'm thinking FAST_SUM(), but bikeshed away).  People who need
extra speed and don't mind the slightly different results can alter
their queries to use these variants.

I'm not as keen on this idea, as if someone later thought of a nice way to allow the inverse functions to provide exactly the same result as if they were not used then the would become redundant legacy that would likely have to be supported forever and a day, and I know we already have things like that, but of course we want to minimise that as much as possible.

If AVG(numeric) and the stddev numeric aggregates turn out to be valid then numeric_avg_accum_inv() must remain in core code to support that and this is the exact function that is required for a user to define their own SUM(numeric) with. We could leave it at that and document what a user must do to create their own FAST_SUM then perhaps leave the including these in core decision to later based maybe on the number of complaints about SUM(numeric) being slow when used in windows with a moving frame head.

(For me) It does not seem too unreasonable to think that one day we might decide that:

SELECT AVG(n) FROM (VALUES(10::NUMERIC(10,3)),(0),(0)) v(n);

return 3.333 instead of 3.3333333333333333 like it does now.

If we ever did that then we could support these numeric inverse transitions on SUM() without having to worry about weird extra trailing zeros. Instead we'd just give the user what they asked for, when they asked for it.
Reasons like that make me think that we might be jumping the gun a little on FAST_SUM() as it could end up redundant more quickly than we might initially think.


One reason I'm thinking this is that whatever we do to ameliorate
the semantic issues is going to slow down the forward transition
function --- to no benefit unless the aggregate is being used as a
window function in a moving window.  So I'm less than convinced
that we *should* implement any of these designs in the default
aggregates, even if we get to the point where we arguably *could*
do it with little risk of functional differences.


So far there's only 1 case of possible slowdown of forward transitions and that's with numeric types.
I had to change do_numeric_accum() to add a NaN Counter increment and also change the logic so that it continues summing non NaN values even after the first NaN was encountered.

Would you see this as too big a performance risk to include in core?

If not then I think we might be able to support AVG(numeric) and STDDEV(numeric) functions as the patch sits without having to worry that there might be an extra overhead on forward transitions that include NaN values.


Florian has done some really good work researching the standards around this area. His research seems to indicate that the results should be of the same scale as the inputs, which the current patch does not do, providing you assume that the user is showing us the intended scale based on the numbers that we're working with rather than a scale specified by the column or cast. Florian's idea of how to fix the scale on inverse transition seems pretty valid to me and I think the overhead of it could be made pretty minimal. I'm just not sure that I can implement it in such a way as to make the overhead small enough to look like background noise. But I'm very willing to give it a try and see...

*** some moments pass ***

I think unless anyone has some objections I'm going to remove the inverse transition for SUM(numeric) and modify the documents to tell the user how to build their own FAST_SUM(numeric) using the built in functions to do it. I'm starting to think that playing around with resetting numeric scale will turn a possible 9.4 patch into a 9.5/10.0 patch. I see no reason why what's there so far, minus sum(numeric), can't go in... 

If so then there's 36 aggregate functions ticked off my "create an inverse transition function for" list! I personally think that's a pretty good win. I'd rather do this than battle and miss deadlines for 9.4. I'd find that pretty annoying.

Regards

David Rowley

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Tue, Jan 14, 2014 at 9:09 PM, David Rowley <dgrowleyml@gmail.com> wrote:
I think unless anyone has some objections I'm going to remove the inverse transition for SUM(numeric) and modify the documents to tell the user how to build their own FAST_SUM(numeric) using the built in functions to do it. I'm starting to think that playing around with resetting numeric scale will turn a possible 9.4 patch into a 9.5/10.0 patch. I see no reason why what's there so far, minus sum(numeric), can't go in... 


Of course its only now that I discover that this is not possible to do this:

CREATE AGGREGATE fast_sum (numeric)
(
    stype = numeric,
    sfunc = numeric_avg_accum,
    invfunc = numeric_avg_accum_inv,
    finalfunc = numeric_sum
);

because SUM(numeric) uses an internal type to store the transition state.

hmmm, built-in fast_sum anyone? 
Is there any simple way to limit these to only be used in the context of a window? If so is it worth it?
Would we want fast_sum() for float too?

Regards

David Rowley
 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Tue, Jan 14, 2014 at 9:09 PM, David Rowley <dgrowleyml@gmail.com> wrote:
I think unless anyone has some objections I'm going to remove the inverse transition for SUM(numeric) and modify the documents to tell the user how to build their own FAST_SUM(numeric) using the built in functions to do it. I'm starting to think that playing around with resetting numeric scale will turn a possible 9.4 patch into a 9.5/10.0 patch. I see no reason why what's there so far, minus sum(numeric), can't go in... 

If so then there's 36 aggregate functions ticked off my "create an inverse transition function for" list! I personally think that's a pretty good win. I'd rather do this than battle and miss deadlines for 9.4. I'd find that pretty annoying.


Here's a patch which removes sum(numeric) and changes the documents a little to remove a reference to using sum(numeric) to workaround the fact that there's no inverse transitions for sum(float). I also made a small change in the aggregates.sql tests to check that the aggfnoid <= 9999.

I've also pulled the regression tests that I had added for sum(numeric) as they no longer test anything new. All tests are now passing.

With the attached patch I feel like I've left users a bit high and dry for their sum(numeric) needs. I guess there is no true workaround as even if they created their functions in SQL using simple + and - arithmetic, they would likely suffer from NaN recovery problems. I'm starting to come around to Tom's FAST_SUM idea as I simply can't see any fool proof workaround that could be created without writing things in C.

The only problems I see with the FAST_SUM idea are that the number of trailing zeros may appear a little random based on if inverse transitions are used or are not used... Keep in mind that inverse transitions are not performed if any aggregate in the window does not support them OR if any aggregate in the frame contains a volatile function in the aggregate's parameters or the FILTER (WHERE clause). Does this matter or can we just document to warn about that?

If there's a few more +1s for FAST_SUM(numeric) then let me know and I'll add it. 
If anyone feels strongly against adding FAST_SUM then please let the reasons for that known too.
Or failing that, if anyone has any other ideas that have not yet been written on this thread, please post them so we can discuss.


Regards

David Rowley
 
Regards

David Rowley


Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan14, 2014, at 11:06 , David Rowley <dgrowleyml@gmail.com> wrote:
> Here's a patch which removes sum(numeric) and changes the documents a little to remove a reference to using
sum(numeric)to workaround the fact that there's no inverse transitions for sum(float). I also made a small change in
theaggregates.sql tests to check that the aggfnoid <= 9999. 

I've looked over the patch, here a few comments.

For STRICT pairs of transfer and inverse transfer functions we should complain if any of them ever return NULL. That
cannever be correct anyway, since a STRICT function cannot undo a non-NULL -> NULL transition. 

The same goes for non-STRICT, at least if we ever want to allow an inverse transfer function to indicate "Sorry, cannot
undoin this case, please rescan". If we allowed NULL as just another state value now, we couldn't easily undo that
later,so we'd rob ourselves of the obvious way for the inverse transfer function to indicate this condition to its
caller.

The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be
constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL
inputshowever they please, no? 

The logic around movedaggbase in eval_windowaggregates() seems a bit convoluted. Couldn't the if be moved before the
codethat pulls aggregatedbase up to frameheadpos using the inverse aggregation function?  

Also, could we easily check whether the frames corresponding to the individual rows are all either identical or
disjoint,and don't use the inverse transfer function then? Currently, for a frame which contains either just the
currentrow, or all the current row's peers, I think we'd use the inverse transfer function to fully un-add the old
frame,and then add back the new frame. 

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Jim Nasby
Date:
On 1/13/14, 7:41 PM, Gavin Flower wrote:
> On 14/01/14 14:29, Tom Lane wrote:
> [...]
>> (2) the float and numeric variants should be implemented under nondefault names (I'm thinking FAST_SUM(), but
bikeshedaway). People who need extra speed and don't mind the slightly different results can alter their queries to use
thesevariants. One reason I'm thinking this is that whatever we do to ameliorate the semantic issues is going to slow
downthe forward transition function --- to no benefit unless the aggregate is being used as a window function in a
movingwindow. So I'm less than convinced that we *should* implement any of these designs in the default aggregates,
evenif we get to the point where we arguably *could* do it with little risk of functional differences. regards, tom
lane
> How SUM_FAST() instead, then it will more likely to be close to SUM() in an index?

+1. That's what I do in cases like this.
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Robert Haas
Date:
On Tue, Jan 14, 2014 at 4:16 AM, David Rowley <dgrowleyml@gmail.com> wrote:
> On Tue, Jan 14, 2014 at 9:09 PM, David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> I think unless anyone has some objections I'm going to remove the inverse
>> transition for SUM(numeric) and modify the documents to tell the user how to
>> build their own FAST_SUM(numeric) using the built in functions to do it. I'm
>> starting to think that playing around with resetting numeric scale will turn
>> a possible 9.4 patch into a 9.5/10.0 patch. I see no reason why what's there
>> so far, minus sum(numeric), can't go in...
>>
>
> Of course its only now that I discover that this is not possible to do this:
>
> CREATE AGGREGATE fast_sum (numeric)
> (
>     stype = numeric,
>     sfunc = numeric_avg_accum,
>     invfunc = numeric_avg_accum_inv,
>     finalfunc = numeric_sum
> );
>
> because SUM(numeric) uses an internal type to store the transition state.
>
> hmmm, built-in fast_sum anyone?
> Is there any simple way to limit these to only be used in the context of a
> window? If so is it worth it?
> Would we want fast_sum() for float too?

Maybe these additional "fast" functions (one might also say
"inaccurate") should live in an extension, in contrib.

It strikes me that for numeric what you really need is to just tell
the sum operation, whether through a parameter or otherwise, how many
decimal places to show in the output.  Obviously that's not a
practical change for sum() itself, but if we're inventing new stuff it
can be done.

For floats, things are not so good.  The data type is inexact by
nature, and I think cases where you get substantially wrong answers
will be common enough that people who attempt to use whatever we
devise in this area will have sum trouble.  *ducks*

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan15, 2014, at 19:56 , Robert Haas <robertmhaas@gmail.com> wrote:
> It strikes me that for numeric what you really need is to just tell
> the sum operation, whether through a parameter or otherwise, how many
> decimal places to show in the output.  Obviously that's not a
> practical change for sum() itself, but if we're inventing new stuff it
> can be done.

You can already do that, just cast the result of SUM(numeric) to
an appropriately constrained NUMERIC type, i.e. to NUMERIC(prec, scale).

BTW, AVG() and STDDEV() have the same issue. The problem is just partially
masked by the division by N (or N-1) at the end, because we always emit as
least 16 fractional digits when dividing. So you have to have an input 
value with a larger scale than that to trigger it.

For the following query
 select avg(x) over (order by i rows between current row and 1 following) from (values   (1,1), (2,1),
(3,0.000000000000000000000000000000001),(4,1), (5,1) ) t(i,x);
 

9.3 returns                avg                 
-------------------------------------
1.000000000000000000000.5000000000000000000000000000000010.500000000000000000000000000000001
1.00000000000000000000            1.00000000000000000000
 

but HEAD+patch returns                avg                 
-------------------------------------
1.000000000000000000000.5000000000000000000000000000000010.5000000000000000000000000000000011.0000000000000000000000000000000001.000000000000000000000000000000000

I have to admit that I'm *very* tempted to suggest we simply ignore this -
but that *would* mean accepting that windowed aggregates are non-
deterministic in the sense that their result (even if only in the number
of trailing zeros) depends on values outside of the frame. Which, I guess,
is a box that best stays closed...

I'm currently thinking the best way forward is to get a basic patch
without any NUMERIC stuff committed, and to revisit this after that's done.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> I'm currently thinking the best way forward is to get a basic patch
> without any NUMERIC stuff committed, and to revisit this after that's done.

+1.  I liked Robert's suggestion that the "fast" aggregates be implemented
as a contrib module rather than directly in core, too.  If we do that then
it's perfectly reasonable to handle that as a separate patch.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan14, 2014, at 17:39 , Florian Pflug <fgp@phlo.org> wrote:
> On Jan14, 2014, at 11:06 , David Rowley <dgrowleyml@gmail.com> wrote:
>> Here's a patch which removes sum(numeric) and changes the documents a little to remove a reference to using
sum(numeric)to workaround the fact that there's no inverse transitions for sum(float). I also made a small change in
theaggregates.sql tests to check that the aggfnoid <= 9999. 
>
> I've looked over the patch, here a few comments.
>
> For STRICT pairs of transfer and inverse transfer functions we should complain if any of them ever return NULL. That
cannever be correct anyway, since a STRICT function cannot undo a non-NULL -> NULL transition. 
>
> The same goes for non-STRICT, at least if we ever want to allow an inverse transfer function to indicate "Sorry,
cannotundo in this case, please rescan". If we allowed NULL as just another state value now, we couldn't easily undo
thatlater, so we'd rob ourselves of the obvious way for the inverse transfer function to indicate this condition to its
caller.
>
> The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be
constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL
inputshowever they please, no? 

I modified the patch to do this, and ran into a problem. Currently, aggregates with state type "internal" cannot have a
stricttransfer function, even if they behave like they did, i.e. ignore non-NULL inputs. This is because the only
possibleinitial value for state type "internal" is NULL, and it's up to the transfer function to create the state -
usuallyupon seeing the first non-NULL input. Now, currently that isn't a huge problem - the transfer function simply
hasto check for NULL input values itself, and if it's indeed conceptually strict, it just returns in this case. 

With inverse transfer functions, however, each such pair of forward and inverse transfer functions would also need to
duplicatethe NULL-counting logic from nodeWindowAgg.c if it want's to be conceptually strict, i.e. ignore NULL inputs,
butreturn NULL if there are *only* NULL inputs (or no inputs at all). That seems like a lot of duplicated code in the
longrun. 

In essence, what we'd want for strict pairs of forward and inverse transfer functions is for the forward transfer
functionto be strict in all arguments except the state, and the inverse to be strict according to the usual definition.
Wecan't express that property of the forward transfer function within pg_proc, but we don't really have to - a local
hackin nodeWindowAgg.c suffices. So what I'm proposing is: 

We allow the forward transfer function to be non-strict even if the inverse is strict, but only if the initial value is
NULL.In that case we behave as if the forward transfer function was strict, except that upon seeing the first non-NULL
inputwe call it with a NULL state. The return value must still be non-NULL in all cases. 

Thoughts?

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan16, 2014, at 02:32 , Florian Pflug <fgp@phlo.org> wrote:
> On Jan14, 2014, at 17:39 , Florian Pflug <fgp@phlo.org> wrote:
>> On Jan14, 2014, at 11:06 , David Rowley <dgrowleyml@gmail.com> wrote:
>>> Here's a patch which removes sum(numeric) and changes the documents a little to remove a reference to using
sum(numeric)to workaround the fact that there's no inverse transitions for sum(float). I also made a small change in
theaggregates.sql tests to check that the aggfnoid <= 9999. 
>>
>> I've looked over the patch, here a few comments.
>>
>> For STRICT pairs of transfer and inverse transfer functions we should complain if any of them ever return NULL. That
cannever be correct anyway, since a STRICT function cannot undo a non-NULL -> NULL transition. 
>>
>> The same goes for non-STRICT, at least if we ever want to allow an inverse transfer function to indicate "Sorry,
cannotundo in this case, please rescan". If we allowed NULL as just another state value now, we couldn't easily undo
thatlater, so we'd rob ourselves of the obvious way for the inverse transfer function to indicate this condition to its
caller.
>>
>> The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be
constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL
inputshowever they please, no? 
>
> I modified the patch to do this, and ran into a problem. Currently, aggregates with state type "internal" cannot have
astrict transfer function, even if they behave like they did, i.e. ignore non-NULL inputs. This is because the only
possibleinitial value for state type "internal" is NULL, and it's up to the transfer function to create the state -
usuallyupon seeing the first non-NULL input. Now, currently that isn't a huge problem - the transfer function simply
hasto check for NULL input values itself, and if it's indeed conceptually strict, it just returns in this case. 
>
> With inverse transfer functions, however, each such pair of forward and inverse transfer functions would also need to
duplicatethe NULL-counting logic from nodeWindowAgg.c if it want's to be conceptually strict, i.e. ignore NULL inputs,
butreturn NULL if there are *only* NULL inputs (or no inputs at all). That seems like a lot of duplicated code in the
longrun. 
>
> In essence, what we'd want for strict pairs of forward and inverse transfer functions is for the forward transfer
functionto be strict in all arguments except the state, and the inverse to be strict according to the usual definition.
Wecan't express that property of the forward transfer function within pg_proc, but we don't really have to - a local
hackin nodeWindowAgg.c suffices. So what I'm proposing is: 
>
> We allow the forward transfer function to be non-strict even if the inverse is strict, but only if the initial value
isNULL. In that case we behave as if the forward transfer function was strict, except that upon seeing the first
non-NULLinput we call it with a NULL state. The return value must still be non-NULL in all cases. 

Ok, I tried this and it worked out quite OK.

Updated patch is attached. It passes regression tests, but those currently don't seem to include any aggregates which
*don't*ignore NULL values, so that case is probably untested. Also, it allows non-strict forward transfer functions
togetherwith strict inverse transfer functions even for non-NULL initial states now. It seemed rather pedantic to
forbidthis. 

BTW, as it stands, the patch currently uses the inverse transition function only when *all* the aggregates belonging to
onewindow clause provide one. After working with the code a bit, I think that a bit of reshuffling would allow that
distinctionto be made on a per-aggregate basis. The question is, is it worth it? 

best regards,
Florian Pflug

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Thu, Jan 16, 2014 at 3:47 PM, Florian Pflug <fgp@phlo.org> wrote:
BTW, as it stands, the patch currently uses the inverse transition function only when *all* the aggregates belonging to one window clause provide one. After working with the code a bit, I think that a bit of reshuffling would allow that distinction to be made on a per-aggregate basis. The question is, is it worth it?


I didn't think it was all that worth it due to the fact that we'd still need to process every tuple in the frame's scope for each aggregate which has no inverse transition function, of course, there would be slightly less work to do in such cases. I guess I just assumed that work load types that would benefit from inverse transitions would have many rows instead of many aggregates and few rows.

I guess to implement the aggregated up to marker variables would just need to be per aggregate rather than per window.

If you think it would be worth it I can modify the patch to work that way.

Regards

David Rowley

 
best regards,
Florian Pflug

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Thu, Jan 16, 2014 at 9:45 AM, Florian Pflug <fgp@phlo.org> wrote:
BTW, AVG() and STDDEV() have the same issue. The problem is just partially
masked by the division by N (or N-1) at the end, because we always emit as
least 16 fractional digits when dividing. So you have to have an input
value with a larger scale than that to trigger it.

For the following query

  select avg(x) over (order by i rows between current row and 1 following)
  from (values
    (1,1), (2,1), (3,0.000000000000000000000000000000001), (4,1), (5,1)
  ) t(i,x);

9.3 returns
                 avg
-------------------------------------
              1.00000000000000000000
 0.500000000000000000000000000000001
 0.500000000000000000000000000000001
              1.00000000000000000000
              1.00000000000000000000

but HEAD+patch returns
                 avg
-------------------------------------
              1.00000000000000000000
 0.500000000000000000000000000000001
 0.500000000000000000000000000000001
 1.000000000000000000000000000000000
 1.000000000000000000000000000000000


Uhhh, that's bad news indeed. That means that I'll need to remove not only all inverse transition functions for all aggregates on numeric types, but also avg for int types, the stddev* functions for everything, since they internally use numeric. I guess that only leaves SUM for smallint, int, bigint, cash and interval, along with count(exp), count(*)... 

 
I have to admit that I'm *very* tempted to suggest we simply ignore this -
but that *would* mean accepting that windowed aggregates are non-
deterministic in the sense that their result (even if only in the number
of trailing zeros) depends on values outside of the frame. Which, I guess,
is a box that best stays closed...


Yeah, I can understand the temptation but I agree we can't go changing results.
 
I'm currently thinking the best way forward is to get a basic patch
without any NUMERIC stuff committed, and to revisit this after that's done.


Agreed... I'll warm up my delete key.

Regards

David Rowley
 
best regards,
Florian Pflug


Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Jan 14, 2014 at 2:29 PM, Tom Lane <span
dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
class="im"><br/></div> One reason I'm thinking this is that whatever we do to ameliorate<br /> the semantic issues is
goingto slow down the forward transition<br /> function --- to no benefit unless the aggregate is being used as a<br />
windowfunction in a moving window.  So I'm less than convinced<br /> that we *should* implement any of these designs in
thedefault<br /> aggregates, even if we get to the point where we arguably *could*<br /> do it with little risk of
functionaldifferences.<br /><br />                         regards, tom lane<br /></blockquote></div><br /></div><div
class="gmail_extra">Iwas thinking about this earlier today and came up with an idea that perhaps aggregates could
support2 transition functions. 1 for normal aggregation and for windows with UNBOUNDED PRECEDING. The 2nd transition
functioncould be used when there is a possibility that we would need to perform an inverse transition. This 2nd
transitionfunction could do all the extra tracking it needed without having to worry that it would slow down normal
aggregation.With numeric that might be tracking each numeric's scale as it enters and exits the window frame. It might
evenbe possible to perform inverse transitions with float and double here, we could just internally use numeric, and
havethe final function convert back to the original type. Though perhaps that might have the side effect of making
floatingpoint calculations too accurate? Or at least not matching the IEEE standards.</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">Of course, I'm not thinking this could be part of this patch, but I thought that I'd
postthe idea while all this is fresh in people's heads.</div><div class="gmail_extra"><br /></div><div
class="gmail_extra">David</div><divclass="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div></div> 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Wed, Jan 15, 2014 at 5:39 AM, Florian Pflug <fgp@phlo.org> wrote:

The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be constrained to STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL inputs however they please, no?


The reason I had to track the notnullcount was because for non-strict was because:

select sum(n) over (order by i rows between current row and unbounded following) from (values(1,1),(2,NULL)) v(i,n);

would otherwise return

1
0

sum is not strict, so I guess we need to track notnullcount for non-strict functions.

See the code around if (peraggstate->notnullcount == 0) in retreat_windowaggregate(). 
 
The logic around movedaggbase in eval_windowaggregates() seems a bit convoluted. Couldn't the if be moved before the code that pulls aggregatedbase up to frameheadpos using the inverse aggregation function?


I had a look at this and even tried moving the code to before the inverse transitions, but it looks like that would only work if I added more tests to the if condition to ensure that we're not about to perform inverse transitions. To me it just seemed more bullet proof the way it is rather than duplicating logic and having to ensure that it stays duplicated. But saying that I don't think I've fully got my head around why the original code is valid in the first place. I would have imagined that it should contain a check for FRAMEOPTION_START_UNBOUNDED_FOLLOWING, but if that sounds like complete rubbish then I'll put it down to my head still being fried from work.

 
Also, could we easily check whether the frames corresponding to the individual rows are all either identical or disjoint, and don't use the inverse transfer function then? Currently, for a frame which contains either just the current row, or all the current row's peers, I think we'd use the inverse transfer function to fully un-add the old frame, and then add back the new frame.


I didn't know there was a situation where this could happen. Could you give me an example query and I'll run it through the debugger to have a look at what's going on. But sure, if this is possible and I understand what you mean then I guess it would be a good optimisation to detect this and throw away the previous results and start fresh.

Thanks for all of your reviewing on this so far.

Regards

David Rowley
 
best regards,
Florian Pflug


Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Thu, Jan 16, 2014 at 9:45 AM, Florian Pflug <fgp@phlo.org> wrote:
I'm currently thinking the best way forward is to get a basic patch
without any NUMERIC stuff committed, and to revisit this after that's done.


Attached is a version to that effect. The number of inverse transition functions is down to 8 from 36.

I managed to keep avg(interval) in there as it's all integer based division.
sum for all int types remain, plus sum for interval and cash. count(exp) and count(*) also naturally remain.

sum(bigint) became a bit weird as it uses numeric types internally, so I had to keep the do_numeric_discard() function to support it. Previously this function did NaNCount-- if it was removing a NaN numeric, but since I got rid of the NaNCounter as it was no longer required I just decided to throw an error if we encounter a numeric with NaN. I thought this was ok as the only place the function is being called the numeric value being passed is built from a bigint which obviously can't be NaN. It may be better to just get rid of do_numeric_discard() and stick the sub_var(&(state->sumX), newval, &(state->sumX)); into int8_avg_accum_inv(). I don't have a preference as to which as I think there's reasons for both ways. Perhaps even the name int8_avg_accum_inv() is misleading as we never use it for avg(bigint). Anyway, I'll leave it as is for now as I just can't decide which way is better.

Florian, I've not had a chance to look at your updated v2.2 patch yet sorry. I'll try and get some time tomorrow evening.

Regards

David Rowley
 
best regards,
Florian Pflug


Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan16, 2014, at 09:07 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, Jan 15, 2014 at 5:39 AM, Florian Pflug <fgp@phlo.org> wrote:
>> The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be
constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL
inputshowever they please, no? 
>
> The reason I had to track the notnullcount was because for non-strict was because:
>
> select sum(n) over (order by i rows between current row and unbounded following) from (values(1,1),(2,NULL)) v(i,n);
>
> would otherwise return
> 1
> 0
>
> sum is not strict, so I guess we need to track notnullcount for non-strict functions.
> See the code around if (peraggstate->notnullcount == 0) in retreat_windowaggregate().

Yeah, I figured that was the reason, but you can't fix it that way. See
http://www.postgresql.org/message-id/8E857D95-CBA4-4974-A238-9DD7F61BEA48@phlo.orgfor a detailed explanation why. My
2.2patch allows pairs of non-strict forward and strict inverse transfer functions exactly because of this - i.e.,
basicallyit decrees that if there's an inverse transfer function, the strict setting of the *inverse* function
determinesthe aggregates NULL behaviour. The forward transfer function is then never called 
for NULL inputs, but it *is* called with the NULL state for the first non-NULL input, and *must* then return a non-NULL
state(hence it's technically not strict, it's strict only in the inputs, not in the state). 

BTW, I just realized I failed to update CREATE AGGREGATE's logic when I did that in 2.2. That doesn't matter for the
built-inaggregates since these aren't created with CREATE AGGREGATE, but it's obviously wrong. I'll post a fixed
patched.

>> The logic around movedaggbase in eval_windowaggregates() seems a bit convoluted. Couldn't the if be moved before the
codethat pulls aggregatedbase up to frameheadpos using the inverse aggregation function? 

> I had a look at this and even tried moving the code to before the inverse transitions, but it looks like that would
onlywork if I added more tests to the if condition to ensure that we're not about to perform inverse transitions. To me
itjust seemed more bullet proof the way it is rather than duplicating logic and having to ensure that it stays
duplicated.But saying that I don't think I've fully got my head around why the original code is valid in the first
place.I would have imagined that it should contain a check for FRAMEOPTION_START_UNBOUNDED_FOLLOWING, but if that
soundslike complete rubbish then I'll put it down to my head still being fried from work. 

Ok, I get this now. That code really just is asking "is the previous row's frame the same as the current row's frame"
inthat "if" where you added movedagg. What confused me was the rather weird way that condition is expressed, which
turnsout is due to the fact that at the point of the if, we don't know the new frame's end. Now, we could use
update_frametailpos()to find that, but that's potentially costly, especially if the tuple store was spilled to disk. So
instead,the code relies on the fact that only if the frame end is "n FOLLOWING/PRECEDING" can the current row lie
withinthe previous row's frame without the two frame's ends being necessarily the same. 

For added confusion, that "if" never explicitly checks whether the frame's *heads* coincide - the previous code seems
tohave relied on the impossibility of having "aggregatedbase <= current < aggregatedupto" after re-initializing the
aggregation,because then aggregatedbase = aggregatedupto = 0. That's why you can't just move the "if" before the
"frameheadpos== aggregatedbase" check. But you can *if* you also check whether "aggregatedbase == frameheadpos" in the
if- which is clearer than relying on that rather subtle  assumption anyway. 

BTW, the your patch will also fail badly if the frame head ever moves backwards - the invariant that "aggregatedbase ==
frameheadpos"after the inverse transition loop will then be violated. Now, this should never happen, because we require
thatthe offset in "n PRECEDING/FOLLOWING" is constant, but we should probably still check for this and elog(). 

That check was implicit in old code, because it advanced the tuplestore mark, so if the frame head moved backwards,
re-scanningthe frame would have failed. That brings me to another todo - as it stands, that mark gets never advanced if
we'redoing inverse aggregation. To fix that, we need a call to WinSetMarkPosition() somewhere in
eval_windowaggregates().

After doing this things, eval_windowaggregates ended up calling initalize_windowframeaggregates at a single place
again,so I inlined it back into eval_windowaggregates().  I also made it use temp_slot_1 in the inverse aggregation
loop,since that seemed safe - the code assumes some invariants about aggregatedbase and agg_row_slow. 

Updated patch is attached (2.4_fgp).

>> Also, could we easily check whether the frames corresponding to the individual rows are all either identical or
disjoint,and don't use the inverse transfer function then? Currently, for a frame which contains either just the
currentrow, or all the current row's peers, I think we'd use the inverse transfer function to fully un-add the old
frame,and then add back the new frame. 
>
> I didn't know there was a situation where this could happen. Could you give me an example query and I'll run it
throughthe debugger to have a look at what's going on. But sure, if this is possible and I understand what you mean
thenI guess it would be a good optimisation to detect this and throw away the previous results and start fresh. 

The case I had in mind there was "RANGE BETWEEN CURRENT ROW AND CURRENT ROW", i.e. each row's frame consists of all
it'sordering peers. Haven't looked into this yet. 

best regards,
Florian Pflug



Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Heikki Linnakangas
Date:
On 01/16/2014 01:02 PM, David Rowley wrote:
> sum(bigint) became a bit weird as it uses numeric types internally, so I
> had to keep the do_numeric_discard() function to support it.

It's pretty weird that we have implemented sum(bigint) that way. I 
understand that the result is a numeric so that it won't overflow, but 
implementing it by converting every value to numeric is naive.

I propose that we reimplement sum(bigint) in a more efficient way: For 
the internal state, let's use an int8 and a numeric overflow field. The 
transition function adds to the int8 variable, and checks for overflow. 
On overflow, increment the numeric field by one. In the final function, 
multiply the numeric by 2^64, and add the residual int8 value.

- Heikki



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> I propose that we reimplement sum(bigint) in a more efficient way: For 
> the internal state, let's use an int8 and a numeric overflow field. The 
> transition function adds to the int8 variable, and checks for overflow. 
> On overflow, increment the numeric field by one. In the final function, 
> multiply the numeric by 2^64, and add the residual int8 value.

It'd probably be sufficient to handle it as two int64 fields (handmade
128-bit arithmetic, or maybe even not so handmade if that ever gets
reasonably common among C compilers).  You're assuming the final output
is still numeric, right?
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Heikki Linnakangas
Date:
On 01/16/2014 08:59 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> I propose that we reimplement sum(bigint) in a more efficient way: For
>> the internal state, let's use an int8 and a numeric overflow field. The
>> transition function adds to the int8 variable, and checks for overflow.
>> On overflow, increment the numeric field by one. In the final function,
>> multiply the numeric by 2^64, and add the residual int8 value.
>
> It'd probably be sufficient to handle it as two int64 fields (handmade
> 128-bit arithmetic, or maybe even not so handmade if that ever gets
> reasonably common among C compilers).

True. That would be sufficient for summing 2^64 int8s of INT64_MAX. That 
sounds like enough, especially considering that that count() will 
overflow after that too.

>  You're assuming the final output is still numeric, right?

Yep.

- Heikki



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Andres Freund
Date:
On 2014-01-16 21:07:33 +0200, Heikki Linnakangas wrote:
> On 01/16/2014 08:59 PM, Tom Lane wrote:
> >Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> >>I propose that we reimplement sum(bigint) in a more efficient way: For
> >>the internal state, let's use an int8 and a numeric overflow field. The
> >>transition function adds to the int8 variable, and checks for overflow.
> >>On overflow, increment the numeric field by one. In the final function,
> >>multiply the numeric by 2^64, and add the residual int8 value.
> >
> >It'd probably be sufficient to handle it as two int64 fields (handmade
> >128-bit arithmetic, or maybe even not so handmade if that ever gets
> >reasonably common among C compilers).
> 
> True. That would be sufficient for summing 2^64 int8s of INT64_MAX. That
> sounds like enough, especially considering that that count() will overflow
> after that too.

You'll have to handle adding negative values and underflow as
well.
Maybe it's instead sufficient to just have flag indicating that you're
working with a state that hasn't overflowed so far and just plain int8
math as long as that's the case, and entirely fall back to the current
path once overflowed. That will probably be slightly faster and easily
handle the majority of cases since overflowing int8 ought to be pretty
rare in the real world.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> You'll have to handle adding negative values and underflow as
> well.

Right.

> Maybe it's instead sufficient to just have flag indicating that you're
> working with a state that hasn't overflowed so far and just plain int8
> math as long as that's the case, and entirely fall back to the current
> path once overflowed. That will probably be slightly faster and easily
> handle the majority of cases since overflowing int8 ought to be pretty
> rare in the real world.

Dunno, I think that a transition state containing both an int64 and
a (presumably separately palloc'd) numeric will be a real PITA.
And it will not be faster, because the principal drag on performance
will just be the overflow test, which you have to do either way.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Andres Freund
Date:
On 2014-01-16 14:23:47 -0500, Tom Lane wrote:
> > Maybe it's instead sufficient to just have flag indicating that you're
> > working with a state that hasn't overflowed so far and just plain int8
> > math as long as that's the case, and entirely fall back to the current
> > path once overflowed. That will probably be slightly faster and easily
> > handle the majority of cases since overflowing int8 ought to be pretty
> > rare in the real world.
> 
> Dunno, I think that a transition state containing both an int64 and
> a (presumably separately palloc'd) numeric will be a real PITA.

Yea, not sure myself. I just dislike the idea of having a good part of a
128bit math implementation for a single transition function.

Another alternative would be a configure check for compiler/native
128bit math and fall back to the current implementation if none is
provided... That should give decent performance with a pretty low amount
of code for most platforms.

> And it will not be faster, because the principal drag on performance
> will just be the overflow test, which you have to do either way.

Well, you don't need to check the second variable for lots of
operations. Say, the current sum is 0 and you add a -1. With the two
variables scheme that requires checking the second variable,
manipulating it etc.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-01-16 14:23:47 -0500, Tom Lane wrote:
>> Dunno, I think that a transition state containing both an int64 and
>> a (presumably separately palloc'd) numeric will be a real PITA.

> Yea, not sure myself. I just dislike the idea of having a good part of a
> 128bit math implementation for a single transition function.

Not sure how you figure that we need very much new code beyond the
overflow test.

> Well, you don't need to check the second variable for lots of
> operations. Say, the current sum is 0 and you add a -1. With the two
> variables scheme that requires checking the second variable,
> manipulating it etc.

I'm envisioning just
state->lowhalf += input;if (overflowed_up)    state->highhalf++;else if (overflowed_down)    state->highhalf--;

The only thing that might take a moment's thought, or extra cycles in the
normal case, is extending the overflow test so that it can tell whether
we need to increment or decrement the upper half.

[ thinks a bit... ]  Note that I'm supposing that the state is defined
as (highhalf * 2^64) + lowhalf, not that we need the two variables to
be exactly a 128-bit twos-complement value, which is what I think
you're talking about.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
I had some more fun with this, the result is v2.5 of the patch (attached). Changes are explained below.

On Jan16, 2014, at 19:10 , Florian Pflug <fgp@phlo.org> wrote:
> On Jan16, 2014, at 09:07 , David Rowley <dgrowleyml@gmail.com> wrote:
>> On Wed, Jan 15, 2014 at 5:39 AM, Florian Pflug <fgp@phlo.org> wrote:
>>> The notnullcount machinery seems to apply to both STRICT and non-STRICT transfer function pairs. Shouldn't that be
constrainedto STRICT transfer function pairs? For non-STRICT pairs, it's up to the transfer functions to deal with NULL
inputshowever they please, no? 
>>
>> The reason I had to track the notnullcount was because for non-strict was because:
>>
>> select sum(n) over (order by i rows between current row and unbounded following) from (values(1,1),(2,NULL)) v(i,n);
>>
>> would otherwise return
>> 1
>> 0
>>
>> sum is not strict, so I guess we need to track notnullcount for non-strict functions.
>> See the code around if (peraggstate->notnullcount == 0) in retreat_windowaggregate().
>
> Yeah, I figured that was the reason, but you can't fix it that way. See
http://www.postgresql.org/message-id/8E857D95-CBA4-4974-A238-9DD7F61BEA48@phlo.orgfor a detailed explanation why. My
2.2patch allows pairs of non-strict forward and strict inverse transfer functions exactly because of this - i.e.,
basicallyit decrees that if there's an inverse transfer function, the strict setting of the *inverse* function
determinesthe aggregates NULL behaviour. The forward transfer function is then never called 
> for NULL inputs, but it *is* called with the NULL state for the first non-NULL input, and *must* then return a
non-NULLstate (hence it's technically not strict, it's strict only in the inputs, not in the state). 
>
> BTW, I just realized I failed to update CREATE AGGREGATE's logic when I did that in 2.2. That doesn't matter for the
built-inaggregates since these aren't created with CREATE AGGREGATE, but it's obviously wrong. I'll post a fixed
patched.

Fixed now.

>>> The logic around movedaggbase in eval_windowaggregates() seems a bit convoluted. Couldn't the if be moved before
thecode that pulls aggregatedbase up to frameheadpos using the inverse aggregation function? 
>
>> I had a look at this and even tried moving the code to before the inverse transitions, but it looks like that would
onlywork if I added more tests to the if condition to ensure that we're not about to perform inverse transitions. To me
itjust seemed more bullet proof the way it is rather than duplicating logic and having to ensure that it stays
duplicated.But saying that I don't think I've fully got my head around why the original code is valid in the first
place.I would have imagined that it should contain a check for FRAMEOPTION_START_UNBOUNDED_FOLLOWING, but if that
soundslike complete rubbish then I'll put it down to my head still being fried from work. 
>
> Ok, I get this now. That code really just is asking "is the previous row's frame the same as the current row's frame"
inthat "if" where you added movedagg. What confused me was the rather weird way that condition is expressed, which
turnsout is due to the fact that at the point of the if, we don't know the new frame's end. Now, we could use
update_frametailpos()to find that, but that's potentially costly, especially if the tuple store was spilled to disk. So
instead,the code relies on the fact that only if the frame end is "n FOLLOWING/PRECEDING" can the current row lie
withinthe previous row's frame without the two frame's ends being necessarily the same. 
>
> For added confusion, that "if" never explicitly checks whether the frame's *heads* coincide - the previous code seems
tohave relied on the impossibility of having "aggregatedbase <= current < aggregatedupto" after re-initializing the
aggregation,because then aggregatedbase = aggregatedupto = 0. That's why you can't just move the "if" before the
"frameheadpos== aggregatedbase" check. But you can *if* you also check whether "aggregatedbase == frameheadpos" in the
if- which is clearer than relying on that rather subtle  assumption anyway. 
>
> BTW, the your patch will also fail badly if the frame head ever moves backwards - the invariant that "aggregatedbase
==frameheadpos" after the inverse transition loop will then be violated. Now, this should never happen, because we
requirethat the offset in "n PRECEDING/FOLLOWING" is constant, but we should probably still check for this and elog(). 
>
> That check was implicit in old code, because it advanced the tuplestore mark, so if the frame head moved backwards,
re-scanningthe frame would have failed. That brings me to another todo - as it stands, that mark gets never advanced if
we'redoing inverse aggregation. To fix that, we need a call to WinSetMarkPosition() somewhere in
eval_windowaggregates().
>
> After doing this things, eval_windowaggregates ended up calling initalize_windowframeaggregates at a single place
again,so I inlined it back into eval_windowaggregates().  I also made it use temp_slot_1 in the inverse aggregation
loop,since that seemed safe - the code assumes some invariants about aggregatedbase and agg_row_slow. 
>
> Updated patch is attached (2.4_fgp).

I've now shuffled things around so that we can use inverse transition functions
even if only some aggregates provide them, and to allow inverse transition
functions to force a restart by returning NULL. The rules regarding NULL handling
are now the following

 (A) Transition functions without an inverse behave as they always did
 (B) Forward transition functions with an inverse may not return NULL,
     not even for all-NULL inputs.
 (C) If the inverse transition function returns NULL, the aggregation is
     restarted from the beginning, just as If no inverse transition
     function had been provided.
 (D) If the transition function is strict, the inverse must also be.
 (E) NULLs are skipped if the inverse transition function is strict,
     regardless of whether the forward transition function is.

The rules may seem a bit weird, but they're the only solution I found which

 * Doesn't require every invertible transition function with "internal"
   state to implement it's own NULL counting. This requires (E), since
   transition functions with state type "internal" need to be callable
   with a NULL state - since nobody else can allocate the state object.

 * Still allows transition functions to handle NULL values however they
   please - they just have to have a non-strict inverse to opt out of (E).
   They cannot use NULL states however they please, though.

 * Allows partial inverse transition functions, i.e. inverse transition
   functions which sometimes report "sorry, please restart". The only
   easy way to report this is to have the inverse transition function
   return NULL, which means we cannot allow NULL as just another state
   value, i.e. this mandates (B,C)

(D) is purely to avoid having to deal with any special-cases that might
arise from allowing this.

I've also moved the volatile functions check into initialize_peragg() -
having it in ExecWindowAgg() made no sense to me since it's dealing
with static information. Also, ExecWindowAgg() may get called multiple
times, e.g. if the window functions are on the inner side of a
nested loop join or within a subselect.


>>> Also, could we easily check whether the frames corresponding to the individual rows are all either identical or
disjoint,and don't use the inverse transfer function then? Currently, for a frame which contains either just the
currentrow, or all the current row's peers, I think we'd use the inverse transfer function to fully un-add the old
frame,and then add back the new frame. 
>>
>> I didn't know there was a situation where this could happen. Could you give me an example query and I'll run it
throughthe debugger to have a look at what's going on. But sure, if this is possible and I understand what you mean
thenI guess it would be a good optimisation to detect this and throw away the previous results and start fresh. 
>
> The case I had in mind there was "RANGE BETWEEN CURRENT ROW AND CURRENT ROW", i.e. each row's frame consists of all
it'sordering peers. Haven't looked into this yet. 

It now resets the aggregation when the new frame doesn't overlap the old one, i.e. if frameheadpos > aggregatedupto.

The patch passes regression test, but I haven't done any other testing yet.
Also, the documentation doesn't explain the NULL and STRICT semantics yet.

best regards,
Florian Pflug

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug <fgp@phlo.org> wrote:
I had some more fun with this, the result is v2.5 of the patch (attached). Changes are explained below.


Looks like you've been busy on this! Thank you for implementing all the changes you talked about.
I've now started working the 2.5 patch you sent and I've ripped out all the numeric inverse transition functions on that patch and implemented inverse transitions for max and min using the new NULL returning method that you've implemented. I've also made a very fast pass and fixed up a few comments and some random white space. I've not gotten to the documents yet.

I did add a whole bunch of regression tests for all of the inverse transition functions that are used for max and min. I'm just calling these like:
SELECT int4larger_inv(3,2),int4larger_inv(3,3),int4larger_inv(3,4);
Rather than using the aggregate as a window function each time. I thought it was better to validate each of those functions individually then put in various tests that target a smaller number of aggregates with various inputs.

A few things still to do that I'll try and get to soon.
1. More testing
2. Docs updated.
3. Perhaps I'll look at adding bitand and bitor inverse transition functions
4. ?

Thanks again for putting so much effort into this.
If you make any more changes would it be possible for you to base them on the attached patch instead of the last one you sent?

Perhaps if there's much more hacking to do we could start pushing to a guthub repo to make it easier to work together on this.

Regards

David Rowley

 
Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan17, 2014, at 15:14 , David Rowley <dgrowleyml@gmail.com> wrote:
> If you make any more changes would it be possible for you to base them on the attached patch instead of the last one
yousent? 

Sure! The only reason I didn't rebase my last patch onto yours was that having the numeric stuff in there meant
potentiallybetter test coverage. Thanks for doing the merge! 

> Perhaps if there's much more hacking to do we could start pushing to a guthub repo to make it easier to work together
onthis. 

Yeah, that seems like a good idea. Easier to keep track of then a bunch of patches floating around. Could you push your
latestversion? 

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug <fgp@phlo.org> wrote:

I've now shuffled things around so that we can use inverse transition functions
even if only some aggregates provide them, and to allow inverse transition
functions to force a restart by returning NULL. The rules regarding NULL handling
are now the following


Maybe this is me thinking out loud, but I'm just thinking about the numeric case again.
Since the patch can now handle inverse transition functions returning NULL when they fail to perform inverse transitions, I'm wondering if we could add an "expectedscale" to NumericAggState, set it to -1 initially, when we get the first value set the expectedscale to the dscale of that numeric, then if we get anything but that value we'll set the expectedscale back to -1 again, if we are asked to perform an inverse transition with a expectedscale as -1 we'll return null, otherwise we can perform the inverse transition...

Thoughts?

Regards

David Rowley 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan17, 2014, at 20:34 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug <fgp@phlo.org> wrote:
>
>> I've now shuffled things around so that we can use inverse transition functions
>> even if only some aggregates provide them, and to allow inverse transition
>> functions to force a restart by returning NULL. The rules regarding NULL handling
>> are now the following
>
> Maybe this is me thinking out loud, but I'm just thinking about the numeric case again.
> Since the patch can now handle inverse transition functions returning NULL when they
> fail to perform inverse transitions, I'm wondering if we could add an "expectedscale"
> to NumericAggState, set it to -1 initially, when we get the first value set the
> expectedscale to the dscale of that numeric, then if we get anything but that value
> we'll set the expectedscale back to -1 again, if we are asked to perform an inverse
> transition with a expectedscale as -1 we'll return null, otherwise we can perform
> the inverse transition...

You could do better than that - the numeric problem amounts to tracking the maximum
scale AFAICS, so it'd be sufficient to restart if we remove a value whose scale equals
the current maximum. But if we do SUM(numeric) at all, I think we should do so
without requiring restarts - I still think the overhead of tracking the maximum scale
within the aggregated values isn't that bad. If we zero the dscale counters lazily,
the numbers of entries we have to zero is bound by the maximum dscale we encounter.
Since actually summing N digits is probably more expensive than zeroing them, and since
we pay the zeroing overhead only once per aggregation but save potentially many
summations, I'm pretty sure we come out ahead by quite some margin.

It'd be interesting to do float() similar to the way you describe, though. We might
not be able to guarantee that we yield exactly the same result as without inverse
aggregation, but we might be able to bound the error. That might make it acceptable
to do this - as Kevin pointed out, float is always an approximation anyway. I haven't
really thought that through, though...

Anyway, with time running out fast if we want to get this into 9.4, I think we should
focus on getting this into a committable state right now.

I've started to look over what you've pushed to github, and it looks mostly fine.
I have a few comments - mostly cosmetic stuff - that I'll post once I finished reading
through it. I also plan to do some basic performance testing to verify that my
reshuffling of eval_windowaggregates() doesn't hurt aggregates without an inverse
transition function.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sat, Jan 18, 2014 at 10:17 AM, Florian Pflug <fgp@phlo.org> wrote:
On Jan17, 2014, at 20:34 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug <fgp@phlo.org> wrote:
>
>> I've now shuffled things around so that we can use inverse transition functions
>> even if only some aggregates provide them, and to allow inverse transition
>> functions to force a restart by returning NULL. The rules regarding NULL handling
>> are now the following
>
> Maybe this is me thinking out loud, but I'm just thinking about the numeric case again.
> Since the patch can now handle inverse transition functions returning NULL when they
> fail to perform inverse transitions, I'm wondering if we could add an "expectedscale"
> to NumericAggState, set it to -1 initially, when we get the first value set the
> expectedscale to the dscale of that numeric, then if we get anything but that value
> we'll set the expectedscale back to -1 again, if we are asked to perform an inverse
> transition with a expectedscale as -1 we'll return null, otherwise we can perform
> the inverse transition...

You could do better than that - the numeric problem amounts to tracking the maximum
scale AFAICS, so it'd be sufficient to restart if we remove a value whose scale equals
the current maximum. But if we do SUM(numeric) at all, I think we should do so
without requiring restarts - I still think the overhead of tracking the maximum scale
within the aggregated values isn't that bad. If we zero the dscale counters lazily,
the numbers of entries we have to zero is bound by the maximum dscale we encounter.
Since actually summing N digits is probably more expensive than zeroing them, and since
we pay the zeroing overhead only once per aggregation but save potentially many
summations, I'm pretty sure we come out ahead by quite some margin.


We'll work that out, I don't think it will take very long to code up your idea either.
I just thought that my idea was good enough and very cheap too, won't all numerics that are actually stored in a column have the same scale anyway? Is it not only been a problem because we've been testing with random numeric literals the whole time?

The test turned out to become:
if (state->expectedScale == -1)
state->expectedScale = X.dscale;
else if (state->expectedScale != X.dscale)
state->expectedScale = -2;

In do_numeric_accum, then when we do the inverse I just check if expectedScale < 0 then return NULL.

I'm not set on it, and I'm willing to try the lazy zeroing of the scale tracker array, but I'm just not quite sure what extra real use cases it covers that the one above does not. Perhaps my way won't perform inverse transitions if the query did sum(numericCol / 10) or so.

I'll be committing this to the github repo very soon, so we can hack away at the scale tracking code once it's back in.

David Rowley

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sat, Jan 18, 2014 at 10:42 AM, David Rowley <dgrowleyml@gmail.com> wrote:

You could do better than that - the numeric problem amounts to tracking the maximum
scale AFAICS, so it'd be sufficient to restart if we remove a value whose scale equals
the current maximum. But if we do SUM(numeric) at all, I think we should do so
without requiring restarts - I still think the overhead of tracking the maximum scale
within the aggregated values isn't that bad. If we zero the dscale counters lazily,
the numbers of entries we have to zero is bound by the maximum dscale we encounter.
Since actually summing N digits is probably more expensive than zeroing them, and since
we pay the zeroing overhead only once per aggregation but save potentially many
summations, I'm pretty sure we come out ahead by quite some margin.


We'll work that out, I don't think it will take very long to code up your idea either.
I just thought that my idea was good enough and very cheap too, won't all numerics that are actually stored in a column have the same scale anyway? Is it not only been a problem because we've been testing with random numeric literals the whole time?

The test turned out to become:
if (state->expectedScale == -1)
state->expectedScale = X.dscale;
else if (state->expectedScale != X.dscale)
state->expectedScale = -2;

In do_numeric_accum, then when we do the inverse I just check if expectedScale < 0 then return NULL.

I'm not set on it, and I'm willing to try the lazy zeroing of the scale tracker array, but I'm just not quite sure what extra real use cases it covers that the one above does not. Perhaps my way won't perform inverse transitions if the query did sum(numericCol / 10) or so.

I'll be committing this to the github repo very soon, so we can hack away at the scale tracking code once it's back in.


Ok, we're back up to 86 aggregate function / type combinations with inverse transition functions.
I've commited my latest work up to github and here's a fresh patch which I will need to do more tests on.

The test (below) that used to fail a few versions ago is back in there and it's now passing.

SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
  FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);

In this case it won't use inverse transitions because the forward transition function detects that the scale is not fixed.

I tested throwing some numerics into a table and I'm pretty happy with the results.

create table num (num numeric(10,2) not null);
insert into num (num) select * from generate_series(1,20000);
select sum(num) over(order by num rows between current row and unbounded following) from num; -- 124ms
select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 254ms
select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 108156.917 ms

The divide by 1 case is slow because of that weird 20 trailing zero instead of 16 when dividing a numeric by 1 and that causes the inverse transition function to return NULL because the scale changes.

I've not tested an unpatched version yet to see how that divide by 1 query performs on that but I'll get to that soon.

I'm thinking that the idea about detecting the numeric range with floating point types and performing an inverse transition providing the range has not gone beyond some defined danger zone is not material for this patch... I think it would be not a great deal of work to code, but the testing involved would probably make this patch not possible for 9.4

The latest version of the patch is attached.

Regards

David Rowley
 
Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> I just thought that my idea was good enough and very cheap too, won't all
> numerics that are actually stored in a column have the same scale anyway?

No; unconstrained numeric columns (ie, if you just say "numeric") don't
force their contents to any particular scale.  It might be that we don't
have to optimize for that case, since it's not in the SQL spec, but it
is definitely supported by PG.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
First, I've go the feeling that I should somehow update the commitfest app,
but I don't really know in which way. Should I put myself in as a reviewer,
or as a second author? Or neither? Suggestions welcome...

On Jan17, 2014, at 23:34 , David Rowley <dgrowleyml@gmail.com> wrote:
> The test turned out to become:
>     if (state->expectedScale == -1)
>         state->expectedScale = X.dscale;
>     else if (state->expectedScale != X.dscale)
>         state->expectedScale = -2;
>
> In do_numeric_accum, then when we do the inverse I just check if
> expectedScale < 0 then return NULL.

Ok, so this will rescan if and only if the dscales of all inputs match.
I still that's overly pessimistic - we've only got a problem when we
removed the input with the largest dscale, no? So my suggestion would be
 state->maxDScale = MAX(X.dscale, state->maxDScale);

in do_numeric_accum, and in the inverse
 if (state->maxDScane == X.dscale)   return PG_RETURN_NULL;

I'd think that this avoids more restarts without about the same effort,
but I haven't tried this though, so maybe I'm missing something.

> I'm not set on it, and I'm willing to try the lazy zeroing of the scale
> tracker array, but I'm just not quite sure what extra real use cases it
> covers that the one above does not. Perhaps my way won't perform inverse
> transitions if the query did sum(numericCol / 10) or so.

Dunno how many people SUM over numerics with different dscales. Easily
possible that it's few enough to not be worth fussing over.

> create table num (num numeric(10,2) not null);
> insert into num (num) select * from generate_series(1,20000);
> select sum(num) over(order by num rows between current row and unbounded following) from num; -- 124ms
> select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 254ms
> select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 108156.917 ms
>
> The divide by 1 case is slow because of that weird 20 trailing zero
> instead of 16 when dividing a numeric by 1 and that causes the inverse
> transition function to return NULL because the scale changes.
>
> I've not tested an unpatched version yet to see how that divide by 1 query
> performs on that but I'll get to that soon.

So without the patch, all three queries should perform simiarly, i.e. take
about 10 seconds, right? If so, the improvement is fantastic!

> I'm thinking that the idea about detecting the numeric range with floating
> point types and performing an inverse transition providing the range has
> not gone beyond some defined danger zone is not material for this patch...
> I think it would be not a great deal of work to code, but the testing involved
> would probably make this patch not possible for 9.4

Yeah, I never imagined that this would happen for 9.4.

> The latest version of the patch is attached.

OK, there are a few more comments

* build_aggregate_fnexprs() should allow NULL to be passed for invtransfn_oid, I think. I don't quite like that we
constructthat even for plain aggregates, and avoiding requires just an additional if. 

* Don't we need to check for volatile function in the filter expression too?

* As it stands, I don't think intXand_inv and intXor_inv are worth it, since the case where they return non-NULL is
awefullyslim (only for inputs containing only 1 respectively only zeros). We should either track the number of zeros
andones per bit, which would allow us to always perform inverse transitions, or just drop them.  

* Quite a few of the inverse transition functions are marked strict, yet contain code to handle NULL inputs. You can
justremove that code - the system makes sure that strict functions never receive NULL arguments. Affected are, AFAICS
numeric_accum_inv,numeric_avg_accum_inv, int2_accum_inv,  int4_accum_inv, int8_accum_inv, int8_avg_accum_inv,
int2_sum_inv,int4_sum_inv, int8_sum_inv. Not sure that list is exhaustive... 

* For any of the new inverse transition functions, I'd be inclined to just elog() if they're called directly and not as
anaggregate. In particular those which check for that anyway, plus the *smaller_inv and *larger_inv ones. I don't see
whyanyone would ever want to call these directly - and if we elog() now, we can easy change these functions later,
becauseno external code can depend on them. E.g., maybe someone wants to keep the top N elements in the MIN/MAX
aggregatesone day... 

* The number of new regression tests seems a bit excessive. I don't think there really a policy what to test and what
not,but in this case I think it suffices if we test the basic machinery, plus the more complex functions. But e.g. most
ofthe SUM and AVG aggregates use numeric_accum or numeric_avg_accum internally, and the wrapper code basically just
doesdatatype conversion, so testing a few cases seems enough there. What I think we *should* test, but don't do
currently,is whether the core machinery performs the expected calls of the forward and reverse transition function. I
wasthinking about creating an aggregate in the regression tests which simply concatenates all the calls into a string,
e.g.you might get "F:1 F:2 F:3 I:1" if we aggregated 1,2,3 and then removed 1. I think that should be possible with an
SQL-languageforward and inverse transfer function, but I haven't tried. I can try, if you want. 

best regards,
Florian Pflug












Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug <fgp@phlo.org> wrote:
First, I've go the feeling that I should somehow update the commitfest app,
but I don't really know in which way. Should I put myself in as a reviewer,
or as a second author? Or neither? Suggestions welcome...


We I guess you're both now, but it's a bit weird to be author and reviewer so I've put your name against author too, hopefully Dean can review our combined results and we can review each other's work at the same time.
 
On Jan17, 2014, at 23:34 , David Rowley <dgrowleyml@gmail.com> wrote:
> The test turned out to become:
>       if (state->expectedScale == -1)
>               state->expectedScale = X.dscale;
>       else if (state->expectedScale != X.dscale)
>               state->expectedScale = -2;
>
> In do_numeric_accum, then when we do the inverse I just check if
> expectedScale < 0 then return NULL.

Ok, so this will rescan if and only if the dscales of all inputs match.
I still that's overly pessimistic - we've only got a problem when we
removed the input with the largest dscale, no? So my suggestion would be

  state->maxDScale = MAX(X.dscale, state->maxDScale);

in do_numeric_accum, and in the inverse

  if (state->maxDScane == X.dscale)
    return PG_RETURN_NULL;

I'd think that this avoids more restarts without about the same effort,
but I haven't tried this though, so maybe I'm missing something.


This is not quite right as it means if all the values are the same then
we reject inverse transitions since state->maxScale will always be equal to X.dscale.
But you are right about the overly strict code I've put in, we should allow values with a less than the maximum dscale to be unaggregated without complaint. To implement this I needed a maxScaleCounter variable too so we only reject when the maxScaleCounter gets back to 0 again. 

Note that after this fix the results for my quick benchmark now look like:

create table num (num numeric(10,2) not null);
insert into num (num) select * from generate_series(1,20000);
select sum(num) over(order by num rows between current row and unbounded following) from num; -- 113 ms
select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 156ms
select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 144 ms

So it seems to be much less prone to falling back to brute force forward transitions. 
It also seems the / 10 version must have had to previously do 1 brute force rescan but now it looks like it can do it in 1 scan.

> I'm not set on it, and I'm willing to try the lazy zeroing of the scale
> tracker array, but I'm just not quite sure what extra real use cases it
> covers that the one above does not. Perhaps my way won't perform inverse
> transitions if the query did sum(numericCol / 10) or so.

Dunno how many people SUM over numerics with different dscales. Easily
possible that it's few enough to not be worth fussing over.


Going by Tom's comments in the post above this is possible just by having an unconstrained numeric column, but I guess there's still a good chance that even those unconstrained numbers have the same scale or at least the scale will likely not vary wildly enough to make us have to perform brute force forward transitions for each row.
 
> create table num (num numeric(10,2) not null);
> insert into num (num) select * from generate_series(1,20000);
> select sum(num) over(order by num rows between current row and unbounded following) from num; -- 124ms
> select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 254ms
> select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 108156.917 ms
>
> The divide by 1 case is slow because of that weird 20 trailing zero
> instead of 16 when dividing a numeric by 1 and that causes the inverse
> transition function to return NULL because the scale changes.
>
> I've not tested an unpatched version yet to see how that divide by 1 query
> performs on that but I'll get to that soon.

So without the patch, all three queries should perform simiarly, i.e. take
about 10 seconds, right? If so, the improvement is fantastic!


Well, it's actually 100 seconds, not 10. I tested the worse case performance against an unpatched head and got 107 seconds instead of the 108. So I'm guessing that's pretty good as worse case is not really any worse and the worse case is pretty hard to get to. I guess the results would have to all have a different scale with the biggest scale on the first aggregated values... Reaching that worse case just seems impossible in a real world workload.
 
> I'm thinking that the idea about detecting the numeric range with floating
> point types and performing an inverse transition providing the range has
> not gone beyond some defined danger zone is not material for this patch...
> I think it would be not a great deal of work to code, but the testing involved
> would probably make this patch not possible for 9.4

Yeah, I never imagined that this would happen for 9.4.

> The latest version of the patch is attached.

OK, there are a few more comments

* build_aggregate_fnexprs() should allow NULL to be passed for invtransfn_oid,
  I think. I don't quite like that we construct that even for plain aggregates,
  and avoiding requires just an additional if.

I'm not quite sure what you mean on this. It passes InvalidOid in normal aggregate calls (search for: "InvalidOid, /* invtrans is not needed here */") and only looks up the function in build_aggregate_fnexprs if (OidIsValid(invtransfn_oid)) is true. I'm not sure how this can be improved since that function is used for window aggregates and normal aggregates.
 
* Don't we need to check for volatile function in the filter expression too?


I did manual testing on this before and the volatility test for the aggregate arguments seems to cover this. I didn't look into why but it just did. I've not test this again since your refactoring. I could test this easily before when my numeric case was changing the results because of the dscale problem, I noticed that if I did FILTER(WHERE random() > 0) that the extra trailing zeros would disappear.
The problem now is that it's pretty hard to determine if an inverse transition took place, the only way we can really tell is performance. I'll see if I can invent a new test case for this by creating a user defined aggregate as you described. I'm thinking just append '+' to a string for transitions and '-' to a string for inverse transitions, then just make sure we only have a string of '+'s when doing something like filter(where random() >= 0).

 
* As it stands, I don't think intXand_inv and intXor_inv are worth it, since
  the case where they return non-NULL is awefully slim (only for inputs
  containing only 1 respectively only zeros). We should either track the number
  of zeros and ones per bit, which would allow us to always perform inverse
  transitions, or just drop them.


I did think of this when I wrote them. I thought that the removing 0 case might be quite common and worth it, but I thought the ~0 case would be less common, but I just thought it was weird to do one without the other.
To do more tracking on these it looks like we'd need to change those aggregates to use an state type that is internal and I think the extra tracking would mean looping over a 8, 32 or 64 element array of int64's for each value, I just don't think that would be a winner performance wise since the code that's there is pretty much a handful of CPU cycles. It's probably far more worth it for the bool and/or aggregates. We could just keep track of the values aggregated and the count of values as "true" and return true if those are the same in the case of "AND", then check the true count is > 0 in the case of "OR". I'd feel more strongly to go and do that if I'd actually ever used those aggregates for anything.
 
* Quite a few of the inverse transition functions are marked strict, yet
  contain code to handle NULL inputs. You can just remove that code - the system
  makes sure that strict functions never receive NULL arguments. Affected are,
  AFAICS numeric_accum_inv, numeric_avg_accum_inv, int2_accum_inv,
  int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv, int4_sum_inv,
  int8_sum_inv. Not sure that list is exhaustive...


Should be able to get a list from:
select proname,proisstrict from pg_proc where proisstrict = true and oid in(select agginvtransfn from pg_aggregate);

I might not have time for this today though, so if you feel like checking these that would be really helpful.
 
* For any of the new inverse transition functions, I'd be inclined to just
  elog() if they're called directly and not as an aggregate. In particular
  those which check for that anyway, plus the *smaller_inv and *larger_inv
  ones. I don't see why anyone would ever want to call these directly - and
  if we elog() now, we can easy change these functions later, because no external
  code can depend on them. E.g., maybe someone wants to keep the top N elements
  in the MIN/MAX aggregates one day...


Yeah I guess the way it is now may mean we'd need to support legacy functions for ever and a day if we changed the way they worked, but I'm not sure if adding a check to see if it was used in an aggregate function changes that, as a user could be using the built in function in their own user defined aggregate, so there could still be complaints if we removed them from a major version. What would be needed is some way to have functions internally but publish these functions to the user, say only visible from initdb or something. I don't think that's part of this patch though. Maybe just the fact that they're undocumented helps give them more ability to be removed later.
 
* The number of new regression tests seems a bit excessive. I don't think there
  really a policy what to test and what not, but in this case I think it suffices
  if we test the basic machinery, plus the more complex functions. But e.g. most
  of the SUM and AVG aggregates use numeric_accum or numeric_avg_accum internally,
  and the wrapper code basically just does datatype conversion, so testing a few
  cases seems enough there. What I think we *should* test, but don't do currently,
  is whether the core machinery performs the expected calls of the forward and
  reverse transition function. I was thinking about creating an aggregate in the
  regression tests which simply concatenates all the calls into a string, e.g.
  you might get "F:1 F:2 F:3 I:1" if we aggregated 1,2,3 and then removed 1.
  I think that should be possible with an SQL-language forward and inverse
  transfer function, but I haven't tried. I can try, if you want.


I agree that there are quite a lot of tests and I think that's why I took a different approach when it came to all this little *larger_inv and smaller_inv functions, there were just so many!  I thought the aggregate tests would run in the blank of a eye anyway, but perhaps I should consider other things than just processing time. I created most of the aggregate call tests by writing a whole load of queries on an unpatched version then ran the tests and took the output of that as my expected results for my patched copy. These were really useful to check for regression when I was working hard on nodeWindowAgg.c. I'd find it hard to pick and choose what to remove giving that they all test something slightly different, even if it's just a different final function. They did pick up some failures earlier when I forgot to change the strict property on int8_avg_accum_inv. Maybe someone else has an opinion on that the number of tests?

Overall, I think it's really starting to take shape now and the list of things to do are pretty small. I'm really happy to see so many aggregate functions with inverse transition functions now!

Regards

David Rowley
 
best regards,
Florian Pflug


Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sat, Jan 18, 2014 at 6:15 PM, David Rowley <dgrowleyml@gmail.com> wrote:
On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug <fgp@phlo.org> wrote:
* Don't we need to check for volatile function in the filter expression too?


I did manual testing on this before and the volatility test for the aggregate arguments seems to cover this. I didn't look into why but it just did. I've not test this again since your refactoring. I could test this easily before when my numeric case was changing the results because of the dscale problem, I noticed that if I did FILTER(WHERE random() > 0) that the extra trailing zeros would disappear.
The problem now is that it's pretty hard to determine if an inverse transition took place, the only way we can really tell is performance. I'll see if I can invent a new test case for this by creating a user defined aggregate as you described. I'm thinking just append '+' to a string for transitions and '-' to a string for inverse transitions, then just make sure we only have a string of '+'s when doing something like filter(where random() >= 0).

 
I've added a test case for this and it seem work as expected:

Regards

David Rowley
 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan18, 2014, at 06:15 , David Rowley <dgrowleyml@gmail.com> wrote:
>> On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug <fgp@phlo.org> wrote:
>> On Jan17, 2014, at 23:34 , David Rowley <dgrowleyml@gmail.com> wrote:
>>> The test turned out to become:
>>>      if (state->expectedScale == -1)
>>>              state->expectedScale = X.dscale;
>>>      else if (state->expectedScale != X.dscale)
>>>              state->expectedScale = -2;
>>>
>>> In do_numeric_accum, then when we do the inverse I just check if
>>> expectedScale < 0 then return NULL.
>>
>> Ok, so this will rescan if and only if the dscales of all inputs match.
>> I still that's overly pessimistic - we've only got a problem when we
>> removed the input with the largest dscale, no? So my suggestion would be
>>
>> <sniped>
>>
>> I'd think that this avoids more restarts without about the same effort,
>> but I haven't tried this though, so maybe I'm missing something.
>
> This is not quite right as it means if all the values are the same then
> we reject inverse transitions since state->maxScale will always be equal
> to X.dscale.
> But you are right about the overly strict code I've put in, we should allow
> values with a less than the maximum dscale to be unaggregated without
> complaint. To implement this I needed a maxScaleCounter variable too so we
> only reject when the maxScaleCounter gets back to 0 again.

Ups, sorry, yeah. Sounds sensible.

BTW, this made me realize that MIN and MAX currently have the same issue -
they'll rescan if the inputs are all equal. We could avoid that by doing what
you did with dscale - track the number of times we've seen the maximum. I
wonder if that would be worth it - it would, unfortunately, require use to
use state type "internal" there too, and hence to add final functions for all
the MIN/MAX aggregates. But that seems excessive. So for now, let's just
live with that.

If we really *do* want to optimize this case, we could
come to it from a completely different angle. Aggregates already special-case
MIN and MAX to be able to use an index to evalutate SELECT MAX(c) FROM t.
If we provided a way for the transition function to call the sort operator
specified by SORTOP in CREATE AGGREGATE, one generic triple of forward and
inverse transition function and final function would work for all the
MIN and MAX aggregates. But that's material for a separate patch for 9.5

> Note that after this fix the results for my quick benchmark now look like:
>
> create table num (num numeric(10,2) not null);
> insert into num (num) select * from generate_series(1,20000);
> select sum(num) over(order by num rows between current row and unbounded following) from num; -- 113 ms
> select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 156ms
> select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 144 ms
>
> So it seems to be much less prone to falling back to brute force forward
> transitions.
> It also seems the / 10 version must have had to previously do 1 brute
> force rescan but now it looks like it can do it in 1 scan.
>
>>> I'm not set on it, and I'm willing to try the lazy zeroing of the scale
>>> tracker array, but I'm just not quite sure what extra real use cases it
>>> covers that the one above does not. Perhaps my way won't perform inverse
>>> transitions if the query did sum(numericCol / 10) or so.
>>
>> Dunno how many people SUM over numerics with different dscales. Easily
>> possible that it's few enough to not be worth fussing over.
>
> Going by Tom's comments in the post above this is possible just by having an
> unconstrained numeric column, but I guess there's still a good chance that
> even those unconstrained numbers have the same scale or at least the scale
> will likely not vary wildly enough to make us have to perform brute force
> forward transitions for each row.

Yeah, I'm convinced by now that your approach is the right trade-off there.
Those who do have values with wildly different dscales in their columns can
always add a cast to normalize them, if they experience a lot of restarts.

So let's just add a sentence or two to the SUM(numeric) documentation about
this, and be done.

>> * build_aggregate_fnexprs() should allow NULL to be passed for invtransfn_oid,
>>  I think. I don't quite like that we construct that even for plain aggregates,
>>  and avoiding requires just an additional if.
>>
>> I'm not quite sure what you mean on this. It passes InvalidOid in normal
>> aggregate calls (search for: "InvalidOid, /* invtrans is not needed here */")
>> and only looks up the function in build_aggregate_fnexprs if
>> (OidIsValid(invtransfn_oid)) is true. I'm not sure how this can be improved
>> since that function is used for window aggregates and normal aggregates.

I was thinking about checking for **invtransfnexpr = NULL, and not assigning
if it is. But on second thought, you're right - the additional variable doesn't
really hurt. So let's leave it as it is.

>> * Don't we need to check for volatile function in the filter expression too?
>
> I did manual testing on this before and the volatility test for the aggregate
> arguments seems to cover this. I didn't look into why but it just did. I've
> not test this again since your refactoring. I could test this easily before
> when my numeric case was changing the results because of the dscale problem,
> I noticed that if I did FILTER(WHERE random() > 0) that the extra trailing
> zeros would disappear. The problem now is that it's pretty hard to determine
> if an inverse transition took place, the only way we can really tell is
> performance. I'll see if I can invent a new test case for this by creating a
> user defined aggregate as you described. I'm thinking just append '+' to a
> string for transitions and '-' to a string for inverse transitions, then
> just make sure we only have a string of '+'s when doing something like
> filter(where random() >= 0).

For your other mail I get that this works as expected. Thanks for testing this!

>> * As it stands, I don't think intXand_inv and intXor_inv are worth it, since
>>  the case where they return non-NULL is awefully slim (only for inputs
>>  containing only 1 respectively only zeros). We should either track the number
>>  of zeros and ones per bit, which would allow us to always perform inverse
>>  transitions, or just drop them.
>
> I did think of this when I wrote them. I thought that the removing 0 case might
> be quite common and worth it, but I thought the ~0 case would be less common,
> but I just thought it was weird to do one without the other.
> To do more tracking on these it looks like we'd need to change those aggregates
> to use an state type that is internal and I think the extra tracking would mean
> looping over a 8, 32 or 64 element array of int64's for each value, I just don't
> think that would be a winner performance wise since the code that's there is
> pretty much a handful of CPU cycles.

Yeah, this is similar to the SUM(numeric) problem in that we *could* avoid
all restarts, but the overhead of doing so is quite high. But whereas in the
SUM(numeric) case we manage to reduce the overhead while still optimizing most
cases, here I think we optimize nearly none. My vote is for dropping these
functions entirely, but I don't feel particularly strongly about this...

>> It's probably far more worth it for the bool and/or aggregates. We could just
>> keep track of the values aggregated and the count of values as "true" and return
>> true if those are the same in the case of "AND", then check the true count
>> is > 0 in the case of "OR". I'd feel more strongly to go and do that if I'd
>> actually ever used those aggregates for anything.

That, OTOH, would be worthwhile I think. I'll go do that, though probably
not today. I hope to get to it sometime tomorrow.

>> * Quite a few of the inverse transition functions are marked strict, yet
>>  contain code to handle NULL inputs. You can just remove that code - the system
>>  makes sure that strict functions never receive NULL arguments. Affected are,
>>  AFAICS numeric_accum_inv, numeric_avg_accum_inv, int2_accum_inv,
>>  int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv, int4_sum_inv,
>>  int8_sum_inv. Not sure that list is exhaustive...
>
> Should be able to get a list from:
> select proname,proisstrict from pg_proc where proisstrict = true and oid in(select
> agginvtransfn from pg_aggregate);
>
> I might not have time for this today though, so if you feel like checking these
> that would be really helpful.

Yeah, I'll do that, also tomorrow hopefully.

>> * For any of the new inverse transition functions, I'd be inclined to just
>>  elog() if they're called directly and not as an aggregate. In particular
>>  those which check for that anyway, plus the *smaller_inv and *larger_inv
>>  ones. I don't see why anyone would ever want to call these directly - and
>>  if we elog() now, we can easy change these functions later, because no external
>>  code can depend on them. E.g., maybe someone wants to keep the top N elements
>>  in the MIN/MAX aggregates one day...
>
> Yeah I guess the way it is now may mean we'd need to support legacy functions
> for ever and a day if we changed the way they worked, but I'm not sure if adding
> a check to see if it was used in an aggregate function changes that, as a user
> could be using the built in function in their own user defined aggregate, so
> there could still be complaints if we removed them from a major version. What
> would be needed is some way to have functions internally but publish these
> functions to the user, say only visible from initdb or something. I don't think
> that's part of this patch though. Maybe just the fact that they're undocumented
> helps give them more ability to be removed later.

Hm, true. Still, I think I'd prefer us to elog() for those new functions which
explicitly check whether there's an aggregation context anyway. Can do that if you
want.

>> * The number of new regression tests seems a bit excessive. I don't think there
>>  really a policy what to test and what not, but in this case I think it suffices
>>  if we test the basic machinery, plus the more complex functions. But e.g. most
>>  of the SUM and AVG aggregates use numeric_accum or numeric_avg_accum internally,
>>  and the wrapper code basically just does datatype conversion, so testing a few
>>  cases seems enough there. What I think we *should* test, but don't do currently,
>>  is whether the core machinery performs the expected calls of the forward and
>>  reverse transition function. I was thinking about creating an aggregate in the
>>  regression tests which simply concatenates all the calls into a string, e.g.
>>  you might get "F:1 F:2 F:3 I:1" if we aggregated 1,2,3 and then removed 1.
>>  I think that should be possible with an SQL-language forward and inverse
>>  transfer function, but I haven't tried. I can try, if you want.
>
> I agree that there are quite a lot of tests and I think that's why I took a different
> approach when it came to all this little *larger_inv and smaller_inv functions, there
> were just so many!  I thought the aggregate tests would run in the blank of a eye
> anyway, but perhaps I should consider other things than just processing time. I
> created most of the aggregate call tests by writing a whole load of queries on an
> unpatched version then ran the tests and took the output of that as my expected
> results for my patched copy. These were really useful to check for regression when
> I was working hard on nodeWindowAgg.c. I'd find it hard to pick and choose what to
> remove giving that they all test something slightly different, even if it's just a
> different final function. They did pick up some failures earlier when I forgot to
> change the strict property on int8_avg_accum_inv. Maybe someone else has an opinion
> on that the number of tests?

I think the basic guideline for whether to test something in the regression test or
not is not so much "did it catch an error during development", but rather "could this
catch errors inadvertedly introduced later". That's not to say you shouldn't use
more test during development - your procedure there's fine - the question is just
whether to cut them down when the patch's done. For all these rather trivial inverse
function, I think we can trust that if they work once, they're not going to break
unless someone changes the function itself - they don't really have any outside
dependencies. That's also what the window functions regression test does today, I
think - it doesn't really test all possible cases exhaustively.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Jan 19, 2014 at 3:22 AM, Florian Pflug <fgp@phlo.org> wrote:

BTW, this made me realize that MIN and MAX currently have the same issue -
they'll rescan if the inputs are all equal. We could avoid that by doing what
you did with dscale - track the number of times we've seen the maximum. I
wonder if that would be worth it - it would, unfortunately, require use to
use state type "internal" there too, and hence to add final functions for all
the MIN/MAX aggregates. But that seems excessive. So for now, let's just
live with that.


Yeah, it's an idea... I had actually talked a bit about it before when first I posted about inverse transition functions.


But I've only realised today that it might be a no go. Let me explain...

I just finished implementing the inverse transition functions for bool_and and bool_or, these aggregates had a sort operator which I assume would have allowed an index scan to be performed, but since I had to change the first argument of these aggregates to internal and that meant I had to get rid of the sort operator... So I'm not actually sure that we really should implement inverse transition functions for bool_and and bool_or because of this. Never-the-less I commited a patch to the github repo which implements them. I guess this sort operator problem completely writes off doing something similar for MAX and MIN as that would mean no index scan would be possible for these aggregates!

 
If we really *do* want to optimize this case, we could
come to it from a completely different angle. Aggregates already special-case
MIN and MAX to be able to use an index to evalutate SELECT MAX(c) FROM t.
If we provided a way for the transition function to call the sort operator
specified by SORTOP in CREATE AGGREGATE, one generic triple of forward and
inverse transition function and final function would work for all the
MIN and MAX aggregates. But that's material for a separate patch for 9.5

> Note that after this fix the results for my quick benchmark now look like:
>
> create table num (num numeric(10,2) not null);
> insert into num (num) select * from generate_series(1,20000);
> select sum(num) over(order by num rows between current row and unbounded following) from num; -- 113 ms
> select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 156ms
> select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 144 ms
>
> So it seems to be much less prone to falling back to brute force forward
> transitions.
> It also seems the / 10 version must have had to previously do 1 brute
> force rescan but now it looks like it can do it in 1 scan.
>
>>> I'm not set on it, and I'm willing to try the lazy zeroing of the scale
>>> tracker array, but I'm just not quite sure what extra real use cases it
>>> covers that the one above does not. Perhaps my way won't perform inverse
>>> transitions if the query did sum(numericCol / 10) or so.
>>
>> Dunno how many people SUM over numerics with different dscales. Easily
>> possible that it's few enough to not be worth fussing over.
>
> Going by Tom's comments in the post above this is possible just by having an
> unconstrained numeric column, but I guess there's still a good chance that
> even those unconstrained numbers have the same scale or at least the scale
> will likely not vary wildly enough to make us have to perform brute force
> forward transitions for each row.

Yeah, I'm convinced by now that your approach is the right trade-off there.
Those who do have values with wildly different dscales in their columns can
always add a cast to normalize them, if they experience a lot of restarts.

So let's just add a sentence or two to the SUM(numeric) documentation about
this, and be done.


I had a quick look and I couldn't decide the best place to write about specific details on inverse transition functions. The best place I could see was to add a note under the aggregates table here:

 
>
> I did think of this when I wrote them. I thought that the removing 0 case might
> be quite common and worth it, but I thought the ~0 case would be less common,
> but I just thought it was weird to do one without the other.
> To do more tracking on these it looks like we'd need to change those aggregates
> to use an state type that is internal and I think the extra tracking would mean
> looping over a 8, 32 or 64 element array of int64's for each value, I just don't
> think that would be a winner performance wise since the code that's there is
> pretty much a handful of CPU cycles.

Yeah, this is similar to the SUM(numeric) problem in that we *could* avoid
all restarts, but the overhead of doing so is quite high. But whereas in the
SUM(numeric) case we manage to reduce the overhead while still optimizing most
cases, here I think we optimize nearly none. My vote is for dropping these
functions entirely, but I don't feel particularly strongly about this...


Ok, I've reverted the patch which implemented the bitwise aggregate inverse transition functions.
 
>> It's probably far more worth it for the bool and/or aggregates. We could just
>> keep track of the values aggregated and the count of values as "true" and return
>> true if those are the same in the case of "AND", then check the true count
>> is > 0 in the case of "OR". I'd feel more strongly to go and do that if I'd
>> actually ever used those aggregates for anything.

That, OTOH, would be worthwhile I think. I'll go do that, though probably
not today. I hope to get to it sometime tomorrow.

I've commited a patch to the github repo to do this.
but I'm not sure if we can keep it as I had to remove the sort op as I explained above. 

>> * Quite a few of the inverse transition functions are marked strict, yet
>>  contain code to handle NULL inputs. You can just remove that code - the system
>>  makes sure that strict functions never receive NULL arguments. Affected are,
>>  AFAICS numeric_accum_inv, numeric_avg_accum_inv, int2_accum_inv,
>>  int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv, int4_sum_inv,
>>  int8_sum_inv. Not sure that list is exhaustive...
>
> Should be able to get a list from:
> select proname,proisstrict from pg_proc where proisstrict = true and oid in(select
> agginvtransfn from pg_aggregate);
>
> I might not have time for this today though, so if you feel like checking these
> that would be really helpful.

Yeah, I'll do that, also tomorrow hopefully.

I think I beat you to it.But I'm not quite sure what to do in numeric_avg_accum_inv() where it does:
if (state == NULL)
state = makeNumericAggState(fcinfo, false);

Perhaps we should just add an Assert(state != NULL); here, since we shouldn't be calling inverse transitions without actually calling the transition function, and the transition function should initialise the state. I think this is valid as we can't actually call that function manually because of the internal type. So it won't be possible to cause an assert failure by doing select numeric_avg_accum_inv(NULL,20);
 

>> * For any of the new inverse transition functions, I'd be inclined to just
>>  elog() if they're called directly and not as an aggregate. In particular
>>  those which check for that anyway, plus the *smaller_inv and *larger_inv
>>  ones. I don't see why anyone would ever want to call these directly - and
>>  if we elog() now, we can easy change these functions later, because no external
>>  code can depend on them. E.g., maybe someone wants to keep the top N elements
>>  in the MIN/MAX aggregates one day...
>
> Yeah I guess the way it is now may mean we'd need to support legacy functions
> for ever and a day if we changed the way they worked, but I'm not sure if adding
> a check to see if it was used in an aggregate function changes that, as a user
> could be using the built in function in their own user defined aggregate, so
> there could still be complaints if we removed them from a major version. What
> would be needed is some way to have functions internally but publish these
> functions to the user, say only visible from initdb or something. I don't think
> that's part of this patch though. Maybe just the fact that they're undocumented
> helps give them more ability to be removed later.

Hm, true. Still, I think I'd prefer us to elog() for those new functions which
explicitly check whether there's an aggregation context anyway. Can do that if you
want.


I've not done this yet, but may look at it later. I'd quite like a 2nd opinion on it before I go and implement this as I'm not quite sure of what value it will bring. I don't think it means that one day we can remove these functions if we find a better way of doing things as there it would still be possible to use them in a user defined aggregate. It just probably narrows the chances, but it does stick a small overhead in each function to test for this.
 
>> * The number of new regression tests seems a bit excessive. I don't think there
>>  really a policy what to test and what not, but in this case I think it suffices
>>  if we test the basic machinery, plus the more complex functions. But e.g. most
>>  of the SUM and AVG aggregates use numeric_accum or numeric_avg_accum internally,
>>  and the wrapper code basically just does datatype conversion, so testing a few
>>  cases seems enough there. What I think we *should* test, but don't do currently,
>>  is whether the core machinery performs the expected calls of the forward and
>>  reverse transition function. I was thinking about creating an aggregate in the
>>  regression tests which simply concatenates all the calls into a string, e.g.
>>  you might get "F:1 F:2 F:3 I:1" if we aggregated 1,2,3 and then removed 1.
>>  I think that should be possible with an SQL-language forward and inverse
>>  transfer function, but I haven't tried. I can try, if you want.
>
> I agree that there are quite a lot of tests and I think that's why I took a different
> approach when it came to all this little *larger_inv and smaller_inv functions, there
> were just so many!  I thought the aggregate tests would run in the blank of a eye
> anyway, but perhaps I should consider other things than just processing time. I
> created most of the aggregate call tests by writing a whole load of queries on an
> unpatched version then ran the tests and took the output of that as my expected
> results for my patched copy. These were really useful to check for regression when
> I was working hard on nodeWindowAgg.c. I'd find it hard to pick and choose what to
> remove giving that they all test something slightly different, even if it's just a
> different final function. They did pick up some failures earlier when I forgot to
> change the strict property on int8_avg_accum_inv. Maybe someone else has an opinion
> on that the number of tests?

I think the basic guideline for whether to test something in the regression test or
not is not so much "did it catch an error during development", but rather "could this
catch errors inadvertedly introduced later". That's not to say you shouldn't use
more test during development - your procedure there's fine - the question is just
whether to cut them down when the patch's done. For all these rather trivial inverse
function, I think we can trust that if they work once, they're not going to break
unless someone changes the function itself - they don't really have any outside
dependencies. That's also what the window functions regression test does today, I
think - it doesn't really test all possible cases exhaustively.


Ok, we can think about that later then. I don't want to remove any just yet, even if we think the patch is getting close another review from someone else could mean it needs more big changes before it's ready to go.
 
best regards,
Florian Pflug


Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Jan 19, 2014 at 5:27 PM, David Rowley <dgrowleyml@gmail.com> wrote:

>> It's probably far more worth it for the bool and/or aggregates. We could just
>> keep track of the values aggregated and the count of values as "true" and return
>> true if those are the same in the case of "AND", then check the true count
>> is > 0 in the case of "OR". I'd feel more strongly to go and do that if I'd
>> actually ever used those aggregates for anything.

That, OTOH, would be worthwhile I think. I'll go do that, though probably
not today. I hope to get to it sometime tomorrow.

I've commited a patch to the github repo to do this.
but I'm not sure if we can keep it as I had to remove the sort op as I explained above. 


I think I'm going to have to revert the patch which implements the inverse transition function for bool_and and bool_or.
I tested on an instance of 9.3.2 and the following queries use index scans.

create table booltest (b boolean not null);
insert into booltest (b) select false from generate_series(1,20000) g(n);
insert into booltest (b) values(true);

create index booltest_b_idx ON booltest(b);
vacuum analyze booltest;

explain select bool_or(b) from booltest;
explain select bool_and(b) from booltest;

I'm guessing there is no way to have an internal state type on the aggregate and a sort operator on the aggregate.

I wonder if it is worth creating naive inverse transition functions similar to max()'s and min()'s inverse transition functions. I guess on average they've got about a 50% chance of being used and likely for some work loads it would be a win.

What's your thoughts?

Regards
David Rowley
 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sat, Jan 18, 2014 at 11:34 AM, David Rowley <dgrowleyml@gmail.com> wrote:

The latest version of the patch is attached.


I've attached an updated version of the patch.

I'm now using github to track the changes on the patch, so I've included the commit sha in the file name of the latest commit that this patch includes, but I've also included the date.

Please see https://github.com/david-rowley/postgres/commits/invtrans for what's been changed.

Right now I don't think there is very much left to do. Perhaps the documents need some examples of creating inverse transition functions, I was not sure, so I left them out for now.

Regards

David Rowley

 
Regards

David Rowley
 

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan19, 2014, at 05:27 , David Rowley <dgrowleyml@gmail.com> wrote:
>> I just finished implementing the inverse transition functions for bool_and
>> and bool_or, these aggregates had a sort operator which I assume would have
>> allowed an index scan to be performed, but since I had to change the first
>> argument of these aggregates to internal and that meant I had to get rid of
>> the sort operator...

Why does having transition type "internal" prevent you from specifying a
sort operator? The sort operator's argument types must match the *input*
type of the aggregate, not the transition type.

Here's a pure SQL implementation of an optimized bool_and called myand_agg
that uses state type bigint[] and specifies a sort operator.
 create or replace function myboolagg_fwd(counts bigint[], value bool) returns bigint[] as $$   select array[
counts[1]+ case value when true then 0 else 1 end,     counts[2] + case value when true then 1 else 0 end   ] $$
languagesql strict immutable; 
 create or replace function myboolagg_inv(counts bigint[], value bool) returns bigint[] as $$   select array[
counts[1]- case value when true then 0 else 1 end,     counts[2] - case value when true then 1 else 0 end   ] $$
languagesql strict immutable; 
 create or replace function myboolagg_and(counts bigint[]) returns bool as $$   select case counts[1] when 0 then true
elsefalse end $$ language sql strict immutable; 
 create aggregate myand_agg (bool) (   stype = bigint[],   sfunc = myboolagg_fwd,   invfunc = myboolagg_inv,
finalfunc= myboolagg_and,   sortop = <,   initcond = '{0,0}' ); 

With this, doing
 create table boolvals as   select i, random() < 0.5 as v from generate_series(1,10000) i; create index on boolvals(v);
 explain analyze select myand_agg(v) from boolvals;

yields
Result  (cost=0.33..0.34 rows=1 width=0) (actual time=0.067..0.067 rows=1 loops=1)  InitPlan 1 (returns $0)    ->
Limit (cost=0.29..0.33 rows=1 width=1) (actual time=0.061..0.061 rows=1 loops=1)          ->  Index Only Scan using
boolvals_v_idxon boolvals  (cost=0.29..474.41 rows=9950 width=1) (actual time=0.061..0.061 rows=1 loops=1)
 Index Cond: (v IS NOT NULL)                Heap Fetches: 1Total runtime: 0.100 ms 

which looks fine, no?

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Mon, Jan 20, 2014 at 5:53 AM, Florian Pflug <fgp@phlo.org> wrote:
On Jan19, 2014, at 05:27 , David Rowley <dgrowleyml@gmail.com> wrote:
>> I just finished implementing the inverse transition functions for bool_and
>> and bool_or, these aggregates had a sort operator which I assume would have
>> allowed an index scan to be performed, but since I had to change the first
>> argument of these aggregates to internal and that meant I had to get rid of
>> the sort operator...

Why does having transition type "internal" prevent you from specifying a
sort operator? The sort operator's argument types must match the *input*
type of the aggregate, not the transition type.

Here's a pure SQL implementation of an optimized bool_and called myand_agg
that uses state type bigint[] and specifies a sort operator.

  create or replace function myboolagg_fwd(counts bigint[], value bool)
  returns bigint[] as $$
    select array[
      counts[1] + case value when true then 0 else 1 end,
      counts[2] + case value when true then 1 else 0 end
    ]
  $$ language sql strict immutable;

  create or replace function myboolagg_inv(counts bigint[], value bool)
  returns bigint[] as $$
    select array[
      counts[1] - case value when true then 0 else 1 end,
      counts[2] - case value when true then 1 else 0 end
    ]
  $$ language sql strict immutable;

  create or replace function myboolagg_and(counts bigint[])
  returns bool as $$
    select case counts[1] when 0 then true else false end
  $$ language sql strict immutable;

  create aggregate myand_agg (bool) (
    stype = bigint[],
    sfunc = myboolagg_fwd,
    invfunc = myboolagg_inv,
    finalfunc = myboolagg_and,
    sortop = <,
    initcond = '{0,0}'
  );

With this, doing

  create table boolvals as
    select i, random() < 0.5 as v from generate_series(1,10000) i;
  create index on boolvals(v);

  explain analyze select myand_agg(v) from boolvals;

yields

 Result  (cost=0.33..0.34 rows=1 width=0) (actual time=0.067..0.067 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.29..0.33 rows=1 width=1) (actual time=0.061..0.061 rows=1 loops=1)
           ->  Index Only Scan using boolvals_v_idx on boolvals  (cost=0.29..474.41 rows=9950 width=1) (actual time=0.061..0.061 rows=1 loops=1)
                 Index Cond: (v IS NOT NULL)
                 Heap Fetches: 1
 Total runtime: 0.100 ms

which looks fine, no?


hmm, yeah you're right. I guess I didn't quite think through what the sort comparison was comparing with, for some reason I had it in my head that it was the aggregate state and not another value in a btree index.

I've applied that patch again and put in the sort operators.

Thanks for looking at that.

Regards

David Rowley
 
best regards,
Florian Pflug


Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan19, 2014, at 20:00 , David Rowley <dgrowleyml@gmail.com> wrote:
> I've applied that patch again and put in the sort operators.

I've push a new version to https://github.com/fgp/postgres/tree/invtrans
This branch includes the following changes

* A bunch of missing declaration for *_inv functions

* An assert that the frame end doesn't move backwards - I realized thatit is after all easy to do that, if it's done
afterthe loop which addsthe new values, not before. 

* EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregatetransitions per row and aggregate. It's a bit
imprecise,because it doesn'ttrack the count per aggregate, but it's still a good metric for how wellthe inverse
transitionfunctions work. If the number is close to one, youknow that very few rescans are happening. 

* I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, andit's the last commit, so if you object to
that,then you can merge up toeafa72330f23f7c970019156fcc26b18dd55be27 instead
ofde3d9148be9732c4870b76af96c309eaf1d613d7.

A few more things I noticed, all minor stuff

* do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if theinverse transition function returns NULL
once,we never call it again, so theflag won't have any practical effect. And second, assume we ever called theforward
transitionfunction after the inverse fail, and then retried the inverse.In the case of do_numeric_discard(), that
actually*could* allow the inverseto suddenly succeed - if the call to the forward function increased the dscalebeyond
thatof the element we tried to remove, removal would suddenly bepossible again. We never do that, of course, and it
seemsunlikely we everwill. But it's still weird to have code which serves no other purpose than topessimize a case
whichwould otherwise just work fine. 

* The state == NULL checks in all the strict inverse transition functions areredundant.

I haven't taken a close look at the documentation yet, I hope to be able to
do that tomorrow. Otherwise, things look good as far as I'm concerned.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Mon, Jan 20, 2014 at 8:42 PM, David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug <fgp@phlo.org> wrote:
* EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate
  transitions per row and aggregate. It's a bit imprecise, because it doesn't
  track the count per aggregate, but it's still a good metric for how well
  the inverse transition functions work. If the number is close to one, you
  know that very few rescans are happening.


I've not looked at this yet and I don't think I'll have time tonight, but it sounds interesting. I guess it might be quite nice to have a way to see this especially with the way the numeric stuff works, it might be actually pretty hard to otherwise know how many inverse transition "failures" there had been. Do you think it's also worth tracking the inverse transition failures too? 


I've merged this patch but I attempted to get it into a bit more of a ready state by moving the code out into a helper function the same way as the other explain stuff is done. I've not touched explain before so do let me know if I've made it worse.


Regards

David Rowley 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan19, 2014, at 20:00 , David Rowley <dgrowleyml@gmail.com> wrote:
> I've applied that patch again and put in the sort operators.

I've push a new version to https://github.com/fgp/postgres/tree/invtrans
which includes

* A bunch of missing declaration for *_inv functions

* An assert that the frame end doesn't move backwards - I realized that it is after all easy to do that, if it's done
afterthe loop which adds the new values, not before. 

* EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate transitions per row and aggregate. It's a bit
imprecise,because it doesn't track the count per aggregate, but it's still a good metric for how well the inverse
transitionfunctions work. If the number is close to one, you know that very few rescans are happening. 

* I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and it's the last commit, so if you object to
that,then you can merge up to eafa72330f23f7c970019156fcc26b18dd55be27 instead of
de3d9148be9732c4870b76af96c309eaf1d613d7.

A few more things I noticed, all minor stuff

* do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if the inverse transition function returns NULL
once,we never call it again, so the flag won't have any practical effect. And second, assume we ever called the forward
transitionfunction after the inverse fail, and then retried the inverse. In the case of do_numeric_discard(), that
actually*could* allow the inverse to suddenly succeed - if the call to the forward function increased the dscale beyond
thatof the element we tried to remove, removal would suddenly be possible again. We never do that, of course, and it
seemsunlikely we ever will. But it's still weird to have code which serves no other purpose than to pessimize a case
whichwould otherwise just work fine. 

* The state == NULL checks in all the strict inverse transition functions are redundant.

I haven't taken a close look at the documentation yet, I hope to be able to
do that tomorrow. Otherwise, things look good as far as I'm concerned.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug <fgp@phlo.org> wrote:
On Jan19, 2014, at 20:00 , David Rowley <dgrowleyml@gmail.com> wrote:
> I've applied that patch again and put in the sort operators.

I've push a new version to https://github.com/fgp/postgres/tree/invtrans
which includes

* A bunch of missing declaration for *_inv functions


Thanks, I've applied that.
 
* An assert that the frame end doesn't move backwards - I realized that
  it is after all easy to do that, if it's done after the loop which adds
  the new values, not before.

 
I've applied this too, but I'm wondering why an elog for if the head moves back, but an assert if the tail moves back? 
 
* EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate
  transitions per row and aggregate. It's a bit imprecise, because it doesn't
  track the count per aggregate, but it's still a good metric for how well
  the inverse transition functions work. If the number is close to one, you
  know that very few rescans are happening.


I've not looked at this yet and I don't think I'll have time tonight, but it sounds interesting. I guess it might be quite nice to have a way to see this especially with the way the numeric stuff works, it might be actually pretty hard to otherwise know how many inverse transition "failures" there had been. Do you think it's also worth tracking the inverse transition failures too? 

* I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
  it's the last commit, so if you object to that, then you can merge up to
  eafa72330f23f7c970019156fcc26b18dd55be27 instead of
  de3d9148be9732c4870b76af96c309eaf1d613d7.


Seems like sfunc really should be tfunc then we could have invtfunc. I'd probably understand this better if I knew what the 's' was for in sfunc. I've not applied this just yet. Do you have a reason why you think it's better?
 
A few more things I noticed, all minor stuff

* do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if the
  inverse transition function returns NULL once, we never call it again, so the
  flag won't have any practical effect. And second, assume we ever called the
  forward transition function after the inverse fail, and then retried the inverse.
  In the case of do_numeric_discard(), that actually *could* allow the inverse
  to suddenly succeed - if the call to the forward function increased the dscale
  beyond that of the element we tried to remove, removal would suddenly be
  possible again. We never do that, of course, and it seems unlikely we ever
  will. But it's still weird to have code which serves no other purpose than to
  pessimize a case which would otherwise just work fine.


hmm, yeah of course, you are right. I've removed this now.
 
* The state == NULL checks in all the strict inverse transition functions are
  redundant.


ok, I've removed these and added comments to note that these functions should be declared strict.
 
I haven't taken a close look at the documentation yet, I hope to be able to
do that tomorrow. Otherwise, things look good as far as I'm concerned.


Thanks, yeah those really do need a review. I've lost a bit of direction with them and I'm not quite sure just how much detail to go in to with it. I'd like to explain a bit that users who need to use their numeric columns in window aggregates might want to think about having a defined scale to the numeric rather than an undefined scale and explain that this is because the inverse transition function for numeric bails out if it loses the maximum seen dscale. Though it does seem generally a good idea to have a defined scale, but then I guess you've got to have a bit of knowledge about the numbers you're storing in that case. I'm not quite sure how to put that into words friendly enough for the documents just yet and or exactly where to put the words. So any ideas or patches you have around that would be great.

Once again thanks for all your work on this.

Regards

David Rowley
 
best regards,
Florian Pflug


Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan20, 2014, at 08:42 , David Rowley <dgrowleyml@gmail.com> wrote:
>> On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug <fgp@phlo.org> wrote:
>> * An assert that the frame end doesn't move backwards - I realized that
>>   it is after all easy to do that, if it's done after the loop which adds
>>   the new values, not before.
>
> I've applied this too, but I'm wondering why an elog for if the head moves
> back, but an assert if the tail moves back?

When I put the frame head check in, I was concerned that the code might crash
or loop endlessly if aggregatedbase was ever larger than frameheadpos, so
I made it elog(), just for safety.

The frame end check, OTOH, is done at the very end, so it doesn't protect
against much, it just documents that it's not supposed to happen.

But yeah, it's bit weird. Feel free to turn the frame end check into an
elog() too if you want to.

>> * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
>>   it's the last commit, so if you object to that, then you can merge up to
>>   eafa72330f23f7c970019156fcc26b18dd55be27 instead of
>>   de3d9148be9732c4870b76af96c309eaf1d613d7.
>
>
> Seems like sfunc really should be tfunc then we could have invtfunc. I'd probably
> understand this better if I knew what the 's' was for in sfunc. I've not applied
> this just yet. Do you have a reason why you think it's better?

My issue with just "invfunc" is mainly that it's too generic - it doesn't tell
you what it's supposed to be the inverse of.

I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and that
the naming is inspired by control theory, where the function which acts on the
state space is often called S.

> Thanks, yeah those really do need a review. I've lost a bit of direction with
> them and I'm not quite sure just how much detail to go in to with it. I'd like
> to explain a bit that users who need to use their numeric columns in window
> aggregates might want to think about having a defined scale to the numeric rather
> than an undefined scale and explain that this is because the inverse transition
> function for numeric bails out if it loses the maximum seen dscale. Though it
> does seem generally a good idea to have a defined scale, but then I guess you've
> got to have a bit of knowledge about the numbers you're storing in that case.
> I'm not quite sure how to put that into words friendly enough for the documents
> just yet and or exactly where to put the words. So any ideas or patches you have
> around that would be great.

Here's what I image the docs should look like, very roughly

* In CREATE AGGREGATE, we should state the precise axioms we assume about forward
and inverse transition functions. The last time I read the text there, it was
a bit ambiguous about whether inverse transition functions assume commutativity,
i.e. whether we assume that we can remove inputs other than the first one from
the aggregation. Currently, all the inverse transition functions we have are,
in fact, commutative, because all the forward transition function are also. But
we could e.g. add an inverse to array_agg and string_agg, and those would
obviously need this ordering guarantee. I'd also like us to explain that the
"inverse" in "inverse transition function" shouldn't be taken too literally. For
forward transition function F, inverse transition function G, and inputs a,b,...
we *don't require that G(F(s,a),a) == s. We we do require is that if I is the
initial state, then
 G(F(...F(F(I,a),b)...,z),a) == F(...F(I,b)...,z).

(Well, actually we don't strictly require even that, the two states don't
need to be identical, they just need to produce identical outputs when passed
to the final function)

* In CREATE AGGREGATE, we should also explain the NULL semantics you get
with various combinations of strict and non-strict forward and inverse
transition functions. I think a table listing all the combinations is in order
there, with a column explaining the semantics you get. We should also clearly
state that once you provide an inverse transition function, NULL isn't a valid
state value anymore, except as an initial state, i.e. that the forward transition
function must never return NULL in this case.

* The window function page should explain the performance hazards when
a moving frame head is involved. Ideally, we'd list which aggregates never
have to restart, and which sometimes might, and what you can do to avoid that.
We can also tell people to use EXPLAIN VERBOSE ANALYZE to check whether
restarts are occurring. This is were we'd tell people to cast their numeric
operands to a type with defined scale to avoid restarts, and were we'd state
the SUM(float) *does* restart always due to precision loss issues.

BTW, something that we haven't addressed at all is how inverse transition
functions interact with what is called "ordered-set aggregates". I haven't
wrapped my head around these fully, I think, so I'm still not sure if there's
anything to do there or not. Can those even be used as window functions?
Should we forbid ordered-set aggregates from specifying an inverse transition
function?

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Dec 15, 2013 at 2:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Stark <stark@mit.edu> writes:
> On 14 Dec 2013 15:40, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> I think you *can't* cover them for the float types; roundoff error
>> would mean you don't get the same answers as before.

> I was going to say the same thing. But then I started to wonder.... What's
> so special about the answers we used to give? They are also subject to
> round off and the results are already quite questionable in those cases.

Well, we can't easily do better than the old answers, and the new ones
might be arbitrarily worse.  Example: sum or average across single-row
windows ought to be exact in any case, but it might be arbitrarily wrong
with the negative-transition technique.

More generally, this is supposed to be a performance enhancement only;
it's not supposed to change the results.


It came to me that it might be possible to implement inverse transitions for floating point aggregates by just detecting if precision has been lost during forward transitions.

I've written the test to do this as:

 IF state.value + value = state.value AND value <> 0 THEN newstate.precision_lost := true; newstate.value := state.value; ELSE newstate.precision_lost := false; newstate.value := state.value + value; END IF;
The inverse transition function checks the precision_lost and if it's true it returns NULL. The core code is now implemented (thanks to Florian) to re-aggregate when NULL is returned from the inverse transition function.

I've attached an implementation of this with the transition functions written in plpgsql. 
I don't really know for sure yet if it can handle all cases and give the exact same results as it would without inverse transitions, but it certainly fixes the error case which was presented


explain (analyze, verbose)
select mysum(v) over (order by i rows between current row and unbounded following) from (values(1,1e20),(2,1)) b(i,v);

Gives me the expected results of 1e20 and 1, instead of my original attempt which gave 1e20 and 0.

I guess the extra tracking on forward transition might mean this would not be practical to implement in C for sum(float), but I just wanted to run the idea through a few heads to see if anyone can present a case where it can still produce wrong results.

If it seems sound enough, then I may implement it in C to see how much overhead it adds to forward aggregation for floating point types, but even if it did add too much overhead to forward aggregation it might be worth allowing aggregates to have 2 forward transition functions and if the 2nd one exists then it could be used in windowing functions where the frame does not have "unbounded following".

Any thoughts?

Regards

David Rowley

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan21, 2014, at 10:53 , David Rowley <dgrowleyml@gmail.com> wrote:
> It came to me that it might be possible to implement inverse transitions
> for floating point aggregates by just detecting if precision has been
> lost during forward transitions.
>
> I've written the test to do this as:
>
> IF state.value + value = state.value AND value <> 0 THEN
>         newstate.precision_lost := true;
>         newstate.value := state.value;
>     ELSE
>         newstate.precision_lost := false;
>         newstate.value := state.value + value;
>     END IF;
>
>
> The inverse transition function checks the precision_lost and if it's true it
> returns NULL. The core code is now implemented (thanks to Florian) to
> re-aggregate when NULL is returned from the inverse transition function.

That's not sufficient, I fear. You can lose all significant digits of the value
and still have precision_lost = false afterwards. Try summing over 1e16, 1.01.
"SELECT 1e16::float8 + 1.01::float8 = 1e16::float8" returns FALSE, yet
"SELECT 1e16::float8 + 1.01::float8 - 1e16::float8" returns "2" where "1.01"
would have been correct. That's still too much precision loss.

I'm quite certain that the general idea has merit, but the actual
invertibility condition is going to be more involved. If you want to play with
this, I think the first step has to be to find a set of guarantees that
SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
summands all have the same sign, the error is bound by C * N, where C is some
(machine-specific?) constant (about 1e-15 or so), and N is the number of input
rows. Or at least so I think from looking at SUMs over floats in descending
order, which I guess is the worst case. You could, for example, try to see if
you can find a invertibility conditions which guarantees the same, but allows
C to be larger. That would put a bound on the number of digits lost by the new
SUM(float) compared to the old one.

I don't have high hopes for this getting int 9.4, though.

> If it seems sound enough, then I may implement it in C to see how much
> overhead it adds to forward aggregation for floating point types, but even
> if it did add too much overhead to forward aggregation it might be worth
> allowing aggregates to have 2 forward transition functions and if the 2nd
> one exists then it could be used in windowing functions where the frame
> does not have "unbounded following".

I don't think adding yet another type of aggregation function is the
solution here.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan20, 2014, at 15:20 , Florian Pflug <fgp@phlo.org> wrote:
> * In CREATE AGGREGATE, we should state the precise axioms we assume about forward
> and inverse transition functions. The last time I read the text there, it was
> a bit ambiguous about whether inverse transition functions assume commutativity,
> i.e. whether we assume that we can remove inputs other than the first one from
> the aggregation. Currently, all the inverse transition functions we have are,
> in fact, commutative, because all the forward transition function are also. But
> we could e.g. add an inverse to array_agg and string_agg, and those would
> obviously need this ordering guarantee. I'd also like us to explain that the
> "inverse" in "inverse transition function" shouldn't be taken too literally. For
> forward transition function F, inverse transition function G, and inputs a,b,...
> we *don't require that G(F(s,a),a) == s. We we do require is that if I is the
> initial state, then
>
>  G(F(...F(F(I,a),b)...,z),a) == F(...F(I,b)...,z).
>
> (Well, actually we don't strictly require even that, the two states don't
> need to be identical, they just need to produce identical outputs when passed
> to the final function)
>
> * In CREATE AGGREGATE, we should also explain the NULL semantics you get
> with various combinations of strict and non-strict forward and inverse
> transition functions. I think a table listing all the combinations is in order
> there, with a column explaining the semantics you get. We should also clearly
> state that once you provide an inverse transition function, NULL isn't a valid
> state value anymore, except as an initial state, i.e. that the forward transition
> function must never return NULL in this case.

I gave that a shot, the results are at https://github.com/fgp/postgres/tree/invtrans

> * The window function page should explain the performance hazards when
> a moving frame head is involved. Ideally, we'd list which aggregates never
> have to restart, and which sometimes might, and what you can do to avoid that.
> We can also tell people to use EXPLAIN VERBOSE ANALYZE to check whether
> restarts are occurring. This is were we'd tell people to cast their numeric
> operands to a type with defined scale to avoid restarts, and were we'd state
> the SUM(float) *does* restart always due to precision loss issues.


> BTW, something that we haven't addressed at all is how inverse transition
> functions interact with what is called "ordered-set aggregates". I haven't
> wrapped my head around these fully, I think, so I'm still not sure if there's
> anything to do there or not. Can those even be used as window functions?
> Should we forbid ordered-set aggregates from specifying an inverse transition
> function?

It seems that these aren't valid window function anyway, so there's nothing
to do for now, I think.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sun, Jan 19, 2014 at 3:22 AM, Florian Pflug <fgp@phlo.org> wrote:
On Jan18, 2014, at 06:15 , David Rowley <dgrowleyml@gmail.com> wrote:> Note that after this fix the results for my quick benchmark now look like:
>
> create table num (num numeric(10,2) not null);
> insert into num (num) select * from generate_series(1,20000);
> select sum(num) over(order by num rows between current row and unbounded following) from num; -- 113 ms
> select sum(num / 10) over(order by num rows between current row and unbounded following) from num; -- 156ms
> select sum(num / 1) over(order by num rows between current row and unbounded following) from num; -- 144 ms
>
> So it seems to be much less prone to falling back to brute force forward
> transitions.
> It also seems the / 10 version must have had to previously do 1 brute
> force rescan but now it looks like it can do it in 1 scan.
>

I've performed some more benchmarks on this patch tonight. The results and full recreation scripts are attached along with the patch it was tested against.

The benchmarks contain a bit of a mix of different possible workloads, I've tried to show the worst and best case for everything I can think of that has a best and worst case. The best case with the patched version is naturally very good when the frame is unbounded following and the frame head moves down on each row, that is providing the inverse transition can actually take place.

I've noticed a slight performance regression on the worse case for max() and min() aggregates. Note that these inverse transition functions are implemented to return failure when the value being removed is equal to the current state. e.g, if max() is 10 and we remove 9, then we can report success, but if max is 10 and we remove 10 we report failure. The failed attempt to remove in this case will be costing us a little bit more than it was previously.

I'm not quite sure how likely it is to be a realistic workload, but a query such as:

select max(id) over (order by id DESC rows between current row and unbounded following) from test_numeric;

will fail to perform inverse transitions on *EVERY* row. 
For int types this regression is only about 2-3%, but it will likely increase as the comparison function gets more expensive. I wondered if it would be worth attempting to check the window's order by and compare that to the aggregate's sort operator and not perform inverse transitions in this case. Although a query such as this:

select first_value(id) over (order by id DESC rows between current row and unbounded following) from test_numeric;

would give the same results to the user and will likely be much much faster anyway, so perhaps detecting that is not worth it.

The performance regression seems a bit unavoidable in the case where the order by col is not the same as the max's column, but the 2 columns happen to sort the results in the same way. I can't really think of a way to detect this. But then I think you'll get back what you lost even if you could perform just 1 inverse transition in the entire processing of the window, so that case is probably quite unlikely to hit and if you compare it to the possible cases that it would improve.

I focused quite a bit on testing the performance of SUM(numeric) due to a few extra lines of code that I had to add into do_numeric_accum() to allow tracking of the maximum dscale'd numeric. My plain aggregate test on this showed no measurable regression, though there will be a couple of extra CPU cycles in there on each call to the function.

I didn't really bother doing much performance testing on aggregates like count(*) and sum(int) as these can always perform inverse transitions, the best cases for each of the other queries should be used as a guide for how much performance will increase.

I'm pretty happy to see that for processing as little as 20 thousand rows that the patch can increase performance by up to 1000 times! And this increase will just get bigger with more rows.

Here's some sample results from the attached txt file:

select sum(num_fixed_scale) over(order by id rows between current row and unbounded following) from test_numeric;
Results in miliseconds 

Patched: 
50.958
63.820
64.719
52.860
64.921

Unpatched:
61358.856
61180.370
62642.495
61416.385
62383.012

Comp. Patched Unpatched
average 59.4556 61796.2236 Avg Increase (times) 1039.367589 103936.76%
median 63.82 61416.385 Median Increase (times) 962.3375901 96233.76%

If anyone can think of any other workloads that they would like tested please let me know.

Regards

David Rowley.

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Tue, Jan 21, 2014 at 3:20 AM, Florian Pflug <fgp@phlo.org> wrote:
On Jan20, 2014, at 08:42 , David Rowley <dgrowleyml@gmail.com> wrote:
>> On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug <fgp@phlo.org> wrote:
>> * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
>>   it's the last commit, so if you object to that, then you can merge up to
>>   eafa72330f23f7c970019156fcc26b18dd55be27 instead of
>>   de3d9148be9732c4870b76af96c309eaf1d613d7.
>
>
> Seems like sfunc really should be tfunc then we could have invtfunc. I'd probably
> understand this better if I knew what the 's' was for in sfunc. I've not applied
> this just yet. Do you have a reason why you think it's better?

My issue with just "invfunc" is mainly that it's too generic - it doesn't tell
you what it's supposed to be the inverse of.

I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and that
the naming is inspired by control theory, where the function which acts on the
state space is often called S.

 
Ok, that makes more sense now and it seems like a reasonable idea. I'm not not quite sure yet as when someone said upthread that these "negative transition functions" as I was calling them at the time should really be called "inverse transition functions", I then posted that I was going to call the create aggregate option "invfunc" which nobody seemed to object to. I just don't want to go and change that now. It is very possible this will come up again when the committer is looking at the patch. It would be a waste if it ended up back at invfunc after we changed it to invsfunc.

Regards

David Rowley

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp@phlo.org> wrote:
On Jan21, 2014, at 10:53 , David Rowley <dgrowleyml@gmail.com> wrote:
> It came to me that it might be possible to implement inverse transitions
> for floating point aggregates by just detecting if precision has been
> lost during forward transitions.
>
> I've written the test to do this as:
>
> IF state.value + value = state.value AND value <> 0 THEN
>               newstate.precision_lost := true;
>               newstate.value := state.value;
>       ELSE
>               newstate.precision_lost := false;
>               newstate.value := state.value + value;
>       END IF;
>
>
> The inverse transition function checks the precision_lost and if it's true it
> returns NULL. The core code is now implemented (thanks to Florian) to
> re-aggregate when NULL is returned from the inverse transition function.

That's not sufficient, I fear. You can lose all significant digits of the value
and still have precision_lost = false afterwards. Try summing over 1e16, 1.01.
"SELECT 1e16::float8 + 1.01::float8 = 1e16::float8" returns FALSE, yet
"SELECT 1e16::float8 + 1.01::float8 - 1e16::float8" returns "2" where "1.01"
would have been correct. That's still too much precision loss.

I'm quite certain that the general idea has merit, but the actual
invertibility condition is going to be more involved. If you want to play with
this, I think the first step has to be to find a set of guarantees that
SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
summands all have the same sign, the error is bound by C * N, where C is some
(machine-specific?) constant (about 1e-15 or so), and N is the number of input
rows. Or at least so I think from looking at SUMs over floats in descending
order, which I guess is the worst case. You could, for example, try to see if
you can find a invertibility conditions which guarantees the same, but allows
C to be larger. That would put a bound on the number of digits lost by the new
SUM(float) compared to the old one.


I guess if the case is that normal (non-window) sum(float) results are undefined unless you add an order by to the aggregate then I guess there is no possible logic to put in for inverse transitions that will make them behave the same as an undefined behaviour.
 
I don't have high hopes for this getting int 9.4, though.


Yeah I'm going to drop it. I need to focus on documents and performance testing.
 
> If it seems sound enough, then I may implement it in C to see how much
> overhead it adds to forward aggregation for floating point types, but even
> if it did add too much overhead to forward aggregation it might be worth
> allowing aggregates to have 2 forward transition functions and if the 2nd
> one exists then it could be used in windowing functions where the frame
> does not have "unbounded following".

I don't think adding yet another type of aggregation function is the
solution here.


Why not?
 
I thought, if in the cases where we need to burden the forward transition functions with extra code to make inverse transitions possible, then why not have an extra transition function so that it does not slow down normal aggregation?

My testing of sum(numeric) last night does not show any slow down by that extra dscale tracking code that was added, but if it did then I'd be thinking seriously about 2 forward transition functions to get around the problem. I don't understand what would be wrong with that. The core code could just make use of the 2nd function if it existed and inverse transitions were thought to be required. i.e in a window context and does not have UNBOUNDED PRECEDING.
 
Regards

David Rowley

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan23, 2014, at 01:17 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>> If you want to play with
>> this, I think the first step has to be to find a set of guarantees that
>> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
>> summands all have the same sign, the error is bound by C * N, where C is some
>> (machine-specific?) constant (about 1e-15 or so), and N is the number of input
>> rows. Or at least so I think from looking at SUMs over floats in descending
>> order, which I guess is the worst case. You could, for example, try to see if
>> you can find a invertibility conditions which guarantees the same, but allows
>> C to be larger. That would put a bound on the number of digits lost by the new
>> SUM(float) compared to the old one.
>
> I guess if the case is that normal (non-window) sum(float) results are undefined
> unless you add an order by to the aggregate then I guess there is no possible
> logic to put in for inverse transitions that will make them behave the same as
> an undefined behaviour.

Actually, if sum(float) was really undefined, it'd be *very* easy to provide an
inverse transition function - just make it a no-op. Heck, you could then even
make the forward transition function a no-op, since the very definition of
"undefined behaviour" is "result can be anything, including setting your house
on fire". The point is, it's *not* undefined - it's just imprecise. And the
imprecision can even be quantified, it just depends on the number of
input rows (the equal-sign requirement is mostly there to make "imprecision"
equivalent to "relative error").

>> > If it seems sound enough, then I may implement it in C to see how much
>> > overhead it adds to forward aggregation for floating point types, but even
>> > if it did add too much overhead to forward aggregation it might be worth
>> > allowing aggregates to have 2 forward transition functions and if the 2nd
>> > one exists then it could be used in windowing functions where the frame
>> > does not have "unbounded following".
>>
>> I don't think adding yet another type of aggregation function is the
>> solution here.
>
> Why not?
>
> I thought, if in the cases where we need to burden the forward transition
> functions with extra code to make inverse transitions possible, then why
> not have an extra transition function so that it does not slow down normal
> aggregation?
>
> My testing of sum(numeric) last night does not show any slow down by that
> extra dscale tracking code that was added, but if it did then I'd be thinking
> seriously about 2 forward transition functions to get around the problem.
> I don't understand what would be wrong with that. The core code could just
> make use of the 2nd function if it existed and inverse transitions were
> thought to be required. i.e in a window context and does not
> have UNBOUNDED PRECEDING.

First, every additional function increases the maintenance burden, and at
some point the expected benefit simply isn't worth it. IMHO at least.

Secondly, you'd really need a second aggregate definition - usually, the
non-invertible aggregate will get away with a much smaller state representation.
Aggregates with state type "internal" could hack their way around that by
simply not using the additional parts of their state structure, but e.g.
plpgsql aggregates would still incur overhead due to repeated copying of
a unnecessary large state value. If it's possible to represent the
forward-only but not the invertible state state with a copy-by-value type,
that overhead could easily be a large part of the total overhead you paid
for having an inverse transition function.

And lastly, we could achieve much of the benefit of a second transition
function by simply telling the forward transition function whether to
expect inverse transition function calls. We already expose things like
the aggregation memory context to C-language transition functions, so the
basic mechanism is already there. In fact, I pondered whether to do this -
but then figured I'd leave it to however needs that facility first. Though
it wouldn't be much code - basically, setting a flag in the WindowAggState,
and exporting a function to be used by aggregates to read it, similar
to what AggCheckCallContext does.

best regards,
Florian Pflug







Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan23, 2014, at 01:07 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Tue, Jan 21, 2014 at 3:20 AM, Florian Pflug <fgp@phlo.org> wrote:
>> On Jan20, 2014, at 08:42 , David Rowley <dgrowleyml@gmail.com> wrote:
>> >> On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug <fgp@phlo.org> wrote:
>> >> * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
>> >>   it's the last commit, so if you object to that, then you can merge up to
>> >>   eafa72330f23f7c970019156fcc26b18dd55be27 instead of
>> >>   de3d9148be9732c4870b76af96c309eaf1d613d7.
>> >
>> >
>> > Seems like sfunc really should be tfunc then we could have invtfunc. I'd probably
>> > understand this better if I knew what the 's' was for in sfunc. I've not applied
>> > this just yet. Do you have a reason why you think it's better?
>>
>> My issue with just "invfunc" is mainly that it's too generic - it doesn't tell
>> you what it's supposed to be the inverse of.
>>
>> I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and that
>> the naming is inspired by control theory, where the function which acts on the
>> state space is often called S.
>
> Ok, that makes more sense now and it seems like a reasonable idea. I'm not not quite
> sure yet as when someone said upthread that these "negative transition functions" as
> I was calling them at the time should really be called "inverse transition functions",
> I then posted that I was going to call the create aggregate option "invfunc" which
> nobody seemed to object to. I just don't want to go and change that now. It is very
> possible this will come up again when the committer is looking at the patch. It would
> be a waste if it ended up back at invfunc after we changed it to invsfunc.

Since we already settled on "inverse transition function", I kinda doubt that
calling the parameter invsfunc is going to meet a lot of resistance. But we can put
that off a little longer still...

I've pushed a few additional things to https://github.com/fgp/postgres/tree/invtrans.

* I update the CREATE AGGREGATE documentation, trying to include the description of the various modes of inverse
transitionfunctions into the paragraphs which already explained about STRICT for transition functions and such. 

* I've also updated the list of window functions to include a list of those aggregates which potentially need to
restartthe computation, i.e. MIN/MAX and  the like. 

* I've changed nodeWindowAgg.c to use per-aggregate aggregation contexts for the invertible aggregates. Without that,
theaggregate context is potentially never reset, because that previously required *all* the aggregates to restart at
thesame time. That would be OK if we were sure not to leak unbounded amounts of stuff stores in that context, but
unfortunatelywe sometimes do. For example, whenever a strict, invertible aggregate ends up with only NULL inputs, we
re-initializethe aggregation, which leaks the old state value. We could pfree() that of course, but that state value
mightreference other stuff that we don't know about and thus cannot free. Separating the aggregation contexts is the
onlysolution I came up with, so I did that. 

* I've also tweaked an if to flag aggregates as invertible only if the frame head can actually move, i.e. if the frame
startclause is something other than UNBOUNDED PRECEEDING. Since the choice of whether to use a private aggregation
contextis driven by that flag, that also makes the above apply only to aggregates were the inverse transition function
isactually used. 

I hope to find some time tomorrow or so to complete my pass through the documentation -
what's still missing as an explanation of the EXPLAIN VERBOSE ANALYZE field and maybe
some cleanup of xaggr.sgml.

Do you have any additional things pending?

best regards,
Florian Pflug








Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 22 January 2014 09:54, David Rowley <dgrowleyml@gmail.com> wrote:
> I've performed some more benchmarks on this patch tonight. The results and
> full recreation scripts are attached along with the patch it was tested
> against.
>

I noticed that the rate of changes to this patch has dropped off,
which I took as sign that it is ready for review, so I started doing
that.

My first thought was wow, this is a big patch!

I think it should probably be broken up. It might be overly ambitious
to try to get all of this committed during this commitfest, and in any
case, I suspect that the committer would probably choose to commit it
in stages. Perhaps something like:

Patch 1- Basic support for inverse transition functions, CREATE AGGREGATE
support and doc updates. This should include test cases to validate
that the underlying executor changes are correct, by defining custom
aggregates such as sum(int) and array_agg() using inverse transition
functions.

Patch 2- Add built-in inverse transition functions for count, sum(int), and friends.

Patch 3, 4...- Other related groups of built-in aggregates. By this point, it
should be a fairly mechanical process.

Splitting it up this way now should help to focus on getting patch 1
correct, without being distracted by all the other aggregates that may
or may not usefully be made to have inverse transition functions. I
think the value of the feature has been proved, and it is good to see
that it can be applied to so many aggregates, but let's not try to do
it all at once.


Regarding what would be in patch 1, I've only given it a cursory look,
but I did immediately notice a problem with the nodeWindowAgg.c
changes --- for aggregates with non-strict transition functions,
something equivalent to the not null counting code is needed to make
it work correctly with FILTER. For example, the following query gives
the wrong results with this patch:

SELECT array_agg(i) FILTER (WHERE i%3=0)      OVER (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) FROM
generate_series(1,10)g(i);
 

Perhaps it should just always count the non-skipped values added after
removal of filtered and null values. Then you might be able to
simplify things by getting rid of ignore_nulls from
WindowStatePerAggData and replacing notnullcount with a more general
valueCount to track the number of values actually added to transValue.

I haven't read through nodeWindowAgg.c in any detail yet (I think it
will take me a while to fully get my head round that code) but I think
it needs some more test cases to verify that the various corner cases
are covered.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan24, 2014, at 08:47 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> I think it should probably be broken up. It might be overly ambitious
> to try to get all of this committed during this commitfest, and in any
> case, I suspect that the committer would probably choose to commit it
> in stages. Perhaps something like:
>
> Patch 1
> - Basic support for inverse transition functions, CREATE AGGREGATE
> support and doc updates. This should include test cases to validate
> that the underlying executor changes are correct, by defining custom
> aggregates such as sum(int) and array_agg() using inverse transition
> functions.
>
> Patch 2
> - Add built-in inverse transition functions for count, sum(int), and friends.
>
> Patch 3, 4...
> - Other related groups of built-in aggregates. By this point, it
> should be a fairly mechanical process.
>
> Splitting it up this way now should help to focus on getting patch 1
> correct, without being distracted by all the other aggregates that may
> or may not usefully be made to have inverse transition functions. I
> think the value of the feature has been proved, and it is good to see
> that it can be applied to so many aggregates, but let's not try to do
> it all at once.

Working on that now, will post individual patches later today.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan24, 2014, at 08:47 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> My first thought was wow, this is a big patch!
>
> I think it should probably be broken up. It might be overly ambitious
> to try to get all of this committed during this commitfest, and in any
> case, I suspect that the committer would probably choose to commit it
> in stages. Perhaps something like:
>
> <snipped>
>
> Splitting it up this way now should help to focus on getting patch 1
> correct, without being distracted by all the other aggregates that may
> or may not usefully be made to have inverse transition functions. I
> think the value of the feature has been proved, and it is good to see
> that it can be applied to so many aggregates, but let's not try to do
> it all at once.

I've now split this up into

invtrans_base: Basic machinery plus tests with SQL-language aggregates
invtrans_arith: COUNT(),SUM(),AVG(),STDDEV() and the like
invtrans_minmax: MIN(),MAX(),BOOL_AND(),BOOL_OR()
invtrans_collecting: ARRAY_AGG(), STRING_AGG()
invtrans_docs: Documentation

Each of these corresponds to one of the invtrans_* branches in my
github repo at https://github.com/fgp/postgres

Attached are the latest versions of these patches.

> Regarding what would be in patch 1, I've only given it a cursory look,
> but I did immediately notice a problem with the nodeWindowAgg.c
> changes --- for aggregates with non-strict transition functions,
> something equivalent to the not null counting code is needed to make
> it work correctly with FILTER. For example, the following query gives
> the wrong results with this patch:
>
> SELECT array_agg(i) FILTER (WHERE i%3=0)
>       OVER (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
>  FROM generate_series(1,10) g(i);
>
> Perhaps it should just always count the non-skipped values added after
> removal of filtered and null values. Then you might be able to
> simplify things by getting rid of ignore_nulls from
> WindowStatePerAggData and replacing notnullcount with a more general
> valueCount to track the number of values actually added to transValue.

Ugh, very true. Though I don't see how to get rid of ignore_nulls, unless
we rewrite the filter clause to

  (filter_clause) AND arg1 IS NOT NULL AND ... argN IS NOT NULL

for strict aggregates, which would probably be much slower and wouldn't
be right if any of the args call volatile functions.

But I've generalized notnullcount to transValueCount as you suggested,
and got rid of noTransValue, since that's now equivalent to
transValueCount == 0 && transValueIsNull.

string_agg() was also severely broken - it needs to track the number of
non-NULL strings within the state, since it skips rows with a NULL
string, but not with a NULL delimiter, which means we can't mark it
strict to make nodeWindowAgg.c take care of that.

> I haven't read through nodeWindowAgg.c in any detail yet (I think it
> will take me a while to fully get my head round that code) but I think
> it needs some more test cases to verify that the various corner cases
> are covered.

Yeah, I hadn't really gotten around to doing that yet :-(

The base patch now contains tests of the various strict and non-strict
cases, both in the filtered and non-filtered cases.

The invtrans_arith patch contains David's original tests, minus the one
with the logging inverse transition function, since that's tested by the
base patch already. It doesn't test sum(interval) though, I think (though
maybe I missed it, it's quite late already...). I think it maybe should -
not sure if there are any gotchas with SUM(interval) and inverse
transition functions.

The invtrans_collecting patch contains tests of the NULL behaviour of
string_agg() and array_agg() - I hope I've covered all the cases.

The invtrans_minmax patch doesn't contain any patches yet - David, could
you provide some for these functions, and also for bool_and and bool_or?
We don't need to test every datatype, but a few would be nice.

I've also run the regression tests of invtrans_arith and
invtrans_collecting against invtrans_base (i.e, without the actual
inverse transition functions) and they still succeed, meaning the
inverse transition functions don't change the results.

Doing the same against HEAD instead of invtrans_base yields a few
diffs, but all of them look OK - the stuff the explicitly tests
inverse transition functions via those "logging" aggregations that
simply log all calls to both the forward and inverse transition
function in a long string obviously yield different results if
inverse transition functions aren't used.

best regards,
Florian Pflug




Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Thu, Jan 23, 2014 at 1:57 PM, Florian Pflug <fgp@phlo.org> wrote:
On Jan23, 2014, at 01:17 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>> If you want to play with
>> this, I think the first step has to be to find a set of guarantees that
>> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
>> summands all have the same sign, the error is bound by C * N, where C is some
>> (machine-specific?) constant (about 1e-15 or so), and N is the number of input
>> rows. Or at least so I think from looking at SUMs over floats in descending
>> order, which I guess is the worst case. You could, for example, try to see if
>> you can find a invertibility conditions which guarantees the same, but allows
>> C to be larger. That would put a bound on the number of digits lost by the new
>> SUM(float) compared to the old one.
>
> I guess if the case is that normal (non-window) sum(float) results are undefined
> unless you add an order by to the aggregate then I guess there is no possible
> logic to put in for inverse transitions that will make them behave the same as
> an undefined behaviour.

Actually, if sum(float) was really undefined, it'd be *very* easy to provide an
inverse transition function - just make it a no-op. Heck, you could then even
make the forward transition function a no-op, since the very definition of
"undefined behaviour" is "result can be anything, including setting your house
on fire". The point is, it's *not* undefined - it's just imprecise. And the
imprecision can even be quantified, it just depends on the number of
input rows (the equal-sign requirement is mostly there to make "imprecision"
equivalent to "relative error").


My apologies, I meant to use the term nondeterministic rather than undefined. There's really not any need that I can see to turn things silly here.

My point was more that since sum(float) can give different results if it used an index scan rather than a seq scan, trying to get the inverse transition to match something that gives varying results sounds like a tricky task to take on.
 
>> > If it seems sound enough, then I may implement it in C to see how much
>> > overhead it adds to forward aggregation for floating point types, but even
>> > if it did add too much overhead to forward aggregation it might be worth
>> > allowing aggregates to have 2 forward transition functions and if the 2nd
>> > one exists then it could be used in windowing functions where the frame
>> > does not have "unbounded following".
>>
>> I don't think adding yet another type of aggregation function is the
>> solution here.
>
> Why not?
>
> I thought, if in the cases where we need to burden the forward transition
> functions with extra code to make inverse transitions possible, then why
> not have an extra transition function so that it does not slow down normal
> aggregation?
>
> My testing of sum(numeric) last night does not show any slow down by that
> extra dscale tracking code that was added, but if it did then I'd be thinking
> seriously about 2 forward transition functions to get around the problem.
> I don't understand what would be wrong with that. The core code could just
> make use of the 2nd function if it existed and inverse transitions were
> thought to be required. i.e in a window context and does not
> have UNBOUNDED PRECEDING.

First, every additional function increases the maintenance burden, and at
some point the expected benefit simply isn't worth it. IMHO at least.


There's little point in arguing for this as we've managed to narrow any forward transition over head to background noise so far, my point more was if there was no other way to maintain performance and have an inverse transition function that this may be a solution to that problem.
 
Secondly, you'd really need a second aggregate definition - usually, the
non-invertible aggregate will get away with a much smaller state representation.
Aggregates with state type "internal" could hack their way around that by
simply not using the additional parts of their state structure, but e.g.
plpgsql aggregates would still incur overhead due to repeated copying of
a unnecessary large state value. If it's possible to represent the
forward-only but not the invertible state state with a copy-by-value type,
that overhead could easily be a large part of the total overhead you paid
for having an inverse transition function.


I had imagined they keep the same state type and don't use any extra variables that are defined for the forward transition function that is invertible.
 
And lastly, we could achieve much of the benefit of a second transition
function by simply telling the forward transition function whether to
expect inverse transition function calls. We already expose things like
the aggregation memory context to C-language transition functions, so the
basic mechanism is already there. In fact, I pondered whether to do this -
but then figured I'd leave it to however needs that facility first. Though
it wouldn't be much code - basically, setting a flag in the WindowAggState,
and exporting a function to be used by aggregates to read it, similar
to what AggCheckCallContext does.


Sounds like a good idea, but what would the solution aggregate transition functions that are not written in C?
 
best regards,
Florian Pflug





Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> My point was more that since sum(float) can give different results if it
> used an index scan rather than a seq scan, trying to get the inverse
> transition to match something that gives varying results sounds like a
> tricky task to take on.

This is just a variant of the same excuse we heard before.  The question
is not whether sum(float8) can give bad results; the question is whether
we are going to break applications that are using it carefully (ie with
an appropriate ORDER BY) and expecting to get good results.

>> Secondly, you'd really need a second aggregate definition - usually, the
>> non-invertible aggregate will get away with a much smaller state
>> representation.

Yeah.  I think the idea of multiple transition functions in a single
aggregate definition is pretty broken to start with, but the likelihood
that they couldn't share aggregate state types puts it completely beyond
sanity.  We're not going to start inventing "stype2" etc.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan25, 2014, at 09:50 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Thu, Jan 23, 2014 at 1:57 PM, Florian Pflug <fgp@phlo.org> wrote:
>> On Jan23, 2014, at 01:17 , David Rowley <dgrowleyml@gmail.com> wrote:
>> > On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>> >> If you want to play with
>> >> this, I think the first step has to be to find a set of guarantees that
>> >> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the
>> >> summands all have the same sign, the error is bound by C * N, where C is some
>> >> (machine-specific?) constant (about 1e-15 or so), and N is the number of input
>> >> rows. Or at least so I think from looking at SUMs over floats in descending
>> >> order, which I guess is the worst case. You could, for example, try to see if
>> >> you can find a invertibility conditions which guarantees the same, but allows
>> >> C to be larger. That would put a bound on the number of digits lost by the new
>> >> SUM(float) compared to the old one.
>> >
>> > I guess if the case is that normal (non-window) sum(float) results are undefined
>> > unless you add an order by to the aggregate then I guess there is no possible
>> > logic to put in for inverse transitions that will make them behave the same as
>> > an undefined behaviour.
>>
>> Actually, if sum(float) was really undefined, it'd be *very* easy to provide an
>> inverse transition function - just make it a no-op. Heck, you could then even
>> make the forward transition function a no-op, since the very definition of
>> "undefined behaviour" is "result can be anything, including setting your house
>> on fire". The point is, it's *not* undefined - it's just imprecise. And the
>> imprecision can even be quantified, it just depends on the number of
>> input rows (the equal-sign requirement is mostly there to make "imprecision"
>> equivalent to "relative error").
>
> My apologies, I meant to use the term nondeterministic rather than undefined.
> There's really not any need that I can see to turn things silly here.

I wasn't trying to be absurd, I was trying to get a point across.

> My point was more that since sum(float) can give different results if it used
> an index scan rather than a seq scan, trying to get the inverse transition to
> match something that gives varying results sounds like a tricky task to take on.

You don't have to match it digit-by-digit! In that I fully agree with Kevin -
floats are *always* an approximation, and so is thus SUM(float). Summarization
order is BTW not the only source of non-determinism for SUM(float) - the exact
result can very between architectures, and for some architectures between
compilers. (Intel is one of these, due to their 80-bit extended precision format
that gets truncated to 64-bit when stored to main memory).

But in a large number of cases, they won't vary by *much*, which is *the* reason
why SUM(float) is *not* totally useless. And reasoning about SUM(float) which
ignores that by simply calling it "non-deterministic", "undefined" or whatever,
without any quantification of the possible error, has thus zero chance of
leading to interesting conclusions.


>> Secondly, you'd really need a second aggregate definition - usually, the
>> non-invertible aggregate will get away with a much smaller state representation.
>> Aggregates with state type "internal" could hack their way around that by
>> simply not using the additional parts of their state structure, but e.g.
>> plpgsql aggregates would still incur overhead due to repeated copying of
>> a unnecessary large state value. If it's possible to represent the
>> forward-only but not the invertible state state with a copy-by-value type,
>> that overhead could easily be a large part of the total overhead you paid
>> for having an inverse transition function.
>
> I had imagined they keep the same state type and don't use any extra variables
> that are defined for the forward transition function that is invertible.

Yeah, and the above explains that at least for non-C-language aggregates,
passing around that unnecessarily large state may very well prove to be the
source of a large part, if not almost all, of the overhead. So having just
a separate forward transition function will buy you almost nothing or some
cases.

I just tried this. I defined two aggregates mymax(int4) and myfastmax(int4),
both with just a forward transition function, both SQL-language functions.
mymax uses a composite type for the state containing an int4 field holding
the current maximum, and a dummy int8 field. myfastmax uses a plain int4
for the state, holding the current maximum. Both forward transition
functions essentially do
 case when current_max > v then current_max else v end

On my laptop, computing the maximum of 1e6 rows takes about 4.5 seconds with
myfastmax and 7.8 seconds with mymax. If make mymax's transition function
increment the dummy field on every transition, the time increases from 7.8
to 8.2 seconds. So here, using a composite type for the state accounts for
about 3.3 seconds, or 40%, of the total runtime of 9 seconds, whereas the
increment costs about 0.4 seconds or 5% of the total runtime.

>> And lastly, we could achieve much of the benefit of a second transition
>> function by simply telling the forward transition function whether to
>> expect inverse transition function calls. We already expose things like
>> the aggregation memory context to C-language transition functions, so the
>> basic mechanism is already there. In fact, I pondered whether to do this -
>> but then figured I'd leave it to however needs that facility first. Though
>> it wouldn't be much code - basically, setting a flag in the WindowAggState,
>> and exporting a function to be used by aggregates to read it, similar
>> to what AggCheckCallContext does.
>
> Sounds like a good idea, but what would the solution aggregate transition
> functions that are not written in C?

See above - non-C aggregates are exactly the ones that benefit least from
a specialized non-invertible forward transition function, because the have
to serialize and deserialize the state type on every transition...

best regards,
Florian Pflug










Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Sat, Jan 25, 2014 at 3:21 PM, Florian Pflug <fgp@phlo.org> wrote:
On Jan24, 2014, at 08:47 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
The invtrans_minmax patch doesn't contain any patches yet - David, could
you provide some for these functions, and also for bool_and and bool_or?
We don't need to test every datatype, but a few would be nice.


I've added a few regression tests for min, min and bool_or and bool_and. I've pushed these up to here:


I did wonder if you'd want to see uses of FILTER in there as I'm thinking that should really be covered by the core patch and the tests here really should just be checking the inverse transition functions for min and max.

As for the data types tested, I ended just adding tests for int and text for min and max. Let me know if you think that more should be tested.

As for bool_or and bool_and. I didn't think there was much extra that would need tested after I added 1 test. It's pretty simple code and adding anything extra seems like it would be testing something else.
 
Regards

David Rowley

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan26, 2014, at 00:24 , David Rowley <dgrowleyml@gmail.com> wrote:
>> On Sat, Jan 25, 2014 at 3:21 PM, Florian Pflug <fgp@phlo.org> wrote:
>> On Jan24, 2014, at 08:47 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> The invtrans_minmax patch doesn't contain any patches yet - David, could
>> you provide some for these functions, and also for bool_and and bool_or?
>> We don't need to test every datatype, but a few would be nice.
>
> I've added a few regression tests for min, min and bool_or and bool_and.
> I've pushed these up to here:
>
> https://github.com/david-rowley/postgres/commits/invtrans_minmax

OK, I've pushed this to github. I haven't produced new patches yet - I
think it's probably better to wait for a few things to pile up, to make
this less of a moving target. But ultimately, this is up to Dean - Dean,
I'll do whatever makes your life as a reviewer easier.

> I did wonder if you'd want to see uses of FILTER in there as I'm thinking
> that should really be covered by the core patch and the tests here really
> should just be checking the inverse transition functions for min and max.

I don't mind the FILTER - when this gets committed, the distinction between
what was in which patch goes away anyway. What I did realize when merging
this is that we currently don't tests the case of a window which lies
fully after the current row (i.e. BETWEEN N FOLLOWING AND m FOLLOWING). We
do test BETWEEN n PRECEDING AND m PRECEDING, because the array_agg and
string_agg tests do that. So maybe some of the MIN/MAX or arithmetic tests
could exercise that case?

> As for the data types tested, I ended just adding tests for int and text
> for min and max. Let me know if you think that more should be tested.

That's sufficient for the regression tests, I think.

> As for bool_or and bool_and. I didn't think there was much extra that would
> need tested after I added 1 test. It's pretty simple code and adding anything
> extra seems like it would be testing something else.

Sounds fine to me.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 25 January 2014 02:21, Florian Pflug <fgp@phlo.org> wrote:
> I've now split this up into
>
> invtrans_base: Basic machinery plus tests with SQL-language aggregates
> invtrans_arith: COUNT(),SUM(),AVG(),STDDEV() and the like
> invtrans_minmax: MIN(),MAX(),BOOL_AND(),BOOL_OR()
> invtrans_collecting: ARRAY_AGG(), STRING_AGG()
> invtrans_docs: Documentation
>

Thanks. That makes it a bit easier to review, and hopefully easier to commit.

The thing that I'm currently trying to get my head round is the
null/not null, strict/not strict handling in this patch, which I find
a bit odd, particularly when transfn and inv_transfn have different
strictness settings:


strict transfn vs non-strict inv_transfn
----------------------------------------

This case is explicitly forbidden, both in CREATE AGGREGATE and in the
executor. To me, that seems overly restrictive --- if transfn is
strict, then you know for sure that no NULL values are added to the
aggregate state, so surely it doesn't matter whether or not
inv_transfn is strict. It will never be asked to remove NULL values.

I think there are definite use-cases where a user might want to use a
pre-existing function as the inverse transition function, so it seems
harsh to force them to wrap it in a strict function in this case.

AFAICS, the code in advance/retreat_windowaggregate should just work
if those checks are removed.


non-strict transfn vs strict inv_transfn
----------------------------------------

At first this seems as though it must be an error --- the forwards
transition function allows NULL values into the aggregate state, and
the inverse transition function is strict, so it cannot remove them.

But actually what the patch is doing in this case is treating the
forwards transition function as partially strict --- it won't be
passed NULL values (at least in a window context), but it may be
passed a NULL state in order to build the initial state when the first
non-NULL value arrives.

It looks like this behaviour is intended to support aggregates that
ignore NULL values, but cannot actually be made to have a strict
transition function. I think typically this is because the aggregate's
initial state is NULL and it's state type differs from the type of the
values being aggregated, and so can't be automatically created from
the first non-NULL value.

That all seems quite ugly though, because now you have a transition
function that is not strict, which is passed NULL values when used in
a normal aggregate context, and not passed NULL values when used in a
window context (whether sliding or not), except for the NULL state for
the first non-NULL value.

I'm not sure if there is a better way to do it though. If we disallow
this case, these aggregates would have to use non-strict functions for
both the forward and inverse transition functions, which means they
would have to implement their own non-NULL value counting.
Alternatively, allowing strict transition functions for these
aggregates would require that we provide some other way to initialise
the state from the first non-NULL input, such as a new initfn.

Thoughts?

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan27, 2014, at 23:28 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> strict transfn vs non-strict inv_transfn
> ----------------------------------------
> 
> This case is explicitly forbidden, both in CREATE AGGREGATE and in the
> executor. To me, that seems overly restrictive --- if transfn is
> strict, then you know for sure that no NULL values are added to the
> aggregate state, so surely it doesn't matter whether or not
> inv_transfn is strict. It will never be asked to remove NULL values.
> 
> I think there are definite use-cases where a user might want to use a
> pre-existing function as the inverse transition function, so it seems
> harsh to force them to wrap it in a strict function in this case.
> 
> AFAICS, the code in advance/retreat_windowaggregate should just work
> if those checks are removed.

True. It didn't use to in earlier version of the patch because the advance
and retreat functions looked at the strict settings directly, but now that
there's an ignore_nulls flag in the executor node, the only requirement
should be that ignore_nulls is set if either of the transition functions
is strict.

I'm not sure that the likelihood of someone wanting to combine a strict
forward with a non-strict inverse function is very hight, but neither

> non-strict transfn vs strict inv_transfn
> ----------------------------------------
> 
> At first this seems as though it must be an error --- the forwards
> transition function allows NULL values into the aggregate state, and
> the inverse transition function is strict, so it cannot remove them.
> 
> But actually what the patch is doing in this case is treating the
> forwards transition function as partially strict --- it won't be
> passed NULL values (at least in a window context), but it may be
> passed a NULL state in order to build the initial state when the first
> non-NULL value arrives.

Exactly.

> It looks like this behaviour is intended to support aggregates that
> ignore NULL values, but cannot actually be made to have a strict
> transition function. I think typically this is because the aggregate's
> initial state is NULL and it's state type differs from the type of the
> values being aggregated, and so can't be automatically created from
> the first non-NULL value.

Yes. I added this because the alternative would haven been to count
non-NULL values in most of the existing SUM() aggregates.

> That all seems quite ugly though, because now you have a transition
> function that is not strict, which is passed NULL values when used in
> a normal aggregate context, and not passed NULL values when used in a
> window context (whether sliding or not), except for the NULL state for
> the first non-NULL value.

Ugh, true. Clearly, nodeAgg.c needs to follow what nodeWindowAgg.c does
and skip NULL inputs if the aggregate has a strict inverse transition
function. That fact that we never actually *call* the inverse doesn't
matter. Will fix that once we decided on the various strictness issues.

> I'm not sure if there is a better way to do it though. If we disallow
> this case, these aggregates would have to use non-strict functions for
> both the forward and inverse transition functions, which means they
> would have to implement their own non-NULL value counting.
> Alternatively, allowing strict transition functions for these
> aggregates would require that we provide some other way to initialise
> the state from the first non-NULL input, such as a new initfn.

I'm not sure an initfn would really help. It would allow us to return
to the initial requirement that the strict settings match - but you
already deem the check that the forward transition function can only
be strict if the inverse is also overly harsh, so would that really be
an improvement? It's also cost us an additional pg_proc entry per aggregate.

Another idea would be to have an explicit nulls=ignore|process option
for CREATE AGGREGATE. If nulls=process and either of the transition
functions are strict, we could either error out, or simply do what
normal functions calls do and pretend they return NULL for NULL inputs.
Not sure how the rule that forward transition functions may not return
NULL if there's an inverse transition function would fit in if we do
the latter, though.

The question is - is it worth it the effort to add that flag?

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 28 January 2014 20:16, Florian Pflug <fgp@phlo.org> wrote:
> On Jan27, 2014, at 23:28 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> strict transfn vs non-strict inv_transfn
>> ----------------------------------------
>>
>> This case is explicitly forbidden, both in CREATE AGGREGATE and in the
>> executor. To me, that seems overly restrictive --- if transfn is
>> strict, then you know for sure that no NULL values are added to the
>> aggregate state, so surely it doesn't matter whether or not
>> inv_transfn is strict. It will never be asked to remove NULL values.
>>
>> I think there are definite use-cases where a user might want to use a
>> pre-existing function as the inverse transition function, so it seems
>> harsh to force them to wrap it in a strict function in this case.
>>
>> AFAICS, the code in advance/retreat_windowaggregate should just work
>> if those checks are removed.
>
> True. It didn't use to in earlier version of the patch because the advance
> and retreat functions looked at the strict settings directly, but now that
> there's an ignore_nulls flag in the executor node, the only requirement
> should be that ignore_nulls is set if either of the transition functions
> is strict.
>
> I'm not sure that the likelihood of someone wanting to combine a strict
> forward with a non-strict inverse function is very hight, but neither
>

Me neither, but the checks to forbid it aren't adding anything, and I
think it's best to make it as flexible as possible.


>> non-strict transfn vs strict inv_transfn
>> ----------------------------------------
>>
>> At first this seems as though it must be an error --- the forwards
>> transition function allows NULL values into the aggregate state, and
>> the inverse transition function is strict, so it cannot remove them.
>>
>> But actually what the patch is doing in this case is treating the
>> forwards transition function as partially strict --- it won't be
>> passed NULL values (at least in a window context), but it may be
>> passed a NULL state in order to build the initial state when the first
>> non-NULL value arrives.
>
> Exactly.
>
>> It looks like this behaviour is intended to support aggregates that
>> ignore NULL values, but cannot actually be made to have a strict
>> transition function. I think typically this is because the aggregate's
>> initial state is NULL and it's state type differs from the type of the
>> values being aggregated, and so can't be automatically created from
>> the first non-NULL value.
>
> Yes. I added this because the alternative would haven been to count
> non-NULL values in most of the existing SUM() aggregates.
>
>> That all seems quite ugly though, because now you have a transition
>> function that is not strict, which is passed NULL values when used in
>> a normal aggregate context, and not passed NULL values when used in a
>> window context (whether sliding or not), except for the NULL state for
>> the first non-NULL value.
>
> Ugh, true. Clearly, nodeAgg.c needs to follow what nodeWindowAgg.c does
> and skip NULL inputs if the aggregate has a strict inverse transition
> function. That fact that we never actually *call* the inverse doesn't
> matter. Will fix that once we decided on the various strictness issues.
>

Yuk!


>> I'm not sure if there is a better way to do it though. If we disallow
>> this case, these aggregates would have to use non-strict functions for
>> both the forward and inverse transition functions, which means they
>> would have to implement their own non-NULL value counting.
>> Alternatively, allowing strict transition functions for these
>> aggregates would require that we provide some other way to initialise
>> the state from the first non-NULL input, such as a new initfn.
>
> I'm not sure an initfn would really help. It would allow us to return
> to the initial requirement that the strict settings match - but you
> already deem the check that the forward transition function can only
> be strict if the inverse is also overly harsh, so would that really be
> an improvement? It's also cost us an additional pg_proc entry per aggregate.
>
> Another idea would be to have an explicit nulls=ignore|process option
> for CREATE AGGREGATE. If nulls=process and either of the transition
> functions are strict, we could either error out, or simply do what
> normal functions calls do and pretend they return NULL for NULL inputs.
> Not sure how the rule that forward transition functions may not return
> NULL if there's an inverse transition function would fit in if we do
> the latter, though.
>
> The question is - is it worth it the effort to add that flag?
>

Yeah, I thought about a flag too. I think that could work quite nicely.

Basically where I'm coming from is trying to make this as flexible as
possible, without pre-judging the kinds of aggregates that users may
want to add.

One use-case I had in mind upthread was suppose you wanted to write a
custom version of array_agg that only collected non-NULL values. With
such a flag, that would be trivial, but with the current patch you'd
have to (count-intuitively) wrap the inverse transition function in a
strict function.

Another use-case I can imagine is suppose you wanted a custom version
of sum() that returned NULL if any of the input values were NULL. If
you knew that most of your data was non-NULL, you might still wish to
benefit from an inverse transition function. Right now the patch won't
allow that because of the error in advance_windowaggregate(), but
possibly that could be relaxed by forcing a restart in that case. If
I've understood correctly that should give similar to current
performance if NULLs are present, and enhanced performance as the
window moved over non-NULL data.

In that second case, it would also be nice if you could simply re-use
the existing sum forward and inverse transition functions, with a
different null-handling flag.

Also, in cases where the forwards transition function cannot be made
strict (e.g., it's state type is internal) and there is no inverse
transition function, there might be a small performance gain to be had
from not calling the transition function with NULL values, rather than
have it ignore them.

So I think an ignore-nulls flag would have real benefits, as well as
being a cleaner design than relying on a strict inverse transition
function.

What do others think?

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan29, 2014, at 09:59 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 28 January 2014 20:16, Florian Pflug <fgp@phlo.org> wrote:
>> On Jan27, 2014, at 23:28 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>> This case is explicitly forbidden, both in CREATE AGGREGATE and in the
>>> executor. To me, that seems overly restrictive --- if transfn is
>>> strict, then you know for sure that no NULL values are added to the
>>> aggregate state, so surely it doesn't matter whether or not
>>> inv_transfn is strict. It will never be asked to remove NULL values.
>>> 
>>> I think there are definite use-cases where a user might want to use a
>>> pre-existing function as the inverse transition function, so it seems
>>> harsh to force them to wrap it in a strict function in this case.
>> 
>> I'm not sure that the likelihood of someone wanting to combine a strict
>> forward with a non-strict inverse function is very hight, but neither
>> 
> 
> Me neither, but the checks to forbid it aren't adding anything, and I
> think it's best to make it as flexible as possible.

Ok.

>> Another idea would be to have an explicit nulls=ignore|process option
>> for CREATE AGGREGATE. If nulls=process and either of the transition
>> functions are strict, we could either error out, or simply do what
>> normal functions calls do and pretend they return NULL for NULL inputs.
>> Not sure how the rule that forward transition functions may not return
>> NULL if there's an inverse transition function would fit in if we do
>> the latter, though.
>> 
>> The question is - is it worth it the effort to add that flag?
> 
> One use-case I had in mind upthread was suppose you wanted to write a
> custom version of array_agg that only collected non-NULL values. With
> such a flag, that would be trivial, but with the current patch you'd
> have to (count-intuitively) wrap the inverse transition function in a
> strict function.

I'd be more convinced by that if doing so was actually possible for
non-superusers. But it isn't, because the aggregate uses "internal" as
it's state type and it's thus entirely up to the user to not crash the
database by mixing transfer and final functions with incompatible
state data. Plus, instead of creating a custom aggregate, one can just
use a FILTER clause to get rid of the NULL values.

> Another use-case I can imagine is suppose you wanted a custom version
> of sum() that returned NULL if any of the input values were NULL. If
> you knew that most of your data was non-NULL, you might still wish to
> benefit from an inverse transition function. Right now the patch won't
> allow that because of the error in advance_windowaggregate(), but
> possibly that could be relaxed by forcing a restart in that case.

That's not really true - that patch only forbids that if you insist on
representing the state "i have seen a NULL input" with a NULL state value.
But if you instead just count the number of NULLS in your transition
functions, all you need to do is to have your final function return NULL
if that count is not zero.

> If I've understood correctly that should give similar to current
> performance if NULLs are present, and enhanced performance as the
> window moved over non-NULL data.

Exactly - and this makes defining a NULL-sensitive SUM() this way
rather silly - a simple counter has very nearly zero overhead, and avoids
all rescans.

> In that second case, it would also be nice if you could simply re-use
> the existing sum forward and inverse transition functions, with a
> different null-handling flag.

Even if we had a nulls=process|ignore flag, SUM's transition functions
would still need to take that use-case into account explicitly to make
this work - at the very least, the forward transition function would
need to return NULL if the input is NULL instead of just skipping it as
it does now. But that would leads to completely unnecessary rescans, so
what we actually ought to do then is to make it track whether there have
been NULL inputs and make the finalfunc return NULL in this case. Which
would border on ironic, since the whole raison d'etre for this is to
*avoid* spreading NULL-counting logic around...

Plus, to make this actually useable, we'd have to document this, and tell
people how to define such a SUM aggregate. But I don't want to go there -
we really mustn't encourage people to mix-and-match built-in aggregates
with state type "internal", since whether they work or crash depends
on factors outside the control of the user.

And finally, you can get that behaviour quite easily by doing
 CASE WHEN bool_and(input IS NOT NULL) whatever_agg(input) ELSE NULL END

> So I think an ignore-nulls flag would have real benefits, as well as
> being a cleaner design than relying on a strict inverse transition
> function.

The more I think about this, the less convinced I am. In fact, I'm
currently leaning towards just forbidding non-strict forward transition
function with strict inverses, and adding non-NULL counters to the
aggregates that then require them. It's really only the SUM() aggregates
that are affected by this, I think.

For an argument in favour of that, look at how we handle *non*-NULL
initial states. For these, even aggregates with nulls=ignore would
require some tracking, because otherwise they'd simply return the
initial state if the input contained no non-NULL value. Which quite
probably isn't what you'd want - you'd usually want to return NULL
in this case (as all the built-in aggregates do).

Currently, all built-in aggregates with non-NULL initial states compute
an expected value of some kind (average, std. deviation, regressions),
and thus need to count the number of inputs anyway. But that's just
coincidence - if it wasn't for state type "internal", having SUM start
from 0 instead of NULL would be entirely reasonable.

So if we have a nulls=ignore flag, we ought to have an nullif=only_nulls
flag too to cover all the ground. But at that point, just making this
the transition function's responsibility seems more sensible.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote:
> In fact, I'm
> currently leaning towards just forbidding non-strict forward transition
> function with strict inverses, and adding non-NULL counters to the
> aggregates that then require them. It's really only the SUM() aggregates
> that are affected by this, I think.

I finally got around to doing that, and the results aren't too bad. The
attached patches required that the strictness settings of the forward and
reverse transition functions agree, and employ exactly the same NULL-skipping
logic we always had.

The only aggregates seriously affected by that change were SUM(int2) and
SUM(int4).

The SUM, AVG and STDDEV aggregates which use NumericAggState where
already mostly prepared for this - all they required were a few adjustments
to correctly handle the last non-NULL, non-NaN input being removed, and a few
additional PG_ARGISNULL calls for the inverse transition functions since they're
now non-strict. I've also modified them to unconditionally allocate the state
at the first call, instead upon seeing the first non-NULL input, but that isn't
strictly required. But without that, the state can have three classes of values -
SQL-NULL, NULL pointer and valid pointer, and that's just confusing...

SUM(int2) and SUM(int4) now simply use the same transition functions as
AVG(int2) and AVG(int4), which use an int8 array to track the sum of the inputs
and the number of inputs, plus a new final function int2int4_sum(). Previously,
they used a single int8 as their state type.

Since I was touching the code anyway, I removed some unnecessary inverse
transition functions - namely, int8_avg_accum_inv and numeric_avg_accum_inv. These
are completely identical to their non-avg cousins - the only difference between
the corresponding forward transition functions is whether they request computation
of sumX2 (i.e. the sum of squares of the inputs) or not.

I haven't yet updated the docs - it'll do that if and when there's consensus
about whether this is the way to go or not.

best regards,
Florian Pflug

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Feb20, 2014, at 02:48 , Florian Pflug <fgp@phlo.org> wrote:
> On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote:
>> In fact, I'm
>> currently leaning towards just forbidding non-strict forward transition
>> function with strict inverses, and adding non-NULL counters to the
>> aggregates that then require them. It's really only the SUM() aggregates
>> that are affected by this, I think.
>
> I finally got around to doing that, and the results aren't too bad. The
> attached patches required that the strictness settings of the forward and
> reverse transition functions agree, and employ exactly the same NULL-skipping
> logic we always had.
>
> The only aggregates seriously affected by that change were SUM(int2) and
> SUM(int4).
>
> The SUM, AVG and STDDEV aggregates which use NumericAggState where
> already mostly prepared for this - all they required were a few adjustments
> to correctly handle the last non-NULL, non-NaN input being removed, and a few
> additional PG_ARGISNULL calls for the inverse transition functions since they're
> now non-strict. I've also modified them to unconditionally allocate the state
> at the first call, instead upon seeing the first non-NULL input, but that isn't
> strictly required. But without that, the state can have three classes of values -
> SQL-NULL, NULL pointer and valid pointer, and that's just confusing...
>
> SUM(int2) and SUM(int4) now simply use the same transition functions as
> AVG(int2) and AVG(int4), which use an int8 array to track the sum of the inputs
> and the number of inputs, plus a new final function int2int4_sum(). Previously,
> they used a single int8 as their state type.
>
> Since I was touching the code anyway, I removed some unnecessary inverse
> transition functions - namely, int8_avg_accum_inv and numeric_avg_accum_inv. These
> are completely identical to their non-avg cousins - the only difference between
> the corresponding forward transition functions is whether they request computation
> of sumX2 (i.e. the sum of squares of the inputs) or not.
>
> I haven't yet updated the docs - it'll do that if and when there's consensus
> about whether this is the way to go or not.

I realized only after posting this that the patches no longer apply to current HEAD.

Attached are rebased patches.

best regards,
Florian Pflug

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 20 February 2014 01:48, Florian Pflug <fgp@phlo.org> wrote:
> On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote:
>> In fact, I'm
>> currently leaning towards just forbidding non-strict forward transition
>> function with strict inverses, and adding non-NULL counters to the
>> aggregates that then require them. It's really only the SUM() aggregates
>> that are affected by this, I think.
>
> I finally got around to doing that, and the results aren't too bad. The
> attached patches required that the strictness settings of the forward and
> reverse transition functions agree, and employ exactly the same NULL-skipping
> logic we always had.
>
> The only aggregates seriously affected by that change were SUM(int2) and
> SUM(int4).
>
> The SUM, AVG and STDDEV aggregates which use NumericAggState where
> already mostly prepared for this - all they required were a few adjustments
> to correctly handle the last non-NULL, non-NaN input being removed, and a few
> additional PG_ARGISNULL calls for the inverse transition functions since they're
> now non-strict. I've also modified them to unconditionally allocate the state
> at the first call, instead upon seeing the first non-NULL input, but that isn't
> strictly required. But without that, the state can have three classes of values -
> SQL-NULL, NULL pointer and valid pointer, and that's just confusing...
>
> SUM(int2) and SUM(int4) now simply use the same transition functions as
> AVG(int2) and AVG(int4), which use an int8 array to track the sum of the inputs
> and the number of inputs, plus a new final function int2int4_sum(). Previously,
> they used a single int8 as their state type.
>
> Since I was touching the code anyway, I removed some unnecessary inverse
> transition functions - namely, int8_avg_accum_inv and numeric_avg_accum_inv. These
> are completely identical to their non-avg cousins - the only difference between
> the corresponding forward transition functions is whether they request computation
> of sumX2 (i.e. the sum of squares of the inputs) or not.
>
> I haven't yet updated the docs - it'll do that if and when there's consensus
> about whether this is the way to go or not.
>

I haven't looked at this in any detail yet, but that seems much neater
to me. It seems perfectly sensible that the forward and inverse
transition functions should have the same strictness settings, and
enforcing that keeps the logic simple, as well as hopefully making it
easier to document.

It's a shame that more transition functions cannot be made strict,
when they actually ignore null values, but I think trying to solve
that can be regarded as outside the scope of this patch.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Feb24, 2014, at 17:50 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 20 February 2014 01:48, Florian Pflug <fgp@phlo.org> wrote:
>> On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote:
>>> In fact, I'm
>>> currently leaning towards just forbidding non-strict forward transition
>>> function with strict inverses, and adding non-NULL counters to the
>>> aggregates that then require them. It's really only the SUM() aggregates
>>> that are affected by this, I think.
>>
>> I finally got around to doing that, and the results aren't too bad. The
>> attached patches required that the strictness settings of the forward and
>> reverse transition functions agree, and employ exactly the same NULL-skipping
>> logic we always had.
>>
>> The only aggregates seriously affected by that change were SUM(int2) and
>> SUM(int4).
>
> I haven't looked at this in any detail yet, but that seems much neater
> to me. It seems perfectly sensible that the forward and inverse
> transition functions should have the same strictness settings, and
> enforcing that keeps the logic simple, as well as hopefully making it
> easier to document.

Good to hear that you agree! I'll try to find some time to update the docs.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 25 February 2014 12:33, Florian Pflug <fgp@phlo.org> wrote:
> On Feb24, 2014, at 17:50 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> On 20 February 2014 01:48, Florian Pflug <fgp@phlo.org> wrote:
>>> On Jan29, 2014, at 13:45 , Florian Pflug <fgp@phlo.org> wrote:
>>>> In fact, I'm
>>>> currently leaning towards just forbidding non-strict forward transition
>>>> function with strict inverses, and adding non-NULL counters to the
>>>> aggregates that then require them. It's really only the SUM() aggregates
>>>> that are affected by this, I think.
>>>
>>> I finally got around to doing that, and the results aren't too bad. The
>>> attached patches required that the strictness settings of the forward and
>>> reverse transition functions agree, and employ exactly the same NULL-skipping
>>> logic we always had.
>>>
>>> The only aggregates seriously affected by that change were SUM(int2) and
>>> SUM(int4).
>>
>> I haven't looked at this in any detail yet, but that seems much neater
>> to me. It seems perfectly sensible that the forward and inverse
>> transition functions should have the same strictness settings, and
>> enforcing that keeps the logic simple, as well as hopefully making it
>> easier to document.
>
> Good to hear that you agree! I'll try to find some time to update the docs.
>

I finally got round to looking at this in more detail. Sorry for the
delay. Here is my more detailed review of the base patch.

Overall, I think that it is in reasonable shape, and as I said I think
the approach of enforcing matching strictness settings on the forward
and inverse transition functions is much simpler and neater. I have a
few comments, some cosmetic, and a couple more substantive:


* In a couple of places:

errmsg("stricness of forward and reverse transition functions must match")

- misspelling: "stricness".
- "reverse" should be "inverse" to match the terminology used elsewhere.


* Grammatical error in the comment for lookup_agg_function() - you
should drop the word "both".


* In show_windowagg_info(), this calculation looks suspicious to me:
       double tperrow = winaggstate->aggfwdtrans /           (inst->nloops * inst->ntuples);

If the node is executed multiple times, aggfwdtrans will be reset in
each loop, so the transitions per row figure will be under-estimated.
ISTM that if you want to report on this, you'd need aggfwdtrans to be
reset once per query, but I'm not sure exactly how to do that.

Here's a test case:

explain (verbose, analyse) select sum(i) over (rows between 4 preceding and current row)   from generate_series(1, 10)
i;

which outputs 10 rows with an average of 1 transition per row, but
doing the same window aggregate twice in a nested loop:

explain (verbose, analyse) select * from (values (10), (10)) v(x), lateral (select sum(i) over (rows between 4
precedingand current row)    from generate_series(1, x) i) t;
 

outputs 20 rows, but only reports 0.5 transitions per row.

Actually, I think it's misleading to only count forward transition
function calls, because a call to the inverse transition function
still represents a state transition, and is likely to be around the
same cost. For a window of size 2, there would not be much advantage
to using inverse transition functions, because it would be around 2
transitions per row either way.


* The function comment for build_aggregate_fnexprs() needs to be
updated to reference the inverse transition function. I'd also be
tempted to have it allow invtransfnexpr be a NULL pointer, if the
inverse transition function expression tree is not required. Then
ExecInitAgg() could simply pass NULL, instead of having the local
variable invtransfnexpr with the slightly cryptic comment "needed but
never used".


* In struct WindowStatePerAggData, I think you should change the field
order to transfn_oid, invtransfn_oid and then finalfn_oid. It's only a
small thing, but that's the order those 3 functions are referred to
everywhere else.


* In struct WindowStatePerAggData, the comment for transValueCount
should read "number of aggregated values".


* If AggCheckCallContext() is called from a window function, and it
asks for an aggcontext, it will fail because calledaggno will be -1.
That can't currently happen for any of our built-in window functions,
and I'm not sure if it's likely to happen in the future, but I think
it would be better to defend against that possibility just in case. So
I think it ought to return the shared context in that case, as the
original code would have done.


* In advance_windowaggregate(), this code
   if (peraggstate->transfn.fn_strict) {

is against the project style, which is to have curly braces on new
lines. But also, that test condition is the same as the preceding
block, so the 2 blocks could just be merged.


* I was wondering about the case of a forward transition function
returning NULL in the presence of an inverse transition function. In
this patch there are 3 pieces of code that test for that:

1). advance_windowaggregate() errors out if the forward transition
function returns NULL and there is an inverse transition function.
2). an Assert in advance_windowaggregate() fires if a prior call made
the state NULL and there is an inverse transition function (should be
impossible due to the above error).
3). retreat_windowaggregate() errors out if it sees a NULL state
(which ought to be impossible due to both of the above).

I find the resulting error "transition function with an inverse
returned NULL" surprising. Why shouldn't a transition function return
NULL if it wants to? It can if it's used as a regular aggregate, so it
seems somewhat odd that it can't if it's used in a window context, and
it has an inverse.

Would it not be simpler and more flexible to just allow the forward
transition function to return NULL, and treat a NULL state as
non-invertible, requiring a restart. So delete check (1) and just
allow the forward transition function to return NULL, delete Assert
(2) so that it propagates a NULL state to the end as it would do in
the absence of an inverse transition function, and modify check (3) to
return false forcing a restart if the state is NULL. So then if the
forward transition function did return NULL, the inverse transition
function would not actually be called, and it would compute the answer
the hard way, rather than erroring out.


* In retreat_windowaggregate(), the comment before the check on
transValueCount is missing a "be".


* I think the function comment for eval_windowaggregates() should be
updated to mention that it may also use an inverse transition function
to remove aggregated data from a transition value.


* I found the guts of eval_windowaggregates() a little hard to follow,
although I think the code is correct. It could perhaps use a little
tidying up. Here are a few ideas:

- Maybe numaggs_restart would be better called numtorestart.

- is_first isn't adding much, and it would probably be easier to read
if it were eliminated by inlining it in the one place it is used.

- Similarly I would move the variable "ok" to the block that uses it.

- The variable pos could be eliminated by having the retreat loop
increment winstate->aggregatedbase in each pass, which would better
match its comment which says it updates aggregatedbase, which it
currently doesn't do. For consistency, perhaps this loop should be
written more in the style of the advance loop. The loop should ensure
that, on exit, aggregatedbase is equal to frameheadpos.

- aggregatedupto_nonrestarted is a bit of a mouthful, but I can't
immediately think of a better name. However, there's one comment that
refers to it as aggregatedupto_previous.

- The comment starting "After this, non-restarted aggregated..." is a
bit confusing. What the code following it really seems to be doing is
more along the lines of "If there are aggregates to restart, rewind
aggregatedupto back to frameheadpos so that we can re-aggregate those
values in the aggregates to be restarted...".


* The pg_dump modifications don't look correct to me. I didn't test
it, but I think you need to modify the pre-9.4 queries to return "-"
for agginvtransfn otherwise I think it will be NULL and will seg-fault
if you dump a pre-9.4 database containing custom aggregates.


* The last regression test in aggregates.sql should test for unequal
strictness values, rather than just strict forward transition
functions with non-strict inverses.


I think that's everything for the base patch.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Mar2, 2014, at 20:39 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> * In a couple of places:
>
> errmsg("stricness of forward and reverse transition functions must match")
>
> - misspelling: "stricness".
> - "reverse" should be "inverse" to match the terminology used elsewhere.

Done.

> * Grammatical error in the comment for lookup_agg_function() - you
> should drop the word "both".

Done.

> * In show_windowagg_info(), this calculation looks suspicious to me:
>
>        double tperrow = winaggstate->aggfwdtrans /
>            (inst->nloops * inst->ntuples);
>
> If the node is executed multiple times, aggfwdtrans will be reset in
> each loop, so the transitions per row figure will be under-estimated.
> ISTM that if you want to report on this, you'd need aggfwdtrans to be
> reset once per query, but I'm not sure exactly how to do that.
>
> ...
>
> Actually, I think it's misleading to only count forward transition
> function calls, because a call to the inverse transition function
> still represents a state transition, and is likely to be around the
> same cost. For a window of size 2, there would not be much advantage
> to using inverse transition functions, because it would be around 2
> transitions per row either way.

True. In fact, I pondered whether to avoid using the inverse transition
function for windows of 2 rows. In the end, I didn't because I felt that
it makes custom aggregates harder to test.

On the question of whether to count inverse transition function calls -
the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the
number of state transitions, but rather to show whether the aggregation
has O(n) or O(n^2) behaviour. The idea being that a value close to "1"
means "inverse transition function works as expected", and larger values
mean "not working so well".

Regarding multiple evaluations - I think I based the behaviour on how
ntuples works, which also only reports the value of the last evaluation
I think. But maybe I'm confused about this.

> * The function comment for build_aggregate_fnexprs() needs to be
> updated to reference the inverse transition function. I'd also be
> tempted to have it allow invtransfnexpr be a NULL pointer, if the
> inverse transition function expression tree is not required. Then
> ExecInitAgg() could simply pass NULL, instead of having the local
> variable invtransfnexpr with the slightly cryptic comment "needed but
> never used".

Hm, I suggested that to David Rowley initially, and he wasn't particularly
convinced - after all, there aren't *that* many callers of that function,
and there isn't much overhead - we just set the value to NULL if an invalid
OID is passed. But now that you brought it up too, it's two yea's agains
one nay, so I changed it.

> * In struct WindowStatePerAggData, I think you should change the field
> order to transfn_oid, invtransfn_oid and then finalfn_oid. It's only a
> small thing, but that's the order those 3 functions are referred to
> everywhere else.

Done.

> * In struct WindowStatePerAggData, the comment for transValueCount
> should read "number of aggregated values".

Done.

> * If AggCheckCallContext() is called from a window function, and it
> asks for an aggcontext, it will fail because calledaggno will be -1.
> That can't currently happen for any of our built-in window functions,
> and I'm not sure if it's likely to happen in the future, but I think
> it would be better to defend against that possibility just in case. So
> I think it ought to return the shared context in that case, as the
> original code would have done.

I did it this way on purpose, because it was never actually safe to
use the shared content from window functions - they won't know if and
when that context is reset. I added a note to the function comment
explaining that this function isn't meant to be used by true window
functions

> * In advance_windowaggregate(), this code
>
>    if (peraggstate->transfn.fn_strict) {
>
> is against the project style, which is to have curly braces on new
> lines. But also, that test condition is the same as the preceding
> block, so the 2 blocks could just be merged.

Done. The two conditions used to be different, I think, but requiring
the strictness settings to match made them the same.

> * I was wondering about the case of a forward transition function
> returning NULL in the presence of an inverse transition function. In
> this patch there are 3 pieces of code that test for that:
>
> 1). advance_windowaggregate() errors out if the forward transition
> function returns NULL and there is an inverse transition function.
>
> 2). an Assert in advance_windowaggregate() fires if a prior call made
> the state NULL and there is an inverse transition function (should be
> impossible due to the above error).

Yeah, that assert is just there to document that "yes, this cannot be".

> 3). retreat_windowaggregate() errors out if it sees a NULL state
> (which ought to be impossible due to both of the above).

This too is simply meant to document "No worries, cannot be".

> I find the resulting error "transition function with an inverse
> returned NULL" surprising. Why shouldn't a transition function return
> NULL if it wants to? It can if it's used as a regular aggregate, so it
> seems somewhat odd that it can't if it's used in a window context, and
> it has an inverse.

Because transition function cannot use NULL as a state value freely if
they have an inverse, since if the inverse returns NULL that means
"sorry, cannot invert, please restart aggregation".

> Would it not be simpler and more flexible to just allow the forward
> transition function to return NULL, and treat a NULL state as
> non-invertible, requiring a restart. So delete check (1) and just
> allow the forward transition function to return NULL, delete Assert
> (2) so that it propagates a NULL state to the end as it would do in
> the absence of an inverse transition function, and modify check (3) to
> return false forcing a restart if the state is NULL. So then if the
> forward transition function did return NULL, the inverse transition
> function would not actually be called, and it would compute the answer
> the hard way, rather than erroring out.

Though that would be weird too - if the inverse transition function
is non-strict, we actually can *call* it with a NULL state, it just
cannot really return a NULL state (well it can, but that means
"restart the aggregation, please").

I don't think there's anything to be gained by allowing this, so I'd
prefer if we don't.

> * In retreat_windowaggregate(), the comment before the check on
> transValueCount is missing a "be".

Done.

> * I think the function comment for eval_windowaggregates() should be
> updated to mention that it may also use an inverse transition function
> to remove aggregated data from a transition value.

Done.

> * I found the guts of eval_windowaggregates() a little hard to follow,
> although I think the code is correct. It could perhaps use a little
> tidying up. Here are a few ideas:
>
> - Maybe numaggs_restart would be better called numtorestart.

Hm, I'm not particularly fond of that - it doesn't really emphasize the
connection to numaggs.

> - is_first isn't adding much, and it would probably be easier to read
> if it were eliminated by inlining it in the one place it is used.

Done.

>
> - Similarly I would move the variable "ok" to the block that uses it.

Done.

>
> - The variable pos could be eliminated by having the retreat loop
> increment winstate->aggregatedbase in each pass, which would better
> match its comment which says it updates aggregatedbase, which it
> currently doesn't do. For consistency, perhaps this loop should be
> written more in the style of the advance loop. The loop should ensure
> that, on exit, aggregatedbase is equal to frameheadpos.

Done, mostly. The loops still look a bit different, but that's mostly
because of the tuple-slot optimization in the advance loop (it re-uses
the last row it fetched during the *previous* call to eval_windowaggregates
if possible), and because the retreat loop already knows how far to
retreat, while the advance loop figures that out as it goes.

> - aggregatedupto_nonrestarted is a bit of a mouthful, but I can't
> immediately think of a better name. However, there's one comment that
> refers to it as aggregatedupto_previous.

I don't have a better name either, so I kept the current name.

> - The comment starting "After this, non-restarted aggregated..." is a
> bit confusing. What the code following it really seems to be doing is
> more along the lines of "If there are aggregates to restart, rewind
> aggregatedupto back to frameheadpos so that we can re-aggregate those
> values in the aggregates to be restarted...".

I re-wrote that comment, and also moved the block that updates the
tuplestore mark up above the restart-block. Thus, the block that updates
aggregatedupto is now immediately before the advance loop, which hopefully
makes the connection more obvious.

> * The pg_dump modifications don't look correct to me. I didn't test
> it, but I think you need to modify the pre-9.4 queries to return "-"
> for agginvtransfn otherwise I think it will be NULL and will seg-fault
> if you dump a pre-9.4 database containing custom aggregates.

Hm, I haven't touched this code, it comes from David's original patch.
I agree that it looks wrong - I've updated it per your suggestion, and
checked that it dumps 9.3 and HEAD+patch correctly.

> * The last regression test in aggregates.sql should test for unequal
> strictness values, rather than just strict forward transition
> functions with non-strict inverses.

Done. That was a leftover from when only one case was rejected.

Attached a new versions of all 5 patches. Only the base patch has actually
changed, the others are just included for completeness' sake.

best regards,
Florian Pflug

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 3 March 2014 23:00, Florian Pflug <fgp@phlo.org> wrote:
>> * In show_windowagg_info(), this calculation looks suspicious to me:
>>
>>        double tperrow = winaggstate->aggfwdtrans /
>>            (inst->nloops * inst->ntuples);
>>
>> If the node is executed multiple times, aggfwdtrans will be reset in
>> each loop, so the transitions per row figure will be under-estimated.
>> ISTM that if you want to report on this, you'd need aggfwdtrans to be
>> reset once per query, but I'm not sure exactly how to do that.
>>
>> ...
>>
>> Actually, I think it's misleading to only count forward transition
>> function calls, because a call to the inverse transition function
>> still represents a state transition, and is likely to be around the
>> same cost. For a window of size 2, there would not be much advantage
>> to using inverse transition functions, because it would be around 2
>> transitions per row either way.
>
> True. In fact, I pondered whether to avoid using the inverse transition
> function for windows of 2 rows. In the end, I didn't because I felt that
> it makes custom aggregates harder to test.
>
> On the question of whether to count inverse transition function calls -
> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the
> number of state transitions, but rather to show whether the aggregation
> has O(n) or O(n^2) behaviour. The idea being that a value close to "1"
> means "inverse transition function works as expected", and larger values
> mean "not working so well".
>
> Regarding multiple evaluations - I think I based the behaviour on how
> ntuples works, which also only reports the value of the last evaluation
> I think. But maybe I'm confused about this.
>

No, it doesn't look like that's correct for multiple loops. Consider
this example:

explain (verbose, analyse) select * from (values (10), (20), (30), (40)) v(x), lateral (select sum(i) over (rows
between4 preceding and current row)    from generate_series(1, x) i) t;
 
                                                                QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------Nested
Loop (cost=0.00..170.06 rows=4000 width=12) (actual
 
time=0.027..0.414 rows=100 loops=1)  Output: "*VALUES*".column1, (sum(i.i) OVER (?))  ->  Values Scan on "*VALUES*"
(cost=0.00..0.05rows=4 width=4)
 
(actual time=0.002..0.006 rows=4 loops=1)        Output: "*VALUES*".column1  ->  WindowAgg  (cost=0.00..22.50 rows=1000
width=4)(actual
 
time=0.019..0.094 rows=25 loops=4)        Output: sum(i.i) OVER (?)        Transitions Per Row: 0.2        ->  Function
Scanon pg_catalog.generate_series i
 
(cost=0.00..10.00 rows=1000 width=4) (actual time=0.010..0.015 rows=25
loops=4)              Output: i.i              Function Call: generate_series(1, "*VALUES*".column1)


It turns out that show_windowagg_info() is only called once at the
end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing
tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get
1, you'd have to use this formula:
   double tperrow = winaggstate->aggfwdtrans / inst->ntuples;

I'm still not convinced that's the most useful thing to report though.
Personally, I'd prefer to just see the separate counts, e.g.:
  ->  WindowAgg  (cost=0.00..22.50 rows=1000 width=4) (actual
time=0.019..0.094 rows=25 loops=4)        Output: sum(i.i) OVER (?)        Forward transitions: 25  Inverse
transitions:25
 

IMO that gives a clearer picture of what's going on.

Thoughts?

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Mar4, 2014, at 21:09 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 3 March 2014 23:00, Florian Pflug <fgp@phlo.org> wrote:
>>> * In show_windowagg_info(), this calculation looks suspicious to me:
>>> 
>>>       double tperrow = winaggstate->aggfwdtrans /
>>>           (inst->nloops * inst->ntuples);
>>> 
>>> If the node is executed multiple times, aggfwdtrans will be reset in
>>> each loop, so the transitions per row figure will be under-estimated.
>>> ISTM that if you want to report on this, you'd need aggfwdtrans to be
>>> reset once per query, but I'm not sure exactly how to do that.
>>> 
>>> ...
>>> 
>>> Actually, I think it's misleading to only count forward transition
>>> function calls, because a call to the inverse transition function
>>> still represents a state transition, and is likely to be around the
>>> same cost. For a window of size 2, there would not be much advantage
>>> to using inverse transition functions, because it would be around 2
>>> transitions per row either way.
>> 
>> True. In fact, I pondered whether to avoid using the inverse transition
>> function for windows of 2 rows. In the end, I didn't because I felt that
>> it makes custom aggregates harder to test.
>> 
>> On the question of whether to count inverse transition function calls -
>> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the
>> number of state transitions, but rather to show whether the aggregation
>> has O(n) or O(n^2) behaviour. The idea being that a value close to "1"
>> means "inverse transition function works as expected", and larger values
>> mean "not working so well".
>> 
>> Regarding multiple evaluations - I think I based the behaviour on how
>> ntuples works, which also only reports the value of the last evaluation
>> I think. But maybe I'm confused about this.
>> 
> 
> No, it doesn't look like that's correct for multiple loops. Consider
> this example:
> 
> ...
> 
> It turns out that show_windowagg_info() is only called once at the
> end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing
> tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get
> 1, you'd have to use this formula:
> 
>    double tperrow = winaggstate->aggfwdtrans / inst->ntuples;

Hm, so I *was* confused - seems I mixed up ntuples (which counts the
total number of tuples over all loops) with what we report as "rows"
(i.e. the average number of rows per loop). Thanks for clearing that up!

> I'm still not convinced that's the most useful thing to report though.
> Personally, I'd prefer to just see the separate counts, e.g.:
> 
>   ->  WindowAgg  (cost=0.00..22.50 rows=1000 width=4) (actual
> time=0.019..0.094 rows=25 loops=4)
>         Output: sum(i.i) OVER (?)
>         Forward transitions: 25  Inverse transitions: 25
> 
> IMO that gives a clearer picture of what's going on.

My problem with this is that if there are multiple aggregates, the
numbers don't really count the number of forward and reverse transfer
function invocations. What nodeWindowAgg.c really counts is the number
of *rows* it has to fetch from the tuple store to perform forward
aggregations - the relevant code snippet is
 if (numaggs_restart > 0)   winstate->aggfwdtrans += (winstate->aggregatedupto                              -
winstate->frameheadpos);else   winstate->aggfwdtrans += (winstate->aggregatedupto                             -
aggregatedupto_nonrestarted);

Now we could of course report these figures per aggregate instead of
only once per aggregation node. But as I said earlier, my guess is that
people usually won't be interested in the precise counts - most likely,
what you want to know is "how much do the restarts cost me"

When I added the EXPLAIN stuff, I initially simply reported the number
of times nodeWindowAgg has to restart the aggregation. The problem with
that approach is that not all restarts carry the same cost. If the frames
either don't overlap at all or are identical, restarts don't cause any
additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND
m FOLLOWING), the performance effects of restarts depends on m-n.

Which is why I made it count the number of aggregated input rows instead.

Having said that, I' not really 100% happy with the name
"Transitions Per Row" for this - it was simply the best I could come up with
that was reasonably short. And I'm certainly open to reporting the absolute
count instead of a factor relative to ntuples.

If we made it an absolute count, would calling this "Aggregated Rows" work
for you?

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 5 March 2014 14:35, Florian Pflug <fgp@phlo.org> wrote:
> On Mar4, 2014, at 21:09 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> On 3 March 2014 23:00, Florian Pflug <fgp@phlo.org> wrote:
>>>> * In show_windowagg_info(), this calculation looks suspicious to me:
>>>>
>>>>       double tperrow = winaggstate->aggfwdtrans /
>>>>           (inst->nloops * inst->ntuples);
>>>>
>>>> If the node is executed multiple times, aggfwdtrans will be reset in
>>>> each loop, so the transitions per row figure will be under-estimated.
>>>> ISTM that if you want to report on this, you'd need aggfwdtrans to be
>>>> reset once per query, but I'm not sure exactly how to do that.
>>>>
>>>> ...
>>>>
>>>> Actually, I think it's misleading to only count forward transition
>>>> function calls, because a call to the inverse transition function
>>>> still represents a state transition, and is likely to be around the
>>>> same cost. For a window of size 2, there would not be much advantage
>>>> to using inverse transition functions, because it would be around 2
>>>> transitions per row either way.
>>>
>>> True. In fact, I pondered whether to avoid using the inverse transition
>>> function for windows of 2 rows. In the end, I didn't because I felt that
>>> it makes custom aggregates harder to test.
>>>
>>> On the question of whether to count inverse transition function calls -
>>> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the
>>> number of state transitions, but rather to show whether the aggregation
>>> has O(n) or O(n^2) behaviour. The idea being that a value close to "1"
>>> means "inverse transition function works as expected", and larger values
>>> mean "not working so well".
>>>
>>> Regarding multiple evaluations - I think I based the behaviour on how
>>> ntuples works, which also only reports the value of the last evaluation
>>> I think. But maybe I'm confused about this.
>>>
>>
>> No, it doesn't look like that's correct for multiple loops. Consider
>> this example:
>>
>> ...
>>
>> It turns out that show_windowagg_info() is only called once at the
>> end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing
>> tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get
>> 1, you'd have to use this formula:
>>
>>    double tperrow = winaggstate->aggfwdtrans / inst->ntuples;
>
> Hm, so I *was* confused - seems I mixed up ntuples (which counts the
> total number of tuples over all loops) with what we report as "rows"
> (i.e. the average number of rows per loop). Thanks for clearing that up!
>
>> I'm still not convinced that's the most useful thing to report though.
>> Personally, I'd prefer to just see the separate counts, e.g.:
>>
>>   ->  WindowAgg  (cost=0.00..22.50 rows=1000 width=4) (actual
>> time=0.019..0.094 rows=25 loops=4)
>>         Output: sum(i.i) OVER (?)
>>         Forward transitions: 25  Inverse transitions: 25
>>
>> IMO that gives a clearer picture of what's going on.
>
> My problem with this is that if there are multiple aggregates, the
> numbers don't really count the number of forward and reverse transfer
> function invocations. What nodeWindowAgg.c really counts is the number
> of *rows* it has to fetch from the tuple store to perform forward
> aggregations - the relevant code snippet is
>
>   if (numaggs_restart > 0)
>     winstate->aggfwdtrans += (winstate->aggregatedupto
>                               - winstate->frameheadpos);
>   else
>     winstate->aggfwdtrans += (winstate->aggregatedupto
>                               - aggregatedupto_nonrestarted);
>
> Now we could of course report these figures per aggregate instead of
> only once per aggregation node. But as I said earlier, my guess is that
> people usually won't be interested in the precise counts - most likely,
> what you want to know is "how much do the restarts cost me"
>

The problem I have with the single "Transitions Per Row" figure, and
the idea that a value close to 1.0 is supposed to be good, is that
it's not really true. For example, with a window size of 2 and a
"perfect" invertible aggregate, you'd get a value of 1.0, but with a
non-invertible aggregate you'd get a value of 2.0, when actually it
wouldn't do any better if it had an inverse. I think trying to
represent this as a single number is too simplistic.


> When I added the EXPLAIN stuff, I initially simply reported the number
> of times nodeWindowAgg has to restart the aggregation. The problem with
> that approach is that not all restarts carry the same cost. If the frames
> either don't overlap at all or are identical, restarts don't cause any
> additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND
> m FOLLOWING), the performance effects of restarts depends on m-n.
>
> Which is why I made it count the number of aggregated input rows instead.
>
> Having said that, I' not really 100% happy with the name
> "Transitions Per Row" for this - it was simply the best I could come up with
> that was reasonably short. And I'm certainly open to reporting the absolute
> count instead of a factor relative to ntuples.
>
> If we made it an absolute count, would calling this "Aggregated Rows" work
> for you?
>

I'm not sure about naming, but I think my preference would be to
output the correct absolute counts for both the forward and inverse
transitions (i.e. multiply by the right combinations of numaggs and
numaggs_restart). The EXPLAIN output already tells you how many
aggregates there are, and how many rows there are, so you'd be able to
work out from that how much extra work it's doing.

I think we really need a larger consensus on this though, so I'd be
interested to hear what others think.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> I think we really need a larger consensus on this though, so I'd be
> interested to hear what others think.

My advice is to lose the EXPLAIN output entirely.  If the authors of
the patch can't agree on what it means, what hope have everyday users
got of making sense of it?
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Vladimir Sitnikov
Date:
<p dir="ltr">Tom,<p dir="ltr">I did not follow the thread very close, so I need to look what the ambiguity is there,
howeverI would love to see window rescans in explain analyze.<br /><p dir="ltr">I have great experience in tuning
Oraclequeries.<br /> There are features in PostgreSQL's explain analyze that I miss badly in Oracle: 'rows removed by
filter'is my favourite one. Improving explain analyze is great for performance analysis.<p dir="ltr">I would say target
audiencefor 'explain analyze' is not all the users, but someone closer to 'performance engineers'. Those beings are
usedto triple-check results and build/validate hypothesis since all the counters tend to lie, so 'a bit misleading
counter'is not a showstopper.<p dir="ltr">I did not think of
'non-being-able-to-use-negative-transition-since-floats-do-not-commute'before.<br /> Thanks to this discussion I see
whatkind of dragons live here.<p dir="ltr">I would vote (if I had any vote at all) for the inclusion of performance
statisticsto explain analyze (e.g. number of rescans and number of rows fed to aggregate) provided performance impact
istolerable in both regular and explain analyze mode.<p dir="ltr">I wonder how Oracle handles negative transition (does
it?),however that is a different discussion.<p dir="ltr">Regards,<br /> Vladimir Sitnikov 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Mar5, 2014, at 18:37 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> I think we really need a larger consensus on this though, so I'd be
>> interested to hear what others think.
> 
> My advice is to lose the EXPLAIN output entirely.  If the authors of
> the patch can't agree on what it means, what hope have everyday users
> got of making sense of it?

The question isn't what the current output means, but whether it's a
good metric to report or not.

If we don't report anything, then how would a user check whether a query
is slow because of O(n^2) behaviour of a windowed aggregate, or because
of some other reasons? If inevitability where a purely static property,
then maybe we could get away with that, and say "check whether your
aggregates are invertible or not". But since we have partially invertible
aggregates, the performance characteristics depends on the input data,
so we IMHO need some way for users to check what's actually happening.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Mar5, 2014, at 18:27 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 5 March 2014 14:35, Florian Pflug <fgp@phlo.org> wrote:
>> When I added the EXPLAIN stuff, I initially simply reported the number
>> of times nodeWindowAgg has to restart the aggregation. The problem with
>> that approach is that not all restarts carry the same cost. If the frames
>> either don't overlap at all or are identical, restarts don't cause any
>> additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND
>> m FOLLOWING), the performance effects of restarts depends on m-n.
>>
>> Which is why I made it count the number of aggregated input rows instead.
>>
>> Having said that, I' not really 100% happy with the name
>> "Transitions Per Row" for this - it was simply the best I could come up with
>> that was reasonably short. And I'm certainly open to reporting the absolute
>> count instead of a factor relative to ntuples.
>>
>> If we made it an absolute count, would calling this "Aggregated Rows" work
>> for you?
>
> I'm not sure about naming, but I think my preference would be to
> output the correct absolute counts for both the forward and inverse
> transitions (i.e. multiply by the right combinations of numaggs and
> numaggs_restart). The EXPLAIN output already tells you how many
> aggregates there are, and how many rows there are, so you'd be able to
> work out from that how much extra work it's doing.

Hm, if we do that we might as well go all the way and simply report these
numbers per aggregate, instead of once per window aggregation node. That'd
provide the maximum amount of information, and be quite unambiguous.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Greg Stark
Date:
On Wed, Mar 5, 2014 at 10:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> [ shrug... ]  They can see whether the Window plan node is where the time
> is going.  It's not apparent to me that the extra numbers you propose to
> report will edify anybody.

Perhaps just saying "Incremental Window Function" versus "Iterated
Window Function" or something like that be sufficient? At least that
way query tuning quidelines have a keyword they can say to watch out
for. And someone trying to figure out *why* the time is being spent in
this node has something they might notice a correlation with.


-- 
greg



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> On Mar5, 2014, at 18:37 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> My advice is to lose the EXPLAIN output entirely.  If the authors of
>> the patch can't agree on what it means, what hope have everyday users
>> got of making sense of it?

> The question isn't what the current output means, but whether it's a
> good metric to report or not.

If you can't agree, then it isn't.

> If we don't report anything, then how would a user check whether a query
> is slow because of O(n^2) behaviour of a windowed aggregate, or because
> of some other reasons?

[ shrug... ]  They can see whether the Window plan node is where the time
is going.  It's not apparent to me that the extra numbers you propose to
report will edify anybody.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Mar5, 2014, at 23:49 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> On Mar5, 2014, at 18:37 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> My advice is to lose the EXPLAIN output entirely.  If the authors of
>>> the patch can't agree on what it means, what hope have everyday users
>>> got of making sense of it?
> 
>> The question isn't what the current output means, but whether it's a
>> good metric to report or not.
> 
> If you can't agree, then it isn't.

Probably, yes, so let's find something that *is* a good metric.

(BTW, it's not the authors who disagree here. It was me who put the EXPLAIN
feature in, and Dean reviewed it and found it confusing. The original
author David seems to run out of time to work on this, and AFAIK hasn't
weighted in on that particular feature at all)

>> If we don't report anything, then how would a user check whether a query
>> is slow because of O(n^2) behaviour of a windowed aggregate, or because
>> of some other reasons?
> 
> [ shrug... ]  They can see whether the Window plan node is where the time
> is going.  It's not apparent to me that the extra numbers you propose to
> report will edify anybody.

By the same line of reasoning, every metric except execution time is
superfluous. Comparing execution times really is a horrible way to measure
this - not only does it include all kinds of thing that have nothing to do
with the restart behaviour of aggregates, you'd also have to construct a
base-line query first which is guaranteed to not restart.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Mar5, 2014, at 23:49 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> On Mar5, 2014, at 18:37 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> My advice is to lose the EXPLAIN output entirely.  If the authors of
>>> the patch can't agree on what it means, what hope have everyday users
>>> got of making sense of it?
> 
>> The question isn't what the current output means, but whether it's a
>> good metric to report or not.
> 
> If you can't agree, then it isn't.

Probably, yes, so let's find something that *is* a good metric.

(BTW, it's not the authors who disagree here. It was me who put the EXPLAIN
feature in, and Dean reviewed it and found it confusing. The original
author David seems to run out of time to work on this, and AFAIK hasn't
weighted in on that particular feature at all)

>> If we don't report anything, then how would a user check whether a query
>> is slow because of O(n^2) behaviour of a windowed aggregate, or because
>> of some other reasons?
> 
> [ shrug... ]  They can see whether the Window plan node is where the time
> is going.  It's not apparent to me that the extra numbers you propose to
> report will edify anybody.

By the same line of reasoning, every metric except execution time is
superfluous. Comparing execution times really is a horrible way to measure
this - not only does it include all kinds of thing that have nothing to do
with the restart behaviour of aggregates, you'd also have to construct a
base-line query first which is guaranteed to not restart.

best regards,
Florian Pflug



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Thu, Mar 6, 2014 at 6:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> I think we really need a larger consensus on this though, so I'd be
> interested to hear what others think.

My advice is to lose the EXPLAIN output entirely.  If the authors of
the patch can't agree on what it means, what hope have everyday users
got of making sense of it?


Hi All,

I'd just like to thank Florian and Dean for all the hard work that has gone on while I've been away.
I've been reading the thread but I've not had much time to respond to messages and I've lost touch a bit with the progress of the code.

It looks like Florian has addressed everything that Dean pointed out in his detailed review, apart from the explain output.
I've thought about this and I think it would be a shame if the patch got delayed for something so trivial. I thought the stats were quite useful as for things like restarts on SUM(numeric) it could be quite difficult to tell how many restarts there were otherwise. Since I have no extra ideas about what might be more useful then I've just pulled it out of the patch and I propose that we continue without it until someone comes up with an idea that everyone can agree on. 

I've attached an updated invtrans_strictstrict_base patch which has the feature removed.


Regards

David Rowley


 
                        regards, tom lane

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> I've attached an updated invtrans_strictstrict_base patch which has the
> feature removed.

What is the state of play on this patch?  Is the latest version what's in
http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org
plus this sub-patch?  Is everybody reasonably happy with it?  I don't
see it marked "ready for committer" in the CF app, but time is running
out.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> I've attached an updated invtrans_strictstrict_base patch which has the
> feature removed.

What is the state of play on this patch?  Is the latest version what's in
http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org
plus this sub-patch?  Is everybody reasonably happy with it?  I don't
see it marked "ready for committer" in the CF app, but time is running
out.


As far as I know the only concern left was around the extra stats in the explain output, which I removed in the patch I attached in the previous email.

The invtrans_strictstrict_base.patch in my previous email replaces the invtrans_strictstrict_base_038070.patch in that Florian sent here http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org all of the other patches are unchanged so it's save to use Florian's latest ones

Perhaps Dean can confirm that there's nothing else outstanding?

 
                        regards, tom lane

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 26 March 2014 19:43, David Rowley <dgrowleyml@gmail.com> wrote:
> On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> David Rowley <dgrowleyml@gmail.com> writes:
>> > I've attached an updated invtrans_strictstrict_base patch which has the
>> > feature removed.
>>
>> What is the state of play on this patch?  Is the latest version what's in
>>
>> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org
>> plus this sub-patch?  Is everybody reasonably happy with it?  I don't
>> see it marked "ready for committer" in the CF app, but time is running
>> out.
>>
>
> As far as I know the only concern left was around the extra stats in the
> explain output, which I removed in the patch I attached in the previous
> email.
>

Agreed. That was my last concern regarding the base patch, and I agree
that removing the new explain output is probably the best course of
action, given that we haven't reached consensus as to what the most
useful output would be.


> The invtrans_strictstrict_base.patch in my previous email replaces the
> invtrans_strictstrict_base_038070.patch in that Florian sent here
> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org
> all of the other patches are unchanged so it's save to use Florian's latest
> ones
>
> Perhaps Dean can confirm that there's nothing else outstanding?
>

Florian mentioned upthread that the docs hadn't been updated to
reflect the latest changes, so I think they need a little attention.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
First, sorry guys for letting this slide - I was overwhelmed by other works,
and this kind of slipped my mind :-(

On Mar27, 2014, at 09:04 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 26 March 2014 19:43, David Rowley <dgrowleyml@gmail.com> wrote:
>> On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>> David Rowley <dgrowleyml@gmail.com> writes:
>>>> I've attached an updated invtrans_strictstrict_base patch which has the
>>>> feature removed.
>>>
>>> What is the state of play on this patch?  Is the latest version what's in
>>>
>>> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org
>>> plus this sub-patch?  Is everybody reasonably happy with it?  I don't
>>> see it marked "ready for committer" in the CF app, but time is running
>>> out.
>>>
>>
>> As far as I know the only concern left was around the extra stats in the
>> explain output, which I removed in the patch I attached in the previous
>> email.
>>
>
> Agreed. That was my last concern regarding the base patch, and I agree
> that removing the new explain output is probably the best course of
> action, given that we haven't reached consensus as to what the most
> useful output would be.

After re-reading the thread, I'd prefer to go with Dean's suggestion, i.e.
simply reporting the total number of invocations of the forward transition
functions, and the total number of invocations of the reverse transition
function, over reporting nothing. The labels of the two counts would simply
be "Forward Transitions" and "Reverse Transitions".

But I don't want this issue to prevent us from getting this patch into 9.4,
so if there are objections to this, I'll rip out the EXPLAIN stuff all
together.

>> The invtrans_strictstrict_base.patch in my previous email replaces the
>> invtrans_strictstrict_base_038070.patch in that Florian sent here
>> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org
>> all of the other patches are unchanged so it's save to use Florian's latest
>> ones
>>
>> Perhaps Dean can confirm that there's nothing else outstanding?
>>
>
> Florian mentioned upthread that the docs hadn't been updated to
> reflect the latest changes, so I think they need a little attention.

I'll see to updating the docs, and will post a final patch within the next
few days.

Dean, have you by chance looked at the other patches yet?

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 27 March 2014 21:01, Florian Pflug <fgp@phlo.org> wrote:
> First, sorry guys for letting this slide - I was overwhelmed by other works,
> and this kind of slipped my mind :-(
>
> On Mar27, 2014, at 09:04 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> On 26 March 2014 19:43, David Rowley <dgrowleyml@gmail.com> wrote:
>>> On Thu, Mar 27, 2014 at 7:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>
>>>> David Rowley <dgrowleyml@gmail.com> writes:
>>>>> I've attached an updated invtrans_strictstrict_base patch which has the
>>>>> feature removed.
>>>>
>>>> What is the state of play on this patch?  Is the latest version what's in
>>>>
>>>> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org
>>>> plus this sub-patch?  Is everybody reasonably happy with it?  I don't
>>>> see it marked "ready for committer" in the CF app, but time is running
>>>> out.
>>>>
>>>
>>> As far as I know the only concern left was around the extra stats in the
>>> explain output, which I removed in the patch I attached in the previous
>>> email.
>>>
>>
>> Agreed. That was my last concern regarding the base patch, and I agree
>> that removing the new explain output is probably the best course of
>> action, given that we haven't reached consensus as to what the most
>> useful output would be.
>
> After re-reading the thread, I'd prefer to go with Dean's suggestion, i.e.
> simply reporting the total number of invocations of the forward transition
> functions, and the total number of invocations of the reverse transition
> function, over reporting nothing. The labels of the two counts would simply
> be "Forward Transitions" and "Reverse Transitions".
>

That should be "Inverse" not "Reverse" according to the terminology
agreed upthread.

Personally, I'm not a big fan of that terminology because "forward"
and "inverse" aren't natural antonyms. But actually I think that it's
"forward" that is the wrong word to use, because they actually both
move (different ends of) the frame forwards. The only alternatives I
can think of are "direct" and "inverse", which are natural antonyms,
but I don't want to hold up this patch bikeshedding over this. OTOH
this is not the first time on this thread that someone has slipped
into calling them "forward" and "reverse" transitions.


> But I don't want this issue to prevent us from getting this patch into 9.4,
> so if there are objections to this, I'll rip out the EXPLAIN stuff all
> together.
>
>>> The invtrans_strictstrict_base.patch in my previous email replaces the
>>> invtrans_strictstrict_base_038070.patch in that Florian sent here
>>> http://www.postgresql.org/message-id/64F96FD9-64D1-40B9-8861-E6182029220B@phlo.org
>>> all of the other patches are unchanged so it's save to use Florian's latest
>>> ones
>>>
>>> Perhaps Dean can confirm that there's nothing else outstanding?
>>>
>>
>> Florian mentioned upthread that the docs hadn't been updated to
>> reflect the latest changes, so I think they need a little attention.
>
> I'll see to updating the docs, and will post a final patch within the next
> few days.
>
> Dean, have you by chance looked at the other patches yet?
>

No, sorry. I too have been swamped by other work. I will try to look
at them over the next few days. I don't anticipate that they will be
as complex as the base patch, so I hope that this can be finished in
time for 9.4.

If there are any other reviewers with spare cycles, feel free to jump in too.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Mar28, 2014, at 08:58 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 27 March 2014 21:01, Florian Pflug <fgp@phlo.org> wrote:
>> After re-reading the thread, I'd prefer to go with Dean's suggestion, i.e.
>> simply reporting the total number of invocations of the forward transition
>> functions, and the total number of invocations of the reverse transition
>> function, over reporting nothing. The labels of the two counts would simply
>> be "Forward Transitions" and "Reverse Transitions".
>>
>
> That should be "Inverse" not "Reverse" according to the terminology
> agreed upthread.

Ups, yeah.

> Personally, I'm not a big fan of that terminology because "forward"
> and "inverse" aren't natural antonyms. But actually I think that it's
> "forward" that is the wrong word to use, because they actually both
> move (different ends of) the frame forwards. The only alternatives I
> can think of are "direct" and "inverse", which are natural antonyms,
> but I don't want to hold up this patch bikeshedding over this. OTOH
> this is not the first time on this thread that someone has slipped
> into calling them "forward" and "reverse" transitions.

Head and tail come to mind. But having said that, I'd strongly prefer
not to start bikeschedding over this either. It's not user-visible
enough to fuss about it...

Also, for order-independent transfer functions, it really *is* an
inverse functions, in the sense that g(f(x,a),a) = x.

>>> Florian mentioned upthread that the docs hadn't been updated to
>>> reflect the latest changes, so I think they need a little attention.
>>
>> I'll see to updating the docs, and will post a final patch within the next
>> few days.

Attached are updated patches that include the EXPLAIN changes mentioned
above and updated docs.

best regards,
Florian Pflug


Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 31 March 2014 01:58, Florian Pflug <fgp@phlo.org> wrote:
> Attached are updated patches that include the EXPLAIN changes mentioned
> above and updated docs.
>

These patches need re-basing --- they no longer apply to HEAD.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr1, 2014, at 10:08 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 31 March 2014 01:58, Florian Pflug <fgp@phlo.org> wrote:
>> Attached are updated patches that include the EXPLAIN changes mentioned
>> above and updated docs.
>>
>
> These patches need re-basing --- they no longer apply to HEAD.

Rebased to head (554bb3beba27bf4a49edecc40f6c0f249974bc7c). I had to
re-assign OIDs in the dependent paches again (those which actually
add the various inverse transition functions).

best regards,
Florian Pflug

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 1 April 2014 20:58, Florian Pflug <fgp@phlo.org> wrote:
> On Apr1, 2014, at 10:08 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> On 31 March 2014 01:58, Florian Pflug <fgp@phlo.org> wrote:
>>> Attached are updated patches that include the EXPLAIN changes mentioned
>>> above and updated docs.
>>>
>>
>> These patches need re-basing --- they no longer apply to HEAD.
>
> Rebased to head (554bb3beba27bf4a49edecc40f6c0f249974bc7c). I had to
> re-assign OIDs in the dependent paches again (those which actually
> add the various inverse transition functions).
>

Looking at the new explain output, that is not exactly what I was
suggesting upthread. In particular, in cases where the WindowAgg node
is executed more than once, I think that the reported transition
counts should be the averages per-execution of the node. That way the
number of transition function calls reported for the node is directly
comparable with its "rows" value. Thus I think the output should be
divided by nloops, which would be more consistent with the way other
similar values are reported in explain output (c.f.
show_instrumentation_count).

I started looking at the "arith" patch, and I got as far as looking at
the changes to count(*) and count(val), which seem reasonable. But
then I started testing performance, and I found cases where the
improvement is not nearly what I expected.

The example cited at the start of this thread is indeed orders of
magnitude faster than HEAD:

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM generate_series(1,20000) g(n);

This executes in 20ms on my box, vs 30sec on HEAD, which reflects the
change from an O(n^2) to an O(n) algorithm.

However, if both ends of the frame move, the benefits are far less impressive:

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 33ms

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 100 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 173ms

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 1000 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 1467ms

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10000 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 7709ms

This is still exhibiting the behaviour of an O(n^2) algorithm.

The problem lies in window_gettupleslot() which steps one row at a
time from the last position to the new required position. So if both
ends of the frame are moving, it needs to step forwards and backwards
through the entire window, for each row processed - hence the O(n^2)
behaviour.

Looking at window_gettupleslot, there is an obvious potential
mirco-optimisation that can be made if there are multiple rows to be
skipped --- instead of calling tuplestore_gettupleslot() multiple
times, call tuplestore_advance() multiple times, then call
tuplestore_gettupleslot() once to fetch the final required tuple, thus
avoiding a lot of unnecessary tuple copying. That of course won't
eliminate the O(n^2) behaviour, but it will reduce the overall factor,
and so is probably worth doing.

However, to fix the fundamental O(n^2) problem, I think there needs to
be separate tuplestore read pointers for the head and tail of the
frame. There may also be a case for another read pointer for the
current row too, and possibly one for general purpose user-triggered
fetches. One approach might be to have up to a small number N (say 3
or 4) of read pointers, with window_gettupleslot() automatically
choosing the one nearest to the requested row. Possibly there are
better approaches. I think a little more investigation is needed.

I'm not sure how much additional work is required to sort this out,
but to me it looks more realistic to target 9.5 than 9.4, so at this
point I tend to think that the patch ought to be marked as returned
with feedback.

Thoughts?

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
>> ), which seem reasonable. But
> then I started testing performance, and I found cases where the
> improvement is not nearly what I expected.
> 
> The example cited at the start of this thread is indeed orders of
> magnitude faster than HEAD:
> 
> SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
> F
> 
> 
> 
> 
> 
> 
> I'm not sure how much additional work is required to sort this out,
> but to me it looks more realistic to target 9.5 than 9.4, so at this
> point I tend to think that the patch ought to be marked as returned
> with feedback.
> 
> Thoughts?
> 
> Regards,
> Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
> On 04.04.2014, at 09:40, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> I'm not sure how much additional work is required to sort this out,
> but to me it looks more realistic to target 9.5 than 9.4, so at this
> point I tend to think that the patch ought to be marked as returned
> with feedback.

I think the patch is worthwhile, even without this additional optimization. In fact, If the optimization was part of
thepatch, there would probably be calls to factor it out, on the ground that the patch is already rather large. 

I don't see what bumping the whole thing to 9.5 buys, compared do applying what we have now, and optimizing in 9.5
further.

best regards,
Florian Pflug

PS: Sorry for the broken mail I sent earlier - miss-touched on my Phone ;-(


Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Andres Freund
Date:
On 2014-04-04 12:56:55 +0200, Florian Pflug wrote:
>
> > On 04.04.2014, at 09:40, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> >
> > I'm not sure how much additional work is required to sort this out,
> > but to me it looks more realistic to target 9.5 than 9.4, so at this
> > point I tend to think that the patch ought to be marked as returned
> > with feedback.
>
> I think the patch is worthwhile, even without this additional
> optimization. In fact, If the optimization was part of the patch,
> there would probably be calls to factor it out, on the ground that the
> patch is already rather large.
>
> I don't see what bumping the whole thing to 9.5 buys, compared do
> applying what we have now, and optimizing in 9.5 further.

From my POV applying this patch can't be considered a very high priority
for 9.4x. It came *really* late to the game for a relatively complex
patch. A significant portion of the development only happened *after*
the start of the last commitfest.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 4 April 2014 11:56, Florian Pflug <fgp@phlo.org> wrote:
>
>> On 04.04.2014, at 09:40, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>
>> I'm not sure how much additional work is required to sort this out,
>> but to me it looks more realistic to target 9.5 than 9.4, so at this
>> point I tend to think that the patch ought to be marked as returned
>> with feedback.
>

Just doing the first optimisation I recommended (which I
pessimistically referred to as a "micro-optimisation") actually gives
up to a 4x performance boost relative to the current patch for the
queries above:

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 20ms (was 33ms)

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 100 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 54ms (was 173ms)

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 1000 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 368ms (was 1467ms)

SELECT SUM(n::int) OVER (ROWS BETWEEN CURRENT ROW AND 10000 FOLLOWING)
FROM generate_series(1,20000) g(n);
-- 1866ms (was 7709ms)

See the attached patch, which applies on top of yours but is actually
independent of it.

IMO, this doesn't go far enough though. We should be aiming to
eliminate the O(n^2) behaviour completely and have all the above
queries take roughly the same amount of time.


> I think the patch is worthwhile, even without this additional optimization. In fact, If the optimization was part of
thepatch, there would probably be calls to factor it out, on the ground that the patch is already rather large. 
>
> I don't see what bumping the whole thing to 9.5 buys, compared do applying what we have now, and optimizing in 9.5
further.
>

The problem with the current state of the patch is that we're going to
a lot of effort to add this new inverse aggregate function,
documenting it's benefits and revealing via EXPLAIN how it can reduce
by several orders of magnitude the number of transition function
calls, but then not giving a commensurate performance boost unless the
window is UNBOUNDED FOLLOWING. That's going to be awkward to explain
to users, and so releasing it in this state feels a little half-baked
to me.

Regards,
Dean

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 5 April 2014 08:38, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> [snip] releasing it in this state feels a little half-baked
> to me.
>

I regret writing that almost as soon as I sent it. The last of those
queries is now over 10 times faster than HEAD, which is certainly
worthwhile. What bugs me is that there is so much more potential in
this patch.

I think it's pretty close to being committable, but I fear that time
is running out for 9.4.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr5, 2014, at 09:55 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 5 April 2014 08:38, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> [snip] releasing it in this state feels a little half-baked
>> to me.
>> 
> 
> I regret writing that almost as soon as I sent it. The last of those
> queries is now over 10 times faster than HEAD, which is certainly
> worthwhile. What bugs me is that there is so much more potential in
> this patch.

Well, the perfect is the enemy of the good as they say. By all means,
let's get rid of the O(n^2) behaviour in 9.5, but let's get a basic
version into 9.4.

> I think it's pretty close to being committable, but I fear that time
> is running out for 9.4.

I'm not aware of any open issues in the basic patch, other then 
scaling the reported calling statistics by nloops, which should be
trivial. I'll post an updated patch later today.

I don't really expect all the add-on patches to make it into 9.4  -
they don't seem to have gotten much attention yet - but at least 
the inverse transition functions for the basic arithmetic aggregates
should be doable I hope.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 7 April 2014 14:09, Florian Pflug <fgp@phlo.org> wrote:
> On Apr5, 2014, at 09:55 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> On 5 April 2014 08:38, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>> [snip] releasing it in this state feels a little half-baked
>>> to me.
>>>
>>
>> I regret writing that almost as soon as I sent it. The last of those
>> queries is now over 10 times faster than HEAD, which is certainly
>> worthwhile. What bugs me is that there is so much more potential in
>> this patch.
>
> Well, the perfect is the enemy of the good as they say. By all means,
> let's get rid of the O(n^2) behaviour in 9.5, but let's get a basic
> version into 9.4.
>
>> I think it's pretty close to being committable, but I fear that time
>> is running out for 9.4.
>
> I'm not aware of any open issues in the basic patch, other then
> scaling the reported calling statistics by nloops, which should be
> trivial. I'll post an updated patch later today.
>
> I don't really expect all the add-on patches to make it into 9.4  -
> they don't seem to have gotten much attention yet - but at least
> the inverse transition functions for the basic arithmetic aggregates
> should be doable I hope.
>

I've just finished reading through all the other patches, and they all
look OK to me. It's mostly straightforward stuff, so despite the size
it's hopefully all committable once the base patch goes in.

I think that you're probably right that optimising
window_gettupleslot() to eliminate the O(n^2) behaviour can be left to
a later patch --- the existing performance benefits of this patch are
enough to justify its inclusion IMO. It would be nice to include the
trivial optimisation to window_gettupleslot() that I posted upthread,
since it gives such a big improvement for only a few lines of code
changed.

If you post a new patch set, I'll mark it as ready for committer attention.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr7, 2014, at 17:41 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> I've just finished reading through all the other patches, and they all
> look OK to me. It's mostly straightforward stuff, so despite the size
> it's hopefully all committable once the base patch goes in.

Hm, I'm starting to have second thoughts about the minmax patch. The
inverse transition functions for MIN and MAX have a non-trivial probability
of failure - they trigger a rescan whenever the value that is removed isn't
strictly smaller (respectively strictly larger) then the current maximum
(respectively minimum). Thus, whenever that happens, we both call the
inverse transition function *and* (since it fails) restart the aggregation.

For windows based on ROWS, this isn't too bad - even if we fail every second
time, we still avoid half the rescans, which should be a net win if the
average window size is > 2.

But for RANGE-based windows, more than one call of the inverse transition
function must succeed for us to save anything, since we must successfully
remove *all* peers to avoid one restart. This greatly increases the chance
that using the inverse transition function hurts rather then helps - the
situation is worse the larger the average number of peers is.

I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch
invtrans_bool - it previously was part of invtrans_minmax. Given the
performance risk involved, I think that we probably shouldn't commit
invtrans_minmax at this time. I should have brought this up earlier, but
the issue had slipped my mind :-( Sorry for that.

> I think that you're probably right that optimising
> window_gettupleslot() to eliminate the O(n^2) behaviour can be left to
> a later patch --- the existing performance benefits of this patch are
> enough to justify its inclusion IMO. It would be nice to include the
> trivial optimisation to window_gettupleslot() that I posted upthread,
> since it gives such a big improvement for only a few lines of code
> changed.

Agreed, but since it's independent from the rest of the base patch,
I kept it as a separate patch (invtrans_optimize) instead of merging it
into the base patch. It should probably be committed separately too.

> If you post a new patch set, I'll mark it as ready for committer attention.

New patch set is attached. The only difference to the previous one is that
"Forward Transitions" and "Inverse Transitions" are now scaled with nloops,
and that it includes your window_gettupleslot patch under the name
invtrans_optimize.

Your nested loop query
explain (verbose, analyse)
select * from
(values (10), (20), (30), (40)) v(x),
lateral
(select sum(i) over (rows between 4 preceding and current row)
from generate_series(1, x) i) t
now produces
                                                          QUERY PLAN
             

--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.00..170.06 rows=4000 width=12) (actual time=0.092..0.257 rows=100 loops=1)
Output: "*VALUES*".column1, (sum(i.i) OVER (?))
->  Values Scan on "*VALUES*"  (cost=0.00..0.05 rows=4 width=4) (actual time=0.003..0.007 rows=4 loops=1)
  Output: "*VALUES*".column1
->  WindowAgg  (cost=0.00..22.50 rows=1000 width=4) (actual time=0.027..0.055 rows=25 loops=4)
  Output: sum(i.i) OVER (?)
  Forward Transitions: 25.0
  Inverse Transitions: 20.0
  ->  Function Scan on pg_catalog.generate_series i  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.013..0.015
rows=25loops=4) 
        Output: i.i
        Function Call: generate_series(1, "*VALUES*".column1)
Planning time: 0.359 ms
Total runtime: 0.544 ms

The patch dependencies are as follows:

invtrans_{optimize,docs) are independent from the rest

invtrans_{arith,bool,minmax,collecting} are pairwise independent and all
depend on invtrans_base.

As explain above, invtrans_bool is a bit problematic, since it carries
a real risk of performance regressions. It's included for completeness'
sake, and should probably not be committed at this time.

invtrans_optimize was authored by Dean Rasheed.

best regards,
Florian Pflug

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr7, 2014, at 17:41 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> I've just finished reading through all the other patches, and they all
> look OK to me. It's mostly straightforward stuff, so despite the size
> it's hopefully all committable once the base patch goes in.

Hm, I'm starting to have second thoughts about the minmax patch. The
inverse transition functions for MIN and MAX have a non-trivial probability
of failure - they trigger a rescan whenever the value that is removed isn't
strictly smaller (respectively strictly larger) then the current maximum
(respectively minimum). Thus, whenever that happens, we both call the
inverse transition function *and* (since it fails) restart the aggregation.

For windows based on ROWS, this isn't too bad - even if we fail every second
time, we still avoid half the rescans, which should be a net win if the
average window size is > 2.

But for RANGE-based windows, more than one call of the inverse transition
function must succeed for us to save anything, since we must successfully
remove *all* peers to avoid one restart. This greatly increases the chance
that using the inverse transition function hurts rather then helps - the
situation is worse the larger the average number of peers is.

I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch
invtrans_bool - it previously was part of invtrans_minmax. Given the
performance risk involved, I think that we probably shouldn't commit
invtrans_minmax at this time. I should have brought this up earlier, but
the issue had slipped my mind :-( Sorry for that.

> I think that you're probably right that optimising
> window_gettupleslot() to eliminate the O(n^2) behaviour can be left to
> a later patch --- the existing performance benefits of this patch are
> enough to justify its inclusion IMO. It would be nice to include the
> trivial optimisation to window_gettupleslot() that I posted upthread,
> since it gives such a big improvement for only a few lines of code
> changed.

Agreed, but since it's independent from the rest of the base patch,
I kept it as a separate patch (invtrans_optimize) instead of merging it
into the base patch. It should probably be committed separately too.

> If you post a new patch set, I'll mark it as ready for committer attention.

New patch set is attached. The only difference to the previous one is that
"Forward Transitions" and "Inverse Transitions" are now scaled with nloops,
and that it includes your window_gettupleslot patch under the name
invtrans_optimize.

Your nested loop query
explain (verbose, analyse)
select * from
(values (10), (20), (30), (40)) v(x),
lateral
(select sum(i) over (rows between 4 preceding and current row)
from generate_series(1, x) i) t
now produces
                                                         QUERY PLAN
            

--------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.00..170.06 rows=4000 width=12) (actual time=0.092..0.257 rows=100 loops=1)
Output: "*VALUES*".column1, (sum(i.i) OVER (?))
->  Values Scan on "*VALUES*"  (cost=0.00..0.05 rows=4 width=4) (actual time=0.003..0.007 rows=4 loops=1)
 Output: "*VALUES*".column1
->  WindowAgg  (cost=0.00..22.50 rows=1000 width=4) (actual time=0.027..0.055 rows=25 loops=4)
 Output: sum(i.i) OVER (?)
 Forward Transitions: 25.0
 Inverse Transitions: 20.0
 ->  Function Scan on pg_catalog.generate_series i  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.013..0.015
rows=25loops=4) 
       Output: i.i
       Function Call: generate_series(1, "*VALUES*".column1)
Planning time: 0.359 ms
Total runtime: 0.544 ms

The patch dependencies are as follows:

invtrans_{optimize,docs) are independent from the rest

invtrans_{arith,bool,minmax,collecting} are pairwise independent and all
depend on invtrans_base.

As explain above, invtrans_bool is a bit problematic, since it carries
a real risk of performance regressions. It's included for completeness'
sake, and should probably not be committed at this time.

invtrans_optimize was authored by Dean Rasheed.

best regards,
Florian Pflug

Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:



On Wed, Apr 9, 2014 at 8:48 AM, Florian Pflug <fgp@phlo.org> wrote:

As explain above, invtrans_bool is a bit problematic, since it carries
a real risk of performance regressions. It's included for completeness'
sake, and should probably not be committed at this time.


Did you mean to write invtrans_minmax? Otherwise you didn't explain about you concerns with bool.

Regards

David Rowley

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr9, 2014, at 02:55 , David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, Apr 9, 2014 at 8:48 AM, Florian Pflug <fgp@phlo.org> wrote:
> 
> As explain above, invtrans_bool is a bit problematic, since it carries
> a real risk of performance regressions. It's included for completeness'
> sake, and should probably not be committed at this time.
> 
> Did you mean to write invtrans_minmax? Otherwise you didn't explain about
> you concerns with bool.

Grmpf. Should have re-read that once more before sending :-(

Yes, I meant invtrans_minmax is problematic! invtrans_bool is fine, the
inverse transition function never fails for BOOL_AND and BOOL_OR. This
is why I factored it out into a separate patch, to make it easy to not
apply the MIN/MAX stuff, while still applying the BOOL stuff. Sorry for
the confision.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 8 April 2014 21:48, Florian Pflug <fgp@phlo.org> wrote:
> On Apr7, 2014, at 17:41 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> I've just finished reading through all the other patches, and they all
>> look OK to me. It's mostly straightforward stuff, so despite the size
>> it's hopefully all committable once the base patch goes in.
>
> Hm, I'm starting to have second thoughts about the minmax patch. The
> inverse transition functions for MIN and MAX have a non-trivial probability
> of failure - they trigger a rescan whenever the value that is removed isn't
> strictly smaller (respectively strictly larger) then the current maximum
> (respectively minimum). Thus, whenever that happens, we both call the
> inverse transition function *and* (since it fails) restart the aggregation.
>
> For windows based on ROWS, this isn't too bad - even if we fail every second
> time, we still avoid half the rescans, which should be a net win if the
> average window size is > 2.
>
> But for RANGE-based windows, more than one call of the inverse transition
> function must succeed for us to save anything, since we must successfully
> remove *all* peers to avoid one restart. This greatly increases the chance
> that using the inverse transition function hurts rather then helps - the
> situation is worse the larger the average number of peers is.
>

Argh, I hadn't really considered that case. I suppose any imperfect
inverse transition function has the potential to make performance
worse rather than better. But working out the likelihood of that isn't
necessarily straightforward.

It might be possible to include some sort of heuristic based on the
known information --- the number of rows P in the peer group about to
be removed vs the total number N of rows aggregated so far. If the
data were fairly random, then a quick back-of-the-envelope calculation
suggests that trying the inverse min/max functions would be worthwhile
on average if P were less than around 0.4N, but of course different
data distributions could make that much worse. Even a perfect inverse
transition function isn't going to be much use if P > N/2 (e.g.,
imagine a case where the peer groups decrease in size exponentially),
so perhaps we should be including such a check anyway.

That's also assuming that the cost of the inverse transition function
is about the same as the cost of the forward function, which is not
necessarily the case. Perhaps imperfect inverse transition functions
should be assigned a higher cost, and that should be factored into the
decision as to whether they are likely to be worth trying. All that
feels very speculative though, and I think it's too late to be
considering that for 9.4, so yes, let's leave out the min/max
aggregates for now.


> I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch
> invtrans_bool - it previously was part of invtrans_minmax. Given the
> performance risk involved, I think that we probably shouldn't commit
> invtrans_minmax at this time. I should have brought this up earlier, but
> the issue had slipped my mind :-( Sorry for that.
>
>> I think that you're probably right that optimising
>> window_gettupleslot() to eliminate the O(n^2) behaviour can be left to
>> a later patch --- the existing performance benefits of this patch are
>> enough to justify its inclusion IMO. It would be nice to include the
>> trivial optimisation to window_gettupleslot() that I posted upthread,
>> since it gives such a big improvement for only a few lines of code
>> changed.
>
> Agreed, but since it's independent from the rest of the base patch,
> I kept it as a separate patch (invtrans_optimize) instead of merging it
> into the base patch. It should probably be committed separately too.
>

It would be good to commit at least the "base", "arith" and "optimize"
patches for 9.4. I think the "collecting" and "bool" patches are also
committable, but I also suspect that those aggregates are less well
used, so they could be considered lower priority.


>> If you post a new patch set, I'll mark it as ready for committer attention.
>
> New patch set is attached. The only difference to the previous one is that
> "Forward Transitions" and "Inverse Transitions" are now scaled with nloops,
> and that it includes your window_gettupleslot patch under the name
> invtrans_optimize.
>

OK, I'm marking this ready for committer attention, on the
understanding that that doesn't include the invtrans_minmax patch.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> OK, I'm marking this ready for committer attention, on the
> understanding that that doesn't include the invtrans_minmax patch.

I've started to look at this patch set.  After rereading the thread,
I'm thinking that it's a mistake to just add the inverse transition
function and require it to work with the standard forward transition
function for the aggregate.  There was discussion upthread of providing
two separate forward transition functions, but Florian argued that that
would do nothing that you couldn't accomplish with a runtime check in
the forward function.  I think that's nonsense though, because one of
the key points here is that an invertible aggregate may require a more
complex transition state data structure --- in particular, if you're
forced to go from a pass-by-value to a pass-by-reference data type, right
there you are going to take a big hit in aggregate performance, and there
is no way for the forward transition function to avoid it.  The patch
has in fact already done that to a couple of basic aggregates like
sum(int4).  Has anyone bothered to test what side-effects that has on
non-windowed aggregation performance?

I think what'd make sense is to have a separate forward function *and*
separate state datatype to be used when we want invertible aggregation
(hm, and I guess a separate final function too).  So an aggregate
definition would include two entirely independent implementations.
If they chance to share sfunc and stype, fine, but they don't have to.

This would mean we'd need some new names for the doppelgangers of the
CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond
(but not sortop).  I guess that'd open up a chance to use a more uniform
naming scheme, but I'm not too sure what would be good.

Comments?
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
I wrote:
> ... an invertible aggregate may require a more
> complex transition state data structure --- in particular, if you're
> forced to go from a pass-by-value to a pass-by-reference data type, right
> there you are going to take a big hit in aggregate performance, and there
> is no way for the forward transition function to avoid it.  The patch
> has in fact already done that to a couple of basic aggregates like
> sum(int4).  Has anyone bothered to test what side-effects that has on
> non-windowed aggregation performance?

As a quick check, I compared aggregation performance in HEAD, non-assert
builds, with and without --disable-float8-byval on a 64-bit machine.
So this tests replacing a pass-by-val transition datatype with a
pass-by-ref one without any other changes.  There's essentially no
difference in performance of sum(int4), AFAICT, but that's because
int4_sum goes out of its way to cheat and avoid palloc overhead.
I looked to the bit_and() aggregates to see what would happen to
an aggregate not thus optimized.  As expected, int4 and int8 bit_and
are just about the same speed if int8 is pass by value ... but if it's
pass by ref, the int8 case is a good 60% slower.

So added palloc overhead, at least, is a no-go.  I see that the patched
version of sum(int4) avoids that trap, but nonetheless it's replaced a
pretty cheap transition function with a less cheap function, namely the
function previously used for avg(int4).  A quick test says that avg(int4)
is about five percent slower than sum(int4), so that's the kind of hit
we'd be taking on non-windowed aggregations if we do it like this.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr9, 2014, at 21:35 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As a quick check, I compared aggregation performance in HEAD, non-assert
> builds, with and without --disable-float8-byval on a 64-bit machine.
> So this tests replacing a pass-by-val transition datatype with a
> pass-by-ref one without any other changes.  There's essentially no
> difference in performance of sum(int4), AFAICT, but that's because
> int4_sum goes out of its way to cheat and avoid palloc overhead.

> I looked to the bit_and() aggregates to see what would happen to
> an aggregate not thus optimized.  As expected, int4 and int8 bit_and
> are just about the same speed if int8 is pass by value ... but if it's
> pass by ref, the int8 case is a good 60% slower.

True, but that just means that aggregate transition functions really
ought to update the state in-place, no?

> So added palloc overhead, at least, is a no-go.  I see that the patched
> version of sum(int4) avoids that trap, but nonetheless it's replaced a
> pretty cheap transition function with a less cheap function, namely the
> function previously used for avg(int4).  A quick test says that avg(int4)
> is about five percent slower than sum(int4), so that's the kind of hit
> we'd be taking on non-windowed aggregations if we do it like this.

That's rather surprising though. AFAICS, there's isn't much difference
between the two transfer functions int4_sum and int4_avg_accum at all.

The differences come down to (+ denoting things which ought to make
int4_avg_accum slower compared to int4_sum, - denoting the opposite)
1. +) int4_avg_accum calls AggCheckCallContext2. -) int4_sum checks if the state is NULL (it never is for
int4_avg_accum)3.+) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS      to verify that the state is a
2-elementarray without NULL entries4. -) int4_sum checks if the input is NULL
 

The number of conditional branches should be about the same (and all are
seldomly taken). The validity checks on the state array, i.e. (3), should
be rather cheap I think - not quite as cheap as PG_ARGISNULL maybe, but
not so much more expensive either. That leaves the AggCheckCallContext call.
If that call costs us 5%, maybe we can find a way to make it faster, or
get rid of it entirely? Still seems a lot of a call of a not-very-complex
function, though...

I'll go and check the disassembly - maybe something in int4_avg_accum turns
out to be more complex than is immediately obvious. I'll also try to create
a call profile, unless you already have one from your test runs.

best regards,
Florian Pflug





Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr9, 2014, at 20:20 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There was discussion upthread of providing
> two separate forward transition functions, but Florian argued that that
> would do nothing that you couldn't accomplish with a runtime check in
> the forward function.  I think that's nonsense though, because one of
> the key points here is that an invertible aggregate may require a more
> complex transition state data structure --- in particular, if you're
> forced to go from a pass-by-value to a pass-by-reference data type, right
> there you are going to take a big hit in aggregate performance, and there
> is no way for the forward transition function to avoid it.

To be precise, my point was that *only* having a separate non-invertible
forward transition function is pointless, exactly because of the reason
you gave - that won't allow a cheaper state representation for the
non-invertible case. So all such a non-invertible forward transition
function can do is to skip some bookkeeping that's required by the
inverse transition function - and *that* can just as easily be accomplished
by a runtime check.

I was (and still am) not in favour of duplicating the whole quadruple of
(state, initialvalue, transferfunction, finalfunction) because it seems
excessive. In fact, I believed that doing this would probably be grounds for
outright rejection of the patch, on the base of catalog bloat. And your
initial response to this suggestion seemed to confirm this.

> The patch has in fact already done that to a couple of basic aggregates like
> sum(int4).  Has anyone bothered to test what side-effects that has on
> non-windowed aggregation performance?

I'm pretty sure David Rowley did some benchmarking. The results should be
in this thread somewhere I think, but they currently evade me... Maybe David
can re-post, if he's following this...

> I think what'd make sense is to have a separate forward function *and*
> separate state datatype to be used when we want invertible aggregation
> (hm, and I guess a separate final function too).  So an aggregate
> definition would include two entirely independent implementations.
> If they chance to share sfunc and stype, fine, but they don't have to.
>
> This would mean we'd need some new names for the doppelgangers of the
> CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond
> (but not sortop).  I guess that'd open up a chance to use a more uniform
> naming scheme, but I'm not too sure what would be good.

If we really go down that road (and I'm far from convinced), then maybe
instead of having a bunch of additional fields, we could have separate
entries in pg_aggregate for the two cases, with links between them.

The non-invertible aggregates would have something like "_noninv" or so
appended to their name, and we'd automatically use them if we know we
won't need to remove entries from the aggregation state. That would also
allow the users to *force* the non-invertible aggregate to be used by
simply saying "SUM_NONINV" instead of "SUM".

Then all we'd need would be an additional OID field that links the
invertible to the non-invertible definition.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> I was (and still am) not in favour of duplicating the whole quadruple of
> (state, initialvalue, transferfunction, finalfunction) because it seems
> excessive. In fact, I believed that doing this would probably be grounds for
> outright rejection of the patch, on the base of catalog bloat. And your
> initial response to this suggestion seemed to confirm this.

Well, I think it's much more likely that causing a performance penalty for
cases unrelated to window aggregates would lead to outright rejection :-(.
The majority of our users probably don't ever use window functions, but
for sure they've heard of SUM().  We can't penalize the non-window case.

Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this
suggestion) is a little annoying but it doesn't sound like a show stopper.
It seems reasonable to assume that the extra initval would be NULL in most
cases, so it's probably a net addition of 12 bytes per row.

> On Apr9, 2014, at 20:20 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The patch has in fact already done that to a couple of basic aggregates like
>> sum(int4).  Has anyone bothered to test what side-effects that has on
>> non-windowed aggregation performance?

> I'm pretty sure David Rowley did some benchmarking. The results should be
> in this thread somewhere I think, but they currently evade me... Maybe David
> can re-post, if he's following this...

I saw benchmarks addressing window aggregation, but none looking for
side-effects on plain aggregation.

> If we really go down that road (and I'm far from convinced), then maybe
> instead of having a bunch of additional fields, we could have separate
> entries in pg_aggregate for the two cases, with links between them.

That seems like a complete mess; in particular it would break the primary
key for pg_aggregate (aggfnoid), and probably break every existing query
that looks at pg_aggregate.  Some extra fields would not break such
expectations (in fact we've added fields to pg_aggregate in the past).
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr9, 2014, at 23:17 , Florian Pflug <fgp@phlo.org> wrote:

> On Apr9, 2014, at 21:35 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A quick test says that avg(int4)
>> is about five percent slower than sum(int4), so that's the kind of hit
>> we'd be taking on non-windowed aggregations if we do it like this.
> 
> That's rather surprising though. AFAICS, there's isn't much difference
> between the two transfer functions int4_sum and int4_avg_accum at all.
> 
> The differences come down to (+ denoting things which ought to make
> int4_avg_accum slower compared to int4_sum, - denoting the opposite)
> 
> 1. +) int4_avg_accum calls AggCheckCallContext
> 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum)
> 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS
>       to verify that the state is a 2-element array without NULL entries
> 4. -) int4_sum checks if the input is NULL

I've done a bit of profiling on this (using Instruments.app on OSX). One
thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that
calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly,
since we know that the datum cannot possibly be toasted I think (or if it
was, nodeAgg.c should do the de-toasting).

The profile also attributes a rather large percent of the total runtime of
int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting
rid of that would help quite a few transition functions, invertible or not.
That certainly seems doable - we'd need a way to mark functions as internal
support functions, and prevent user-initiated calls of such functions. 
Transition functions marked that way could then safely scribble over their
state arguments without having to consult AggCheckCallContext() first.

I've also compared the disassemblies of int4_sum and int4_avg_accum, and
apart from these differences, they are pretty similar.

Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's
transition type is that it needs to return NULL, not 0, if zero rows
were aggregates. It might seems that it could just use int8 as state
with initial value NULL, but that only works if the transition functions
are non-strict (since the special case of state type == input type doesn't
apply here). And for non-strict transition functions need to deal with
NULL inputs themselves, which means counting the number of non-NULL inputs..

That problem would go away though if we had support for an initalfunction,
which would receive the first input value and initialize the state. In
the case of SUM(), the initial function would be strict, and thus would be
called on the first non-NULL input value, which it'd convert to int8 and
return as the new state. 

However, I still believe the best approach at this point is to just work
on making int4_avg_accum faster. I still see no principal reason what it
has to be noticeably slower - the only additional work it absolutely *has*
to perform is *one* 64-bit increment.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 9 April 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> I was (and still am) not in favour of duplicating the whole quadruple of
>> (state, initialvalue, transferfunction, finalfunction) because it seems
>> excessive. In fact, I believed that doing this would probably be grounds for
>> outright rejection of the patch, on the base of catalog bloat. And your
>> initial response to this suggestion seemed to confirm this.
>
> Well, I think it's much more likely that causing a performance penalty for
> cases unrelated to window aggregates would lead to outright rejection :-(.
> The majority of our users probably don't ever use window functions, but
> for sure they've heard of SUM().  We can't penalize the non-window case.
>
> Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this
> suggestion) is a little annoying but it doesn't sound like a show stopper.
> It seems reasonable to assume that the extra initval would be NULL in most
> cases, so it's probably a net addition of 12 bytes per row.
>
>> On Apr9, 2014, at 20:20 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The patch has in fact already done that to a couple of basic aggregates like
>>> sum(int4).  Has anyone bothered to test what side-effects that has on
>>> non-windowed aggregation performance?
>
>> I'm pretty sure David Rowley did some benchmarking. The results should be
>> in this thread somewhere I think, but they currently evade me... Maybe David
>> can re-post, if he's following this...
>
> I saw benchmarks addressing window aggregation, but none looking for
> side-effects on plain aggregation.
>
>> If we really go down that road (and I'm far from convinced), then maybe
>> instead of having a bunch of additional fields, we could have separate
>> entries in pg_aggregate for the two cases, with links between them.
>
> That seems like a complete mess; in particular it would break the primary
> key for pg_aggregate (aggfnoid), and probably break every existing query
> that looks at pg_aggregate.  Some extra fields would not break such
> expectations (in fact we've added fields to pg_aggregate in the past).
>

This may initially sound unrelated, but I think it might address some
of these issues. Suppose we added a 'firsttrans' function, that took a
single argument (the first value to be aggregated) and was responsible
for creating the initial state from that first value.

This would apply to aggregates that ignore null values, but whose
transition function cannot currently be declared strict (either
because the state type is internal, or because it is not the same as
the aggregate's argument type).

I think quite a lot of the existing aggregates fall into this
category, and this would allow their transition functions to be made
strict and simplified --- no more testing if the state is null, and
then building it, and no more testing if the argument is null and
ignoring it. That might give a noticeable performance boost in the
regular aggregate case, especially over data containing nulls.

But in addition, it would help with writing inverse transition
functions because if the transition functions could be made strict,
they wouldn't need to do null-counting, which would mean that their
state types would not need to be expanded.

So for example sum(int4) could continue to have int8 as its state
type, it could use int8(int4) as its firsttrans function, and
int4_sum() could become strict and lose all its null-handling logic.
Then int4_sum_inv() would be the trivial to write - just doing the
same in reverse.

I'm not sure it helps for all aggregates, but there are certainly some
where it would seem to simplify things.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 10 April 2014 01:13, Florian Pflug <fgp@phlo.org> wrote:
> Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's
> transition type is that it needs to return NULL, not 0, if zero rows
> were aggregates. It might seems that it could just use int8 as state
> with initial value NULL, but that only works if the transition functions
> are non-strict (since the special case of state type == input type doesn't
> apply here). And for non-strict transition functions need to deal with
> NULL inputs themselves, which means counting the number of non-NULL inputs..
>
> That problem would go away though if we had support for an initalfunction,
> which would receive the first input value and initialize the state. In
> the case of SUM(), the initial function would be strict, and thus would be
> called on the first non-NULL input value, which it'd convert to int8 and
> return as the new state.
>

Ah snap!

> However, I still believe the best approach at this point is to just work
> on making int4_avg_accum faster. I still see no principal reason what it
> has to be noticeably slower - the only additional work it absolutely *has*
> to perform is *one* 64-bit increment.
>

In the best case that would make sum() not noticeably slower than
avg(), whereas using a firsttrans/initialfunction would potentially
make both of them faster than they currently are, and not just in
window queries.

Also, IMO a first/initial function leads to a cleaner separation of
logic, allowing some of the transition functions to be simplified
rather than becoming more complex.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Thu, Apr 10, 2014 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I'm pretty sure David Rowley did some benchmarking. The results should be
> in this thread somewhere I think, but they currently evade me... Maybe David
> can re-post, if he's following this...

I saw benchmarks addressing window aggregation, but none looking for
side-effects on plain aggregation.



I think it was only around SUM(numeric), and you'll need to scroll down a bit to find it as it's mixed in with a load of window agg tests. At the time it was the only forward transition function that I had added any overhead to, so I think it was the only one I bothered to test at the time.
However, I do remember benchmarking the bool_and and bool_or changes after I rewrote the way they worked and also found the same as you... no difference, I just can't find the post with my results... Though it sounds like Tom found the same as me so I don't think it's worth me looking any more for them.

Regards

David Rowley


 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
David Rowley
Date:
On Thu, Apr 10, 2014 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Florian Pflug <fgp@phlo.org> writes:
> I was (and still am) not in favour of duplicating the whole quadruple of
> (state, initialvalue, transferfunction, finalfunction) because it seems
> excessive. In fact, I believed that doing this would probably be grounds for
> outright rejection of the patch, on the base of catalog bloat. And your
> initial response to this suggestion seemed to confirm this.

Well, I think it's much more likely that causing a performance penalty for
cases unrelated to window aggregates would lead to outright rejection :-(.
The majority of our users probably don't ever use window functions, but
for sure they've heard of SUM().  We can't penalize the non-window case.

Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this
suggestion) is a little annoying but it doesn't sound like a show stopper.
It seems reasonable to assume that the extra initval would be NULL in most
cases, so it's probably a net addition of 12 bytes per row.


I also wouldn't imagine that the overhead of storing that would be too great... And are there really any databases out there that have 1000's of custom aggregate functions?

I'm actually quite glad to see someone agrees with me on this. I think it opens up quite a bit of extra optimisation opportunities with things like MAX and MIN... In these cases we could be tracking the number of values of max found and reset it when we get a bigger value. That way we could report the inverse transition as successful if maxcount is still above 0 after the removal of a max value... Similar to how I implemented the inverse transition for sum(numeric). In fact doing it this way would mean that inverse transitions for sum(numeric) would never fail and retry. I just thought we had gotten to a stage of not requiring this due to the overheads being so low... I was quite surprised to see the count tracking account for 5% for sum int. What I don't quite understand yet is why we can't just create a new function for int inverse transitions instead of borrowing the inverse transition functions for avg...? 

Regards

David Rowley

 

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 10 April 2014 01:13, Florian Pflug <fgp@phlo.org> wrote:
>> However, I still believe the best approach at this point is to just work
>> on making int4_avg_accum faster. I still see no principal reason what it
>> has to be noticeably slower - the only additional work it absolutely *has*
>> to perform is *one* 64-bit increment.

> In the best case that would make sum() not noticeably slower than
> avg(), whereas using a firsttrans/initialfunction would potentially
> make both of them faster than they currently are, and not just in
> window queries.

I'm still of the opinion that we should separate the transfn for
invertible cases from the normal one, and allow for two separate
state types.  One of the things that helps with is the strictness
consideration: you no longer have to have the same strictness
setting for the plain and invertible forward transfns.

This idea of a separate firsttrans function is interesting but perhaps
orthogonal to the current patch.  Also, I don't quite understand how
it would work for aggregates with null initvalues; don't you end up
with exactly the same conflict about how you can't mark the transfn
strict?  Or is the idea that firsttrans would *only* apply to aggregates
with null initvalue, and so you wouldn't even pass the previous state
value to it?
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 10 April 2014 15:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> On 10 April 2014 01:13, Florian Pflug <fgp@phlo.org> wrote:
>>> However, I still believe the best approach at this point is to just work
>>> on making int4_avg_accum faster. I still see no principal reason what it
>>> has to be noticeably slower - the only additional work it absolutely *has*
>>> to perform is *one* 64-bit increment.
>
>> In the best case that would make sum() not noticeably slower than
>> avg(), whereas using a firsttrans/initialfunction would potentially
>> make both of them faster than they currently are, and not just in
>> window queries.
>
> I'm still of the opinion that we should separate the transfn for
> invertible cases from the normal one, and allow for two separate
> state types.  One of the things that helps with is the strictness
> consideration: you no longer have to have the same strictness
> setting for the plain and invertible forward transfns.
>

Yes, I can imagine that there would be some aggregates for which the
plain forwards transition function would be simpler than the
invertible one, with a simpler state type. You'd still be left with
quite a large number of existing aggregates having non-strict plain
transition functions, in addition to a bunch of new non-strict
invertible transition functions that had to do null counting.


> This idea of a separate firsttrans function is interesting but perhaps
> orthogonal to the current patch.  Also, I don't quite understand how
> it would work for aggregates with null initvalues; don't you end up
> with exactly the same conflict about how you can't mark the transfn
> strict?  Or is the idea that firsttrans would *only* apply to aggregates
> with null initvalue, and so you wouldn't even pass the previous state
> value to it?
>

I was imagining that firsttrans would only be passed the first value
to be aggregated, not any previous state, and that it would be illegal
to specify both an initcond and a firsttrans function.

The forward transition function would only be called for values after
the first, by which point the state would be non-null, and so it could
be made strict in most cases. The same would apply to the invertible
transition functions, so they wouldn't have to do null counting, which
in turn would make their state types simpler.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 10 April 2014 15:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This idea of a separate firsttrans function is interesting but perhaps
>> orthogonal to the current patch.  Also, I don't quite understand how
>> it would work for aggregates with null initvalues; don't you end up
>> with exactly the same conflict about how you can't mark the transfn
>> strict?  Or is the idea that firsttrans would *only* apply to aggregates
>> with null initvalue, and so you wouldn't even pass the previous state
>> value to it?

> I was imagining that firsttrans would only be passed the first value
> to be aggregated, not any previous state, and that it would be illegal
> to specify both an initcond and a firsttrans function.

Got it.  So the existing behavior where we insert the first non-null
value could be seen as a degenerate case in which the firsttrans function
is an identity.

> The forward transition function would only be called for values after
> the first, by which point the state would be non-null, and so it could
> be made strict in most cases. The same would apply to the invertible
> transition functions, so they wouldn't have to do null counting, which
> in turn would make their state types simpler.

Makes sense to me.  We need names for these things though.  I don't
think abbreviating to "ffunc" is a good idea because of the likelihood
of confusion with the finalfunc (indeed I see the CREATE AGGREGATE
ref page is already using "ffunc" as a short form for finalfunc).
Maybe "initfunc", which would parallel "initcond"?

What about names for the invertible-aggregate infrastructure?
I'm tempted to prefix "inv" to all the existing names, but then
"invsfunc" means the alternate forward function ... can we use
"invifunc" for the inverse transition function?  Or maybe the
prefix should be just "i".
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 10 April 2014 19:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> On 10 April 2014 15:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> This idea of a separate firsttrans function is interesting but perhaps
>>> orthogonal to the current patch.  Also, I don't quite understand how
>>> it would work for aggregates with null initvalues; don't you end up
>>> with exactly the same conflict about how you can't mark the transfn
>>> strict?  Or is the idea that firsttrans would *only* apply to aggregates
>>> with null initvalue, and so you wouldn't even pass the previous state
>>> value to it?
>
>> I was imagining that firsttrans would only be passed the first value
>> to be aggregated, not any previous state, and that it would be illegal
>> to specify both an initcond and a firsttrans function.
>
> Got it.  So the existing behavior where we insert the first non-null
> value could be seen as a degenerate case in which the firsttrans function
> is an identity.
>

Right.

>> The forward transition function would only be called for values after
>> the first, by which point the state would be non-null, and so it could
>> be made strict in most cases. The same would apply to the invertible
>> transition functions, so they wouldn't have to do null counting, which
>> in turn would make their state types simpler.
>
> Makes sense to me.  We need names for these things though.  I don't
> think abbreviating to "ffunc" is a good idea because of the likelihood
> of confusion with the finalfunc (indeed I see the CREATE AGGREGATE
> ref page is already using "ffunc" as a short form for finalfunc).
> Maybe "initfunc", which would parallel "initcond"?
>

Yes, I was already thinking that "initfunc" is actually a much better
name for that.

> What about names for the invertible-aggregate infrastructure?
> I'm tempted to prefix "inv" to all the existing names, but then
> "invsfunc" means the alternate forward function ... can we use
> "invifunc" for the inverse transition function?  Or maybe the
> prefix should be just "i".
>

Hmm, I'm not a fan of any of those names. Perhaps "win" as a prefix to
denote a sliding window? Or just "m" for "moving aggregate".

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 10 April 2014 19:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What about names for the invertible-aggregate infrastructure?
>> I'm tempted to prefix "inv" to all the existing names, but then
>> "invsfunc" means the alternate forward function ... can we use
>> "invifunc" for the inverse transition function?  Or maybe the
>> prefix should be just "i".

> Hmm, I'm not a fan of any of those names. Perhaps "win" as a prefix to
> denote a sliding window? Or just "m" for "moving aggregate".

Hmm ... "moving aggregate" is actually a pretty good name for this
whole feature -- better than "invertible aggregate" anyway.  I can
feel a global-search-and-replace coming on.

So if we go with that terminology, perhaps these names for the
new CREATE AGGREGATE parameters:

initfunc    applies to plain aggregation, mutually exclusive with initcond
msfunc        (or just mfunc?) forward transition for moving-agg mode
mifunc        inverse transition for moving-agg mode
mstype        state datatype for moving-agg mode
msspace        space estimate for mstype
mfinalfunc    final function for moving-agg mode
minitfunc    "firsttrans" for moving-agg mode
minitcond    mutually exclusive with minitfunc

That takes us up to 16 columns in pg_aggregate, but it's still not going
to be a very voluminous catalog --- there's only 171 rows there today.
So I'm not particularly concerned about space, and if there's a chance
of squeezing out cycles, I think we should seize it.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 10 April 2014 19:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> On 10 April 2014 19:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> What about names for the invertible-aggregate infrastructure?
>>> I'm tempted to prefix "inv" to all the existing names, but then
>>> "invsfunc" means the alternate forward function ... can we use
>>> "invifunc" for the inverse transition function?  Or maybe the
>>> prefix should be just "i".
>
>> Hmm, I'm not a fan of any of those names. Perhaps "win" as a prefix to
>> denote a sliding window? Or just "m" for "moving aggregate".
>
> Hmm ... "moving aggregate" is actually a pretty good name for this
> whole feature -- better than "invertible aggregate" anyway.  I can
> feel a global-search-and-replace coming on.
>
> So if we go with that terminology, perhaps these names for the
> new CREATE AGGREGATE parameters:
>
> initfunc        applies to plain aggregation, mutually exclusive with initcond
> msfunc          (or just mfunc?) forward transition for moving-agg mode
> mifunc          inverse transition for moving-agg mode
> mstype          state datatype for moving-agg mode
> msspace         space estimate for mstype
> mfinalfunc      final function for moving-agg mode
> minitfunc       "firsttrans" for moving-agg mode
> minitcond       mutually exclusive with minitfunc
>

Yeah, those work for me.

I think I prefer "mfunc" to "msfunc", but perhaps that's just my
natural aversion to the "ms" prefix :-)

Also, perhaps "minvfunc" rather than "mifunc" because "i" by itself
could mean "initial".

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr10, 2014, at 02:13 , Florian Pflug <fgp@phlo.org> wrote:
> On Apr9, 2014, at 23:17 , Florian Pflug <fgp@phlo.org> wrote:
>> On Apr9, 2014, at 21:35 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> A quick test says that avg(int4)
>>> is about five percent slower than sum(int4), so that's the kind of hit
>>> we'd be taking on non-windowed aggregations if we do it like this.
>>
>> That's rather surprising though. AFAICS, there's isn't much difference
>> between the two transfer functions int4_sum and int4_avg_accum at all.
>>
>> The differences come down to (+ denoting things which ought to make
>> int4_avg_accum slower compared to int4_sum, - denoting the opposite)
>>
>> 1. +) int4_avg_accum calls AggCheckCallContext
>> 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum)
>> 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS
>>      to verify that the state is a 2-element array without NULL entries
>> 4. -) int4_sum checks if the input is NULL
>
> I've done a bit of profiling on this (using Instruments.app on OSX). One
> thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that
> calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly,
> since we know that the datum cannot possibly be toasted I think (or if it
> was, nodeAgg.c should do the de-toasting).
>
> The profile also attributes a rather large percent of the total runtime of
> int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting
> rid of that would help quite a few transition functions, invertible or not.
> That certainly seems doable - we'd need a way to mark functions as internal
> support functions, and prevent user-initiated calls of such functions.
> Transition functions marked that way could then safely scribble over their
> state arguments without having to consult AggCheckCallContext() first.
>
> ...
>
> However, I still believe the best approach at this point is to just work
> on making int4_avg_accum faster. I still see no principal reason what it
> has to be noticeably slower - the only additional work it absolutely *has*
> to perform is *one* 64-bit increment.

I played with this a bit.

Currently, int4_avg_accum invokes AggCheckCallContext every time, and also
repeatedly checks whether the array has the required dimension - which,
incidentally, is the only big difference between int4_avg_accum and int4_sum.

To avoid that, I added a flags field to fcinfo which nodeAgg uses to tell
transition functions whether they're called for the first time, or are being
called with whatever state they themselves returned last time, i.e the
n-th time.

If the n-th time flag is set, int4_avg_accum simply retrieves the state with
PG_GETARG_DATUM() instead of PG_GETARG_ARRAYTYPE_P(), relying on the fact
that it never returns toasted datums itself, and also doesn't bother to validate
the array size, for the same reason.

If the flag is not set, it uses PG_GETARG_ARRAYTYPE_COPY_P and does validate
the array size. In theory, it could further distinguish between a 1st call
in an aggregation context (where the copy is unnecessary), and a user-initiated
call (i.e. outside an aggregation). But that seems unnecessary - one additional
copy per aggregation group seems unlikely to be a problem.

With this in place, instruction-level profiling using Apple's Instruments.app
shows that int4_avg_accum takes about 1.5% of the total runtime of a simple
aggregation, while int4_sum takes about 1.2%.

A (very rough) patch is attached.

I haven't been able to repeat Tom's measurement which shows a 5% difference in
performance between the invertible and the non-invertible versions of SUM(int4),
so I cannot say if this removes that. But the profiling I've done would certainly
indicate it should.

best regards,
Florian Pflug


Attachment

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> I was imagining that firsttrans would only be passed the first value
> to be aggregated, not any previous state, and that it would be illegal
> to specify both an initcond and a firsttrans function.

> The forward transition function would only be called for values after
> the first, by which point the state would be non-null, and so it could
> be made strict in most cases. The same would apply to the invertible
> transition functions, so they wouldn't have to do null counting, which
> in turn would make their state types simpler.

I put together a very fast proof-of-concept patch for this (attached).
It has a valid execution path for an aggregate with initfunc, but I didn't
bother writing the CREATE AGGREGATE support yet.  I made sum(int4) work
as you suggest, marking the transfn strict and ripping out int4_sum's
internal support for null inputs.  The result seems to be about a 4% or so
improvement in the overall aggregation speed, for a simple "SELECT
sum(int4col) FROM table" query.  So from a performance standpoint this
seems only marginally worth doing.  The real problem is not that 4% isn't
worth the trouble, it's that AFAICS the only built-in aggregates that
can benefit are sum(int2) and sum(int4).  So that looks like a rather
narrow use-case.

You had suggested upthread that we could use this idea to make the
transition functions strict for aggregates using "internal" transition
datatypes, but that does not work because the initfunc would violate
the safety rule that a function returning internal must take at least
one internal-type argument.  That puts a pretty strong damper on the
usefulness of the approach, given how many internal-transtype aggregates
we have (and the moving-aggregate case is not going to be better is it?)

So at this point I'm feeling unexcited about the initfunc idea.
Unless it does something really good for the moving-aggregate case,
I think we should drop it.

            regards, tom lane

diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index fe6dc8a..1ca5c8f 100644
*** a/src/backend/catalog/pg_aggregate.c
--- b/src/backend/catalog/pg_aggregate.c
*************** AggregateCreate(const char *aggName,
*** 390,395 ****
--- 390,396 ----
      values[Anum_pg_aggregate_aggfnoid - 1] = ObjectIdGetDatum(procOid);
      values[Anum_pg_aggregate_aggkind - 1] = CharGetDatum(aggKind);
      values[Anum_pg_aggregate_aggnumdirectargs - 1] = Int16GetDatum(numDirectArgs);
+     values[Anum_pg_aggregate_agginitfn - 1] = ObjectIdGetDatum(InvalidOid);
      values[Anum_pg_aggregate_aggtransfn - 1] = ObjectIdGetDatum(transfn);
      values[Anum_pg_aggregate_aggfinalfn - 1] = ObjectIdGetDatum(finalfn);
      values[Anum_pg_aggregate_aggsortop - 1] = ObjectIdGetDatum(sortop);
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 7e4bca5..2671c49 100644
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*************** typedef struct AggStatePerAggData
*** 152,157 ****
--- 152,158 ----
      int            numTransInputs;

      /* Oids of transfer functions */
+     Oid            initfn_oid;        /* may be InvalidOid */
      Oid            transfn_oid;
      Oid            finalfn_oid;    /* may be InvalidOid */

*************** typedef struct AggStatePerAggData
*** 160,165 ****
--- 161,167 ----
       * corresponding oid is not InvalidOid.  Note in particular that fn_strict
       * flags are kept here.
       */
+     FmgrInfo    initfn;
      FmgrInfo    transfn;
      FmgrInfo    finalfn;

*************** typedef struct AggHashEntryData
*** 296,301 ****
--- 298,306 ----
  static void initialize_aggregates(AggState *aggstate,
                        AggStatePerAgg peragg,
                        AggStatePerGroup pergroup);
+ static void initialize_transition_value(AggState *aggstate,
+                             AggStatePerAgg peraggstate,
+                             AggStatePerGroup pergroupstate);
  static void advance_transition_function(AggState *aggstate,
                              AggStatePerAgg peraggstate,
                              AggStatePerGroup pergroupstate);
*************** initialize_aggregates(AggState *aggstate
*** 403,408 ****
--- 408,498 ----
  }

  /*
+  * Initialize the transition value upon reaching the first non-NULL input(s).
+  *
+  * We use this code when the initcond is NULL and the transfn is strict,
+  * which otherwise would mean the transition value can never become non-null.
+  * If an initfn has been provided, call it on the current input value(s);
+  * otherwise, take the current input value as the new transition value.
+  * (In the latter case, we already checked that this is okay datatype-wise.)
+  */
+ static void
+ initialize_transition_value(AggState *aggstate,
+                             AggStatePerAgg peraggstate,
+                             AggStatePerGroup pergroupstate)
+ {
+     FunctionCallInfo tfcinfo = &peraggstate->transfn_fcinfo;
+     MemoryContext oldContext;
+
+     if (OidIsValid(peraggstate->initfn_oid))
+     {
+         FunctionCallInfoData fcinfo;
+         int            numInitArgs;
+         Datum        newVal;
+
+         /* We run the transition functions in per-input-tuple memory context */
+         oldContext = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
+
+         /*
+          * Set up an fcinfo for calling the initfn.  It takes one fewer
+          * argument than the transfn: the old transValue is not passed, since
+          * by definition it's NULL at this point.  We know that the other
+          * transfn arguments are all non-null, so no need to check.
+          */
+         numInitArgs = peraggstate->numTransInputs;
+         InitFunctionCallInfoData(fcinfo, &(peraggstate->initfn),
+                                  numInitArgs,
+                                  peraggstate->aggCollation,
+                                  (void *) aggstate, NULL);
+         /* Skip tfcinfo->arg[0], which would be the transValue */
+         memcpy(fcinfo.arg, tfcinfo->arg + 1, numInitArgs * sizeof(Datum));
+         memset(fcinfo.argnull, 0, numInitArgs * sizeof(bool));
+
+         /* set up aggstate->curperagg for AggGetAggref() */
+         aggstate->curperagg = peraggstate;
+
+         newVal = FunctionCallInvoke(&fcinfo);
+
+         aggstate->curperagg = NULL;
+
+         /*
+          * If pass-by-ref datatype, must copy the new value into aggcontext.
+          * Needn't pfree the prior transValue, since it's NULL.
+          */
+         if (!peraggstate->transtypeByVal && !fcinfo.isnull)
+         {
+             MemoryContextSwitchTo(aggstate->aggcontext);
+             newVal = datumCopy(newVal,
+                                peraggstate->transtypeByVal,
+                                peraggstate->transtypeLen);
+         }
+
+         pergroupstate->transValue = newVal;
+         pergroupstate->transValueIsNull = fcinfo.isnull;
+
+         MemoryContextSwitchTo(oldContext);
+     }
+     else
+     {
+         /*
+          * No initfn, so use the current input value as the new transValue.
+          *
+          * We must copy the datum into aggcontext if it is pass-by-ref. We do
+          * not need to pfree the old transValue, since it's NULL.
+          */
+         oldContext = MemoryContextSwitchTo(aggstate->aggcontext);
+         pergroupstate->transValue = datumCopy(tfcinfo->arg[1],
+                                               peraggstate->transtypeByVal,
+                                               peraggstate->transtypeLen);
+         pergroupstate->transValueIsNull = false;
+         MemoryContextSwitchTo(oldContext);
+     }
+
+     /* Reset state so we won't do this again in the current group */
+     pergroupstate->noTransValue = false;
+ }
+
+ /*
   * Given new input value(s), advance the transition function of an aggregate.
   *
   * The new values (and null flags) have been preloaded into argument positions
*************** advance_transition_function(AggState *ag
*** 438,458 ****
          if (pergroupstate->noTransValue)
          {
              /*
!              * transValue has not been initialized. This is the first non-NULL
!              * input value. We use it as the initial value for transValue. (We
!              * already checked that the agg's input type is binary-compatible
!              * with its transtype, so straight copy here is OK.)
!              *
!              * We must copy the datum into aggcontext if it is pass-by-ref. We
!              * do not need to pfree the old transValue, since it's NULL.
               */
!             oldContext = MemoryContextSwitchTo(aggstate->aggcontext);
!             pergroupstate->transValue = datumCopy(fcinfo->arg[1],
!                                                   peraggstate->transtypeByVal,
!                                                   peraggstate->transtypeLen);
!             pergroupstate->transValueIsNull = false;
!             pergroupstate->noTransValue = false;
!             MemoryContextSwitchTo(oldContext);
              return;
          }
          if (pergroupstate->transValueIsNull)
--- 528,538 ----
          if (pergroupstate->noTransValue)
          {
              /*
!              * We had a NULL initcond, and this is the first non-NULL input
!              * value(s).  Perform special initialization, using the initfn if
!              * any.
               */
!             initialize_transition_value(aggstate, peraggstate, pergroupstate);
              return;
          }
          if (pergroupstate->transValueIsNull)
*************** ExecInitAgg(Agg *node, EState *estate, i
*** 1676,1682 ****
          Form_pg_aggregate aggform;
          Oid            aggtranstype;
          AclResult    aclresult;
!         Oid            transfn_oid,
                      finalfn_oid;
          Expr       *transfnexpr,
                     *finalfnexpr;
--- 1756,1763 ----
          Form_pg_aggregate aggform;
          Oid            aggtranstype;
          AclResult    aclresult;
!         Oid            initfn_oid,
!                     transfn_oid,
                      finalfn_oid;
          Expr       *transfnexpr,
                     *finalfnexpr;
*************** ExecInitAgg(Agg *node, EState *estate, i
*** 1728,1733 ****
--- 1809,1815 ----
                             get_func_name(aggref->aggfnoid));
          InvokeFunctionExecuteHook(aggref->aggfnoid);

+         peraggstate->initfn_oid = initfn_oid = aggform->agginitfn;
          peraggstate->transfn_oid = transfn_oid = aggform->aggtransfn;
          peraggstate->finalfn_oid = finalfn_oid = aggform->aggfinalfn;

*************** ExecInitAgg(Agg *node, EState *estate, i
*** 1802,1808 ****
                                  &transfnexpr,
                                  &finalfnexpr);

!         /* set up infrastructure for calling the transfn and finalfn */
          fmgr_info(transfn_oid, &peraggstate->transfn);
          fmgr_info_set_expr((Node *) transfnexpr, &peraggstate->transfn);

--- 1884,1896 ----
                                  &transfnexpr,
                                  &finalfnexpr);

!         /* set up infrastructure for calling the aggregate's functions */
!         if (OidIsValid(initfn_oid))
!         {
!             fmgr_info(initfn_oid, &peraggstate->initfn);
!             // fmgr_info_set_expr((Node *) initfnexpr, &peraggstate->initfn);
!         }
!
          fmgr_info(transfn_oid, &peraggstate->transfn);
          fmgr_info_set_expr((Node *) transfnexpr, &peraggstate->transfn);

*************** ExecInitAgg(Agg *node, EState *estate, i
*** 1849,1855 ****
           * transValue.    This should have been checked at agg definition time,
           * but just in case...
           */
!         if (peraggstate->transfn.fn_strict && peraggstate->initValueIsNull)
          {
              if (numArguments <= numDirectArgs ||
                  !IsBinaryCoercible(inputTypes[numDirectArgs], aggtranstype))
--- 1937,1944 ----
           * transValue.    This should have been checked at agg definition time,
           * but just in case...
           */
!         if (peraggstate->transfn.fn_strict && peraggstate->initValueIsNull &&
!             !OidIsValid(initfn_oid))
          {
              if (numArguments <= numDirectArgs ||
                  !IsBinaryCoercible(inputTypes[numDirectArgs], aggtranstype))
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 64eb0f8..cc2a55d 100644
*** a/src/backend/utils/adt/numeric.c
--- b/src/backend/utils/adt/numeric.c
*************** int4_sum(PG_FUNCTION_ARGS)
*** 3021,3036 ****
  {
      int64        newval;

-     if (PG_ARGISNULL(0))
-     {
-         /* No non-null input seen so far... */
-         if (PG_ARGISNULL(1))
-             PG_RETURN_NULL();    /* still no non-null */
-         /* This is the first non-null input. */
-         newval = (int64) PG_GETARG_INT32(1);
-         PG_RETURN_INT64(newval);
-     }
-
      /*
       * If we're invoked as an aggregate, we can cheat and modify our first
       * parameter in-place to avoid palloc overhead. If not, we need to return
--- 3021,3026 ----
*************** int4_sum(PG_FUNCTION_ARGS)
*** 3043,3051 ****
      {
          int64       *oldsum = (int64 *) PG_GETARG_POINTER(0);

!         /* Leave the running sum unchanged in the new input is null */
!         if (!PG_ARGISNULL(1))
!             *oldsum = *oldsum + (int64) PG_GETARG_INT32(1);

          PG_RETURN_POINTER(oldsum);
      }
--- 3033,3039 ----
      {
          int64       *oldsum = (int64 *) PG_GETARG_POINTER(0);

!         *oldsum = *oldsum + (int64) PG_GETARG_INT32(1);

          PG_RETURN_POINTER(oldsum);
      }
*************** int4_sum(PG_FUNCTION_ARGS)
*** 3054,3064 ****
      {
          int64        oldsum = PG_GETARG_INT64(0);

-         /* Leave sum unchanged if new input is null. */
-         if (PG_ARGISNULL(1))
-             PG_RETURN_INT64(oldsum);
-
-         /* OK to do the addition. */
          newval = oldsum + (int64) PG_GETARG_INT32(1);

          PG_RETURN_INT64(newval);
--- 3042,3047 ----
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index f189998..79d327c 100644
*** a/src/include/catalog/pg_aggregate.h
--- b/src/include/catalog/pg_aggregate.h
***************
*** 30,35 ****
--- 30,36 ----
   *    aggfnoid            pg_proc OID of the aggregate itself
   *    aggkind                aggregate kind, see AGGKIND_ categories below
   *    aggnumdirectargs    number of arguments that are "direct" arguments
+  *    agginitfn            first-transition function (0 if none)
   *    aggtransfn            transition function
   *    aggfinalfn            final function (0 if none)
   *    aggsortop            associated sort operator (0 if none)
*************** CATALOG(pg_aggregate,2600) BKI_WITHOUT_O
*** 45,50 ****
--- 46,52 ----
      regproc        aggfnoid;
      char        aggkind;
      int16        aggnumdirectargs;
+     regproc        agginitfn;
      regproc        aggtransfn;
      regproc        aggfinalfn;
      Oid            aggsortop;
*************** typedef FormData_pg_aggregate *Form_pg_a
*** 68,83 ****
   * ----------------
   */

! #define Natts_pg_aggregate                    9
  #define Anum_pg_aggregate_aggfnoid            1
  #define Anum_pg_aggregate_aggkind            2
  #define Anum_pg_aggregate_aggnumdirectargs    3
! #define Anum_pg_aggregate_aggtransfn        4
! #define Anum_pg_aggregate_aggfinalfn        5
! #define Anum_pg_aggregate_aggsortop            6
! #define Anum_pg_aggregate_aggtranstype        7
! #define Anum_pg_aggregate_aggtransspace        8
! #define Anum_pg_aggregate_agginitval        9

  /*
   * Symbolic values for aggkind column.    We distinguish normal aggregates
--- 70,86 ----
   * ----------------
   */

! #define Natts_pg_aggregate                    10
  #define Anum_pg_aggregate_aggfnoid            1
  #define Anum_pg_aggregate_aggkind            2
  #define Anum_pg_aggregate_aggnumdirectargs    3
! #define Anum_pg_aggregate_agginitfn            4
! #define Anum_pg_aggregate_aggtransfn        5
! #define Anum_pg_aggregate_aggfinalfn        6
! #define Anum_pg_aggregate_aggsortop            7
! #define Anum_pg_aggregate_aggtranstype        8
! #define Anum_pg_aggregate_aggtransspace        9
! #define Anum_pg_aggregate_agginitval        10

  /*
   * Symbolic values for aggkind column.    We distinguish normal aggregates
*************** typedef FormData_pg_aggregate *Form_pg_a
*** 101,277 ****
   */

  /* avg */
! DATA(insert ( 2100    n 0 int8_avg_accum    numeric_avg        0    2281    128 _null_ ));
! DATA(insert ( 2101    n 0 int4_avg_accum    int8_avg        0    1016    0    "{0,0}" ));
! DATA(insert ( 2102    n 0 int2_avg_accum    int8_avg        0    1016    0    "{0,0}" ));
! DATA(insert ( 2103    n 0 numeric_avg_accum    numeric_avg 0    2281    128 _null_ ));
! DATA(insert ( 2104    n 0 float4_accum    float8_avg        0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2105    n 0 float8_accum    float8_avg        0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2106    n 0 interval_accum    interval_avg    0    1187    0    "{0 second,0 second}" ));

  /* sum */
! DATA(insert ( 2107    n 0 int8_avg_accum    numeric_sum        0    2281    128 _null_ ));
! DATA(insert ( 2108    n 0 int4_sum        -                0    20        0    _null_ ));
! DATA(insert ( 2109    n 0 int2_sum        -                0    20        0    _null_ ));
! DATA(insert ( 2110    n 0 float4pl        -                0    700        0    _null_ ));
! DATA(insert ( 2111    n 0 float8pl        -                0    701        0    _null_ ));
! DATA(insert ( 2112    n 0 cash_pl            -                0    790        0    _null_ ));
! DATA(insert ( 2113    n 0 interval_pl        -                0    1186    0    _null_ ));
! DATA(insert ( 2114    n 0 numeric_avg_accum    numeric_sum 0    2281    128 _null_ ));

  /* max */
! DATA(insert ( 2115    n 0 int8larger        -                413        20        0    _null_ ));
! DATA(insert ( 2116    n 0 int4larger        -                521        23        0    _null_ ));
! DATA(insert ( 2117    n 0 int2larger        -                520        21        0    _null_ ));
! DATA(insert ( 2118    n 0 oidlarger        -                610        26        0    _null_ ));
! DATA(insert ( 2119    n 0 float4larger    -                623        700        0    _null_ ));
! DATA(insert ( 2120    n 0 float8larger    -                674        701        0    _null_ ));
! DATA(insert ( 2121    n 0 int4larger        -                563        702        0    _null_ ));
! DATA(insert ( 2122    n 0 date_larger        -                1097    1082    0    _null_ ));
! DATA(insert ( 2123    n 0 time_larger        -                1112    1083    0    _null_ ));
! DATA(insert ( 2124    n 0 timetz_larger    -                1554    1266    0    _null_ ));
! DATA(insert ( 2125    n 0 cashlarger        -                903        790        0    _null_ ));
! DATA(insert ( 2126    n 0 timestamp_larger    -            2064    1114    0    _null_ ));
! DATA(insert ( 2127    n 0 timestamptz_larger    -            1324    1184    0    _null_ ));
! DATA(insert ( 2128    n 0 interval_larger -                1334    1186    0    _null_ ));
! DATA(insert ( 2129    n 0 text_larger        -                666        25        0    _null_ ));
! DATA(insert ( 2130    n 0 numeric_larger    -                1756    1700    0    _null_ ));
! DATA(insert ( 2050    n 0 array_larger    -                1073    2277    0    _null_ ));
! DATA(insert ( 2244    n 0 bpchar_larger    -                1060    1042    0    _null_ ));
! DATA(insert ( 2797    n 0 tidlarger        -                2800    27        0    _null_ ));
! DATA(insert ( 3526    n 0 enum_larger        -                3519    3500    0    _null_ ));

  /* min */
! DATA(insert ( 2131    n 0 int8smaller        -                412        20        0    _null_ ));
! DATA(insert ( 2132    n 0 int4smaller        -                97        23        0    _null_ ));
! DATA(insert ( 2133    n 0 int2smaller        -                95        21        0    _null_ ));
! DATA(insert ( 2134    n 0 oidsmaller        -                609        26        0    _null_ ));
! DATA(insert ( 2135    n 0 float4smaller    -                622        700        0    _null_ ));
! DATA(insert ( 2136    n 0 float8smaller    -                672        701        0    _null_ ));
! DATA(insert ( 2137    n 0 int4smaller        -                562        702        0    _null_ ));
! DATA(insert ( 2138    n 0 date_smaller    -                1095    1082    0    _null_ ));
! DATA(insert ( 2139    n 0 time_smaller    -                1110    1083    0    _null_ ));
! DATA(insert ( 2140    n 0 timetz_smaller    -                1552    1266    0    _null_ ));
! DATA(insert ( 2141    n 0 cashsmaller        -                902        790        0    _null_ ));
! DATA(insert ( 2142    n 0 timestamp_smaller    -            2062    1114    0    _null_ ));
! DATA(insert ( 2143    n 0 timestamptz_smaller -            1322    1184    0    _null_ ));
! DATA(insert ( 2144    n 0 interval_smaller    -            1332    1186    0    _null_ ));
! DATA(insert ( 2145    n 0 text_smaller    -                664        25        0    _null_ ));
! DATA(insert ( 2146    n 0 numeric_smaller -                1754    1700    0    _null_ ));
! DATA(insert ( 2051    n 0 array_smaller    -                1072    2277    0    _null_ ));
! DATA(insert ( 2245    n 0 bpchar_smaller    -                1058    1042    0    _null_ ));
! DATA(insert ( 2798    n 0 tidsmaller        -                2799    27        0    _null_ ));
! DATA(insert ( 3527    n 0 enum_smaller    -                3518    3500    0    _null_ ));

  /* count */
! DATA(insert ( 2147    n 0 int8inc_any        -                0        20        0    "0" ));
! DATA(insert ( 2803    n 0 int8inc            -                0        20        0    "0" ));

  /* var_pop */
! DATA(insert ( 2718    n 0 int8_accum    numeric_var_pop 0    2281    128 _null_ ));
! DATA(insert ( 2719    n 0 int4_accum    numeric_var_pop 0    2281    128 _null_ ));
! DATA(insert ( 2720    n 0 int2_accum    numeric_var_pop 0    2281    128 _null_ ));
! DATA(insert ( 2721    n 0 float4_accum    float8_var_pop 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2722    n 0 float8_accum    float8_var_pop 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2723    n 0 numeric_accum    numeric_var_pop 0    2281    128 _null_ ));

  /* var_samp */
! DATA(insert ( 2641    n 0 int8_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2642    n 0 int4_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2643    n 0 int2_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2644    n 0 float4_accum    float8_var_samp 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2645    n 0 float8_accum    float8_var_samp 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2646    n 0 numeric_accum    numeric_var_samp 0    2281    128 _null_ ));

  /* variance: historical Postgres syntax for var_samp */
! DATA(insert ( 2148    n 0 int8_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2149    n 0 int4_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2150    n 0 int2_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2151    n 0 float4_accum    float8_var_samp 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2152    n 0 float8_accum    float8_var_samp 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2153    n 0 numeric_accum    numeric_var_samp 0    2281    128 _null_ ));

  /* stddev_pop */
! DATA(insert ( 2724    n 0 int8_accum    numeric_stddev_pop        0    2281    128 _null_ ));
! DATA(insert ( 2725    n 0 int4_accum    numeric_stddev_pop        0    2281    128 _null_ ));
! DATA(insert ( 2726    n 0 int2_accum    numeric_stddev_pop        0    2281    128 _null_ ));
! DATA(insert ( 2727    n 0 float4_accum    float8_stddev_pop    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2728    n 0 float8_accum    float8_stddev_pop    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2729    n 0 numeric_accum    numeric_stddev_pop    0    2281    128 _null_ ));

  /* stddev_samp */
! DATA(insert ( 2712    n 0 int8_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2713    n 0 int4_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2714    n 0 int2_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2715    n 0 float4_accum    float8_stddev_samp    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2716    n 0 float8_accum    float8_stddev_samp    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2717    n 0 numeric_accum    numeric_stddev_samp 0    2281    128 _null_ ));

  /* stddev: historical Postgres syntax for stddev_samp */
! DATA(insert ( 2154    n 0 int8_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2155    n 0 int4_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2156    n 0 int2_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2157    n 0 float4_accum    float8_stddev_samp    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2158    n 0 float8_accum    float8_stddev_samp    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2159    n 0 numeric_accum    numeric_stddev_samp 0    2281    128 _null_ ));

  /* SQL2003 binary regression aggregates */
! DATA(insert ( 2818    n 0 int8inc_float8_float8        -                0    20        0    "0" ));
! DATA(insert ( 2819    n 0 float8_regr_accum    float8_regr_sxx            0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2820    n 0 float8_regr_accum    float8_regr_syy            0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2821    n 0 float8_regr_accum    float8_regr_sxy            0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2822    n 0 float8_regr_accum    float8_regr_avgx        0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2823    n 0 float8_regr_accum    float8_regr_avgy        0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2824    n 0 float8_regr_accum    float8_regr_r2            0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2825    n 0 float8_regr_accum    float8_regr_slope        0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2826    n 0 float8_regr_accum    float8_regr_intercept    0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2827    n 0 float8_regr_accum    float8_covar_pop        0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2828    n 0 float8_regr_accum    float8_covar_samp        0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2829    n 0 float8_regr_accum    float8_corr                0    1022    0    "{0,0,0,0,0,0}" ));

  /* boolean-and and boolean-or */
! DATA(insert ( 2517    n 0 booland_statefunc    -            58    16        0    _null_ ));
! DATA(insert ( 2518    n 0 boolor_statefunc    -            59    16        0    _null_ ));
! DATA(insert ( 2519    n 0 booland_statefunc    -            58    16        0    _null_ ));

  /* bitwise integer */
! DATA(insert ( 2236    n 0 int2and        -                    0    21        0    _null_ ));
! DATA(insert ( 2237    n 0 int2or        -                    0    21        0    _null_ ));
! DATA(insert ( 2238    n 0 int4and        -                    0    23        0    _null_ ));
! DATA(insert ( 2239    n 0 int4or        -                    0    23        0    _null_ ));
! DATA(insert ( 2240    n 0 int8and        -                    0    20        0    _null_ ));
! DATA(insert ( 2241    n 0 int8or        -                    0    20        0    _null_ ));
! DATA(insert ( 2242    n 0 bitand        -                    0    1560    0    _null_ ));
! DATA(insert ( 2243    n 0 bitor        -                    0    1560    0    _null_ ));

  /* xml */
! DATA(insert ( 2901    n 0 xmlconcat2    -                    0    142        0    _null_ ));

  /* array */
! DATA(insert ( 2335    n 0 array_agg_transfn    array_agg_finalfn    0    2281    0    _null_ ));

  /* text */
! DATA(insert ( 3538    n 0 string_agg_transfn    string_agg_finalfn    0    2281    0    _null_ ));

  /* bytea */
! DATA(insert ( 3545    n 0 bytea_string_agg_transfn    bytea_string_agg_finalfn    0    2281    0    _null_ ));

  /* json */
! DATA(insert ( 3175    n 0 json_agg_transfn    json_agg_finalfn    0    2281    0    _null_ ));
! DATA(insert ( 3197    n 0 json_object_agg_transfn    json_object_agg_finalfn    0    2281    0    _null_ ));

  /* ordered-set and hypothetical-set aggregates */
! DATA(insert ( 3972    o 1 ordered_set_transition            percentile_disc_final                    0    2281    0
_null_ )); 
! DATA(insert ( 3974    o 1 ordered_set_transition            percentile_cont_float8_final            0    2281    0
_null_)); 
! DATA(insert ( 3976    o 1 ordered_set_transition            percentile_cont_interval_final            0    2281    0
 _null_ )); 
! DATA(insert ( 3978    o 1 ordered_set_transition            percentile_disc_multi_final                0    2281    0
  _null_ )); 
! DATA(insert ( 3980    o 1 ordered_set_transition            percentile_cont_float8_multi_final        0    2281    0
 _null_ )); 
! DATA(insert ( 3982    o 1 ordered_set_transition            percentile_cont_interval_multi_final    0    2281    0
_null_)); 
! DATA(insert ( 3984    o 0 ordered_set_transition            mode_final                                0    2281    0
 _null_ )); 
! DATA(insert ( 3986    h 1 ordered_set_transition_multi    rank_final                                0    2281    0
_null_)); 
! DATA(insert ( 3988    h 1 ordered_set_transition_multi    percent_rank_final                        0    2281    0
_null_)); 
! DATA(insert ( 3990    h 1 ordered_set_transition_multi    cume_dist_final                            0    2281    0
_null_ )); 
! DATA(insert ( 3992    h 1 ordered_set_transition_multi    dense_rank_final                        0    2281    0
_null_)); 


  /*
--- 104,280 ----
   */

  /* avg */
! DATA(insert ( 2100    n 0 -        int8_avg_accum    numeric_avg        0    2281    128 _null_ ));
! DATA(insert ( 2101    n 0 -        int4_avg_accum    int8_avg        0    1016    0    "{0,0}" ));
! DATA(insert ( 2102    n 0 -        int2_avg_accum    int8_avg        0    1016    0    "{0,0}" ));
! DATA(insert ( 2103    n 0 -        numeric_avg_accum    numeric_avg 0    2281    128 _null_ ));
! DATA(insert ( 2104    n 0 -        float4_accum    float8_avg        0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2105    n 0 -        float8_accum    float8_avg        0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2106    n 0 -        interval_accum    interval_avg    0    1187    0    "{0 second,0 second}" ));

  /* sum */
! DATA(insert ( 2107    n 0 -        int8_avg_accum    numeric_sum        0    2281    128 _null_ ));
! DATA(insert ( 2108    n 0 481        int4_sum        -            0    20        0    _null_ ));
! DATA(insert ( 2109    n 0 -        int2_sum        -            0    20        0    _null_ ));
! DATA(insert ( 2110    n 0 -        float4pl        -            0    700        0    _null_ ));
! DATA(insert ( 2111    n 0 -        float8pl        -            0    701        0    _null_ ));
! DATA(insert ( 2112    n 0 -        cash_pl            -            0    790        0    _null_ ));
! DATA(insert ( 2113    n 0 -        interval_pl        -            0    1186    0    _null_ ));
! DATA(insert ( 2114    n 0 -        numeric_avg_accum    numeric_sum 0    2281    128 _null_ ));

  /* max */
! DATA(insert ( 2115    n 0 -        int8larger        -            413        20        0    _null_ ));
! DATA(insert ( 2116    n 0 -        int4larger        -            521        23        0    _null_ ));
! DATA(insert ( 2117    n 0 -        int2larger        -            520        21        0    _null_ ));
! DATA(insert ( 2118    n 0 -        oidlarger        -            610        26        0    _null_ ));
! DATA(insert ( 2119    n 0 -        float4larger    -            623        700        0    _null_ ));
! DATA(insert ( 2120    n 0 -        float8larger    -            674        701        0    _null_ ));
! DATA(insert ( 2121    n 0 -        int4larger        -            563        702        0    _null_ ));
! DATA(insert ( 2122    n 0 -        date_larger        -            1097    1082    0    _null_ ));
! DATA(insert ( 2123    n 0 -        time_larger        -            1112    1083    0    _null_ ));
! DATA(insert ( 2124    n 0 -        timetz_larger    -            1554    1266    0    _null_ ));
! DATA(insert ( 2125    n 0 -        cashlarger        -            903        790        0    _null_ ));
! DATA(insert ( 2126    n 0 -        timestamp_larger    -        2064    1114    0    _null_ ));
! DATA(insert ( 2127    n 0 -        timestamptz_larger    -        1324    1184    0    _null_ ));
! DATA(insert ( 2128    n 0 -        interval_larger -            1334    1186    0    _null_ ));
! DATA(insert ( 2129    n 0 -        text_larger        -            666        25        0    _null_ ));
! DATA(insert ( 2130    n 0 -        numeric_larger    -            1756    1700    0    _null_ ));
! DATA(insert ( 2050    n 0 -        array_larger    -            1073    2277    0    _null_ ));
! DATA(insert ( 2244    n 0 -        bpchar_larger    -            1060    1042    0    _null_ ));
! DATA(insert ( 2797    n 0 -        tidlarger        -            2800    27        0    _null_ ));
! DATA(insert ( 3526    n 0 -        enum_larger        -            3519    3500    0    _null_ ));

  /* min */
! DATA(insert ( 2131    n 0 -        int8smaller        -            412        20        0    _null_ ));
! DATA(insert ( 2132    n 0 -        int4smaller        -            97        23        0    _null_ ));
! DATA(insert ( 2133    n 0 -        int2smaller        -            95        21        0    _null_ ));
! DATA(insert ( 2134    n 0 -        oidsmaller        -            609        26        0    _null_ ));
! DATA(insert ( 2135    n 0 -        float4smaller    -            622        700        0    _null_ ));
! DATA(insert ( 2136    n 0 -        float8smaller    -            672        701        0    _null_ ));
! DATA(insert ( 2137    n 0 -        int4smaller        -            562        702        0    _null_ ));
! DATA(insert ( 2138    n 0 -        date_smaller    -            1095    1082    0    _null_ ));
! DATA(insert ( 2139    n 0 -        time_smaller    -            1110    1083    0    _null_ ));
! DATA(insert ( 2140    n 0 -        timetz_smaller    -            1552    1266    0    _null_ ));
! DATA(insert ( 2141    n 0 -        cashsmaller        -            902        790        0    _null_ ));
! DATA(insert ( 2142    n 0 -        timestamp_smaller    -        2062    1114    0    _null_ ));
! DATA(insert ( 2143    n 0 -        timestamptz_smaller -        1322    1184    0    _null_ ));
! DATA(insert ( 2144    n 0 -        interval_smaller    -        1332    1186    0    _null_ ));
! DATA(insert ( 2145    n 0 -        text_smaller    -            664        25        0    _null_ ));
! DATA(insert ( 2146    n 0 -        numeric_smaller -            1754    1700    0    _null_ ));
! DATA(insert ( 2051    n 0 -        array_smaller    -            1072    2277    0    _null_ ));
! DATA(insert ( 2245    n 0 -        bpchar_smaller    -            1058    1042    0    _null_ ));
! DATA(insert ( 2798    n 0 -        tidsmaller        -            2799    27        0    _null_ ));
! DATA(insert ( 3527    n 0 -        enum_smaller    -            3518    3500    0    _null_ ));

  /* count */
! DATA(insert ( 2147    n 0 -        int8inc_any        -            0        20        0    "0" ));
! DATA(insert ( 2803    n 0 -        int8inc            -            0        20        0    "0" ));

  /* var_pop */
! DATA(insert ( 2718    n 0 -        int8_accum    numeric_var_pop 0    2281    128 _null_ ));
! DATA(insert ( 2719    n 0 -        int4_accum    numeric_var_pop 0    2281    128 _null_ ));
! DATA(insert ( 2720    n 0 -        int2_accum    numeric_var_pop 0    2281    128 _null_ ));
! DATA(insert ( 2721    n 0 -        float4_accum    float8_var_pop 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2722    n 0 -        float8_accum    float8_var_pop 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2723    n 0 -        numeric_accum    numeric_var_pop 0    2281    128 _null_ ));

  /* var_samp */
! DATA(insert ( 2641    n 0 -        int8_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2642    n 0 -        int4_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2643    n 0 -        int2_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2644    n 0 -        float4_accum    float8_var_samp 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2645    n 0 -        float8_accum    float8_var_samp 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2646    n 0 -        numeric_accum    numeric_var_samp 0    2281    128 _null_ ));

  /* variance: historical Postgres syntax for var_samp */
! DATA(insert ( 2148    n 0 -        int8_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2149    n 0 -        int4_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2150    n 0 -        int2_accum    numeric_var_samp    0    2281    128 _null_ ));
! DATA(insert ( 2151    n 0 -        float4_accum    float8_var_samp 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2152    n 0 -        float8_accum    float8_var_samp 0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2153    n 0 -        numeric_accum    numeric_var_samp 0    2281    128 _null_ ));

  /* stddev_pop */
! DATA(insert ( 2724    n 0 -        int8_accum    numeric_stddev_pop        0    2281    128 _null_ ));
! DATA(insert ( 2725    n 0 -        int4_accum    numeric_stddev_pop        0    2281    128 _null_ ));
! DATA(insert ( 2726    n 0 -        int2_accum    numeric_stddev_pop        0    2281    128 _null_ ));
! DATA(insert ( 2727    n 0 -        float4_accum    float8_stddev_pop    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2728    n 0 -        float8_accum    float8_stddev_pop    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2729    n 0 -        numeric_accum    numeric_stddev_pop    0    2281    128 _null_ ));

  /* stddev_samp */
! DATA(insert ( 2712    n 0 -        int8_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2713    n 0 -        int4_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2714    n 0 -        int2_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2715    n 0 -        float4_accum    float8_stddev_samp    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2716    n 0 -        float8_accum    float8_stddev_samp    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2717    n 0 -        numeric_accum    numeric_stddev_samp 0    2281    128 _null_ ));

  /* stddev: historical Postgres syntax for stddev_samp */
! DATA(insert ( 2154    n 0 -        int8_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2155    n 0 -        int4_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2156    n 0 -        int2_accum    numeric_stddev_samp        0    2281    128 _null_ ));
! DATA(insert ( 2157    n 0 -        float4_accum    float8_stddev_samp    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2158    n 0 -        float8_accum    float8_stddev_samp    0    1022    0    "{0,0,0}" ));
! DATA(insert ( 2159    n 0 -        numeric_accum    numeric_stddev_samp 0    2281    128 _null_ ));

  /* SQL2003 binary regression aggregates */
! DATA(insert ( 2818    n 0 -        int8inc_float8_float8        -            0    20        0    "0" ));
! DATA(insert ( 2819    n 0 -        float8_regr_accum    float8_regr_sxx            0    1022    0    "{0,0,0,0,0,0}"
));
! DATA(insert ( 2820    n 0 -        float8_regr_accum    float8_regr_syy            0    1022    0    "{0,0,0,0,0,0}"
));
! DATA(insert ( 2821    n 0 -        float8_regr_accum    float8_regr_sxy            0    1022    0    "{0,0,0,0,0,0}"
));
! DATA(insert ( 2822    n 0 -        float8_regr_accum    float8_regr_avgx        0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2823    n 0 -        float8_regr_accum    float8_regr_avgy        0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2824    n 0 -        float8_regr_accum    float8_regr_r2            0    1022    0    "{0,0,0,0,0,0}"
));
! DATA(insert ( 2825    n 0 -        float8_regr_accum    float8_regr_slope        0    1022    0    "{0,0,0,0,0,0}"
));
! DATA(insert ( 2826    n 0 -        float8_regr_accum    float8_regr_intercept    0    1022    0    "{0,0,0,0,0,0}"
));
! DATA(insert ( 2827    n 0 -        float8_regr_accum    float8_covar_pop        0    1022    0    "{0,0,0,0,0,0}" ));
! DATA(insert ( 2828    n 0 -        float8_regr_accum    float8_covar_samp        0    1022    0    "{0,0,0,0,0,0}"
));
! DATA(insert ( 2829    n 0 -        float8_regr_accum    float8_corr                0    1022    0    "{0,0,0,0,0,0}"
));

  /* boolean-and and boolean-or */
! DATA(insert ( 2517    n 0 -        booland_statefunc    -        58    16        0    _null_ ));
! DATA(insert ( 2518    n 0 -        boolor_statefunc    -        59    16        0    _null_ ));
! DATA(insert ( 2519    n 0 -        booland_statefunc    -        58    16        0    _null_ ));

  /* bitwise integer */
! DATA(insert ( 2236    n 0 -        int2and        -                0    21        0    _null_ ));
! DATA(insert ( 2237    n 0 -        int2or        -                0    21        0    _null_ ));
! DATA(insert ( 2238    n 0 -        int4and        -                0    23        0    _null_ ));
! DATA(insert ( 2239    n 0 -        int4or        -                0    23        0    _null_ ));
! DATA(insert ( 2240    n 0 -        int8and        -                0    20        0    _null_ ));
! DATA(insert ( 2241    n 0 -        int8or        -                0    20        0    _null_ ));
! DATA(insert ( 2242    n 0 -        bitand        -                0    1560    0    _null_ ));
! DATA(insert ( 2243    n 0 -        bitor        -                0    1560    0    _null_ ));

  /* xml */
! DATA(insert ( 2901    n 0 -        xmlconcat2    -                0    142        0    _null_ ));

  /* array */
! DATA(insert ( 2335    n 0 -        array_agg_transfn    array_agg_finalfn    0    2281    0    _null_ ));

  /* text */
! DATA(insert ( 3538    n 0 -        string_agg_transfn    string_agg_finalfn    0    2281    0    _null_ ));

  /* bytea */
! DATA(insert ( 3545    n 0 -        bytea_string_agg_transfn    bytea_string_agg_finalfn    0    2281    0    _null_
));

  /* json */
! DATA(insert ( 3175    n 0 -        json_agg_transfn            json_agg_finalfn        0    2281    0    _null_ ));
! DATA(insert ( 3197    n 0 -        json_object_agg_transfn        json_object_agg_finalfn 0    2281    0    _null_
));

  /* ordered-set and hypothetical-set aggregates */
! DATA(insert ( 3972    o 1 -        ordered_set_transition            percentile_disc_final                    0
2281   0    _null_ )); 
! DATA(insert ( 3974    o 1 -        ordered_set_transition            percentile_cont_float8_final            0
2281   0    _null_ )); 
! DATA(insert ( 3976    o 1 -        ordered_set_transition            percentile_cont_interval_final            0
2281   0    _null_ )); 
! DATA(insert ( 3978    o 1 -        ordered_set_transition            percentile_disc_multi_final                0
2281   0    _null_ )); 
! DATA(insert ( 3980    o 1 -        ordered_set_transition            percentile_cont_float8_multi_final        0
2281   0    _null_ )); 
! DATA(insert ( 3982    o 1 -        ordered_set_transition            percentile_cont_interval_multi_final    0
2281   0    _null_ )); 
! DATA(insert ( 3984    o 0 -        ordered_set_transition            mode_final                                0
2281   0    _null_ )); 
! DATA(insert ( 3986    h 1 -        ordered_set_transition_multi    rank_final                                0
2281   0    _null_ )); 
! DATA(insert ( 3988    h 1 -        ordered_set_transition_multi    percent_rank_final                        0
2281   0    _null_ )); 
! DATA(insert ( 3990    h 1 -        ordered_set_transition_multi    cume_dist_final                            0
2281   0    _null_ )); 
! DATA(insert ( 3992    h 1 -        ordered_set_transition_multi    dense_rank_final                        0    2281
 0    _null_ )); 


  /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 7b9c587..0bcb76d 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 1839 (  numeric_stddev
*** 2445,2451 ****
  DESCR("aggregate final function");
  DATA(insert OID = 1840 (  int2_sum           PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 20 "20 21" _null_ _null_ _null_
_null_int2_sum _null_ _null_ _null_ )); 
  DESCR("aggregate transition function");
! DATA(insert OID = 1841 (  int4_sum           PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 20 "20 23" _null_ _null_ _null_
_null_int4_sum _null_ _null_ _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1842 (  int8_sum           PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 1700 "1700 20" _null_ _null_
_null__null_ int8_sum _null_ _null_ _null_ )); 
  DESCR("aggregate transition function");
--- 2445,2451 ----
  DESCR("aggregate final function");
  DATA(insert OID = 1840 (  int2_sum           PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 20 "20 21" _null_ _null_ _null_
_null_int2_sum _null_ _null_ _null_ )); 
  DESCR("aggregate transition function");
! DATA(insert OID = 1841 (  int4_sum           PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 20 "20 23" _null_ _null_ _null_
_null_int4_sum _null_ _null_ _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1842 (  int8_sum           PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 1700 "1700 20" _null_ _null_
_null__null_ int8_sum _null_ _null_ _null_ )); 
  DESCR("aggregate transition function");

Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr10, 2014, at 21:34 , Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 10 April 2014 19:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So if we go with that terminology, perhaps these names for the
>> new CREATE AGGREGATE parameters:
>>
>> initfunc        applies to plain aggregation, mutually exclusive with initcond
>> msfunc          (or just mfunc?) forward transition for moving-agg mode
>> mifunc          inverse transition for moving-agg mode
>> mstype          state datatype for moving-agg mode
>> msspace         space estimate for mstype
>> mfinalfunc      final function for moving-agg mode
>> minitfunc       "firsttrans" for moving-agg mode
>> minitcond       mutually exclusive with minitfunc
>
> I think I prefer "mfunc" to "msfunc", but perhaps that's just my
> natural aversion to the "ms" prefix :-)
>
> Also, perhaps "minvfunc" rather than "mifunc" because "i" by itself
> could mean "initial".

I still think you're getting ahead of yourselves here. The number of
aggregates which benefit from this is tiny SUM(int2,int4) and maybe
BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs -
for the others, the state type is already pass-by-ref.

I don't think we should be additional that much additional machinery
until it has been conclusively demonstrated that the performance
regression cannot be fixed any other way. Which, quite frankly, I
don't believe. Nothing in int4_avg_accum looks particularly expensive,
and the things that *do* seem to cost something certainly can be made
cheaper. c.f. the patch I just posted.

Another reason I'm so opposed to this is that inverse transition
functions might not be the last kind of transition functions we ever
add. For example, if we ever get ROLLUP/CUBE, we might want to have
a mergefunc which takes two aggregation states and combines them
into one. What do we do if we add those? Add yet a another set of
"mergable" transition functions? What about the various combinations
of invertible/non-invertible mergable/non-mergable that could result?
The opportunity cost seems pretty high here...

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> I still think you're getting ahead of yourselves here. The number of
> aggregates which benefit from this is tiny SUM(int2,int4) and maybe
> BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs -
> for the others, the state type is already pass-by-ref.

That argument is reasonable for the initfunc idea, but it doesn't apply
otherwise.

> Another reason I'm so opposed to this is that inverse transition
> functions might not be the last kind of transition functions we ever
> add. For example, if we ever get ROLLUP/CUBE, we might want to have
> a mergefunc which takes two aggregation states and combines them 
> into one. What do we do if we add those?

Make more pg_aggregate columns.  What exactly do you think is either
easier or harder about such cases?  Also, maybe I'm misremembering
the spec, but ROLLUP/CUBE wouldn't apply to window functions would
they?  So if your argument is based on the assumption that these
features need to combine, I'm not sure it's true.

The bottom line for me is that it seems conceptually far cleaner to
make the moving-aggregate support be independent than to insist that
it share an stype and sfunc with the plain case.

Furthermore, I do not buy the argument that if we hack hard enough,
we can make the performance cost of forcing the sfunc to do double duty
negligible.  In the first place, that argument is unsupported by much
evidence, and in the second place, it will certainly cost us complexity
to make the performance issue go away.  Instead we can just design the
problem out, for nothing that I see as a serious drawback.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr11, 2014, at 00:07 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> I still think you're getting ahead of yourselves here. The number of
>> aggregates which benefit from this is tiny SUM(int2,int4) and maybe
>> BOOL_{AND,OR}. And in the SUM(int2,int4) case *only* on 64-bit archs -
>> for the others, the state type is already pass-by-ref.
> 
> That argument is reasonable for the initfunc idea, but it doesn't apply
> otherwise.

Why not? AFAICS, the increase in cost comes from going from an
by-value to a by-reference state type. Once you're using a by-refence
type, you already pay the overhead of the additional dereferences, and
for calling AggCheckCallContext() or some equivalent. 

>> Another reason I'm so opposed to this is that inverse transition
>> functions might not be the last kind of transition functions we ever
>> add. For example, if we ever get ROLLUP/CUBE, we might want to have
>> a mergefunc which takes two aggregation states and combines them 
>> into one. What do we do if we add those?
> 
> Make more pg_aggregate columns.  What exactly do you think is either
> easier or harder about such cases?  Also, maybe I'm misremembering
> the spec, but ROLLUP/CUBE wouldn't apply to window functions would
> they?  So if your argument is based on the assumption that these
> features need to combine, I'm not sure it's true.

Well, it was just an example - there might be other future extensions
which *do* need to combine. And we've been known to go beyond the spec
sometimes...

> Furthermore, I do not buy the argument that if we hack hard enough,
> we can make the performance cost of forcing the sfunc to do double duty
> negligible.  In the first place, that argument is unsupported by much
> evidence, and in the second place, it will certainly cost us complexity
> to make the performance issue go away.  Instead we can just design the
> problem out, for nothing that I see as a serious drawback.

My argument is that is costs us more complexity to duplicate everything
for the invertible case, *and* the result seems less flexible - not
from the POV of aggregate implementations, but from the POV of future
extensions.

As for evidence - have you looked at the patch I posted? I'd be very
interested to know if it removes the performance differences you saw.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> My argument is that is costs us more complexity to duplicate everything
> for the invertible case, *and* the result seems less flexible - not
> from the POV of aggregate implementations, but from the POV of future
> extensions.

[ shrug... ]  You can argue against any feature whatsoever by claiming
that it might possibly conflict with something we would wish to do
sometime in the future.  I think you need to have a much more concrete
argument about specific issues this will cause in order to be convincing.
For all we know about ROLLUP/CUBE implementation issues right now, doing
this feature with separate implementations might make that easier not
harder.  (I note that the crux of my complaint right now is that we're
asking sfuncs to serve two masters --- how's it going to be better when
they have to serve three or four?)

> As for evidence - have you looked at the patch I posted? I'd be very
> interested to know if it removes the performance differences you saw.

(1) You can't really prove the absence of a performance issue by showing
that one specific aggregate doesn't have an issue.  (2) These results
(mine as well as yours) seem mighty platform-dependent, so the fact you
don't see an issue doesn't mean someone else won't.  (3) A new
FunctionCallInfoData field just for this?  Surely not.  There's got to be
a distributed cost to that (although I notice you didn't bother
initializing the field most places, which is cheating).

Now point 3 could be addressed by doing the signaling in some other way
with the existing context field.  But it's still the case that you're
trying to band-aid a bad design.  There's no good reason to make the sfunc
do extra work to be invertible in contexts where we know, with certainty,
that that work is useless.  Especially not when we know that even a few
instructions of extra work can be performance-significant.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr11, 2014, at 01:30 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> As for evidence - have you looked at the patch I posted? I'd be very
>> interested to know if it removes the performance differences you saw.
> 
> (1) You can't really prove the absence of a performance issue by showing
> that one specific aggregate doesn't have an issue.

I'm claiming that SUM(int4) is about as simple as it gets, so if the
effect can be mitigated there, it can be mitigated everywhere. The more
complex a forward-only transition function is, the less will and added if
or two hurt.

> (2) These results
> (mine as well as yours) seem mighty platform-dependent, so the fact you
> don't see an issue doesn't mean someone else won't.

Yeah, though *any* change - mine, the one your propose, and any other -
has the potential to hurt some platform due to weird interactions (say,
cache trashing). 

> (3) A new
> FunctionCallInfoData field just for this?  Surely not.  There's got to be
> a distributed cost to that (although I notice you didn't bother
> initializing the field most places, which is cheating).

I think the field doesn't actually increase the size of the structure at
all - at least not if the bool before it has size 1 and the short following
it is 2-byte aligned. Or at least that was why I made it a char, and added
it at the place I did. But I might be missing something...

Also, InitFunctionCallInfoData *does* initialize the flags to zero. Though
maybe not everybody uses that - I didn't check, this was just a quick hack.

> Now point 3 could be addressed by doing the signaling in some other way
> with the existing context field.  But it's still the case that you're
> trying to band-aid a bad design.  There's no good reason to make the sfunc
> do extra work to be invertible in contexts where we know, with certainty,
> that that work is useless.

This is the principal point where we disagree, I think. From my POV, the
problem isn't invertibility here at all. Heck, SUM(int4) wouldn't need
*any* extra state at all to be invertible, if it weren't for those pesky
issues surrounding NULL handling. In fact, an earlier version of the
invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The
only reason this doesn't work nowadays is that Dean didn't like the
forward-nonstrict-but-inverse-strict special case that made this work.

The way I see it, the main problem is the drop in performance that comes
from using a pass-by-ref state type. Which IMHO happens because this
*already* is a heavily band-aided design - all the state validation and
"if (AggCheckCallContext(,NULL))" stuff really works around the fact that
transition functions *aren't* supposed to be user-called, yet they are,
and must deal.

Which is why I feel that having two separate sets of transition functions
and state types solves the wrong problem. If we find a way to prevent
transition functions from being called directly, we'll shave a few cycles
of quite a few existing aggregates, invertible or not. If we find a way
around the initfunc-for-internal-statetype problem you discovered, the
implementation would get much simpler, since we could then make nearly
all of them strict. And this would again shave off a few cycles - for lots
of NULL inputs, the effect could even be large.

Compared to that, the benefit of having a completely separate set of
transition functions and state types for the invertible case is much
less tangible, IMHO.

> Especially not when we know that even a few instructions of extra work
> can be performance-significant.

But where do we draw that line? nodeWindowAgg.c quite certainly wastes
about as many cycles as int4_avg_accum does on various checks that are
unnecessary unless in the non-sliding window case. Do we duplicate those
functions too?

best regards,
Florian Pflug






Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Dean Rasheed
Date:
On 10 April 2014 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> I was imagining that firsttrans would only be passed the first value
>> to be aggregated, not any previous state, and that it would be illegal
>> to specify both an initcond and a firsttrans function.
>
>> The forward transition function would only be called for values after
>> the first, by which point the state would be non-null, and so it could
>> be made strict in most cases. The same would apply to the invertible
>> transition functions, so they wouldn't have to do null counting, which
>> in turn would make their state types simpler.
>
> I put together a very fast proof-of-concept patch for this (attached).
> It has a valid execution path for an aggregate with initfunc, but I didn't
> bother writing the CREATE AGGREGATE support yet.  I made sum(int4) work
> as you suggest, marking the transfn strict and ripping out int4_sum's
> internal support for null inputs.  The result seems to be about a 4% or so
> improvement in the overall aggregation speed, for a simple "SELECT
> sum(int4col) FROM table" query.  So from a performance standpoint this
> seems only marginally worth doing.  The real problem is not that 4% isn't
> worth the trouble, it's that AFAICS the only built-in aggregates that
> can benefit are sum(int2) and sum(int4).  So that looks like a rather
> narrow use-case.
>
> You had suggested upthread that we could use this idea to make the
> transition functions strict for aggregates using "internal" transition
> datatypes, but that does not work because the initfunc would violate
> the safety rule that a function returning internal must take at least
> one internal-type argument.  That puts a pretty strong damper on the
> usefulness of the approach, given how many internal-transtype aggregates
> we have (and the moving-aggregate case is not going to be better is it?)
>

Ah, that's disappointing. If it can't be made to work for aggregates
with internal state types, then I think it loses most of it's value,
and I don't think it will be of much more use in the moving aggregate
case either.

> So at this point I'm feeling unexcited about the initfunc idea.
> Unless it does something really good for the moving-aggregate case,
> I think we should drop it.
>

Agreed. Thanks for prototyping it though.

Regards,
Dean



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> Which is why I feel that having two separate sets of transition functions
> and state types solves the wrong problem. If we find a way to prevent
> transition functions from being called directly, we'll shave a few cycles
> of quite a few existing aggregates, invertible or not. If we find a way
> around the initfunc-for-internal-statetype problem you discovered, the
> implementation would get much simpler, since we could then make nearly
> all of them strict. And this would again shave off a few cycles - for lots
> of NULL inputs, the effect could even be large.

Since neither of those latter things seems likely to happen, I don't
find this argument convincing at all.  Even if they did happen, they
would represent an incremental improvement that would be equally useful
with either one or two sfuncs.

Basically, this comes down to a design judgment call, and my judgment
is differing from yours.  In the absence of opinions from others,
I'm going to do it my way.

However, quite independently of how many sfuncs there are, I'm interested
about this:

> ... SUM(int4) wouldn't need
> *any* extra state at all to be invertible, if it weren't for those pesky
> issues surrounding NULL handling. In fact, an earlier version of the
> invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The
> only reason this doesn't work nowadays is that Dean didn't like the
> forward-nonstrict-but-inverse-strict special case that made this work.

Tell me about the special case here again?  Should we revisit the issue?
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr11, 2014, at 17:09 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Basically, this comes down to a design judgment call, and my judgment
> is differing from yours.  In the absence of opinions from others,
> I'm going to do it my way.

Ok. Are you going to do the necessary changes, or shall I? I'm happy to
leave it to you, but if you lack the time I can try to find some.

>> ... SUM(int4) wouldn't need
>> *any* extra state at all to be invertible, if it weren't for those pesky
>> issues surrounding NULL handling. In fact, an earlier version of the
>> invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The
>> only reason this doesn't work nowadays is that Dean didn't like the
>> forward-nonstrict-but-inverse-strict special case that made this work.
>
> Tell me about the special case here again?  Should we revisit the issue?

My original coding allows the combination of non-strict forward with strict
reverse transition functions. The combination behaved like a strict aggregate
regarding NULL handling - i.e., neither the forward nor the reverse transition
function received NULL inputs. But if the initial state was NULL, the forward
transition function *would* be called with that NULL state value upon seeing
the first non-NULL input. In the window case, the aggregation machinery would
also take care to reset the state type to it's initial value when it removed
the last non-NULL input from the aggregation state (just like it does for
strict aggregates today). This had two advantages
 1) First, it allows strict aggregates to use state type internal. As it    stands now, aggregates with state type
internalmust implement their    own NULL handling, even if they behave exactly like most standard    aggregates do,
namelyignore NULLS and return NULL only if there were    no non-NULL inputs.  
 2) It allows strict aggregates whose state type and input type aren't    binary coercible to return NULL if all inputs
wereNULL without any    special coding. As it stands today, this doesn't work without some    kind of counter in the
state,because the final function otherwise    won't know if there were non-NULL inputs or not. Aggregates whose state
and input types are binary coercible get around that by setting the    initial value to NULL while *still* being
strict,and relying on the    state replacement behaviour of the aggregate machinery. 

It, however, also has a few disadvantages
 3) It means that one needs to look at the inverse transition function's    strictness setting even if that function is
neverused.  
 4) It feels a bit hacky.

(2) is what affects SUM(int4). The only reason it track the number of inputs
is to be able to return NULL instead of 0 if no inputs remain.

One idea to fix (3) and (4) was *explicitly* flagging aggregates as
NULL-handling or NULL-ignoring, and also as what one might call
"weakly strict", i.e. as returning NULL exactly if there were no non-NULL
inputs. There are various variations of that theme possible - one flag for
each behaviour, or simply a single "common behaviour" flag. In the end, I
decided not to pursue that, mostly because the aggregates affected by that
issued turned out to be relatively easy to fix. For the ones with state type
internal, I simply added a counter, and I made SUM(int4) use AVG's transition
function.

I don't feel too strongly either way on this one - I initially implemented the
exception because I noticed that the NULL handling of some aggregates was
broken otherwise, and it seemed simpler to fix this in one place than going
over all the aggregates separately. OTOH, when I wrote the docs, I noticed
how hard it was to describe the behaviour accurately, which made me like it
less and less. And Dean wasn't happy with it at all, so that finally settled it.

best regards,
Florian Pflug





Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> On Apr11, 2014, at 17:09 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Basically, this comes down to a design judgment call, and my judgment
>> is differing from yours.  In the absence of opinions from others,
>> I'm going to do it my way.

> Ok. Are you going to do the necessary changes, or shall I? I'm happy to
> leave it to you, but if you lack the time I can try to find some.

Nah, I'm happy to do the work, since it's me insisting on changing it.

>> Tell me about the special case here again?  Should we revisit the issue?

> ...
> I don't feel too strongly either way on this one - I initially implemented the
> exception because I noticed that the NULL handling of some aggregates was
> broken otherwise, and it seemed simpler to fix this in one place than going
> over all the aggregates separately. OTOH, when I wrote the docs, I noticed
> how hard it was to describe the behaviour accurately, which made me like it
> less and less. And Dean wasn't happy with it at all, so that finally settled it.

Yeah, if you can't document the design nicely, it's probably not a good
idea :-(.  Thanks for the summary.

It strikes me that your second point

>   2) It allows strict aggregates whose state type and input type aren't
>      binary coercible to return NULL if all inputs were NULL without any
>      special coding. As it stands today, this doesn't work without some
>      kind of counter in the state, because the final function otherwise
>      won't know if there were non-NULL inputs or not. Aggregates whose state
>      and input types are binary coercible get around that by setting the
>      initial value to NULL while *still* being strict, and relying on the
>      state replacement behaviour of the aggregate machinery.

could be addressed by the initfunc idea, but I'm still not sufficiently
excited by that one.  Given the problem with internal-type transition
values, I think this could only win if there are cases where it would let
us use a regular SQL type instead of internal for the transition value;
and I'm not sure that there are many/any such cases.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr11, 2014, at 19:04 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It strikes me that your second point
>
>>  2) It allows strict aggregates whose state type and input type aren't
>>     binary coercible to return NULL if all inputs were NULL without any
>>     special coding. As it stands today, this doesn't work without some
>>     kind of counter in the state, because the final function otherwise
>>     won't know if there were non-NULL inputs or not. Aggregates whose state
>>     and input types are binary coercible get around that by setting the
>>     initial value to NULL while *still* being strict, and relying on the
>>     state replacement behaviour of the aggregate machinery.
>
> could be addressed by the initfunc idea, but I'm still not sufficiently
> excited by that one.  Given the problem with internal-type transition
> values, I think this could only win if there are cases where it would let
> us use a regular SQL type instead of internal for the transition value;
> and I'm not sure that there are many/any such cases.

Yes, the idea had come up at some point during the review discussion. I
agree that it's only worthwhile if it works for state type internal - though
I think there ought to be a way to allow that. We could for example simply
decree that the initfunc's first parameter must be of type internal if it's
return type is, and then just pass NULL for that parameter.

What I like about the initfunc idea is that it also naturally extends to
ordered-set aggregates, I think it'd be very useful for some possible
ordered-set aggregates to received their direct arguments beforehand and not
afterwards.

But that all seems largely orthogonal to the invtrans patch.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 10 April 2014 19:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So if we go with that terminology, perhaps these names for the
>> new CREATE AGGREGATE parameters:
>> 
>> initfunc        applies to plain aggregation, mutually exclusive with initcond
>> msfunc          (or just mfunc?) forward transition for moving-agg mode
>> mifunc          inverse transition for moving-agg mode
>> mstype          state datatype for moving-agg mode
>> msspace         space estimate for mstype
>> mfinalfunc      final function for moving-agg mode
>> minitfunc       "firsttrans" for moving-agg mode
>> minitcond       mutually exclusive with minitfunc

> Yeah, those work for me.

> I think I prefer "mfunc" to "msfunc", but perhaps that's just my
> natural aversion to the "ms" prefix :-)

Meh.  We've got mstype, and I don't think leaving out the "s" there
feels right.

> Also, perhaps "minvfunc" rather than "mifunc" because "i" by itself
> could mean "initial".

Good point.  So with initfuncs out of the picture, we have
new CREATE AGGREGATE parameter names

msfunc        forward transition for moving-agg mode
minvfunc    inverse transition for moving-agg mode
mfinalfunc    final function for moving-agg mode
mstype        state datatype for moving-agg mode
msspace        space estimate for mstype
minitcond    initial state value for moving-agg mode

and new pg_aggregate columns

aggmtransfn       | regproc  | not null
aggminvtransfn    | regproc  | not null
aggmfinalfn       | regproc  | not null
aggmtranstype     | oid      | not null
aggmtransspace    | integer  | not null
aggminitval       | text     | 

It's a bit unfortunate that the catalog column names aren't quite on
the same page as CREATE AGGREGATE, but it doesn't seem like a good
idea to try to fix that now.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> Yes, the idea had come up at some point during the review discussion. I
> agree that it's only worthwhile if it works for state type internal - though
> I think there ought to be a way to allow that. We could for example simply
> decree that the initfunc's first parameter must be of type internal if it's
> return type is, and then just pass NULL for that parameter.

I had thought about that, but it doesn't really work since it'd be
violating the strictness spec of the function.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr11, 2014, at 19:42 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> Yes, the idea had come up at some point during the review discussion. I
>> agree that it's only worthwhile if it works for state type internal - though
>> I think there ought to be a way to allow that. We could for example simply
>> decree that the initfunc's first parameter must be of type internal if it's
>> return type is, and then just pass NULL for that parameter.
>
> I had thought about that, but it doesn't really work since it'd be
> violating the strictness spec of the function.

Only if we insist on passing SQL NULL, not if we passed an non-NULL value
that happens to be (char*)0.

Or we could simply require the initfunc to be non-strict in the case of
state type internal.

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> OK, I'm marking this ready for committer attention, on the
> understanding that that doesn't include the invtrans_minmax patch.

I've finished reviewing/revising/committing this submission.
Some notes:

* I left out the EXPLAIN ANALYZE statistics, as I still feel that they're
of little if any use to regular users, and neither very well defined nor
well documented.  We can revisit that later of course.

* I also left out the table documenting which aggregates have this
optimization.  That's not the kind of thing we ordinarily document,
and it seems inevitable to me that such a table would be noteworthy
mostly for wrong/incomplete/obsolete information in the future.

* I rejected the invtrans_collecting sub-patch altogether.  I didn't
like anything about the idea of adding a poorly-documented field to
ArrayBuildState and then teaching some random subset of the functions
using that struct what to do with it.  I think it would've been simpler,
more reliable, and not that much less efficient to just do memmove's in
shiftArrayResult.  The string-aggregate changes were at least more
localized, but they still seemed awfully messy.  And there's a bigger
issue: these aggregates have to do O(N) work per row for a frame of length
N anyway, so it's not clear to me that there's any big-O gain to be had
from making them into moving aggregates.  I doubt people are going to be
using these aggregates with very wide frames, just because they're going
to be horribly expensive no matter what we do.

Anyway, this is nice forward progress for 9.4, even if we get no further.
        regards, tom lane



Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Florian Pflug
Date:
On Apr13, 2014, at 20:23 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> OK, I'm marking this ready for committer attention, on the
>> understanding that that doesn't include the invtrans_minmax patch.
> 
> I've finished reviewing/revising/committing this submission.

Cool! Thanks!

> Some notes:
> 
> * I left out the EXPLAIN ANALYZE statistics, as I still feel that they're
> of little if any use to regular users, and neither very well defined nor
> well documented.  We can revisit that later of course.
> 
> * I also left out the table documenting which aggregates have this
> optimization.  That's not the kind of thing we ordinarily document,
> and it seems inevitable to me that such a table would be noteworthy
> mostly for wrong/incomplete/obsolete information in the future.

I can live with each leaving each of these out individually, but leaving
them both out means there's now no way of determining how a particular
sliding window aggregation will behave. That leaves our users a bit
out in the cold IMHO.

For example, with this patch you'd usually want to write SUM(numeric_value::numeric(n,m))
instead of SUM(numeric_value)::numeric(n,m)
in a sliding window aggregation, and there should be *some* way for
users to figure this out.

We have exhaustive tables of all the functions, operators and aggregates
we support, and maintenance of those seems to work well for the most
part. Why would a table of invertible aggregates be any different? If
it's the non-locality that bothers you - I guess the same information
could be added to the descriptions of the individual aggregates, instead
of having one big table.

> * I rejected the invtrans_collecting sub-patch altogether.  I didn't
> like anything about the idea of adding a poorly-documented field to
> ArrayBuildState and then teaching some random subset of the functions
> using that struct what to do with it.  I think it would've been simpler,
> more reliable, and not that much less efficient to just do memmove's in
> shiftArrayResult. The string-aggregate changes were at least more
> localized, but they still seemed awfully messy.  And there's a bigger
> issue: these aggregates have to do O(N) work per row for a frame of length
> N anyway, so it's not clear to me that there's any big-O gain to be had
> from making them into moving aggregates.

Yeah, those probably aren't super-usefull. I initially added array_agg
because with the nonstrict-forward/strict-reverse rule still in place,
there wouldn't otherwise have been an in-core aggregate which exercises
the non-strict case, which seemed like a bad idea. For the string case -
I didn't expect that to turn out to be *quite* this messy when I started
implementing it.

> Anyway, this is nice forward progress for 9.4, even if we get no further.

Yup! Thanks to everybody who made this happens!

best regards,
Florian Pflug




Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
"David Rowley"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 14 April 2014 06:23
> To: Dean Rasheed
> Cc: Florian Pflug; David Rowley; Kevin Grittner; Josh Berkus; Greg Stark;
> PostgreSQL-development
> Subject: Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions
> (WIP)
> 
> Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> > OK, I'm marking this ready for committer attention, on the
> > understanding that that doesn't include the invtrans_minmax patch.
> 
> I've finished reviewing/revising/committing this submission.

That's great news!
Tom, Thanks for taking the time to make the all modifications and commit
this.

Dean, Thank you for all your hard work reviewing the patch. The detail of
the review was quite impressive.

I'm really happy to see this make it in for 9.4.

Regards

David Rowley





Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From
Robert Haas
Date:
On Sun, Apr 13, 2014 at 2:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> * I also left out the table documenting which aggregates have this
> optimization.  That's not the kind of thing we ordinarily document,
> and it seems inevitable to me that such a table would be noteworthy
> mostly for wrong/incomplete/obsolete information in the future.

I tend to think that not documenting such things is an error.  Sure,
the documentation could become obsolete, but I don't see why it's
particularly more likely with this table than anywhere else, and if it
does happen, and people care, they'll submit patches to fix it.  More
to the point, when we don't document things like this, it doesn't
cause the knowledge not to be important to end-users; it just means
that the knowledge lives on wiki pages, support fora, and the minds of
people who are "in the know" rather than being easily and generally
accessible.  I'd rather have documentation on this topic that was,
say, 80% correct than have none at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company