Re: [HACKERS] Implementing STDDEV and VARIANCE - Mailing list pgsql-hackers

From Jeroen van Vianen
Subject Re: [HACKERS] Implementing STDDEV and VARIANCE
Date
Msg-id 4.2.2.20000124111157.00aa2530@mail.design.nl
Whole thread Raw
In response to Re: [HACKERS] Implementing STDDEV and VARIANCE  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Implementing STDDEV and VARIANCE
List pgsql-hackers
At 11:27 PM 1/23/00 -0500, Tom Lane wrote:
>Jeroen van Vianen <jeroen@design.nl> writes:
> > aggname
> > aggowner
> > aggtype
> > aggtranstype           [ n, sx, sx2 ]
> > agginitfunction        function that does ( n = 0, sx = 0.0, sx2 = 0.0 )
> > aggtransfunction       function that does ( n = n + 1, sx = sx + x,
> >                          sx2 = sx2 + x * x )
> > aggfinalizefunction    function that returns (sx2 - (1/n) * sx * sx ) /
> > n
>
>Right, that's pretty much what I'm visualizing.  One minor detail: there
>is not an "agginitfunction", there is an "agginitvalue".  So your
>special datatype to hold n/sx/sx2 must have at least a typinput function
>that can convert the text string held in pg_aggregate into the desired
>internal form of the initial state.  (At least, that's how it's done
>now.  Do you want to argue to change it?  As long as we're opening up
>the AGG design for reconsideration, we could revisit that choice too.)

I would suggest supplying an initfunction that initializes the datatype 
that holds n/sx/sx2 so you're able to set individual members to NULL if so 
desired. I also won't need to implement typinput for all required aggregate 
types, one small headache less ;-)

count --> int4
min/max --> basetype
sum --> basetype
avg --> basetype, n
stddev, variance --> n, basetype, basetype

> > Might it be better for me to wait for 7.1 before implementing stddev and
> > variance?
>
>Well, you will need to be pretty fast on the draw if you want to get it
>into 7.0 release, since we will be freezing features for beta in a week.

True.

>But I see no reason that you couldn't implement STDDEV within the
>existing framework; just ignore transfn2 and do it as above.  You might
>have some problems with getting the desired response for zero or one
>tuples, but there isn't any way to fix that within the current
>framework :-(.  We have to do the function manager rewrite before you
>can have control over when to return a NULL.  As long as you are willing
>to live with that, you can have useful functionality now.

The problem with zero or one rows is pretty important IMO if you want to 
implement stddev and variance for both population and sample. You won't be 
able to explain the difference in outcome if you don't do it right.

Let me wait for you to overhaul the fmgr code and do all the aggregate 
stuff right in one sweep. Only thing is: how do we deal with current 
user-defined aggregates?

At 09:02 PM 1/23/00 -0800, Don Baccus wrote:
>At 11:27 PM 1/23/00 -0500, Tom Lane wrote:
> >Right, that's pretty much what I'm visualizing.  One minor detail: there
> >is not an "agginitfunction", there is an "agginitvalue".  So your
> >special datatype to hold n/sx/sx2 must have at least a typinput function
> >that can convert the text string held in pg_aggregate into the desired
> >internal form of the initial state.  (At least, that's how it's done
> >now.  Do you want to argue to change it?  As long as we're opening up
> >the AGG design for reconsideration, we could revisit that choice too.)
>
>At the moment I have a hard time visualizing an aggregate function where
>a constant initializer wouldn't serve, but ... what would be the cost of
>the generalization?  It would only be called once per query or subquery
>containing the aggregate, right?

Initializer functions for count need to return 0, for min, max, avg, sum, 
stddev and variance they need to set individual members to NULL (at least 
that's how I see it now). A function returning this (with the new fmgr 
code) would be very easy to implement (I hope ;-) ).

I'll hold my breath until the dust settles and we're starting 7.1.


Jeroen



pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: [HACKERS] Use of Indicies ...
Next
From: Zeugswetter Andreas SB
Date:
Subject: AW: [HACKERS] Implementing STDDEV and VARIANCE