Thread: Standard Deviation function.

Standard Deviation function.

From
Chris Albertson
Date:
Hello,

I really need a Standard Deviation aggregate function so I will
try to write one.

I know about the man pages for "create aggregate" and "create
function".  Is there something else I should look at?
Just a few pointers could save me a few hours of hunting around.
All advice accepted.

It seems kind of hard to do with only two state functions unless
I "cheat".  I need to keep three values, Count, Sum, and Sum of
Squares.  I could use three static variables and have the final
function ignore its input and use the static vars instead.  This
will likely blow up if the new Standard Deviation aggregate is
used twice in the same select.

Any hints or advice??

If someone has this done already let me know.

I may want do a "median" aggregate function too as I'll need that
later.  This would require private storage and a sort.

Could you cc me at both addresses below as I move around between
them

Thanks,

--
--Chris Albertson

  chrisja@jps.net
  chris@topdog.logicon.com                Voice:  626-351-0089  X127
                                            Fax:  626-351-0699

RE: [HACKERS] Standard Deviation function.

From
"Stupor Genius"
Date:
> I really need a Standard Deviation aggregate function...
>
> I know about the man pages for "create aggregate" and "create
> function".  Is there something else I should look at?
>
> It seems kind of hard to do with only two state functions unless
> I "cheat".  I need to keep three values, Count, Sum, and Sum of
> Squares.
>
> Any hints or advice??

I thought about this a long time ago and had an idea but never
got around to trying to implement it.  I was going to have some
functions that worked on a structure of two doubles to track
the sum and square instead of using only one simple type.

darrenk

Re: [HACKERS] Standard Deviation function.

From
Bruce Momjian
Date:
>
> > I really need a Standard Deviation aggregate function...
> >
> > I know about the man pages for "create aggregate" and "create
> > function".  Is there something else I should look at?
> >
> > It seems kind of hard to do with only two state functions unless
> > I "cheat".  I need to keep three values, Count, Sum, and Sum of
> > Squares.
> >
> > Any hints or advice??
>
> I thought about this a long time ago and had an idea but never
> got around to trying to implement it.  I was going to have some
> functions that worked on a structure of two doubles to track
> the sum and square instead of using only one simple type.

I remember talking about this to someone, and the problem is that you
needed the average WHILE scanning through the table, which required two
passes, which the aggregate system is not designed to do.  I may be
wrong on this, though.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

RE: [HACKERS] Standard Deviation function.

From
"Stupor Genius"
Date:
> > > I really need a Standard Deviation aggregate function...
> >
> > I thought about this a long time ago and had an idea but never
> > got around to trying to implement it.  I was going to have some
> > functions that worked on a structure of two doubles to track
> > the sum and square instead of using only one simple type.
>
> I remember talking about this to someone, and the problem is that you
> needed the average WHILE scanning through the table, which required two
> passes, which the aggregate system is not designed to do.  I may be
> wrong on this, though.

I had asked you how to calculate this and the variance early last
year.  One (I think the variance) was two-pass because of the need
for the average, but I thought the StdDev would work with the struct.

Been a while and I still haven't configured #(*&^ FreeBSD ppp yet.

darrenk

Re: [HACKERS] Standard Deviation function.

From
Bruce Momjian
Date:
> I had asked you how to calculate this and the variance early last
> year.  One (I think the variance) was two-pass because of the need
> for the average, but I thought the StdDev would work with the struct.

Variance is just square of std. dev, no?

>
> Been a while and I still haven't configured #(*&^ FreeBSD ppp yet.

Bummer.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] Standard Deviation function.

From
dg@illustra.com (David Gould)
Date:
>
> > > > I really need a Standard Deviation aggregate function...
> > >
> > > I thought about this a long time ago and had an idea but never
> > > got around to trying to implement it.  I was going to have some
> > > functions that worked on a structure of two doubles to track
> > > the sum and square instead of using only one simple type.
> >
> > I remember talking about this to someone, and the problem is that you
> > needed the average WHILE scanning through the table, which required two
> > passes, which the aggregate system is not designed to do.  I may be
> > wrong on this, though.
>
> I had asked you how to calculate this and the variance early last
> year.  One (I think the variance) was two-pass because of the need
> for the average, but I thought the StdDev would work with the struct.
>
> Been a while and I still haven't configured #(*&^ FreeBSD ppp yet.

The Perl Module "Statistics/Descriptive" has on the fly variance calculation.

sub add_data {
  my $self = shift;  ##Myself
  my $oldmean;
  my ($min,$mindex,$max,$maxdex);

  ##Take care of appending to an existing data set
  $min    = (defined ($self->{min}) ? $self->{min} : $_[0]);
  $max    = (defined ($self->{max}) ? $self->{max} : $_[0]);
  $maxdex = $self->{maxdex} || 0;
  $mindex = $self->{mindex} || 0;

  ##Calculate new mean, pseudo-variance, min and max;
  foreach (@_) {
    $oldmean = $self->{mean};
    $self->{sum} += $_;
    $self->{count}++;
    if ($_ >= $max) {
      $max = $_;
      $maxdex = $self->{count}-1;
    }
    if ($_ <= $min) {
      $min = $_;
      $mindex = $self->{count}-1;
    }
    $self->{mean} += ($_ - $oldmean) / $self->{count};
    $self->{pseudo_variance} += ($_ - $oldmean) * ($_ - $self->{mean});
  }

  $self->{min}          = $min;
  $self->{mindex}       = $mindex;
  $self->{max}          = $max;
  $self->{maxdex}       = $maxdex;
  $self->{sample_range} = $self->{max} - $self->{min};
  if ($self->{count} > 1) {
    $self->{variance}     = $self->{pseudo_variance} / ($self->{count} -1);
    $self->{standard_deviation}  = sqrt( $self->{variance});
  }
  return 1;
}


-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"Don't worry about people stealing your ideas.  If your ideas are any
 good, you'll have to ram them down people's throats." -- Howard Aiken