Re: [HACKERS] Standard Deviation function. - Mailing list pgsql-hackers

From dg@illustra.com (David Gould)
Subject Re: [HACKERS] Standard Deviation function.
Date
Msg-id 9806050815.AA04919@hawk.illustra.com
Whole thread Raw
In response to RE: [HACKERS] Standard Deviation function.  ("Stupor Genius" <stuporg@erols.com>)
List pgsql-hackers
>
> > > > I really need a Standard Deviation aggregate function...
> > >
> > > I thought about this a long time ago and had an idea but never
> > > got around to trying to implement it.  I was going to have some
> > > functions that worked on a structure of two doubles to track
> > > the sum and square instead of using only one simple type.
> >
> > I remember talking about this to someone, and the problem is that you
> > needed the average WHILE scanning through the table, which required two
> > passes, which the aggregate system is not designed to do.  I may be
> > wrong on this, though.
>
> I had asked you how to calculate this and the variance early last
> year.  One (I think the variance) was two-pass because of the need
> for the average, but I thought the StdDev would work with the struct.
>
> Been a while and I still haven't configured #(*&^ FreeBSD ppp yet.

The Perl Module "Statistics/Descriptive" has on the fly variance calculation.

sub add_data {
  my $self = shift;  ##Myself
  my $oldmean;
  my ($min,$mindex,$max,$maxdex);

  ##Take care of appending to an existing data set
  $min    = (defined ($self->{min}) ? $self->{min} : $_[0]);
  $max    = (defined ($self->{max}) ? $self->{max} : $_[0]);
  $maxdex = $self->{maxdex} || 0;
  $mindex = $self->{mindex} || 0;

  ##Calculate new mean, pseudo-variance, min and max;
  foreach (@_) {
    $oldmean = $self->{mean};
    $self->{sum} += $_;
    $self->{count}++;
    if ($_ >= $max) {
      $max = $_;
      $maxdex = $self->{count}-1;
    }
    if ($_ <= $min) {
      $min = $_;
      $mindex = $self->{count}-1;
    }
    $self->{mean} += ($_ - $oldmean) / $self->{count};
    $self->{pseudo_variance} += ($_ - $oldmean) * ($_ - $self->{mean});
  }

  $self->{min}          = $min;
  $self->{mindex}       = $mindex;
  $self->{max}          = $max;
  $self->{maxdex}       = $maxdex;
  $self->{sample_range} = $self->{max} - $self->{min};
  if ($self->{count} > 1) {
    $self->{variance}     = $self->{pseudo_variance} / ($self->{count} -1);
    $self->{standard_deviation}  = sqrt( $self->{variance});
  }
  return 1;
}


-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"Don't worry about people stealing your ideas.  If your ideas are any
 good, you'll have to ram them down people's throats." -- Howard Aiken

pgsql-hackers by date:

Previous
From: Andreas Zeugswetter
Date:
Subject: AW: [HACKERS] Standard Deviation function.
Next
From: Maarten Boekhold
Date:
Subject: Re: [GENERAL] Re: [HACKERS] NEW POSTGRESQL LOGOS