Thread: requesting features in PostgreSQL

requesting features in PostgreSQL

From

Andrew Gould

Date:

26 April 2002, 08:54:29

I would like to request a new aggregate function for
PostgreSQL.

I need an aggregate function that calculates geometric
mean, and can be used in SQL statements the same as
any other aggregate function such as count(), sum()
and avg().  Rather than have a custom function, I'd
like to see it added as a standard PostgreSQL feature.

Is there a specific PostgreSQL developer that I should
contact?  Or is an open request on this list
sufficient for consideration?

The geometric mean function, I'll call it gmean(), is
similar to avg(), except that instead of adding the
individual values, you would multiply them; and
instead of dividing the sum by the sample size, you
would raise the resulting product by a power of (1 /
sample size).

Therefore, the gmean() of the values 1, 2, 3 and 9
would equal:
  = (1 * 2 * 3 * 9) ^ (1 / 4)
  = 54 ^ 0.25
  = 2.710806  # rounded to 6 decimal places

This function differs from arithmetic mean (average)
in that it lessens the affect of outliers without
discounting them altogether.

Thanks,

Andrew Gould

__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

Re: requesting features in PostgreSQL

From

Tom Lane

Date:

26 April 2002, 11:19:05

Andrew Gould <andrewgould@yahoo.com> writes:
> I need an aggregate function that calculates geometric
> mean, and can be used in SQL statements the same as
> any other aggregate function such as count(), sum()
> and avg().  Rather than have a custom function, I'd
> like to see it added as a standard PostgreSQL feature.

One request does not strike me as sufficient reason to make it a
standard feature.  Extensibility is what Postgres is all about
--- so go ahead and write your own.

AFAICS this should take about ten minutes to prototype (two simple
plpgsql or pltcl functions and a user-defined aggregate).  If you intend
to process very large volumes of data, it might be worth rewriting the
transition function in C for speed.  (You could steal the existing
transition function for avg() as a model.)

            regards, tom lane

Re: requesting features in PostgreSQL

From

Andrew Gould

Date:

26 April 2002, 12:49:25

One request is certainly not sufficient for a new
standard feature.  As I see more industries becoming
data savvy, however, I think it's time will come.
Until then....

Simple python scripts are the extent of my coding
experience.  As I will be processing 2 to 6 million
records within one query, I guess I should learn some
C.

Once I untar PostgreSQL's source, where can I find the
code for avg()?  Will it have it's own file?

Thanks,

Andrew

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Gould <andrewgould@yahoo.com> writes:
> > I need an aggregate function that calculates
> geometric
> > mean, and can be used in SQL statements the same
> as
> > any other aggregate function such as count(),
> sum()
> > and avg().  Rather than have a custom function,
> I'd
> > like to see it added as a standard PostgreSQL
> feature.
>
> One request does not strike me as sufficient reason
> to make it a
> standard feature.  Extensibility is what Postgres is
> all about
> --- so go ahead and write your own.
>
> AFAICS this should take about ten minutes to
> prototype (two simple
> plpgsql or pltcl functions and a user-defined
> aggregate).  If you intend
> to process very large volumes of data, it might be
> worth rewriting the
> transition function in C for speed.  (You could
> steal the existing
> transition function for avg() as a model.)
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the
> unregister command
>     (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)

__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

Re: requesting features in PostgreSQL

From

Tom Lane

Date:

26 April 2002, 14:30:07

Andrew Gould <andrewgould@yahoo.com> writes:
> Once I untar PostgreSQL's source, where can I find the
> code for avg()?  Will it have it's own file?

The C code is in src/backend/utils/adt/float.c --- look at
float8_accum() and float8_avg().

            regards, tom lane

Re: requesting features in PostgreSQL

From

Jeff Davis

Date:

27 April 2002, 04:19:14

I decided to use your problem as a little exercize for myself, so I came up
with a working aggregate for gmean. I posted the code below. I wrote it in
plpgsql. It's possible the "a1" function (the main part of the aggregate)
won't be the bottleneck for performance. You could probably translate to C,
and you can probably solve the problem more gracefully than I, but I have
working code. I couldn't think of what else to use as a state type, so I just
used a two-element array of floats. The first element holds the running
product (i.e. 1*2*3*9) and the second holds the number of records visited (4).

I would appreciate it if you (or anyone else) would post (or direct email)
what changes you make for efficiency, including if you write a c function for
a1 (a2 doesn't really need a C func unless you really want, since it should
only be called once per aggregation). I could probably help you translate to
C if you'd like, but first I'd like to make sure I have the most efficient
algorithm.

Regards,
    Jeff

______________________________________
create function a1f(float[2]) returns float as '
BEGIN
return ($1[1]^(1/($1[2])));
END;
' language 'plpgsql';

create function a1(float[2],float) returns float[2] as '
DECLARE
ret float[2];
BEGIN
ret := ''{'' || (($1[1]) * ($2)) || '','' || (($1[2]) + 1) || ''}'';
RETURN ret;
END;
' language 'plpgsql';

create aggregate a2
(basetype=float,sfunc=a1,stype=float[],finalfunc=a1f,initcond='{1.0,0.0}');

Re: requesting features in PostgreSQL

From

Tom Lane

Date:

27 April 2002, 11:39:26

Jeff Davis <list-pgsql-general@empires.org> writes:
> create function a1(float[2],float) returns float[2] as '
> DECLARE
> ret float[2];
> BEGIN
> ret := ''{'' || (($1[1]) * ($2)) || '','' || (($1[2]) + 1) || ''}'';
> RETURN ret;
> END;
> ' language 'plpgsql';

We really need better support for arrays in plpgsql :-(.  The above will
work, but it invokes conversion of floats to text and back again on
every call; that's slow and will probably cause accumulation of roundoff
errors in the aggregate result.

I tried to do this:

create function a1(float[2],float) returns float[2] as '
declare
ret float[2];
begin
ret := $1;
ret[1] := ret[1] * $2;
ret[2] := ret[2] + 1;
return ret;
end' language plpgsql;

but it failed with syntax errors --- plpgsql doesn't understand the
notation "var[subscript] := something".  Someone oughta dig into it
and fix that.

In the meantime I think the most practical way to do this task in
plpgsql is to abuse the built-in "point" type, which can be treated
as an array of 2 floats:

regression=# create function a1(point,float) returns point as '
regression'# begin
regression'# return point($1[0] * $2, $1[1] + 1);
regression'# end' language plpgsql;
CREATE
regression=# select a1('(2,3)'::point, 44);
   a1
--------
 (88,4)
(1 row)

Note that the subscripts are [0],[1] not [1],[2] ... a bit of legacy
incompatibility ...

            regards, tom lane

Re: requesting features in PostgreSQL

From

Jeff Davis

Date:

27 April 2002, 14:43:35

Yup, I tried the same exact thing :) I wouldn't mind seeing some better
support in plpgsql either. However, anyone can create their own type so I
guess it isn't much of a problem.

I updated my code to use points instead, and more meaningful identifiers.

-----------------------------------------------------------
create function float_gmean(point) returns float as '
BEGIN
return ($1[0]^(1/($1[1])));
END;
' language 'plpgsql';

create function float_gmean_accum(point,float) returns point as '
BEGIN
RETURN point(($1[0]*$2),($1[1]+1));
END;
' language 'plpgsql';

create aggregate gmean
(basetype=float,sfunc=float_gmean_accum,stype=point,finalfunc=float_gmean,initcond='(1.0,0.0)');
-----------------------------------------------------------


> I tried to do this:
>
> create function a1(float[2],float) returns float[2] as '
> declare
> ret float[2];
> begin
> ret := $1;
> ret[1] := ret[1] * $2;
> ret[2] := ret[2] + 1;
> return ret;
> end' language plpgsql;
>
> but it failed with syntax errors --- plpgsql doesn't understand the
> notation "var[subscript] := something".  Someone oughta dig into it
> and fix that.
>
> In the meantime I think the most practical way to do this task in
> plpgsql is to abuse the built-in "point" type, which can be treated
> as an array of 2 floats:
>
> regression=# create function a1(point,float) returns point as '
> regression'# begin
> regression'# return point($1[0] * $2, $1[1] + 1);
> regression'# end' language plpgsql;
> CREATE
> regression=# select a1('(2,3)'::point, 44);
>    a1
> --------
>  (88,4)
> (1 row)
>
> Note that the subscripts are [0],[1] not [1],[2] ... a bit of legacy
> incompatibility ...
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: requesting features in PostgreSQL

From

Jeff Davis

Date:

27 April 2002, 14:59:51

I was looking at the code. Why do you calculate "sumX2" in float8_accum?
Later you have a comment "ignore sumX2" in float8_avg().

Regards,
    Jeff

On Friday 26 April 2002 11:30 am, Tom Lane wrote:
> Andrew Gould <andrewgould@yahoo.com> writes:
> > Once I untar PostgreSQL's source, where can I find the
> > code for avg()?  Will it have it's own file?
>
> The C code is in src/backend/utils/adt/float.c --- look at
> float8_accum() and float8_avg().
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: requesting features in PostgreSQL

From

Tom Lane

Date:

27 April 2002, 17:38:13

Jeff Davis <list-pgsql-general@empires.org> writes:
> I was looking at the code. Why do you calculate "sumX2" in float8_accum?

Because the same accumulator function is also used for stddev.

            regards, tom lane

Re: requesting features in PostgreSQL

From

Andrew Gould

Date:

27 April 2002, 18:24:51

I want to thank you both for your help on this.  I
read the C source and the plpgsql below.  I think I'm
understanding what I'm seeing.

I will be out of town for several days; but will try
out the plpsql solution next week.  (If my luck in
airports holds true, I'll have plenty of time to read
the documentation.)

Thanks again,

Andrew

--- Jeff Davis <list-pgsql-general@empires.org> wrote:
> Yup, I tried the same exact thing :) I wouldn't mind
> seeing some better
> support in plpgsql either. However, anyone can
> create their own type so I
> guess it isn't much of a problem.
>
> I updated my code to use points instead, and more
> meaningful identifiers.
>
>
-----------------------------------------------------------
> create function float_gmean(point) returns float as
> '
> BEGIN
> return ($1[0]^(1/($1[1])));
> END;
> ' language 'plpgsql';
>
> create function float_gmean_accum(point,float)
> returns point as '
> BEGIN
> RETURN point(($1[0]*$2),($1[1]+1));
> END;
> ' language 'plpgsql';
>
> create aggregate gmean
>
(basetype=float,sfunc=float_gmean_accum,stype=point,finalfunc=float_gmean,initcond='(1.0,0.0)');
>
-----------------------------------------------------------
>
>
> > I tried to do this:
> >
> > create function a1(float[2],float) returns
> float[2] as '
> > declare
> > ret float[2];
> > begin
> > ret := $1;
> > ret[1] := ret[1] * $2;
> > ret[2] := ret[2] + 1;
> > return ret;
> > end' language plpgsql;
> >
> > but it failed with syntax errors --- plpgsql
> doesn't understand the
> > notation "var[subscript] := something".  Someone
> oughta dig into it
> > and fix that.
> >
> > In the meantime I think the most practical way to
> do this task in
> > plpgsql is to abuse the built-in "point" type,
> which can be treated
> > as an array of 2 floats:
> >
> > regression=# create function a1(point,float)
> returns point as '
> > regression'# begin
> > regression'# return point($1[0] * $2, $1[1] + 1);
> > regression'# end' language plpgsql;
> > CREATE
> > regression=# select a1('(2,3)'::point, 44);
> >    a1
> > --------
> >  (88,4)
> > (1 row)
> >
> > Note that the subscripts are [0],[1] not [1],[2]
> ... a bit of legacy
> > incompatibility ...
> >
> >             regards, tom lane
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster


__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com