Thread: Multi-parameter aggregates.
I'm interested in defining a covariance aggregate function. (As a refresher, remember that covariance is a little bit like variance, but is between two variables: cov(X,Y) = <XY> - <X><Y>, where the angular brackets in this case denote taking the averag. Variance is a special case when X and Y are the same.) But the whole user-defined aggregate thing is tough to get a handle on. I'm not even sure if the direction I'm heading in below will actually work, but as far as I got, I'm stuck on not knowing how to define a aggregate that takes more that one variable as its argument, so its use in SQL would look like, e.g., SELECT company, COVAR(year, sales) FROM annual_sales GROUP BY company; Here is what I tried, and I'm wondering if the team here can help me make this work (or tell me that the whole approach makes no sense, etc.). All the DDL executes without syntactical errors until the last function definition, and the problem is with the " BASETYPE=numeric" line, i.e., "ERROR: AggregateCreate: function covariance_accum(numeric[], numeric) does not exist" CREATE TYPE public._covariance AS (n integer, x numeric, y numeric, xy numeric); CREATE OR REPLACE FUNCTION public.covariance_accum(_covariance, numeric, numeric) RETURNS _covariance AS ' BEGIN _covariance.n := _covariance.n+1; _covariance.x := _covariance.x+$2; _covariance.y := _covariance.x+$3; _covariance.xy:= _covariance.xy+($1*$2); END; 'LANGUAGE 'plpgsql' IMMUTABLE STRICT; GRANT EXECUTE ON FUNCTION public.covariance_accum(_covariance, numeric, numeric) TO public; COMMENT ON FUNCTION public.covariance_accum(_covariance, numeric, numeric) IS 'covariance aggregate transition function'; -- Need to include a check for N equal zero data points CREATE OR REPLACE FUNCTION public.numeric_covariance(_covariance) RETURNS numeric AS ' BEGIN (_covariance.xy/_covariance.n) - (_covariance.x/_covariance.n)*(_covariance.y/_covariance.n); END; 'LANGUAGE 'plpgsql' IMMUTABLE STRICT; GRANT EXECUTE ON FUNCTION public.numeric_covariance(_covariance) TO public; COMMENT ON FUNCTION public.numeric_covariance(_covariance) IS 'covariance aggregate final function'; CREATE AGGREGATE public.covariance( BASETYPE=numeric, SFUNC=covariance_accum, STYPE=numeric[], FINALFUNC=numeric_covariance, INITCOND='{0,0,0,0}' ); /* --I also tried this: CREATE AGGREGATE covariance( BASETYPE='numeric, numeric', SFUNC=covariance_accum, STYPE=numeric[], FINALFUNC=numeric_covariance, INITCOND='{0,0,0,0}' ); -- to no avail. */ Regards, Berend
Attachment
Berend Tober wrote: > I'm stuck on not knowing how to define a > aggregate that takes more that one variable as its argument This is currently unsupported. Joe
Hello Berend, have you considered using pl/r. http://www.joeconway.com/plr/ I think R got a covariance-function. http://www.r-project.org/ Best regards Hakan Kocaman Software-Developer digame.de GmbH Richard-Byrd-Str. 4-8 50829 Köln Tel.: +49 (0) 221 59 68 88 31 Fax: +49 (0) 221 59 68 88 98 Email: hakan.kocaman@digame.de > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Berend Tober > Sent: Monday, November 21, 2005 5:03 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Multi-parameter aggregates. > > > I'm interested in defining a covariance aggregate function. (As a > refresher, remember that covariance is a little bit like > variance, but > is between two variables: > > cov(X,Y) = <XY> - <X><Y>, > > where the angular brackets in this case denote taking the averag. > Variance is a special case when X and Y are the same.) > > But the whole user-defined aggregate thing is tough to get a > handle on. > I'm not even sure if the direction I'm heading in below will actually > work, but as far as I got, I'm stuck on not knowing how to define a > aggregate that takes more that one variable as its argument, > so its use > in SQL would look like, e.g., > > SELECT company, COVAR(year, sales) FROM annual_sales GROUP BY company; > > Here is what I tried, and I'm wondering if the team here can help me > make this work (or tell me that the whole approach makes no sense, > etc.). All the DDL executes without syntactical errors until the last > function definition, and the problem is with the " BASETYPE=numeric" > line, i.e., "ERROR: AggregateCreate: function > covariance_accum(numeric[], numeric) does not exist" > > CREATE TYPE public._covariance AS > (n integer, x numeric, y numeric, xy numeric); > > > CREATE OR REPLACE FUNCTION > public.covariance_accum(_covariance, numeric, > numeric) > RETURNS _covariance AS ' > BEGIN > _covariance.n := _covariance.n+1; > _covariance.x := _covariance.x+$2; > _covariance.y := _covariance.x+$3; > _covariance.xy:= _covariance.xy+($1*$2); > END; > 'LANGUAGE 'plpgsql' IMMUTABLE STRICT; > GRANT EXECUTE ON FUNCTION > public.covariance_accum(_covariance, numeric, > numeric) TO public; > COMMENT ON FUNCTION public.covariance_accum(_covariance, numeric, > numeric) IS 'covariance aggregate transition function'; > > > -- Need to include a check for N equal zero data points > > CREATE OR REPLACE FUNCTION public.numeric_covariance(_covariance) > RETURNS numeric AS ' > BEGIN > (_covariance.xy/_covariance.n) - > (_covariance.x/_covariance.n)*(_covariance.y/_covariance.n); > END; > 'LANGUAGE 'plpgsql' IMMUTABLE STRICT; > GRANT EXECUTE ON FUNCTION > public.numeric_covariance(_covariance) TO public; > COMMENT ON FUNCTION public.numeric_covariance(_covariance) IS > 'covariance aggregate final function'; > > > > CREATE AGGREGATE public.covariance( > BASETYPE=numeric, > SFUNC=covariance_accum, > STYPE=numeric[], > FINALFUNC=numeric_covariance, > INITCOND='{0,0,0,0}' > ); > > > /* > --I also tried this: > > CREATE AGGREGATE covariance( > BASETYPE='numeric, numeric', > SFUNC=covariance_accum, > STYPE=numeric[], > FINALFUNC=numeric_covariance, > INITCOND='{0,0,0,0}' > ); > > -- to no avail. > */ > > Regards, > Berend > > >
Joe Conway wrote: > Berend Tober wrote: > >> I'm stuck on not knowing how to define a aggregate that takes more >> that one variable as its argument But I guess it _could_ take an array as argument, maybe even a record (postgresql pseudonym for what's called a structure in C). You'd use it with the following syntax (array): select covariance(array[x, y]) from t ; or, for a "record"-parameter (I didn't test it - I just guess that it should work...) select covariance((x, y)) from t ; greetings, Florian Pflug
On Mon, Nov 21, 2005 at 11:03:22AM -0500, Berend Tober wrote: > I'm interested in defining a covariance aggregate function. (As a > refresher, remember that covariance is a little bit like variance, but > is between two variables: > > cov(X,Y) = <XY> - <X><Y>, > > where the angular brackets in this case denote taking the averag. > Variance is a special case when X and Y are the same.) > > But the whole user-defined aggregate thing is tough to get a handle on. > I'm not even sure if the direction I'm heading in below will actually > work, but as far as I got, I'm stuck on not knowing how to define a > aggregate that takes more that one variable as its argument, so its use > in SQL would look like, e.g., > > SELECT company, COVAR(year, sales) FROM annual_sales GROUP BY company; I think aggregates must take a single value, so the above won't work as written. However, in PostgreSQL 8.0 or later you could define the aggregate's base type to be a composite type and do something like SELECT company, COVAR(ROW(year, sales)) FROM annual_sales GROUP BY company; You'd create the aggregate like this: CREATE TYPE covar_state AS (...); CREATE TYPE xypair AS (x numeric, y numeric); CREATE FUNCTION covar_accum(covar_state, xypair) RETURNS covar_state AS ... CREATE FUNCTION covar_final(covar_state) RETURNS numeric AS ... CREATE AGGREGATE covar ( BASETYPE = xypair, SFUNC = covar_accum, FINALFUNC = covar_final, STYPE = covar_state, INITCOND = '(...)' ); -- Michael Fuhr
Berend Tober <btober@seaworthysys.com> writes: > I'm stuck on not knowing how to define a > aggregate that takes more that one variable as its argument, That's because there isn't any way to do that. It's on the TODO list I believe. In the meantime, you could possibly kluge it up by defining a composite type to be the aggregate's argument, and calling it like SELECT covariance(row(...)) FROM ... regards, tom lane
Hakan Kocaman wrote: >have you considered using pl/r. >http://www.joeconway.com/plr/ > >I think R got a covariance-function. >http://www.r-project.org/ > > > That would be, like, the easy way. Thanks! Berend
Attachment
Michael Fuhr wrote: >On Mon, Nov 21, 2005 at 11:03:22AM -0500, Berend Tober wrote: > > >>I'm interested in defining a covariance aggregate function. >> >> > >I think aggregates must take a single value, so the above won't >work as written. However, in PostgreSQL 8.0 or later you could >define the aggregate's base type to be a composite type > Thanks. I briefly explored something like that, using the existing POINT data type, but didn't press it too far, pending mailling list advice. Regards, Berend
Attachment
Michael Fuhr wrote: >On Mon, Nov 21, 2005 at 11:03:22AM -0500, Berend Tober wrote: > > >>I'm interested in defining a covariance aggregate function. (As a >> >> >...aggregates must take a single value, so the above won't >work as written. However, in PostgreSQL 8.0 or later you could >define the aggregate's base type to be a composite type ... > > Thanks for you help. Here is what I came up with, using the existing POINT data type, which when you think about it makes a lot of sense, and even thoug having to explicitly cast a pair of columns as a point type is a little cludgy, I'm get the analytical tool I need, regardless. CREATE TYPE public._regression AS ( n integer, x double precision, y double precision, xy double precision ); CREATE OR REPLACE FUNCTION public.weighted_average(double precision, double precision, double precision) RETURNS double precision AS ' BEGIN IF $1>1.0 THEN RAISE EXCEPTION ''Weighted average % coefficient exceeds unity.'', $1; ELSIF $1<0 THEN RAISE EXCEPTION ''Weighted average % coefficient less than zero.'', $1; END IF; RETURN $1*$2 + (1.-$1)*$3; END; 'LANGUAGE 'plpgsql' IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION public.regression_accum(_regression, point) RETURNS _regression AS ' SELECT $1.n+1, weighted_average($1.n/($1.n+1.), $1.x, $2[0]), weighted_average($1.n/($1.n+1.), $1.y, $2[1]), weighted_average($1.n/($1.n+1.), $1.xy, $2[0]*$2[1]); 'LANGUAGE 'sql' IMMUTABLE STRICT; GRANT EXECUTE ON FUNCTION public.regression_accum(_regression, point) TO public; COMMENT ON FUNCTION public.regression_accum(_regression, point) IS 'regression aggregate transition function'; CREATE OR REPLACE FUNCTION public.covariance(_regression) RETURNS double precision AS ' BEGIN IF $1.n = 0 THEN RAISE EXCEPTION ''No covariance data''; END IF; RETURN $1.xy - $1.x*$1.y; END; 'LANGUAGE 'plpgsql' IMMUTABLE STRICT; GRANT EXECUTE ON FUNCTION public.covariance(_regression) TO public; COMMENT ON FUNCTION public.covariance(_regression) IS 'cov aggregate final function'; CREATE AGGREGATE public.cov( BASETYPE=point, SFUNC=regression_accum, STYPE=_regression, FINALFUNC=covariance, INITCOND='(0,0,0,0)' ); -- Examples CREATE LOCAL TEMPORARY TABLE test ( example integer, x numeric, y numeric ) without oids; -- Hand calculated: cov(x,y)=0.25 INSERT INTO test VALUES (1, 0.0, 0.0); INSERT INTO test VALUES (1, 1.0, 1.0); -- Hand calculated: cov(x,x)=0.389, cov(y,y)=1.556, cov(x,y)=0.778 INSERT INTO test VALUES (2, 0.0, 0.0); INSERT INTO test VALUES (2, 1.0, 2.0); INSERT INTO test VALUES (2, 1.5, 3.0); -- Hand calculated: cov(x,x)=0.264, cov(y,y)=0.192, cov(x,y)=0.214 INSERT INTO test VALUES (3, 0.25, 0.125); INSERT INTO test VALUES (3, 1.0, 1.0); INSERT INTO test VALUES (3, 1.5, 1.1); -- http://www.quantlet.com/mdstat/scripts/mva/htmlbook/mvahtmlnode22.html: cov(x,y) = -80.02 INSERT INTO test VALUES (4, 230, 125); INSERT INTO test VALUES (4, 181, 99); INSERT INTO test VALUES (4, 165, 97); INSERT INTO test VALUES (4, 150, 115); INSERT INTO test VALUES (4, 97, 120); INSERT INTO test VALUES (4, 192, 100); INSERT INTO test VALUES (4, 181, 80); INSERT INTO test VALUES (4, 189, 90); INSERT INTO test VALUES (4, 172, 95); INSERT INTO test VALUES (4, 170, 125); /* Note that the value COV(X,X) and COV(Y,Y) do not equal output from the native Postgresql aggregates VARIANCE(x) and VARIANCE(y), respectively. I cannot explain why they are different, but COV() is consistent with direct calculation. */ SELECT 'Example '||example AS title, COUNT(*) AS N, COV(POINT(x,x)) AS covXX, COV(POINT(y,y)) AS covYY, COV(POINT(x,y)) AS covXY, VARIANCE(x), VARIANCE(y) FROM test GROUP BY 1 ORDER BY 1;