Thread: aggregate question
I posted the questions below a few days ago to the SQL list, but apparently there is nobody there who can help -- maybe I'm luckier in this list? :-) ---------------------------------------------------------------------- I'm a novice to postgresql, so please excuse me if I'm addressing to the wrong mailing list! I need to specify some column functions for statistical analysis, e.g. standard deviation or variance. After reading the user's guide I think that's to realize with aggregates, but with those docs only I cannot figure out how to do it -- does anybody know whether there is a ready-for-use "statistics" package, or how to write something that's usable in a SQL statement like this: select date, avg(temperature), stddev(temperature) from temperatures where date between '01.01.1999' and '15.02.1999' group by date ? Other question: I'm used to use DB2/2, and DB2/2 has functions like "month(some_date)" and "hour(some_timestamp)". Are there equivalents in postgresql? Many thanks in advance for help + patience! Ulf -- ====================================================================== %%%%% Ulf Mehlig <ulf.mehlig@uni-bremen.de> %%%%!%%% Projekt "MADAM" <umehlig@uni-bremen.de> %%%% %!% %%%% ---------------------------------------------------- ---| %%% MADAM: MAngrove | Center for Tropical Marine ||--%!% Dynamics | Biology || And | Fahrenheitstrasse 1 _ /||\_/\_ Management | / / \ \ ~~~~~~~~~~~~~~~~~ | 28359 Bremen/Germany ~~~~~~~~~~~~~~~~~~~~
Hello Ulf, mercoledì, 21 ottobre 98, you wrote: UM> I posted the questions below a few days ago to the SQL list, but UM> apparently there is nobody there who can help -- maybe I'm luckier in UM> this list? :-) UM> ---------------------------------------------------------------------- UM> I'm a novice to postgresql, so please excuse me if I'm addressing to UM> the wrong mailing list! UM> I need to specify some column functions for statistical analysis, UM> e.g. standard deviation or variance. After reading the user's guide I UM> think that's to realize with aggregates, but with those docs only I UM> cannot figure out how to do it -- does anybody know whether there is a UM> ready-for-use "statistics" package, or how to write something that's UM> usable in a SQL statement like this: UM> select date, avg(temperature), stddev(temperature) UM> from temperatures ^^^^^^ UM> where date between '01.01.1999' and '15.02.1999' UM> group by date ? This query is OK, except for STDDEV function, you have to create this function, I think we don't have it on PostgreSQL. UM> Other question: I'm used to use DB2/2, and DB2/2 has functions like UM> "month(some_date)" and "hour(some_timestamp)". Are there equivalents UM> in postgresql? There's the SQL92 EXTRACT function... look, this is the actual date and time: prova=> select current_timestamp as today; today ---------------------- 1998-10-21 17:03:16+02 (1 row) How to display the month: prova=> select extract(month from current_date) as month; month ----- 10 (1 row) Unfortunately, for now, the EXTRACT function works only with DATE, DATETIME and TIMESPAN data types. You have to CAST a TIMESTAMP or a TIME as DATETIME to extract a field from it as in: prova=> select extract(hour from cast(current_timestamp as datetime)) as hour; hour ---- 16 (1 row) Jose'
Hello out there, many thanks for everybody helping with extracting month information and so on from date columns. Realizing that apparently a package with simple statistical functions/aggregates like standard deviation doesn't exist, I have to write one on my own. But, as stated earlier, I have problems understanding the docs (i.e., the manual section "Extending SQL: Aggregates"). This chapter is a little bit laconic (and others are, too), and at least for the newcomers it is sometimes difficult to guess the overall definition principle out of the three examples using three unexplained for two explained keywords or functions. I have the suspection that I will spend some hours in front of the screen before I get a useful do-it-yourself-solution -- if anybody knows something about more fool-proof documentation texts, or could send me her or his own aggregate-defining-statements, I would be very happy ;-) Thanks in advance, Ulf -- ====================================================================== %%%%% Ulf Mehlig <ulf.mehlig@uni-bremen.de> %%%%!%%% Projekt "MADAM" <umehlig@uni-bremen.de> %%%% %!% %%%% ---------------------------------------------------- ---| %%% MADAM: MAngrove | Center for Tropical Marine ||--%!% Dynamics | Biology || And | Fahrenheitstrasse 1 _ /||\_/\_ Management | / / \ \ ~~~~~~~~~~~~~~~~~ | 28359 Bremen/Germany ~~~~~~~~~~~~~~~~~~~~
Ulf, As a partial answer to your question about statistical functions and std. deviation in particular, I have pulled out an example query that I have used in the past as an example: select lab_no, prep_no, avg(hai), max(hai), min(hai), |/((sum(hai^2)-(count(hai)::float8)*(avg(hai)^2))/count(hai)::float8) as stdhai, count(hai), peak_values from hyd_ratios group by peak_values, lab_no, prep_no; The expression on the second line of the query returns the standard deviation of my data "hai" (which is float8) in this case. Note that count returns an integer that must be cast as float8. One could retrieve the variance in a similar way. These are awkward to type, but I usually keep queries in text files that I send to psql on a command line. Of course, a nice set of statistical functions would be welcome. -- William D. McCoy Geosciences University of Massachusetts Amherst, MA 01003