Thread: aggregate question

aggregate question

From
Ulf Mehlig
Date:
I posted the questions below a few days ago to the SQL list, but
apparently there is nobody there who can help -- maybe I'm luckier in
this list?  :-)

----------------------------------------------------------------------
I'm a novice to postgresql, so please excuse me if I'm addressing to
the wrong mailing list!

I need to specify some column functions for statistical analysis,
e.g. standard deviation or variance. After reading the user's guide I
think that's to realize with aggregates, but with those docs only I
cannot figure out how to do it -- does anybody know whether there is a
ready-for-use "statistics" package, or how to write something that's
usable in a SQL statement like this:

          select date, avg(temperature), stddev(temperature)
          from temperatures
          where date between '01.01.1999' and '15.02.1999'
          group by date                                      ?

Other question: I'm used to use DB2/2, and DB2/2 has functions like
"month(some_date)" and "hour(some_timestamp)". Are there equivalents
in postgresql?

Many thanks in advance for help + patience!
Ulf

--
======================================================================
 %%%%%            Ulf Mehlig              <ulf.mehlig@uni-bremen.de>
   %%%%!%%%       Projekt "MADAM"         <umehlig@uni-bremen.de>
%%%% %!% %%%%     ----------------------------------------------------
 ---| %%%         MADAM:  MAngrove    |  Center for Tropical Marine
    ||--%!%              Dynamics     |  Biology
    ||                  And           |  Fahrenheitstrasse 1
 _ /||\_/\_            Management     |
/  /    \  \ ~~~~~~~~~~~~~~~~~        |  28359 Bremen/Germany
  ~~~~~~~~~~~~~~~~~~~~

Re: [GENERAL] aggregate question

From
Sferacarta Software
Date:
Hello Ulf,

mercoledì, 21 ottobre 98, you wrote:

UM> I posted the questions below a few days ago to the SQL list, but
UM> apparently there is nobody there who can help -- maybe I'm luckier in
UM> this list?  :-)

UM> ----------------------------------------------------------------------
UM> I'm a novice to postgresql, so please excuse me if I'm addressing to
UM> the wrong mailing list!

UM> I need to specify some column functions for statistical analysis,
UM> e.g. standard deviation or variance. After reading the user's guide I
UM> think that's to realize with aggregates, but with those docs only I
UM> cannot figure out how to do it -- does anybody know whether there is a
UM> ready-for-use "statistics" package, or how to write something that's
UM> usable in a SQL statement like this:

UM>           select date, avg(temperature), stddev(temperature)
UM>           from temperatures              ^^^^^^
UM>           where date between '01.01.1999' and '15.02.1999'
UM>           group by date                                      ?
This query is OK, except for STDDEV function, you have to create this
function, I think we don't have it on PostgreSQL.

UM> Other question: I'm used to use DB2/2, and DB2/2 has functions like
UM> "month(some_date)" and "hour(some_timestamp)". Are there equivalents
UM> in postgresql?

There's the SQL92 EXTRACT function...
look, this is the actual date and time:

prova=> select current_timestamp as today;
today
----------------------
1998-10-21 17:03:16+02
(1 row)

How to display the month:
prova=> select extract(month from current_date) as month;
month
-----
   10
(1 row)

Unfortunately, for now, the EXTRACT function works only with DATE, DATETIME
and TIMESPAN data types. You have to CAST a TIMESTAMP or a TIME as DATETIME to
extract a field from it as in:

prova=> select extract(hour from cast(current_timestamp as datetime))
        as hour;
hour
----
  16
(1 row)

Jose'



Re: [GENERAL] aggregate question

From
Ulf Mehlig
Date:
Hello out there,

many thanks for everybody helping with extracting month information
and so on from date columns.

Realizing that apparently a package with simple statistical
functions/aggregates like standard deviation doesn't exist, I have to
write one on my own. But, as stated earlier, I have problems
understanding the docs (i.e., the manual section "Extending SQL:
Aggregates"). This chapter is a little bit laconic (and others are,
too), and at least for the newcomers it is sometimes difficult to
guess the overall definition principle out of the three examples using
three unexplained for two explained keywords or functions.

I have the suspection that I will spend some hours in front of the
screen before I get a useful do-it-yourself-solution -- if anybody
knows something about more fool-proof documentation texts, or could
send me her or his own aggregate-defining-statements, I would be very
happy ;-)

Thanks in advance,
Ulf

--
======================================================================
 %%%%%            Ulf Mehlig              <ulf.mehlig@uni-bremen.de>
   %%%%!%%%       Projekt "MADAM"         <umehlig@uni-bremen.de>
%%%% %!% %%%%     ----------------------------------------------------
 ---| %%%         MADAM:  MAngrove    |  Center for Tropical Marine
    ||--%!%              Dynamics     |  Biology
    ||                  And           |  Fahrenheitstrasse 1
 _ /||\_/\_            Management     |
/  /    \  \ ~~~~~~~~~~~~~~~~~        |  28359 Bremen/Germany
  ~~~~~~~~~~~~~~~~~~~~

Re: [GENERAL] aggregate question

From
"William D. McCoy"
Date:
Ulf,  As a partial answer to your question about statistical functions
and std. deviation in particular, I have pulled out an example query
that I have used in the past as an example:


select lab_no, prep_no, avg(hai), max(hai), min(hai),
|/((sum(hai^2)-(count(hai)::float8)*(avg(hai)^2))/count(hai)::float8)
    as stdhai, count(hai), peak_values
from hyd_ratios
group by peak_values, lab_no, prep_no;


The expression on the second line of the query returns the standard
deviation of my data "hai" (which is float8) in this case.  Note that
count returns an integer that must be cast as float8.  One could
retrieve the variance in a similar way.  These are awkward to type,
but I usually keep queries in text files that I send to psql on a
command line.  Of course, a nice set of statistical functions would be
welcome.

--
William D. McCoy
Geosciences
University of Massachusetts
Amherst, MA  01003