Thread: Aggregate query for multiple records

Aggregate query for multiple records

From
Scott Gerhardt
Date:
Hello, I am new to the list, my apology if this question is beyond the
scope or charter of this list.

My questions is:
What is the best method to perform an aggregate query to calculate
sum() values for each distinct wid as in the example below, but except
for all wid's (not just WHERE wid='01/1-6-1-30w1/0').

Also, performance wise, would it be better to build a function for this
query.  The table has 9 million records and these aggregate queries
take hours.


SELECT
   SUM(oil) as sumoil, SUM(hours) as sumhours,
FROM
   (SELECT oil, hours prd_data WHERE wid='01/1-6-1-30w1/0'
    ORDER BY date LIMIT 6) subtable
;


Table description:
               Table "prd_data"
  Column |         Type          | Modifiers
--------+-----------------------+-----------
  date   | integer               |
  hours  | real                  |
  oil    | real                  |
  gas    | real                  |
  water  | real                  |
  pwid   | integer               |
  wid    | character varying(20) |
  year   | smallint              |
Indexes: wid_index6


Actual table (prd_data), 9 million records:

   date  | hours |  oil  | gas  | water | pwid |       wid       | year
--------+-------+-------+------+-------+------+-----------------+------
  196507 |   360 | 159.4 | 11.3 |  40.9 |  413 | 01/1-1-1-31w1/0 | 1965
  196508 |   744 |   280 |   20 |  27.2 |  413 | 01/1-1-1-31w1/0 | 1965
  196509 |   360 | 171.1 | 11.4 |  50.4 |  413 | 01/1-1-1-31w1/0 | 1965
  196510 |   744 | 202.1 |   25 |  89.8 |  413 | 01/1-1-1-31w1/0 | 1965
  196512 |   744 | 201.3 | 23.8 |  71.9 |  413 | 01/1-1-1-31w1/0 | 1965
  196511 |   720 |   184 | 17.6 |  78.9 |  413 | 01/1-1-1-31w1/0 | 1965
  196610 |   744 |  99.8 | 15.4 |  53.7 |  413 | 01/1-1-1-31w1/0 | 1966
  196612 |   744 |    86 | 12.8 |  36.1 |  413 | 01/1-1-1-31w1/0 | 1966
  196611 |   720 |    86 | 12.6 |  41.7 |  413 | 01/1-1-1-31w1/0 | 1966
  196601 |   744 | 191.6 | 22.6 |  50.7 |  413 | 01/1-1-1-31w1/0 | 1966
  200301 |   461 |  68.8 |   0 | 186.3 | 47899 | 9G/6-1-50-24w3/0 | 2003
  200310 |   740 | 446.3 |   0 | 563.1 | 47899 | 9G/6-1-50-24w3/0 | 2003
  200306 |   667 |  92.1 |   0 | 968.8 | 47899 | 9G/6-1-50-24w3/0 | 2003
  200304 |     0 |     0 |   0 |     0 | 47899 | 9G/6-1-50-24w3/0 | 2003
  200308 |   457 | 100.7 |   0 |  82.8 | 47899 | 9G/6-1-50-24w3/0 | 2003
  200307 |   574 |    78 |   0 |   752 | 47899 | 9G/6-1-50-24w3/0 | 2003
  200312 |   582 | 360.9 |   0 |   569 | 47899 | 9G/6-1-50-24w3/0 | 2003
  200311 |   681 | 260.8 |   0 | 563.9 | 47899 | 9G/6-1-50-24w3/0 | 2003
  200305 |   452 |     0 |   0 |     0 | 47899 | 9G/6-1-50-24w3/0 | 2003
  200309 |   637 | 244.6 |   0 | 193.8 | 47899 | 9G/6-1-50-24w3/0 | 2003
(20 rows)



Thanks,

--
Scott A. Gerhardt, P.Geo.
Gerhardt Information Technologies


Re: Aggregate query for multiple records

From
Greg Stark
Date:
Scott Gerhardt <scott@g-it.ca> writes:

> Hello, I am new to the list, my apology if this question is beyond the scope or
> charter of this list.

Not only is this on-charter, but this specific question comes up fairly often.

> My questions is:
> What is the best method to perform an aggregate query to calculate sum() values
> for each distinct wid as in the example below, but except for all wid's (not
> just WHERE wid='01/1-6-1-30w1/0').

This type of "top 6" or in this case "first 6" query is pretty tricky to do in
SQL. In fact the best solution anyone's proposed here uses non-standard
postgres extensions to define an aggregate that keeps an accumulation in an
array.

Something like (but I suppose you need reals, not integers):

test=> create or replace function first_6_accum (integer[], integer) returns integer[]
         language sql immutable as
        'select case when array_upper($1,1)>=6 then $1 else $1||$2 end';

test=> create function sum_6(integer[]) returns integer
         immutable language sql as
        'select $1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]';

test=> create aggregate sum_first_6
         (basetype=integer, sfunc=first_6_accum, stype=integer[],initcond='{}',finalfunc=sum_6);

test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union select 3 union select 4 union
select5 union select 6 union select 7 union select 8) as x order by i asc) as x; 
 sum_first_6
-------------
          21
(1 row)

You'll need to select from a subquery that guarantees the correct ordering.
And then you'll need to do a GROUP BY wid. And then you should be aware that
some versions of postgres didn't always use a sorting method for the group by
that guaranteed the ordering of the subquery was preserved. I think you're
safe in 7.4 but you would have to test it.

--
greg