Re: Aggregate query for multiple records - Mailing list pgsql-sql
From | Greg Stark |
---|---|
Subject | Re: Aggregate query for multiple records |
Date | |
Msg-id | 87eklrlkwh.fsf@stark.xeocode.com Whole thread Raw |
In response to | Re: Aggregate query for multiple records (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Aggregate query for multiple records
|
List | pgsql-sql |
Scott Gerhardt <scott@g-it.ca> writes: > Hello, I am new to the list, my apology if this question is beyond the scope or > charter of this list. > > My questions is: > What is the best method to perform an aggregate query to calculate sum() values > for each distinct wid as in the example below, but except for all wid's (not > just WHERE wid='01/1-6-1-30w1/0'). > > Also, performance wise, would it be better to build a function for this query. > The table has 9 million records and these aggregate queries take hours. The "top n" type query (or in this case "first n" or "last n" but it's the same thing) is actually very tricky to do in standard SQL. The best solution seen here for postgres is to use arrays and custom aggregate functions. The following is based on a previous answer from Tom Lane to a similar question. (I thought I already posted this for you on pgsql-general but the list archives are down and you don't seem to have seen it, so I'm resending it) It allows you to do the whole query with a single sort for the grouping and the ordering by date together. You would have to use it with something like: SELECT sum_first_6(oil) FROM (SELECT oil from prd_data ORDER BY wid, "date") GROUP BY wid If you pump up sort_mem enough -- you can do it within the session for the single connection using "SET sort_mem" it should be pretty fast. I think it's the best you're going to get. If you're absolutely sure the data is physically stored in chronological order -- which I would only feel comfortable with if you've never done any updates or deletes, only inserts and perhaps occasional truncates, then you might be able to get by without ordering and convince it to do a hash aggregate. That would be the optimal result, no sorts at all. But it would be hard to make sure it would always work. test=> create or replace function first_6_accum (integer[], integer) returns integer[] language sql immutable as 'selectcase when array_upper($1,1)>=6 then $1 else $1||$2 end'; CREATE FUNCTION test=> create function sum_6(integer[]) returns integer immutable language sql as 'select $1[1]+$1[2]+$1[3]+$1[4]+$1[5]+$1[6]'; CREATE FUNCTION test=> create aggregate sum_first_6 (basetype=integer, sfunc=first_6_accum, stype=integer[],initcond='{}',finalfunc=sum_6); CREATE AGGREGATE test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union select 3 union select 4 union select5 union select 6 union select 7 union select 8) as x order by i desc) as x;sum_first_6 ------------- 33 (1 row) test=> select sum_first_6(i) from (select i from (select 1 as i union select 2 union select 3 union select 4 union select5 union select 6 union select 7 union select 8) as x order by i asc) as x;sum_first_6 ------------- 21 (1 row) This can easily be switched around to make it "last_6" and you can write functions to handle 6 records or 9 records. And all of these could be combined in a single query, so you only have to do the sort once. Unfortunately you cannot make aggregate functions that take multiple parameters, nor can you pass extra parameters to the state function. So you'll have to create a separate set of functions for each variant. Also, you'll have to change it to use reals. -- greg