Re: Optimizing sum() operations - Mailing list pgsql-novice

From Harold A. Giménez Ch.
Subject Re: Optimizing sum() operations
Date
Msg-id c807ef1a0810031810y2f802b71s7e19e2f8cdc67463@mail.gmail.com
Whole thread Raw
In response to Re: Optimizing sum() operations  ("Dobes Vandermeer" <dobesv@gmail.com>)
List pgsql-novice
Your best bet is probably to do some bookkeeping on a different table, and keep it up to date with the aggregations you will require (ie: cache it). Maintain this table on inserts to the main table via triggers, or whichever mechanism inserts the data in the first place such as a web app, a script, etc.

This would definitely scale, although the indexing strategy proposed earlier seems exotic ;)

On Fri, Oct 3, 2008 at 6:38 PM, Dobes Vandermeer <dobesv@gmail.com> wrote:
On Fri, Oct 3, 2008 at 12:23 PM, Sean Davis <sdavis2@mail.nih.gov> wrote:
> On Fri, Oct 3, 2008 at 3:13 PM, Dobes Vandermeer <dobesv@gmail.com> wrote:
>> On Fri, Oct 3, 2008 at 4:51 AM, Sean Davis <sdavis2@mail.nih.gov> wrote:
>>> On Fri, Oct 3, 2008 at 4:51 AM, Dobes Vandermeer <dobesv@gmail.com> wrote:
>>>> I'm currently using sum() to compute historical values in reports;
>>>> basically select sum(amount) on records where date <= '...' and date
>>>>>= '...' who = X.
>>>>
>>>> Second, if this is a concern, is there a best practice for optimizing
>>>> these kinds of queries?
>>>
>>> You'll need to test to see what performance you get.  That said,
>>> indexing is a good place to start.  You can always run explain and
>>> explain analyze on the queries to double-check the planner.
>>
>> Could I create an index that includes a sum() function - like:
>>
>> create index sumidx on records (who, date, sum(amount)) ?
>>
>> I'm sure that theoretically this is possible, but does postgres support it?
>
> I'm not sure what you want to do.  Trying to make an index on a sum()
> doesn't make any sense because the sum() depends on the rows used in a
> query; i.e., sum() is an aggregate and cannot be used in an index.

Well, if you think about the structure of a B-Tree, each page of the
index contains a list of references to other subtrees, and the key
value ranges for those subtrees.  In the case of sum() you could cache
the sum total of that column for an entire subtree and, if the range
of the query includes the whole subtree, you could skip descending
into the subtree and take the sum straight from that page in the
index.

This is a just a general theory that occurred to me, it's probably a
pretty specialized kind of indexing that isn't supported by any RDBMS,
but it's possible there could be a postgres plugin which did this,
though.  Has anyone heard of something like that?

> What is wrong with an index on who and date and then doing the sum?

I think that if there are a lot of rows that match the query, it'll
take a long time, so I thought I'd start inquiring about whether
anyone has a good algorithm for accelerating these kinds of queries.

--

Dobes Vandermeer
Director, Habitsoft Inc.
dobesv@habitsoft.com
778-891-2922

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

pgsql-novice by date:

Previous
From: "Dobes Vandermeer"
Date:
Subject: Re: Optimizing sum() operations
Next
From: "Nikita Koselev"
Date:
Subject: array of composite types - how can I launch this function with an array of composite values