Re: Optimizing sum() operations - Mailing list pgsql-novice

From Dobes Vandermeer
Subject Re: Optimizing sum() operations
Date
Msg-id 7324d9a20810031538q406daf89g94f7d57375c90c69@mail.gmail.com
Whole thread Raw
In response to Re: Optimizing sum() operations  ("Sean Davis" <sdavis2@mail.nih.gov>)
Responses Re: Optimizing sum() operations
Re: Optimizing sum() operations
List pgsql-novice
On Fri, Oct 3, 2008 at 12:23 PM, Sean Davis <sdavis2@mail.nih.gov> wrote:
> On Fri, Oct 3, 2008 at 3:13 PM, Dobes Vandermeer <dobesv@gmail.com> wrote:
>> On Fri, Oct 3, 2008 at 4:51 AM, Sean Davis <sdavis2@mail.nih.gov> wrote:
>>> On Fri, Oct 3, 2008 at 4:51 AM, Dobes Vandermeer <dobesv@gmail.com> wrote:
>>>> I'm currently using sum() to compute historical values in reports;
>>>> basically select sum(amount) on records where date <= '...' and date
>>>>>= '...' who = X.
>>>>
>>>> Second, if this is a concern, is there a best practice for optimizing
>>>> these kinds of queries?
>>>
>>> You'll need to test to see what performance you get.  That said,
>>> indexing is a good place to start.  You can always run explain and
>>> explain analyze on the queries to double-check the planner.
>>
>> Could I create an index that includes a sum() function - like:
>>
>> create index sumidx on records (who, date, sum(amount)) ?
>>
>> I'm sure that theoretically this is possible, but does postgres support it?
>
> I'm not sure what you want to do.  Trying to make an index on a sum()
> doesn't make any sense because the sum() depends on the rows used in a
> query; i.e., sum() is an aggregate and cannot be used in an index.

Well, if you think about the structure of a B-Tree, each page of the
index contains a list of references to other subtrees, and the key
value ranges for those subtrees.  In the case of sum() you could cache
the sum total of that column for an entire subtree and, if the range
of the query includes the whole subtree, you could skip descending
into the subtree and take the sum straight from that page in the
index.

This is a just a general theory that occurred to me, it's probably a
pretty specialized kind of indexing that isn't supported by any RDBMS,
but it's possible there could be a postgres plugin which did this,
though.  Has anyone heard of something like that?

> What is wrong with an index on who and date and then doing the sum?

I think that if there are a lot of rows that match the query, it'll
take a long time, so I thought I'd start inquiring about whether
anyone has a good algorithm for accelerating these kinds of queries.

--

Dobes Vandermeer
Director, Habitsoft Inc.
dobesv@habitsoft.com
778-891-2922

pgsql-novice by date:

Previous
From: "Sean Davis"
Date:
Subject: Re: Optimizing sum() operations
Next
From: Mark Roberts
Date:
Subject: Re: Optimizing sum() operations