Re: need to speed up query - Mailing list pgsql-performance

From Justin
Subject Re: need to speed up query
Date
Msg-id 48209863.9070709@emproshunts.com
Whole thread Raw
In response to Re: need to speed up query  (PFC <lists@peufeu.com>)
List pgsql-performance

PFC wrote:
>
>>>     What is a "period" ? Is it a month, or something more "custom" ?
>>> Can periods overlap ?
>
>> No periods can never overlap.  If the periods did you would be in
>> violation of many tax laws around the world. Plus it you would not
>> know how much money you are making or losing.
>
>     I was wondering if you'd be using the same query to compute how
> much was gained every month and every week, which would have
> complicated things.
>     But now it's clear.
>
>> To make this really funky you can have a Fiscal  Calendar year start
>> June 15 2008 and end on June 14 2009
>
>     Don't you just love those guys ? Always trying new tricks to make
> your life more interesting.

Thats been around been around a long time.  You can go back a few
hundreds years


>>>     Note that here you are scanning the entire table multiple times,
>>> the complexity of this is basically (rows in gltrans)^2 which is
>>> something you'd like to avoid.
>>>
>> For accounting purposes you need to know the Beginning Balances,
>> Debits,  Credits,  Difference between Debits to Credits and the
>> Ending Balance  for each account.  We have 133 accounts with
>> presently 12 periods defined so we end up 1596 rows returned for this
>> query.
>
>     Alright, I propose a solution which only works when periods don't
> overlap.
>     It will scan the entire table, but only once, not many times as
> your current query does.
>
>> So period 1 should have for the most part have Zero for Beginning
>> Balances for most types of Accounts.  Period 2 is Beginning Balance
>> is Period 1 Ending Balance, Period 3 is Period 2 ending balance so
>> and so on forever.
>
>     Precisely. So, it is not necessary to recompute everything for
> each period.
>     Use the previous period's ending balance as the current period's
> starting balance...
>
>     There are several ways to do this.
>     First, you could use your current query, but only compute the sum
> of what happened during a period, for each period, and store that in a
> temporary table.
>     Then, you use a plpgsql function, or you do that in your client,
> you take the rows in chronological order, you sum them as they come,
> and you get your balances. Use a NUMERIC type, not a FLOAT, to avoid
> rounding errors.
>
>     The other solution does the same thing but optimizes the first
> step like this :
>     INSERT INTO temp_table SELECT period, sum(...) GROUP BY period
>
>     To do this you must be able to compute the period from the date
> and not the other way around. You could store a period_id in your
> table, or use a function.
>
>     Another much more efficient solution would be to have a summary
> table which keeps the summary data for each period, with beginning
> balance and end balance. This table will only need to be updated when
> someone finds an old receipt in their pocket or something.
>

As i posted earlier the software did do this but it has so many bugs
else where in the code it allows it get out of balance to what really is
happening.   I spent a several weeks trying to get this working and find
all the places it  went wrong.  I gave up and did this query which took
a day write and balance to a point that i turned it over to the
accountant.   I redid the front end and i'm off to the races and Fixing
other critical problems.

All i need to do is take Shanun Thomas code and replace the View this
select statement creates


>> This falls under the stupid question and i'm just curious what other
>> people think what makes a query complex?
>
>     I have some rather complex queries which postgres burns in a few
> milliseconds.
>     You could define complexity as the amount of brain sweat that went
> into writing that query.
>     You could also define complexity as O(n) or O(n^2) etc, for
> instance your query (as written) is O(n^2) which is something you
> don't want, I've seen stuff that was O(2^n) or worse, O(n!) in
> software written by drunk students, in this case getting rid of it is
> an emergency...
>

Thanks for your help and ideas i really appreciate it.

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Possible Redundancy/Performance Solution
Next
From: Tom Lane
Date:
Subject: Re: multiple joins + Order by + LIMIT query performance issue