Thread: need to speed up query

need to speed up query

From
Justin
Date:
i've had to write queries to get trail balance values out of the GL
transaction table and i'm not happy with its performance

The table has 76K rows growing about 1000 rows per working day so the
performance is not that great it takes about 20 to 30 seconds to get all
the records for the table and when we limit it to single accounting
period it drops down to 2 seconds

Here is the query and explain .  PostgreSql  is 8.3.1 on new server with
raid 10 Serial SCSI.

SELECT period.period_id,
    period.period_start,
    period.period_end,
    accnt.accnt_id,
    accnt.accnt_number,
    accnt.accnt_descrip,
    period.period_yearperiod_id,
    accnt.accnt_type,
    COALESCE(( SELECT sum(gltrans.gltrans_amount) AS sum
        FROM gltrans
        WHERE gltrans.gltrans_date < period.period_start
            AND gltrans.gltrans_accnt_id = accnt.accnt_id
            AND gltrans.gltrans_posted = true), 0.00)::text::money AS
beginbalance,
    COALESCE(( SELECT sum(gltrans.gltrans_amount) AS sum
        FROM gltrans
        WHERE gltrans.gltrans_date <= period.period_end
            AND gltrans.gltrans_date >= period.period_start
            AND gltrans.gltrans_amount <= 0::numeric
            AND gltrans.gltrans_accnt_id = accnt.accnt_id
            AND gltrans.gltrans_posted = true), 0.00)::text::money AS
negative,
    COALESCE(( SELECT sum(gltrans.gltrans_amount) AS sum
        FROM gltrans
        WHERE gltrans.gltrans_date <= period.period_end
            AND gltrans.gltrans_date >= period.period_start
            AND gltrans.gltrans_amount >= 0::numeric
            AND gltrans.gltrans_accnt_id = accnt.accnt_id
            AND gltrans.gltrans_posted = true), 0.00)::text::money AS
positive,
    COALESCE(( SELECT sum(gltrans.gltrans_amount) AS sum
        FROM gltrans
        WHERE gltrans.gltrans_date <= period.period_end
            AND gltrans.gltrans_date >= period.period_start
            AND gltrans.gltrans_accnt_id = accnt.accnt_id
            AND gltrans.gltrans_posted = true), 0.00)::text::money AS
difference,
    COALESCE(( SELECT sum(gltrans.gltrans_amount) AS sum
        FROM gltrans
        WHERE gltrans.gltrans_date <= period.period_end
            AND gltrans.gltrans_accnt_id = accnt.accnt_id
            AND gltrans.gltrans_posted = true), 0.00)::text::money AS
endbalance
FROM period, accnt
ORDER BY period.period_id, accnt.accnt_number;

"Sort  (cost=4083970.56..4083974.89 rows=1729 width=57) (actual
time=24680.402..24681.386 rows=1729 loops=1)"
"  Sort Key: period.period_id, accnt.accnt_number"
"  Sort Method:  quicksort  Memory: 292kB"
"  ->  Nested Loop  (cost=1.14..4083877.58 rows=1729 width=57) (actual
time=4.043..24674.258 rows=1729 loops=1)"
"        ->  Seq Scan on accnt  (cost=0.00..4.33 rows=133 width=41)
(actual time=0.011..0.158 rows=133 loops=1)"
"        ->  Materialize  (cost=1.14..1.27 rows=13 width=16) (actual
time=0.001..0.010 rows=13 loops=133)"
"              ->  Seq Scan on period  (cost=0.00..1.13 rows=13
width=16) (actual time=0.005..0.023 rows=13 loops=1)"
"        SubPlan"
"          ->  Aggregate  (cost=1093.64..1093.65 rows=1 width=8) (actual
time=6.039..6.039 rows=1 loops=1729)"
"                ->  Bitmap Heap Scan on gltrans  (cost=398.21..1092.18
rows=585 width=8) (actual time=5.171..5.623 rows=428 loops=1729)"
"                      Recheck Cond: ((gltrans_accnt_id = $1) AND
(gltrans_date <= $3))"
"                      Filter: gltrans_posted"
"                      ->  BitmapAnd  (cost=398.21..398.21 rows=636
width=0) (actual time=5.158..5.158 rows=0 loops=1729)"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_accnt_id_idx  (cost=0.00..30.57 rows=1908 width=0)
(actual time=0.078..0.078 rows=574 loops=1729)"
"                                  Index Cond: (gltrans_accnt_id = $1)"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_date_idx  (cost=0.00..367.10 rows=25446 width=0) (actual
time=7.407..7.407 rows=63686 loops=1183)"
"                                  Index Cond: (gltrans_date <= $3)"
"          ->  Aggregate  (cost=58.19..58.20 rows=1 width=8) (actual
time=0.920..0.921 rows=1 loops=1729)"
"                ->  Bitmap Heap Scan on gltrans  (cost=38.90..58.16
rows=9 width=8) (actual time=0.843..0.878 rows=40 loops=1729)"
"                      Recheck Cond: ((gltrans_date <= $3) AND
(gltrans_date >= $0) AND (gltrans_accnt_id = $1))"
"                      Filter: gltrans_posted"
"                      ->  BitmapAnd  (cost=38.90..38.90 rows=10
width=0) (actual time=0.839..0.839 rows=0 loops=1729)"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_date_idx  (cost=0.00..8.08 rows=382 width=0) (actual
time=0.782..0.782 rows=5872 loops=1729)"
"                                  Index Cond: ((gltrans_date <= $3) AND
(gltrans_date >= $0))"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_accnt_id_idx  (cost=0.00..30.57 rows=1908 width=0)
(actual time=0.076..0.076 rows=574 loops=798)"
"                                  Index Cond: (gltrans_accnt_id = $1)"
"          ->  Aggregate  (cost=58.20..58.21 rows=1 width=8) (actual
time=0.897..0.898 rows=1 loops=1729)"
"                ->  Bitmap Heap Scan on gltrans  (cost=38.89..58.19
rows=4 width=8) (actual time=0.845..0.874 rows=20 loops=1729)"
"                      Recheck Cond: ((gltrans_date <= $3) AND
(gltrans_date >= $0) AND (gltrans_accnt_id = $1))"
"                      Filter: (gltrans_posted AND (gltrans_amount >=
0::numeric))"
"                      ->  BitmapAnd  (cost=38.89..38.89 rows=10
width=0) (actual time=0.840..0.840 rows=0 loops=1729)"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_date_idx  (cost=0.00..8.08 rows=382 width=0) (actual
time=0.783..0.783 rows=5872 loops=1729)"
"                                  Index Cond: ((gltrans_date <= $3) AND
(gltrans_date >= $0))"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_accnt_id_idx  (cost=0.00..30.57 rows=1908 width=0)
(actual time=0.077..0.077 rows=574 loops=798)"
"                                  Index Cond: (gltrans_accnt_id = $1)"
"          ->  Aggregate  (cost=58.20..58.21 rows=1 width=8) (actual
time=0.908..0.909 rows=1 loops=1729)"
"                ->  Bitmap Heap Scan on gltrans  (cost=38.89..58.19
rows=4 width=8) (actual time=0.854..0.885 rows=20 loops=1729)"
"                      Recheck Cond: ((gltrans_date <= $3) AND
(gltrans_date >= $0) AND (gltrans_accnt_id = $1))"
"                      Filter: (gltrans_posted AND (gltrans_amount <=
0::numeric))"
"                      ->  BitmapAnd  (cost=38.89..38.89 rows=10
width=0) (actual time=0.843..0.843 rows=0 loops=1729)"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_date_idx  (cost=0.00..8.08 rows=382 width=0) (actual
time=0.785..0.785 rows=5872 loops=1729)"
"                                  Index Cond: ((gltrans_date <= $3) AND
(gltrans_date >= $0))"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_accnt_id_idx  (cost=0.00..30.57 rows=1908 width=0)
(actual time=0.078..0.078 rows=574 loops=798)"
"                                  Index Cond: (gltrans_accnt_id = $1)"
"          ->  Aggregate  (cost=1093.64..1093.65 rows=1 width=8) (actual
time=5.485..5.485 rows=1 loops=1729)"
"                ->  Bitmap Heap Scan on gltrans  (cost=398.21..1092.18
rows=585 width=8) (actual time=4.699..5.110 rows=388 loops=1729)"
"                      Recheck Cond: ((gltrans_accnt_id = $1) AND
(gltrans_date < $0))"
"                      Filter: gltrans_posted"
"                      ->  BitmapAnd  (cost=398.21..398.21 rows=636
width=0) (actual time=4.687..4.687 rows=0 loops=1729)"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_accnt_id_idx  (cost=0.00..30.57 rows=1908 width=0)
(actual time=0.079..0.079 rows=574 loops=1729)"
"                                  Index Cond: (gltrans_accnt_id = $1)"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_date_idx  (cost=0.00..367.10 rows=25446 width=0) (actual
time=6.717..6.717 rows=57814 loops=1183)"
"                                  Index Cond: (gltrans_date < $0)"
"Total runtime: 24682.580 ms"



Re: need to speed up query

From
"Scott Marlowe"
Date:
You're joining these two tables: period, accnt, but I'm not seeing an
on () clause or a where clause joining them.  Is the cross product
intentional?

But what I'm seeing that seems like the lowest hanging fruit would be
two column indexes on the bits that are showing up in those bit map
scans.  Like this part:

"                      Recheck Cond: ((gltrans_date <= $3) AND
(gltrans_date >= $0) AND gltrans_accnt_id = $1))"
"                      Filter: gltrans_posted"
"                      ->  BitmapAnd  (cost=38.90..38.90 rows=10
width=0) (actual time=0.839..0.839 rows=0 loops=1729)"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_date_idx  (cost=0.00..8.08 rows=382 width=0) (actual
time=0.782..0.782 rows=5872 loops=1729)"
"                                  Index Cond: ((gltrans_date <= $3)
AND (gltrans_date >= $0))"
"                            ->  Bitmap Index Scan on
gltrans_gltrans_accnt_id_idx  (cost=0.00..30.57 rows=1908 width=0)
(actual time=0.076..0.076 rows=574 loops=798)"
"                                  Index Cond: (gltrans_accnt_id = $1)"

You are looking through 574 rows in one column and 5872 in another.
But when they're anded together, you get 0 rows.  A two column index
there should really help.

Re: need to speed up query

From
"Gregory Williamson"
Date:

Justin --

You wrote:

> i've had to write queries to get trail balance values out of the GL
> transaction table and i'm not happy with its performance
>
>
> The table has 76K rows growing about 1000 rows per working day so the
> performance is not that great it takes about 20 to 30 seconds to get all
> the records for the table and when we limit it to single accounting
> period it drops down to 2 seconds

So 30 seconds for 76 days (roughly) worth of numbers ? Not terrible but not great.

> Here is the query and explain .  PostgreSql  is 8.3.1 on new server with
> raid 10 Serial SCSI.
<... snipped 'cause I have a lame reader ...>

> "  Sort Method:  quicksort  Memory: 292kB"
<...snip...>
> "Total runtime: 24682.580 ms"


I don't have any immediate thoughts but maybe you could post the table schemas and indexes. It looks to my untutored eye as if most of the estimates are fair so I am guessing that you have run analyze recently.

What is your sort memory set to ? If work_mem is too low then you'll go to disk (if you see tmp files under the postgres $PGDATA/base directory you might be seeing the result of this) ...

HTH

Greg Williamson
Senior DBA
DigitalGlobe

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)

Re: need to speed up query

From
Justin
Date:
yes the cross join is intentional.

Thanks creating the two column index drop processing time to 15 to 17
seconds
put per period down to 1 second



Scott Marlowe wrote:
> You're joining these two tables: period, accnt, but I'm not seeing an
> on () clause or a where clause joining them.  Is the cross product
> intentional?
>
> But what I'm seeing that seems like the lowest hanging fruit would be
> two column indexes on the bits that are showing up in those bit map
> scans.  Like this part:
>
> "                      Recheck Cond: ((gltrans_date <= $3) AND
> (gltrans_date >= $0) AND gltrans_accnt_id = $1))"
> "                      Filter: gltrans_posted"
> "                      ->  BitmapAnd  (cost=38.90..38.90 rows=10
> width=0) (actual time=0.839..0.839 rows=0 loops=1729)"
> "                            ->  Bitmap Index Scan on
> gltrans_gltrans_date_idx  (cost=0.00..8.08 rows=382 width=0) (actual
> time=0.782..0.782 rows=5872 loops=1729)"
> "                                  Index Cond: ((gltrans_date <= $3)
> AND (gltrans_date >= $0))"
> "                            ->  Bitmap Index Scan on
> gltrans_gltrans_accnt_id_idx  (cost=0.00..30.57 rows=1908 width=0)
> (actual time=0.076..0.076 rows=574 loops=798)"
> "                                  Index Cond: (gltrans_accnt_id = $1)"
>
> You are looking through 574 rows in one column and 5872 in another.
> But when they're anded together, you get 0 rows.  A two column index
> there should really help.
>
>

Re: need to speed up query

From
Justin
Date:
Gregory Williamson wrote:

Justin --

You wrote:

> i've had to write queries to get trail balance values out of the GL
> transaction table and i'm not happy with its performance
>
>
> The table has 76K rows growing about 1000 rows per working day so the
> performance is not that great it takes about 20 to 30 seconds to get all
> the records for the table and when we limit it to single accounting
> period it drops down to 2 seconds

So 30 seconds for 76 days (roughly) worth of numbers ? Not terrible but not great.

> Here is the query and explain .  PostgreSql  is 8.3.1 on new server with
> raid 10 Serial SCSI.
<... snipped 'cause I have a lame reader ...>
not according to the bench marks i have done,  which were posted a couple of months ago.


> "  Sort Method:  quicksort  Memory: 292kB"
<...snip...>
> "Total runtime: 24682.580 ms"


I don't have any immediate thoughts but maybe you could post the table schemas and indexes. It looks to my untutored eye as if most of the estimates are fair so I am guessing that you have run analyze recently.

What is your sort memory set to ? If work_mem is too low then you'll go to disk (if you see tmp files under the postgres $PGDATA/base directory you might be seeing the result of this) ...

i need to look into work mem its set at 25 megs which is fine for most work unless we get into the accounting queries which have to be more complicated than they need to be because how some of the tables are laid out which i did not lay out.


HTH

Greg Williamson
Senior DBA
DigitalGlobe

Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)

Re: need to speed up query

From
PFC
Date:
> i've had to write queries to get trail balance values out of the GL
> transaction table and i'm not happy with its performance The table has
> 76K rows growing about 1000 rows per working day so the performance is
> not that great it takes about 20 to 30 seconds to get all the records
> for the table and when we limit it to single accounting period it drops
> down to 2 seconds

    What is a "period" ? Is it a month, or something more "custom" ? Can
periods overlap ?

>     COALESCE(( SELECT sum(gltrans.gltrans_amount) AS sum
>         FROM gltrans
>         WHERE gltrans.gltrans_date < period.period_start
>             AND gltrans.gltrans_accnt_id = accnt.accnt_id
>             AND gltrans.gltrans_posted = true), 0.00)::text::money AS
> beginbalance,

    Note that here you are scanning the entire table multiple times, the
complexity of this is basically (rows in gltrans)^2 which is something
you'd like to avoid.

Re: need to speed up query

From
Justin
Date:

PFC wrote:
>
>> i've had to write queries to get trail balance values out of the GL
>> transaction table and i'm not happy with its performance The table
>> has 76K rows growing about 1000 rows per working day so the
>> performance is not that great it takes about 20 to 30 seconds to get
>> all the records for the table and when we limit it to single
>> accounting period it drops down to 2 seconds
>
>     What is a "period" ? Is it a month, or something more "custom" ?
> Can periods overlap ?
No periods can never overlap.  If the periods did you would be in
violation of many tax laws around the world. Plus it you would not know
how much money you are making or losing.
Generally  yes a accounting period is a normal calendar month.  but you
can have 13 periods in a normal calendar year.  52 weeks in a year / 4
weeks in month = 13 periods or 13 months in a Fiscal Calendar year.
This means if someone is using a 13 period fiscal accounting year the
start and end dates are offset from a normal calendar.
To make this really funky you can have a Fiscal  Calendar year start
June 15 2008 and end on June 14 2009

http://en.wikipedia.org/wiki/Fiscal_year
>
>>     COALESCE(( SELECT sum(gltrans.gltrans_amount) AS sum
>>         FROM gltrans
>>         WHERE gltrans.gltrans_date < period.period_start
>>             AND gltrans.gltrans_accnt_id = accnt.accnt_id
>>             AND gltrans.gltrans_posted = true), 0.00)::text::money AS
>> beginbalance,
>
>     Note that here you are scanning the entire table multiple times,
> the complexity of this is basically (rows in gltrans)^2 which is
> something you'd like to avoid.
>
For accounting purposes you need to know the Beginning Balances,
Debits,  Credits,  Difference between Debits to Credits and the Ending
Balance  for each account.  We have 133 accounts with presently 12
periods defined so we end up 1596 rows returned for this query.

So period 1 should have for the most part have Zero for Beginning
Balances for most types of Accounts.  Period 2 is Beginning Balance is
Period 1 Ending Balance, Period 3 is Period 2 ending balance so and so
on forever.






Re: need to speed up query

From
PFC
Date:
>>     What is a "period" ? Is it a month, or something more "custom" ?
>> Can periods overlap ?

> No periods can never overlap.  If the periods did you would be in
> violation of many tax laws around the world. Plus it you would not know
> how much money you are making or losing.

    I was wondering if you'd be using the same query to compute how much was
gained every month and every week, which would have complicated things.
    But now it's clear.

> To make this really funky you can have a Fiscal  Calendar year start
> June 15 2008 and end on June 14 2009

    Don't you just love those guys ? Always trying new tricks to make your
life more interesting.

>>     Note that here you are scanning the entire table multiple times,
>> the complexity of this is basically (rows in gltrans)^2 which is
>> something you'd like to avoid.
>>
> For accounting purposes you need to know the Beginning Balances,
> Debits,  Credits,  Difference between Debits to Credits and the Ending
> Balance  for each account.  We have 133 accounts with presently 12
> periods defined so we end up 1596 rows returned for this query.

    Alright, I propose a solution which only works when periods don't overlap.
    It will scan the entire table, but only once, not many times as your
current query does.

> So period 1 should have for the most part have Zero for Beginning
> Balances for most types of Accounts.  Period 2 is Beginning Balance is
> Period 1 Ending Balance, Period 3 is Period 2 ending balance so and so
> on forever.

    Precisely. So, it is not necessary to recompute everything for each
period.
    Use the previous period's ending balance as the current period's starting
balance...

    There are several ways to do this.
    First, you could use your current query, but only compute the sum of what
happened during a period, for each period, and store that in a temporary
table.
    Then, you use a plpgsql function, or you do that in your client, you take
the rows in chronological order, you sum them as they come, and you get
your balances. Use a NUMERIC type, not a FLOAT, to avoid rounding errors.

    The other solution does the same thing but optimizes the first step like
this :
    INSERT INTO temp_table SELECT period, sum(...) GROUP BY period

    To do this you must be able to compute the period from the date and not
the other way around. You could store a period_id in your table, or use a
function.

    Another much more efficient solution would be to have a summary table
which keeps the summary data for each period, with beginning balance and
end balance. This table will only need to be updated when someone finds an
old receipt in their pocket or something.

> This falls under the stupid question and i'm just curious what other
> people think what makes a query complex?

    I have some rather complex queries which postgres burns in a few
milliseconds.
    You could define complexity as the amount of brain sweat that went into
writing that query.
    You could also define complexity as O(n) or O(n^2) etc, for instance your
query (as written) is O(n^2) which is something you don't want, I've seen
stuff that was O(2^n) or worse, O(n!) in software written by drunk
students, in this case getting rid of it is an emergency...

Re: need to speed up query

From
Shaun Thomas
Date:
On Tue, 2008-05-06 at 03:01 +0100, Justin wrote:

> i've had to write queries to get trail balance values out of the GL
> transaction table and i'm not happy with its performance

Go ahead and give this a try:

SELECT p.period_id, p.period_start, p.period_end, a.accnt_id,
       a.accnt_number, a.accnt_descrip, p.period_yearperiod_id,
       a.accnt_type,
       SUM(CASE WHEN g.gltrans_date < p.period_start
                THEN g.gltrans_amount ELSE 0.0
           END)::text::money AS beginbalance,
       SUM(CASE WHEN g.gltrans_date < p.period_end
                 AND g.gltrans_date >= p.period_start
                 AND g.gltrans_amount <= 0::numeric
                THEN g.gltrans_amount ELSE 0.0
           END)::text::money AS negative,
       SUM(CASE WHEN g.gltrans_date <= p.period_end
                 AND g.gltrans_date >= p.period_start
                 AND g.gltrans_amount >= 0::numeric
                THEN g.gltrans_amount ELSE 0.0
           END)::text::money AS positive,
       SUM(CASE WHEN g.gltrans_date <= p.period_end
                 AND g.gltrans_date >= p.period_start
                THEN g.gltrans_amount ELSE 0.0
           END)::text::money AS difference,
       SUM(CASE WHEN g.gltrans_date <= p.period_end
                THEN g.gltrans_amount ELSE 0.0
           END)::text::money AS endbalance,
  FROM period p
 CROSS JOIN accnt a
  LEFT JOIN gltrans g ON (g.gltrans_accnt_id = a.accnt_id
                          AND g.gltrans_posted = true)
 ORDER BY period.period_id, accnt.accnt_number;

Depending on how the planner saw your old query, it may have forced
several different sequence or index scans to get the information from
gltrans.  One thing all of your subqueries had in common was a join on
the account id and listing only posted transactions.  It's still a big
gulp, but it's only one gulp.

The other thing I did was that I guessed you added the coalesce clause
because the subqueries individually could return null rowsets for
various groupings, and you wouldn't want that.  This left-join solution
only lets it add to your various sums if it matches all the conditions,
otherwise it falls through the list of cases until nothing matches.  If
some of your transactions can have null amounts, you might consider
turning g.gltrans into COALESCE(g.gltrans, 0.0) instead.

Otherwise, this *might* work; without knowing more about your schema,
it's only a guess.  I'm a little skeptical about the conditionless
cross-join, but whatever.

Either way, by looking at this query, it looks like some year-end
summary piece, or an at-a-glance idea of your account standings.  The
problem you're going to have with this is that there's no way to truly
optimize this.  One way or another, you're going to incur some
combination of three sequence scans or three index scans; if those
tables get huge, you're in trouble.  You might want to consider a
denormalized summary table that contains this information (and maybe
more) maintained by a trigger or regularly invoked stored-procedure and
then you can select from *that* with much less agony.

Then there's fact-tables, but that's beyond the scope of this email. ;)

Good luck!

--

Shaun Thomas
Database Administrator

Leapfrog Online
807 Greenwood Street
Evanston, IL 60201
Tel. 847-440-8253
Fax. 847-570-5750
www.leapfrogonline.com



Re: need to speed up query

From
Justin
Date:
it worked it had couple missing parts but it worked and ran in 3.3 seconds.  Thanks for this
i need to review the result and balance it to my results as the Accountant already went through and balanced some accounts by hand to verify my results

<<begin quote>>
 You might want to consider a
denormalized summary table that contains this information (and maybe
more) maintained by a trigger or regularly invoked stored-procedure and
then you can select from *that* with much less agony.
<<end quote>>

I just dumped the summary table because it kept getting out of balance all the time and was missing accounts that did not have transaction in them for given period.  Again i did not lay out the table nor the old code which was terrible and did not work  correctly.   I tried several times to fix the summary table  but to many  things allowed it to get out of sync.  Keeping the Ending and Beginning Balance correct was to much trouble and i needed to get numbers we can trust to the accountant. 

The developers of the code got credits and debits backwards so instead of fixing the code they just added code to flip the values on the front end.  Its really annoying.  At this point if i could go back 7 months ago i would not purchased this software if i had known what i know now.

I've had to make all kinds of changes i never intended to make in order to get the stuff to balance and agree. I've spent the last 3 months in code review fixing things that allow accounts to get out of balance and stop stupid things from happening, like posting GL Transactions into non-existing accounting periods.  the list of things i have to fix is getting dam long.


Re: need to speed up query

From
Justin
Date:

PFC wrote:
>
>>>     What is a "period" ? Is it a month, or something more "custom" ?
>>> Can periods overlap ?
>
>> No periods can never overlap.  If the periods did you would be in
>> violation of many tax laws around the world. Plus it you would not
>> know how much money you are making or losing.
>
>     I was wondering if you'd be using the same query to compute how
> much was gained every month and every week, which would have
> complicated things.
>     But now it's clear.
>
>> To make this really funky you can have a Fiscal  Calendar year start
>> June 15 2008 and end on June 14 2009
>
>     Don't you just love those guys ? Always trying new tricks to make
> your life more interesting.

Thats been around been around a long time.  You can go back a few
hundreds years


>>>     Note that here you are scanning the entire table multiple times,
>>> the complexity of this is basically (rows in gltrans)^2 which is
>>> something you'd like to avoid.
>>>
>> For accounting purposes you need to know the Beginning Balances,
>> Debits,  Credits,  Difference between Debits to Credits and the
>> Ending Balance  for each account.  We have 133 accounts with
>> presently 12 periods defined so we end up 1596 rows returned for this
>> query.
>
>     Alright, I propose a solution which only works when periods don't
> overlap.
>     It will scan the entire table, but only once, not many times as
> your current query does.
>
>> So period 1 should have for the most part have Zero for Beginning
>> Balances for most types of Accounts.  Period 2 is Beginning Balance
>> is Period 1 Ending Balance, Period 3 is Period 2 ending balance so
>> and so on forever.
>
>     Precisely. So, it is not necessary to recompute everything for
> each period.
>     Use the previous period's ending balance as the current period's
> starting balance...
>
>     There are several ways to do this.
>     First, you could use your current query, but only compute the sum
> of what happened during a period, for each period, and store that in a
> temporary table.
>     Then, you use a plpgsql function, or you do that in your client,
> you take the rows in chronological order, you sum them as they come,
> and you get your balances. Use a NUMERIC type, not a FLOAT, to avoid
> rounding errors.
>
>     The other solution does the same thing but optimizes the first
> step like this :
>     INSERT INTO temp_table SELECT period, sum(...) GROUP BY period
>
>     To do this you must be able to compute the period from the date
> and not the other way around. You could store a period_id in your
> table, or use a function.
>
>     Another much more efficient solution would be to have a summary
> table which keeps the summary data for each period, with beginning
> balance and end balance. This table will only need to be updated when
> someone finds an old receipt in their pocket or something.
>

As i posted earlier the software did do this but it has so many bugs
else where in the code it allows it get out of balance to what really is
happening.   I spent a several weeks trying to get this working and find
all the places it  went wrong.  I gave up and did this query which took
a day write and balance to a point that i turned it over to the
accountant.   I redid the front end and i'm off to the races and Fixing
other critical problems.

All i need to do is take Shanun Thomas code and replace the View this
select statement creates


>> This falls under the stupid question and i'm just curious what other
>> people think what makes a query complex?
>
>     I have some rather complex queries which postgres burns in a few
> milliseconds.
>     You could define complexity as the amount of brain sweat that went
> into writing that query.
>     You could also define complexity as O(n) or O(n^2) etc, for
> instance your query (as written) is O(n^2) which is something you
> don't want, I've seen stuff that was O(2^n) or worse, O(n!) in
> software written by drunk students, in this case getting rid of it is
> an emergency...
>

Thanks for your help and ideas i really appreciate it.