Re: need to speed up query - Mailing list pgsql-performance

From Shaun Thomas
Subject Re: need to speed up query
Date
Msg-id 1210092209.14833.32.camel@berners-lee
Whole thread Raw
In response to need to speed up query  (Justin <justin@emproshunts.com>)
Responses Re: need to speed up query
List pgsql-performance
On Tue, 2008-05-06 at 03:01 +0100, Justin wrote:

> i've had to write queries to get trail balance values out of the GL
> transaction table and i'm not happy with its performance

Go ahead and give this a try:

SELECT p.period_id, p.period_start, p.period_end, a.accnt_id,
       a.accnt_number, a.accnt_descrip, p.period_yearperiod_id,
       a.accnt_type,
       SUM(CASE WHEN g.gltrans_date < p.period_start
                THEN g.gltrans_amount ELSE 0.0
           END)::text::money AS beginbalance,
       SUM(CASE WHEN g.gltrans_date < p.period_end
                 AND g.gltrans_date >= p.period_start
                 AND g.gltrans_amount <= 0::numeric
                THEN g.gltrans_amount ELSE 0.0
           END)::text::money AS negative,
       SUM(CASE WHEN g.gltrans_date <= p.period_end
                 AND g.gltrans_date >= p.period_start
                 AND g.gltrans_amount >= 0::numeric
                THEN g.gltrans_amount ELSE 0.0
           END)::text::money AS positive,
       SUM(CASE WHEN g.gltrans_date <= p.period_end
                 AND g.gltrans_date >= p.period_start
                THEN g.gltrans_amount ELSE 0.0
           END)::text::money AS difference,
       SUM(CASE WHEN g.gltrans_date <= p.period_end
                THEN g.gltrans_amount ELSE 0.0
           END)::text::money AS endbalance,
  FROM period p
 CROSS JOIN accnt a
  LEFT JOIN gltrans g ON (g.gltrans_accnt_id = a.accnt_id
                          AND g.gltrans_posted = true)
 ORDER BY period.period_id, accnt.accnt_number;

Depending on how the planner saw your old query, it may have forced
several different sequence or index scans to get the information from
gltrans.  One thing all of your subqueries had in common was a join on
the account id and listing only posted transactions.  It's still a big
gulp, but it's only one gulp.

The other thing I did was that I guessed you added the coalesce clause
because the subqueries individually could return null rowsets for
various groupings, and you wouldn't want that.  This left-join solution
only lets it add to your various sums if it matches all the conditions,
otherwise it falls through the list of cases until nothing matches.  If
some of your transactions can have null amounts, you might consider
turning g.gltrans into COALESCE(g.gltrans, 0.0) instead.

Otherwise, this *might* work; without knowing more about your schema,
it's only a guess.  I'm a little skeptical about the conditionless
cross-join, but whatever.

Either way, by looking at this query, it looks like some year-end
summary piece, or an at-a-glance idea of your account standings.  The
problem you're going to have with this is that there's no way to truly
optimize this.  One way or another, you're going to incur some
combination of three sequence scans or three index scans; if those
tables get huge, you're in trouble.  You might want to consider a
denormalized summary table that contains this information (and maybe
more) maintained by a trigger or regularly invoked stored-procedure and
then you can select from *that* with much less agony.

Then there's fact-tables, but that's beyond the scope of this email. ;)

Good luck!

--

Shaun Thomas
Database Administrator

Leapfrog Online
807 Greenwood Street
Evanston, IL 60201
Tel. 847-440-8253
Fax. 847-570-5750
www.leapfrogonline.com



pgsql-performance by date:

Previous
From: Shaun Thomas
Date:
Subject: Re: multiple joins + Order by + LIMIT query performance issue
Next
From: Steve Atkins
Date:
Subject: Re: What constitutes a complex query