Re: Query optimizing - Mailing list pgsql-sql

From Sebastian Ritter
Subject Re: Query optimizing
Date
Msg-id 99b656cb0811100358v71788c28s51d08d5512200b21@mail.gmail.com
Whole thread Raw
In response to Re: Query optimizing  ("Helio Campos Mello de Andrade" <helio.campos@gmail.com>)
List pgsql-sql
Hi Helio,

Sorry about the parenthesis - Bad copy/pasting skills! To further discuss your suggestion: Wouldn't adding n_issue=i.id as a where clause filter cause the sub-query to become correlated and thus much less efficient ? I may be wrong, or may have miss-understood your suggestion.

Thanks for you help,
Sebastian

 
On Mon, Nov 10, 2008 at 11:48 AM, Helio Campos Mello de Andrade <helio.campos@gmail.com> wrote:
Hi Sebastian,

 - First of all i think there is an open-parenthesis missing in the query V2.
Maybe in the V2 version you cold restrict the results in the INNER query a bit more if you use a restriction clause like "WHERE n_issue = i.id" in that. It will certainly lower the number of rows returned by it to only 1 result.

Regards

--
Helio Campos Mello de Andrade

 

On Mon, Nov 10, 2008 at 8:43 AM, Sebastian Ritter <ritter.sebastian@gmail.com> wrote:
Hi all,

I was hoping to receive some advise on a slow running query in our business' Issue Tracking System. To shed some light on the below mentioned queries, here is a brief summary of how users interact with the system. The two main components in the system are a Issues and Followups. An Issue is created by our technical support staff when some form of action needs to be taken in order to resolve a problem. FollowUps are entries entered against an issue depicting the sequence of events taken to resolve the issue. There are about 15,000 Issues in the system at present and about 95,000 FollowUps in the system. As we need the system to be very responsive, each query should ideally run in under 1 second.

A lot of the reports our technical officers submit to us include a listing of all actioned issues for a given day along with the last modified followup of each said issue. With the number of rows in our database increasing at a high rate, these queries are starting to run too slowly.

Here is a condensed version of the two tables:

Issues:
=================================
id  - integer                
dt_created - timestamp
dt_modified - timestamp
t_title - varchar
t_description - varchar

FollowUps:
=================================
id  - integer                
dt_created - timestamp
dt_modified - timestamp
t_description - varchar
n_issue - foregin key to issues

We have identified that the slowness in our queries is trying to return the lastest followup for each actioned issue that day. Without further ado here are two variations I have tried within the system (neither of which are making the cut):

V1 (correlated subquery - Very bad performance)

 (SELECT
         fu.*
  FROM
        manage_followup fu,
        manage_issue i
  WHERE
         i.id = fu.n_issue
         AND
         fu.id = (SELECT
                        id
                   FROM
                        manage_followup
                    WHERE
                         n_issue = i.id
                    ORDER BY
                         dt_modified DESC
                    LIMIT 1)) AS latestfu,

V2 (Using Group By, "max" aggregate function  and distinct- better performance, but still bad because of distinct)


SELECT DISTINCT ON (fu.n_issue)
                                fu.id,
                                fu.dt_created,
                                fu.dt_modified,
                                fu.t_description,
                                fu.n_issue as issue_id
                            FROM
                                manage_followup fu,
                                (SELECT
                                    n_issue,
                                    max(dt_modified) as dt_modified
                                 FROM
                                    manage_followup
                                 GROUP BY
                                    n_issue) as max_modified
                            WHERE
                                max_modified.n_issue = fu.n_issue
                                AND
                                fu.dt_modified = max_modified.dt_modified) AS latestfu ON (latestfu.issue_id = i.id),

We must use distinct here as we sometimes use batch scripts to enter followups, which will give them all similar, if not equal, modification dates. We also can't use followup ids as an indicator of the latest followup because users of the system can retrospectively go back and change older followups.

I was hoping some one could provide a solution that does not require a corrolated subquery or make use of the distinct keyword. Any help would be much appreciated.

Kind regards,
Sebastian






pgsql-sql by date:

Previous
From: "Helio Campos Mello de Andrade"
Date:
Subject: Re: Query optimizing
Next
From: Richard Huxton
Date:
Subject: Re: Query optimizing