Re: Query optimizing - Mailing list pgsql-sql
From | Sebastian Ritter |
---|---|
Subject | Re: Query optimizing |
Date | |
Msg-id | 99b656cb0811100358v71788c28s51d08d5512200b21@mail.gmail.com Whole thread Raw |
In response to | Re: Query optimizing ("Helio Campos Mello de Andrade" <helio.campos@gmail.com>) |
List | pgsql-sql |
Hi Helio,
Sorry about the parenthesis - Bad copy/pasting skills! To further discuss your suggestion: Wouldn't adding n_issue=i.id as a where clause filter cause the sub-query to become correlated and thus much less efficient ? I may be wrong, or may have miss-understood your suggestion.
Thanks for you help,
Sebastian
Sorry about the parenthesis - Bad copy/pasting skills! To further discuss your suggestion: Wouldn't adding n_issue=i.id as a where clause filter cause the sub-query to become correlated and thus much less efficient ? I may be wrong, or may have miss-understood your suggestion.
Thanks for you help,
Sebastian
On Mon, Nov 10, 2008 at 11:48 AM, Helio Campos Mello de Andrade <helio.campos@gmail.com> wrote:
Hi Sebastian,
- First of all i think there is an open-parenthesis missing in the query V2.
Maybe in the V2 version you cold restrict the results in the INNER query a bit more if you use a restriction clause like "WHERE n_issue = i.id" in that. It will certainly lower the number of rows returned by it to only 1 result.
Regards
--
Helio Campos Mello de Andrade
On Mon, Nov 10, 2008 at 8:43 AM, Sebastian Ritter <ritter.sebastian@gmail.com> wrote:Hi all,
I was hoping to receive some advise on a slow running query in our business' Issue Tracking System. To shed some light on the below mentioned queries, here is a brief summary of how users interact with the system. The two main components in the system are a Issues and Followups. An Issue is created by our technical support staff when some form of action needs to be taken in order to resolve a problem. FollowUps are entries entered against an issue depicting the sequence of events taken to resolve the issue. There are about 15,000 Issues in the system at present and about 95,000 FollowUps in the system. As we need the system to be very responsive, each query should ideally run in under 1 second.
A lot of the reports our technical officers submit to us include a listing of all actioned issues for a given day along with the last modified followup of each said issue. With the number of rows in our database increasing at a high rate, these queries are starting to run too slowly.
Here is a condensed version of the two tables:
Issues:
=================================
id - integer
dt_created - timestamp
dt_modified - timestamp
t_title - varchar
t_description - varchar
FollowUps:
=================================
id - integer
dt_created - timestamp
dt_modified - timestamp
t_description - varchar
n_issue - foregin key to issues
We have identified that the slowness in our queries is trying to return the lastest followup for each actioned issue that day. Without further ado here are two variations I have tried within the system (neither of which are making the cut):
V1 (correlated subquery - Very bad performance)
(SELECT
fu.*
FROM
manage_followup fu,
manage_issue i
WHERE
i.id = fu.n_issue
AND
fu.id = (SELECT
id
FROM
manage_followup
WHERE
n_issue = i.id
ORDER BY
dt_modified DESC
LIMIT 1)) AS latestfu,
V2 (Using Group By, "max" aggregate function and distinct- better performance, but still bad because of distinct)
SELECT DISTINCT ON (fu.n_issue)
fu.id,
fu.dt_created,
fu.dt_modified,
fu.t_description,
fu.n_issue as issue_id
FROM
manage_followup fu,
(SELECT
n_issue,
max(dt_modified) as dt_modified
FROM
manage_followup
GROUP BY
n_issue) as max_modified
WHERE
max_modified.n_issue = fu.n_issue
AND
fu.dt_modified = max_modified.dt_modified) AS latestfu ON (latestfu.issue_id = i.id),
We must use distinct here as we sometimes use batch scripts to enter followups, which will give them all similar, if not equal, modification dates. We also can't use followup ids as an indicator of the latest followup because users of the system can retrospectively go back and change older followups.
I was hoping some one could provide a solution that does not require a corrolated subquery or make use of the distinct keyword. Any help would be much appreciated.
Kind regards,
Sebastian