Re: Query optimizing - Mailing list pgsql-sql

From Helio Campos Mello de Andrade
Subject Re: Query optimizing
Date
Msg-id 29e3942f0811100348q542b0ceck5a6af51296a4f45a@mail.gmail.com
Whole thread Raw
In response to Query optimizing  ("Sebastian Ritter" <ritter.sebastian@gmail.com>)
Responses Re: Query optimizing
List pgsql-sql
Hi Sebastian,<br /><br /> - First of all i think there is an open-parenthesis missing in the query V2.<br />Maybe in
theV2 version you cold restrict the results in the INNER query a bit more if you use a restriction clause like "WHERE
n_issue= <a href="http://i.id/" target="_blank">i.id</a>" in that. It will certainly lower the number of rows returned
byit to only 1 result. <br /><br />Regards<br /><br />--<br />Helio Campos Mello de Andrade<br /> <br /><br /><div
class="gmail_quote">OnMon, Nov 10, 2008 at 8:43 AM, Sebastian Ritter <span dir="ltr"><<a
href="mailto:ritter.sebastian@gmail.com">ritter.sebastian@gmail.com</a>></span>wrote:<br /><blockquote
class="gmail_quote"style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi
all,<br/><br />I was hoping to receive some advise on a slow running query in our business' Issue Tracking System. To
shedsome light on the below mentioned queries, here is a brief summary of how users interact with the system. The two
maincomponents in the system are a Issues and Followups. An Issue is created by our technical support staff when some
formof action needs to be taken in order to resolve a problem. FollowUps are entries entered against an issue depicting
thesequence of events taken to resolve the issue. There are about 15,000 Issues in the system at present and about
95,000FollowUps in the system. As we need the system to be very responsive, each query should ideally run in under 1
second.<br/><br />A lot of the reports our technical officers submit to us include a listing of all actioned issues for
agiven day along with the last modified followup of each said issue. With the number of rows in our database increasing
ata high rate, these queries are starting to run too slowly. <br /><br />Here is a condensed version of the two
tables:<br/><br />Issues:<br />=================================<br />id  - integer                 <br />dt_created -
timestamp<br/>dt_modified - timestamp<br />t_title - varchar<br />t_description - varchar<br /><br />FollowUps:<br />
=================================<br/> id  - integer                 <br /> dt_created - timestamp<br /> dt_modified -
timestamp<br/> t_description - varchar<br />n_issue - foregin key to issues<br /><br />We have identified that the
slownessin our queries is trying to return the lastest followup for each actioned issue that day. Without further ado
hereare two variations I have tried within the system (neither of which are making the cut):<br /><br />V1 (correlated
subquery- Very bad performance)<br /><br /> (SELECT<br />         fu.*<br />  FROM<br />        manage_followup fu,<br
/>       manage_issue i<br />  WHERE<br />         <a href="http://i.id" target="_blank">i.id</a> = fu.n_issue<br />
        AND<br />         <a href="http://fu.id" target="_blank">fu.id</a> = (SELECT<br />                        id<br
/>                  FROM<br />                        manage_followup<br />                    WHERE<br
/>                        n_issue = <a href="http://i.id" target="_blank">i.id</a><br />                     ORDER
BY<br/>                         dt_modified DESC<br />                    LIMIT 1)) AS latestfu,<br /><br />V2 (Using
GroupBy, "max" aggregate function  and distinct- better performance, but still bad because of distinct) <br /><br /><br
/>SELECTDISTINCT ON (fu.n_issue)<br />                                <a href="http://fu.id"
target="_blank">fu.id</a>,<br/>                                fu.dt_created,<br />                               
fu.dt_modified,<br/>                                 fu.t_description,<br />                                 fu.n_issue
asissue_id<br />                            FROM<br />                                manage_followup fu,<br
/>                               (SELECT<br />                                    n_issue,<br />
                                   max(dt_modified) as dt_modified<br />                                 FROM<br
/>                                   manage_followup<br />                                 GROUP BY<br
/>                                   n_issue) as max_modified<br />                             WHERE<br
/>                               max_modified.n_issue = fu.n_issue<br />                                AND<br
/>                               fu.dt_modified = max_modified.dt_modified) AS latestfu ON (latestfu.issue_id = <a
href="http://i.id"target="_blank">i.id</a>),<br /><br />We must use distinct here as we sometimes use batch scripts to
enterfollowups, which will give them all similar, if not equal, modification dates. We also can't use followup ids as
anindicator of the latest followup because users of the system can retrospectively go back and change older
followups.<br/><br />I was hoping some one could provide a solution that does not require a corrolated subquery or make
useof the distinct keyword. Any help would be much appreciated. <br /><br />Kind regards,<br /><font
color="#888888">Sebastian<br /><br /><br /><br /><br /><br /></font></blockquote></div> 

pgsql-sql by date:

Previous
From: prakash
Date:
Subject: [PERFORM] Can we activate WAL runtime?
Next
From: "Sebastian Ritter"
Date:
Subject: Re: Query optimizing