Re: help with query - Mailing list pgsql-performance

From Jean-Luc Lachance
Subject Re: help with query
Date
Msg-id 4124B7B5.6050409@sympatico.ca
Whole thread Raw
In response to Re: help with query  (Dave Cramer <pg@fastcrypt.com>)
List pgsql-performance
how about:

SELECT distinct main.oid,main.* FROM Tickets main
WHERE main.EffectiveId = main.id
AND main.Status != 'deleted'
AND ( main.Type = 'ticket' OR main.Type = 'subticket' )
AND ( main.Queue = '9' )
AND ( main.id = '17417'
   OR main.id IN (
     SELECT DISTINCT LocalTarget from Links
     where Type = 'MemberOf' and LocalTarget = '17417')
   OR main.id IN (
     SELECT DISTINCT LocalBase from Links
     where Type = 'MemberOf' and LocalTarget = '17417'))




Dave Cramer wrote:

> Brad,
>
> Thanks, that runs on the same order of magnitude as the subqueries.
>
> DAve
> On Thu, 2004-08-19 at 09:38, Brad Bulger wrote:
>
>>You're doing a join except not, is the trouble, looks like. The query is really
>>"FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join
>>to the Links table. So you end up getting every row in Links for each row in
>>Tickets with id = 17417.
>>
>>I'd think this wants to be two queries or a union:
>>
>>SELECT distinct main.oid,main.* FROM Tickets main
>>WHERE (main.EffectiveId = main.id)
>>AND (main.Status != 'deleted')
>>AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
>>AND ( (main.Queue = '9') )
>>AND ( (main.id = '17417'))
>>union
>>SELECT distinct main.oid,main.* FROM Tickets main, Links
>>WHERE (main.EffectiveId = main.id)
>>AND (main.Status != 'deleted')
>>AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
>>AND ( (main.Queue = '9') )
>>AND ( (Links.Type = 'MemberOf') )
>>AND ( (Links.LocalTarget = '17417') )
>>AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) )
>>;
>>
>>or else, yah, a subquery:
>>
>>[...]
>>AND (
>>  main.id = '17417'
>>  or
>>  exists(
>>    select true from Links
>>    where Type = 'MemberOf' and LocalTarget = '17417'
>>     and (LocalBase = main.id or LocalTarget = main.id)
>>  )
>>)
>>
>>Those are the only things I can think of to make it work, anyways.
>>
>>Dave Cramer wrote:
>>
>>
>>>RT uses a query like:
>>>
>>>SELECT distinct main.oid,main.* FROM Tickets main
>>>WHERE
>>>(main.EffectiveId = main.id)
>>>AND
>>>(main.Status != 'deleted')
>>>AND
>>>    ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
>>>AND
>>>    ( (main.Queue = '9') )
>>>AND ((
>>>        ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) )
>>>      OR
>>>        ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) )
>>>      or
>>>        (main.id = '17417')
>>>     )
>>>  );
>>>
>>>
>>>which produces a query plan:
>>>
>>>Nested Loop  (cost=0.00..813.88 rows=1 width=169)
>>>   Join Filter: (((("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("inner".localbase =
17417)OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id 
>>>= 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id =
17417))AND ((("inner"."type")::text = 'MemberOf'::text) OR ("inner" 
>>>.localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR ("inner".localtarget = 17417) OR
("outer".id= 17417)) AND (("outer".id = "inner".localtarget) OR ( 
>>>"inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR
("outer".id= "inner".localbase) OR ("outer".id = 17417)) AND (("inner".loca 
>>>lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget)
OR("outer".id = "inner".localbase) OR ("outer".id = 17417))) 
>>>   ->  Index Scan using tickets1 on tickets main  (cost=0.00..657.61 rows=1 width=169)
>>>         Index Cond: (queue = 9)
>>>         Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND ((("type")::text = 'ticket'::text)
OR(("type")::text = 'subticket'::text))) 
>>>   ->  Seq Scan on links  (cost=0.00..46.62 rows=1462 width=20)
>>>
>>>If I rewrite the query as:
>>>
>>>SELECT main.* FROM Tickets main
>>>WHERE
>>>(main.EffectiveId = main.id)
>>>AND
>>>(main.Status != 'deleted')
>>>AND
>>>    ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
>>>AND
>>>    ( (main.Queue = '9') )
>>>AND (
>>>        17417 in (select links.localtarget from links where links.type='MemberOf' and main.id=links.localbase)
>>>        or
>>>        17417 in ( select links.localbase from links where links.type='MemberOf' and main.id=links.localtarget)
>>>      or
>>>        main.id = '17417'
>>>    )
>>>  ;
>>>
>>>The time for the query goes from 1500ms to 15ms. The two OR clauses
>>>
>>>        ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) )
>>>      OR
>>>        ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) )
>>>
>>>don't contribute to the result set in this particular dataset, which is why the speed increases so dramatically.
>>>
>>>Is there a way to rewrite the top query to get the same results? I have already talked to Best Practical,
>>>and subqueries are not easily embraced.
>>>
>>>Dave
>>

pgsql-performance by date:

Previous
From: Dave Cramer
Date:
Subject: Re: help with query
Next
From: Josh Berkus
Date:
Subject: Re: Help specifying new machine