Re: help with query - Mailing list pgsql-performance

From Brad Bulger
Subject Re: help with query
Date
Msg-id 4124AD68.8000208@madfish.com
Whole thread Raw
In response to help with query  (Dave Cramer <pg@fastcrypt.com>)
Responses Re: help with query
Re: help with query
List pgsql-performance
You're doing a join except not, is the trouble, looks like. The query is really
"FROM Tickets main, Links", but when Tickets.id is 17417, you've got no join
to the Links table. So you end up getting every row in Links for each row in
Tickets with id = 17417.

I'd think this wants to be two queries or a union:

SELECT distinct main.oid,main.* FROM Tickets main
WHERE (main.EffectiveId = main.id)
AND (main.Status != 'deleted')
AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND ( (main.Queue = '9') )
AND ( (main.id = '17417'))
union
SELECT distinct main.oid,main.* FROM Tickets main, Links
WHERE (main.EffectiveId = main.id)
AND (main.Status != 'deleted')
AND ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
AND ( (main.Queue = '9') )
AND ( (Links.Type = 'MemberOf') )
AND ( (Links.LocalTarget = '17417') )
AND ( (main.id = Links.LocalBase) ) OR (main.id = Links.LocalTarget) )
;

or else, yah, a subquery:

[...]
AND (
  main.id = '17417'
  or
  exists(
    select true from Links
    where Type = 'MemberOf' and LocalTarget = '17417'
     and (LocalBase = main.id or LocalTarget = main.id)
  )
)

Those are the only things I can think of to make it work, anyways.

Dave Cramer wrote:

> RT uses a query like:
>
> SELECT distinct main.oid,main.* FROM Tickets main
> WHERE
> (main.EffectiveId = main.id)
> AND
> (main.Status != 'deleted')
> AND
>     ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> AND
>     ( (main.Queue = '9') )
> AND ((
>         ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) )
>       OR
>         ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) )
>       or
>         (main.id = '17417')
>      )
>   );
>
>
> which produces a query plan:
>
> Nested Loop  (cost=0.00..813.88 rows=1 width=169)
>    Join Filter: (((("inner"."type")::text = 'MemberOf'::text) OR ("outer".id = 17417)) AND (("inner".localbase =
17417)OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id 
> = 17417)) AND (("outer".id = "inner".localtarget) OR (("inner"."type")::text = 'MemberOf'::text) OR ("outer".id =
17417))AND ((("inner"."type")::text = 'MemberOf'::text) OR ("inner" 
> .localtarget = 17417) OR ("outer".id = 17417)) AND (("inner".localbase = 17417) OR ("inner".localtarget = 17417) OR
("outer".id= 17417)) AND (("outer".id = "inner".localtarget) OR ( 
> "inner".localtarget = 17417) OR ("outer".id = 17417)) AND ((("inner"."type")::text = 'MemberOf'::text) OR ("outer".id
="inner".localbase) OR ("outer".id = 17417)) AND (("inner".loca 
> lbase = 17417) OR ("outer".id = "inner".localbase) OR ("outer".id = 17417)) AND (("outer".id = "inner".localtarget)
OR("outer".id = "inner".localbase) OR ("outer".id = 17417))) 
>    ->  Index Scan using tickets1 on tickets main  (cost=0.00..657.61 rows=1 width=169)
>          Index Cond: (queue = 9)
>          Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND ((("type")::text = 'ticket'::text)
OR(("type")::text = 'subticket'::text))) 
>    ->  Seq Scan on links  (cost=0.00..46.62 rows=1462 width=20)
>
> If I rewrite the query as:
>
> SELECT main.* FROM Tickets main
> WHERE
> (main.EffectiveId = main.id)
> AND
> (main.Status != 'deleted')
> AND
>     ( (main.Type = 'ticket') OR (main.Type = 'subticket') )
> AND
>     ( (main.Queue = '9') )
> AND (
>         17417 in (select links.localtarget from links where links.type='MemberOf' and main.id=links.localbase)
>         or
>         17417 in ( select links.localbase from links where links.type='MemberOf' and main.id=links.localtarget)
>       or
>         main.id = '17417'
>     )
>   ;
>
> The time for the query goes from 1500ms to 15ms. The two OR clauses
>
>         ( (Links.Type = 'MemberOf') AND (Links.LocalTarget = '17417') AND (main.id = Links.LocalBase) )
>       OR
>         ( (Links.Type = 'MemberOf') AND (Links.LocalBase = '17417') AND (main.id = Links.LocalTarget) )
>
> don't contribute to the result set in this particular dataset, which is why the speed increases so dramatically.
>
> Is there a way to rewrite the top query to get the same results? I have already talked to Best Practical,
> and subqueries are not easily embraced.
>
> Dave

pgsql-performance by date:

Previous
From: Dave Cramer
Date:
Subject: help with query
Next
From: Dave Cramer
Date:
Subject: Re: help with query