Thread: OR clause causing strange index performance

OR clause causing strange index performance

From
Doug Y
Date:
Hello,  For the following query:

SELECT *  FROM permissions p       INNER JOIN users u               ON u.id = p.id       LEFT JOIN user_list ul1
     ON ul1.id = u.id                 AND ul1.type = '1'       LEFT JOIN user_list ul2              ON ul2.id = u.id
            AND ul2.type = '2'       INNER JOIN lists l               ON ( l.list_id1 = ul1.list_id1 AND l.list_id2 =
ul1.list_id2)                    OR                  ( l.list_id1 = ul2.list_id1 AND l.list_id2 = ul2.list_id2 ) WHERE
    p.code = '123456' AND p.type = 'User'
 

(lists table has ~ 500k records, users ~ 100k, permissions ~ 60k, user_list 
~ 530k)

lists can be associated with 2 users via the user_list table, and are 
designated by the 1 or 2, can have a user with a 1, a user with a 2 or one 
of each.

I'm getting really poor performance... about 60 seconds. Explain (see 
below) is showing its trying to use the pkey (list_id1,list_id2) on the 
list table, but not showing an index condition.

If I get rid of the OR, and only at one of the conditions it returns very 
quickly and properly set the index condition. I can't use a union because I 
would end up with duplicate rows for those that have both ul type 1 and 2

I actually started off trying the query by looking at lists first, but 
performance was awful since I can't narrow down the records like I can with 
permissions.

I know the tables aren't really set up ideally, and I actually have to join 
a few more tables to the lists table after the fact, but want to get the 
base running as efficient as possible first.

Is there any way to get this query to use the correct index condition so 
that it runs in a reasonable amount of time?

Thanks!

EXPLAIN with the OR                                                         QUERY 
PLAN

----------------------------------------------------------------------------------------------------------------------------
NestedLoop  (cost=0.00..13051262.13 rows=1 width=1794)   Join Filter: ((("inner".list_id1 = "outer".list_id1) OR 
 
("inner".list_id1 = "outer".list_id1)) AND (("inner".list_id2 = 
"outer".list_id2) OR ("inner".list_id1 = "outer".list_id1)) AND 
(("inner".list_id1 = "outer".list_id1) OR ("inner".list_id2 = 
"outer".list_id2)) AND (("inner".list_id2 = "outer".list_id2) OR 
("inner".list_id2 = "outer".list_id2)))   ->  Nested Loop  (cost=0.00..2654.08 rows=12 width=1087)         Join Filter:
("inner".type= '2'::character varying)         ->  Nested Loop  (cost=0.00..427.39 rows=12 width=1032)
JoinFilter: ("inner".type = '1'::character varying)               ->  Nested Loop  (cost=0.00..23.82 rows=2 width=977)
                  ->  Index Scan using permissions_pkey on permissions 
 
p  (cost=0.00..12.14 rows=2 width=476)                           Index Cond: ((code = '123456'::character 
varying) AND (type = 'User'::character varying))                     ->  Index Scan using users_pkey on users 
u  (cost=0.00..4.92 rows=1 width=501)                           Index Cond: (u.id = "outer".id)               ->  Index
Scanusing user_list_id on user_list 
 
ul1  (cost=0.00..159.86 rows=37 width=55)                     Index Cond: (ul1.id = "outer".id)         ->  Index Scan
usinguser_list_id on user_list 
 
ul2  (cost=0.00..159.86 rows=37 width=55)               Index Cond: (ul2.id = "outer".id)   ->  Seq Scan on lists 1
(cost=0.00..26103.61rows=508361 width=707)
 
(16 rows)

Note: this example shows it trying a seq scan.. I've tried it with 
enable_seqscan off, too. When I referred above to it trying to use index 
scan, it was from an explain with an additional join to the lists table after:
->  Index Scan using lists_pkey on lists l  (cost=0.00..1872375.82 
rows=508361 width=144)


EXPLAIN without the OR                                                         QUERY 
PLAN

----------------------------------------------------------------------------------------------------------------------------
NestedLoop  (cost=0.00..2740.09 rows=17 width=1794)   ->  Nested Loop  (cost=0.00..2654.08 rows=12 width=1087)
JoinFilter: ("inner".type = '2'::character varying)         ->  Nested Loop  (cost=0.00..427.39 rows=12 width=1032)
         Join Filter: ("inner".type = '1'::character varying)               ->  Nested Loop  (cost=0.00..23.82 rows=2
width=977)                    ->  Index Scan using permissions_pkey on permissions 
 
p  (cost=0.00..12.14 rows=2 width=476)                           Index Cond: ((code = '123456'::character 
varying) AND (type = 'User'::character varying))                     ->  Index Scan using users_pkey on users 
u  (cost=0.00..4.92 rows=1 width=501)                           Index Cond: (u.id = "outer".id)               ->  Index
Scanusing user_list_id on user_list 
 
ul1  (cost=0.00..159.86 rows=37 width=55)                     Index Cond: (ul1.id = "outer".id)         ->  Index Scan
usinguser_list_id on user_list 
 
ul2  (cost=0.00..159.86 rows=37 width=55)               Index Cond: (ul2.id = "outer".id)   ->  Index Scan using
list_pkeyon lists 1  (cost=0.00..6.40 rows=1 
 
width=707)         Index Cond: ((l.list_id1 = "outer".list_id1) AND (l.list_id2 = 
"outer".list_id2))
(16 rows)




Re: OR clause causing strange index performance

From
Doug Y
Date:
Sorry,  I just realized that my logic for the query is flawed anyway. It won't 
return the proper data set I'm after. I'll have to go back to looking at 
the lists table first.
  I still guess knowing why the query below isn't as quick as expected 
could be useful though.

At 01:32 PM 5/20/2004, Doug Y wrote:
>Hello,
>   For the following query:
>
>SELECT *
>   FROM permissions p
>        INNER JOIN users u
>                ON u.id = p.id
>        LEFT JOIN user_list ul1
>               ON ul1.id = u.id
>                  AND ul1.type = '1'
>        LEFT JOIN user_list ul2
>               ON ul2.id = u.id
>                  AND ul2.type = '2'
>        INNER JOIN lists l
>                ON ( l.list_id1 = ul1.list_id1 AND l.list_id2 = ul1.list_id2 )
>                     OR
>                   ( l.list_id1 = ul2.list_id1 AND l.list_id2 = ul2.list_id2 )
>  WHERE
>        p.code = '123456' AND p.type = 'User'
>
>(lists table has ~ 500k records, users ~ 100k, permissions ~ 60k, 
>user_list ~ 530k)
>
>lists can be associated with 2 users via the user_list table, and are 
>designated by the 1 or 2, can have a user with a 1, a user with a 2 or one 
>of each.
>
>I'm getting really poor performance... about 60 seconds. Explain (see 
>below) is showing its trying to use the pkey (list_id1,list_id2) on the 
>list table, but not showing an index condition.
>
>If I get rid of the OR, and only at one of the conditions it returns very 
>quickly and properly set the index condition. I can't use a union because 
>I would end up with duplicate rows for those that have both ul type 1 and 2
>
>I actually started off trying the query by looking at lists first, but 
>performance was awful since I can't narrow down the records like I can 
>with permissions.
>
>I know the tables aren't really set up ideally, and I actually have to 
>join a few more tables to the lists table after the fact, but want to get 
>the base running as efficient as possible first.
>
>Is there any way to get this query to use the correct index condition so 
>that it runs in a reasonable amount of time?
>
>Thanks!

- cut explains off - 



Re: OR clause causing strange index performance

From
Tom Lane
Date:
Doug Y <dylists@ptd.net> writes:
> Is there any way to get this query to use the correct index condition so 
> that it runs in a reasonable amount of time?

I think CVS tip (7.5-to-be) would handle this better, but it's hard to
be sure since you didn't provide a self-contained test case.
        regards, tom lane


Re: OR clause causing strange index performance

From
"Atesz"
Date:
Hi!

I read your JOIN - Index Scaning - OR problem. I don't understand why
you decomposed JOINs two brach (ul1 and ul2).
If I understand your problem well I can suggest the next idea for your
QUERY (you don't need two branch):

SELECT *  FROM  permissions p INNER JOIN users u       ON u.id  = p.id INNER JOIN user_list ul  ON ul.id = u.id INNER
JOINlists l          ON ( l.list_id1 = ul.list_id1 AND
 
l.list_id2 = ul.list_id2 )
WHERE  (ul.type = '1' OR ul.type= '2') and p.code = '123456' AND p.type =
'User';

If ul.type field is integer you can optimze the OR (which can cause
index scan problem and low performance) with BETWEEN:

SELECT *  FROM  permissions p INNER JOIN users u       ON u.id  = p.id INNER JOIN user_list ul  ON ul.id = u.id INNER
JOINlists l          ON ( l.list_id1 = ul.list_id1 AND
 
l.list_id2 = ul.list_id2 )
WHERE  ul.type BETWEEN 1 AND 2   and p.code = '123456' AND p.type = 'User';

After that you need some good index on ul.type, p.code and p.type. You
have to think about creating indices. Analyse the results of explain!!!
In my opinion this solution may be very fast.

Regards,
Antal Attila


-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Doug Y
Sent: Thursday, May 20, 2004 7:32 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] OR clause causing strange index performance


SELECT *  FROM permissions p       INNER JOIN users u               ON u.id = p.id       LEFT JOIN user_list ul1
     ON ul1.id = u.id                 AND ul1.type = '1'       LEFT JOIN user_list ul2              ON ul2.id = u.id
            AND ul2.type = '2'       INNER JOIN lists l               ON ( l.list_id1 = ul1.list_id1 AND l.list_id2 =
 
ul1.list_id2 )                    OR                  ( l.list_id1 = ul2.list_id1 AND l.list_id2 =
ul2.list_id2 ) WHERE       p.code = '123456' AND p.type = 'User'