Thread: Inner join on two OR conditions dont use index
Hi all i dont know if this is normal, but if yes i would like to know why and how I could do it another way other than using unions. (I tried on postgresql 7.4 and 8.0.3, made my vacuum analyse just before) Here is my simple query: select * from rt_node n, rt_edge e where node_id = 2 and e.start_node_id = n.node_id; which give me the following query plan: Nested Loop (cost=0.00..79.46 rows=24 width=60) -> Index Scan using rt_node_pkey on rt_node n (cost=0.00..5.94 rows=1 width=36) Index Cond: (node_id = 2) -> Index Scan using rt_edge_start_node on rt_edge e (cost=0.00..73.28 rows=24 width=24) Index Cond: (start_node_id = 2) But if I plug another condition with a OR like this: select * from rt_node n, rt_edge e where node_id = 2 and (e.start_node_id = n.node_id or e.end_node_id = n.node_id); I get this plan, it stop using the index!: Nested Loop (cost=0.00..158.94 rows=4 width=60) Join Filter: (("inner".start_node_id = "outer".node_id) OR ("inner".end_node_id = "outer".node_id)) -> Index Scan using rt_node_pkey on rt_node n (cost=0.00..5.94 rows=1 width=36) Index Cond: (node_id = 2) -> Seq Scan on rt_edge e (cost=0.00..81.60 rows=4760 width=24) I tried SET enable_seqscan = OFF and it give me this (woah) : Nested Loop (cost=100000000.00..100000158.94 rows=4 width=60) Join Filter: (("inner".start_node_id = "outer".node_id) OR ("inner".end_node_id = "outer".node_id)) -> Index Scan using rt_node_pkey on rt_node n (cost=0.00..5.94 rows=1 width=36) Index Cond: (node_id = 2) -> Seq Scan on rt_edge e (cost=100000000.00..100000081.60 rows=4760 width=24) These are my tables definitions: CREATE TABLE rt_node ( node_id INTEGER PRIMARY KEY ); CREATE TABLE rt_edge ( edge_id INTEGER PRIMARY KEY, start_node_id INTEGER NOT NULL, end_node_id INTEGER NOT NULL, CONSTRAINT start_node_ref FOREIGN KEY (start_node_id) REFERENCES rt_node(node_id), CONSTRAINT end_node_ref FOREIGN KEY (end_node_id) REFERENCES rt_node(node_id) ); CREATE INDEX rt_edge_start_node ON rt_edge(start_node_id); CREATE INDEX rt_edge_end_node ON rt_edge(end_node_id); I cant figure why it cant use my index I know I can use a UNION instead on two query like the first one only different on "start_node_id"/"end_node_id", and it works, but this is a part of a bigger query which is already ugly and I hate using 5 lines for something I could in 5 words. thank you!
Jocelyn Turcotte wrote: >Hi all >i dont know if this is normal, but if yes i would like to know why and >how I could do it another way other than using unions. > > The only thing that *might* work is if you used an index on both keys. So if you did: CREATE INDEX rt_edge_start_end_node ON rt_edge(start_node_id,end_node_id); The reason is that in an "OR" construct, you have to check both for being true. So in the general case where you don't knowthe correlation between the columns, you have to check all of the entries, because even if you know the status of oneside of the OR, you don't know the other. Another possibility would be to try this index: CREATE INDEX rt_edge_stare_or_end ON rt_edge(start_node_id OR end_node_id); I'm not sure how smart the planner can be, though. John =:->
Attachment
Thanks John it dont seems to work, but in my context I only needed data from the rt_node table so I tried this: select * from rt_node n where node_id = 2 and exists (select edge_id from rt_edge where start_node_id = n.node_id or end_node_id = n.node_id) and it gave me this plan (even if I remove the stupid node_id = 2 condition): Index Scan using rt_node_pkey on rt_node n (cost=0.00..6.15 rows=1 width=25) Index Cond: (node_id = 2) Filter: (subplan) SubPlan -> Index Scan using rt_edge_start_node, rt_edge_end_node on rt_edge (cost=0.00..12.56 rows=4 width=4) Index Cond: ((start_node_id = $0) OR (end_node_id = $0)) this time it use my two indexes, maybe because he know that the same value is compared in the two condition... I should ask my mother if she got an idea, mothers know a lot of stuff! On 5/25/05, John A Meinel <john@arbash-meinel.com> wrote: > Jocelyn Turcotte wrote: > > >Hi all > >i dont know if this is normal, but if yes i would like to know why and > >how I could do it another way other than using unions. > > > > > > The only thing that *might* work is if you used an index on both keys. > So if you did: > > CREATE INDEX rt_edge_start_end_node ON rt_edge(start_node_id,end_node_id); > > The reason is that in an "OR" construct, you have to check both for being true. So in the general case where you don'tknow the correlation between the columns, you have to check all of the entries, because even if you know the statusof one side of the OR, you don't know the other. > > Another possibility would be to try this index: > > CREATE INDEX rt_edge_stare_or_end ON rt_edge(start_node_id OR end_node_id); > > I'm not sure how smart the planner can be, though. > > John > =:-> > > > > >
Jocelyn Turcotte <turcotte.j@gmail.com> writes: > But if I plug another condition with a OR like this: > select * > from rt_node n, rt_edge e > where node_id = 2 > and (e.start_node_id = n.node_id or e.end_node_id = n.node_id); > I get this plan, it stop using the index!: I'm afraid you're stuck with faking it with a UNION for now; the current planner is incapable of recognizing that a join OR condition can be handled with OR indexscans. FWIW, this is fixed for 8.1 --- in CVS tip, I get this from your example: QUERY PLAN -------------------------------------------------------------------------------------------------- Nested Loop (cost=2.06..17.57 rows=2 width=16) -> Index Scan using rt_node_pkey on rt_node n (cost=0.00..4.82 rows=1 width=4) Index Cond: (node_id = 2) -> Bitmap Heap Scan on rt_edge e (cost=2.06..12.47 rows=18 width=12) Recheck Cond: ((e.start_node_id = "outer".node_id) OR (e.end_node_id = "outer".node_id)) -> BitmapOr (cost=2.06..2.06 rows=18 width=0) -> Bitmap Index Scan on rt_edge_start_node (cost=0.00..1.03 rows=9 width=0) Index Cond: (e.start_node_id = "outer".node_id) -> Bitmap Index Scan on rt_edge_end_node (cost=0.00..1.03 rows=9 width=0) Index Cond: (e.end_node_id = "outer".node_id) (10 rows) (This is with no data in the tables, so the cost estimates are small, but it does show that the planner knows how to generate this kind of query plan now.) regards, tom lane