Thread: full outer join performance
Are full outer joins expected to perform much worse than inner joins? I'm seeing 2 orders of magnitude difference for an almost identical query. (Well, as "identical" as you can get, comparing a query with an outer join to one without.) This is on 8.0.3, recently analyzed. Here are the explain plans: music=# explain select music-# extractbasenamefrompath(files.path),trms.trm music-# from files,nodes music-# full outer join trms on (trms.id = nodes.trm) music-# where music-# nodes.fileid = files.id and music-# extractbasenamefrompath(files.path) = 'Joy Division/Substance/'; QUERY PLAN --------------------------------------------------------------------------------------------- Hash Join (cost=3932.55..11891.05 rows=18 width=117) Hash Cond: ("outer".fileid = "inner".id) -> Hash Left Join (cost=3867.51..11391.65 rows=86827 width=44) Hash Cond: ("outer".trm = "inner".id) -> Seq Scan on nodes (cost=0.00..1557.27 rows=86827 width=8) -> Hash (cost=2867.21..2867.21 rows=88521 width=44) -> Seq Scan on trms (cost=0.00..2867.21 rows=88521 width=44) -> Hash (cost=64.99..64.99 rows=18 width=81) -> Index Scan using basename_idx on files (cost=0.00..64.99 rows=18 width=81) Index Cond: (extractbasenamefrompath(path) = 'Joy Division/Substance/'::text) (10 rows) music=# explain select music-# extractbasenamefrompath(files.path),trms.trm music-# from music-# nodes,files,trms music-# where music-# nodes.fileid = files.id and music-# nodes.trm = trms.id and music-# extractbasenamefrompath(files.path) = 'Joy Division/Substance/'; QUERY PLAN --------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..202.04 rows=18 width=117) -> Nested Loop (cost=0.00..119.46 rows=18 width=81) -> Index Scan using basename_idx on files (cost=0.00..64.99 rows=18 width=81) Index Cond: (extractbasenamefrompath(path) = 'Joy Division/Substance/'::text) -> Index Scan using nodes_fileid_idx on nodes (cost=0.00..3.01 rows=1 width=8) Index Cond: (nodes.fileid = "outer".id) -> Index Scan using trms_pkey on trms (cost=0.00..4.57 rows=1 width=44) Index Cond: ("outer".trm = trms.id) (8 rows)
Ben <bench@silentmedia.com> writes: > Are full outer joins expected to perform much worse than inner joins? You're getting burnt by the outer join forcing a bad join order. There's some related discussion here: http://www.postgresql.org/docs/8.0/static/explicit-joins.html regards, tom lane
Hrm, as I understand that page, there's not much that can be done about the join order for outer joins. At least, not when there's on 3 tables and 1 outer join involved. Am I missing something? Tom Lane wrote: >Ben <bench@silentmedia.com> writes: > > >>Are full outer joins expected to perform much worse than inner joins? >> >> > >You're getting burnt by the outer join forcing a bad join order. >There's some related discussion here: >http://www.postgresql.org/docs/8.0/static/explicit-joins.html > > regards, tom lane > >
Ben <bench@silentmedia.com> writes: > Hrm, as I understand that page, there's not much that can be done about > the join order for outer joins. At least, not when there's on 3 tables > and 1 outer join involved. Am I missing something? Without knowing what you want the query to do, it's difficult to say if it's OK to rearrange the join order. Frequently it's OK to rearrange the join order even without risking any change in the query results, but PG doesn't currently have any smarts about that, so you have to do it manually by changing the query. regards, tom lane
Scott Marlowe <smarlowe@g2switchworks.com> writes: > Tom, would that help the planner make better choices for this kind of > query? I think the problem was that he had select ... from a, b full join c on ... where ... where table b is big and you only need a few rows from it, so it really needs to be joined last, but the above forced doing it first. It wasn't clear to me why he wanted the full join at all (in fact, you could see from the plan that the planner had been able to reduce it to a left join because there were WHERE clauses that'd discard one set of null-extended rows anyway). Without knowing that, it's hard to say whether there's another way to get what he wants. regards, tom lane
On Tue, 2005-09-13 at 13:28, Ben wrote: > Hrm, as I understand that page, there's not much that can be done about > the join order for outer joins. At least, not when there's on 3 tables > and 1 outer join involved. Am I missing something? You might be able to do some kind of thing like: select * from ( select * from table1 <whereclausehere>) as a ) left join ( select * from table2 <whereclausehere) as b on (a.somefield=b.somefield2) where <anotherwhereclause) or something. Tom, would that help the planner make better choices for this kind of query?
I see. I think I'm going to restructure my logic so I won't have to use the outer join after all, but thanks for the pointers anyway. Tom Lane wrote: >Ben <bench@silentmedia.com> writes: > > >>Hrm, as I understand that page, there's not much that can be done about >>the join order for outer joins. At least, not when there's on 3 tables >>and 1 outer join involved. Am I missing something? >> >> > >Without knowing what you want the query to do, it's difficult to say if >it's OK to rearrange the join order. > >Frequently it's OK to rearrange the join order even without risking any >change in the query results, but PG doesn't currently have any smarts >about that, so you have to do it manually by changing the query. > > regards, tom lane > >
Tom Lane wrote: >I think the problem was that he had > > select ... from a, b full join c on ... where ... > >where table b is big and you only need a few rows from it, so it really >needs to be joined last, but the above forced doing it first. It wasn't >clear to me why he wanted the full join at all (in fact, you could see >from the plan that the planner had been able to reduce it to a left join >because there were WHERE clauses that'd discard one set of null-extended >rows anyway). Without knowing that, it's hard to say whether there's >another way to get what he wants. > > > Without the outer join, the output was missing the joined rows from a and b when there was no matching row in c. (Tables a and c both referenced b.) I suppose I could have used a left join instead of a full join, but either way the results were much slower than an inner join, and I was able to redo some other logic to let an inner join work just fine.