Thread: inner join removal
So I sat down to work on some code to do inner join removal, and quickly got stuck. As a first step, I wanted to write some code that would identify inner joins that could be treated as LEFT JOINs if we so chose. Consider: SELECT 1 FROM foo, bar WHERE foo.x = bar.x; If it so happens that there is a left join constraint from either foo (x) or bar (x) referencing the other one, then the referenced column has a unique index on it; if, further, the referencing column is not null, then every row in the referencing table will have exactly one partner in the referenced table. Thus, the join can be implemented as a LEFT JOIN without changing the results. In this case, that also means the join can be removed, but it can be a win anyway. Consider: SELECT * FROM foo LEFT JOIN (bar JOIN baz ON bar.y = baz.y) ON foo.x = bar.x; If foo is itty bitty and bar and baz are enormous, it would be nice to start by joining foo to bar and then joining the result to baz, but that's not legal. However, if bar (y) references baz (y) and bar.y is not null, then the inner join is equivalent to a left join and it's OK to commute them. Anyway, I didn't get near as far as actually trying to implement this fancy footwork because I got stuck on a much simpler problem. I thought that it would make sense to iterate through all the relations in the query and look at the join clauses for each one. So in the above query, for example, I was expecting that when I looked at the RelOptInfo for baz, its joinlist would contain bar.y = baz.y. It doesn't, because bar.y = baz.y gets eaten by the equivalence class machinery. Essentially, what I want to do is ask, for each relation, whether there is exactly one other relation that gets joined to it, and then if that proves to be the case, get a list of all the join clauses and examine them. But I can't figure out how to do that. Help? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > Consider: > SELECT * FROM foo LEFT JOIN (bar JOIN baz ON bar.y = baz.y) ON foo.x = bar.x; > If foo is itty bitty and bar and baz are enormous, it would be nice to > start by joining foo to bar and then joining the result to baz, but > that's not legal. However, if bar (y) references baz (y) and bar.y is > not null, then the inner join is equivalent to a left join and it's OK > to commute them. I think you're going at this in the wrong place. It'd likely work better to identify this situation while building the SpecialJoinInfo structs describing the join order constraints, and mark the constraints appropriately. In fact, I'm not convinced that "convert the inner join to a left join" is even the right way to think about the problem, because if you fail to get a win from it then you have likely made things worse not better, by adding a join order constraint that wasn't there before. I think it might work out better if you ask "what additional conditions are needed in order to prove that this inner join can commute with this left join", and then work at being able to prove that. (It's entirely likely that the planner isn't currently gathering the right information for solving that problem.) regards, tom lane
On Thu, Jul 8, 2010 at 2:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Consider: > >> SELECT * FROM foo LEFT JOIN (bar JOIN baz ON bar.y = baz.y) ON foo.x = bar.x; > >> If foo is itty bitty and bar and baz are enormous, it would be nice to >> start by joining foo to bar and then joining the result to baz, but >> that's not legal. However, if bar (y) references baz (y) and bar.y is >> not null, then the inner join is equivalent to a left join and it's OK >> to commute them. > > I think you're going at this in the wrong place. It'd likely work > better to identify this situation while building the SpecialJoinInfo > structs describing the join order constraints, and mark the constraints > appropriately. I'll take a look at that. > In fact, I'm not convinced that "convert the inner join > to a left join" is even the right way to think about the problem, > because if you fail to get a win from it then you have likely made > things worse not better, by adding a join order constraint that wasn't > there before. Yeah, I'm aware of that problem, although I haven't figured out exactly what to do about it. I do realize we can't afford lossage in that situation. There are actually possible wins from transforming an inner join into a left join OR a left join into an inner join, so it's obviously not right to transform blindly. > I think it might work out better if you ask "what > additional conditions are needed in order to prove that this inner join > can commute with this left join", and then work at being able to prove > that. (It's entirely likely that the planner isn't currently gathering > the right information for solving that problem.) We have to avoid putting much of anything into the critical path where we're trying out different join orders - we want to figure it out earlier and, if possible, by examining each relation just once. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jul 8, 2010 at 2:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think it might work out better if you ask "what >> additional conditions are needed in order to prove that this inner join >> can commute with this left join", and then work at being able to prove >> that. �(It's entirely likely that the planner isn't currently gathering >> the right information for solving that problem.) > We have to avoid putting much of anything into the critical path where > we're trying out different join orders - we want to figure it out > earlier and, if possible, by examining each relation just once. Right, but look for example at the logic involved with the current outer join transformation identity #3, which can't be applied unless the join predicate is strict for the left-hand side. We avoid doing the dogwork to test that more than once. I'm hoping that something similar will work for this problem. But it's way premature to say much about that without a mathematical characterization of the extra conditions needed to make the joins commutable. regards, tom lane
I think it might work out better if you ask "what > additional conditions are needed in order to prove that this inner join > can commute with this left join", and then work at being able to prove > that. (It's entirely likely that the planner isn't currently gathering > the right information for solving that problem.) I also see this falling into a whole class of issues where the planner could utilize the existance of FKs to plan queries better. That might be a better first step. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Thu, Jul 8, 2010 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jul 8, 2010 at 2:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I think it might work out better if you ask "what >>> additional conditions are needed in order to prove that this inner join >>> can commute with this left join", and then work at being able to prove >>> that. (It's entirely likely that the planner isn't currently gathering >>> the right information for solving that problem.) > >> We have to avoid putting much of anything into the critical path where >> we're trying out different join orders - we want to figure it out >> earlier and, if possible, by examining each relation just once. > > Right, but look for example at the logic involved with the current outer > join transformation identity #3, which can't be applied unless the join > predicate is strict for the left-hand side. We avoid doing the dogwork > to test that more than once. <admits ignorance> Uh... where exactly is that logic? I've been looking for it for 2 years and haven't found it yet. > I'm hoping that something similar will > work for this problem. But it's way premature to say much about that > without a mathematical characterization of the extra conditions needed > to make the joins commutable. I wrote a previous email on this topic which might be a start in that direction. http://archives.postgresql.org/pgsql-hackers/2009-10/msg01012.php -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jul 8, 2010 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Right, but look for example at the logic involved with the current outer >> join transformation identity #3, which can't be applied unless the join >> predicate is strict for the left-hand side. �We avoid doing the dogwork >> to test that more than once. > Uh... where exactly is that logic? I've been looking for it for 2 > years and haven't found it yet. Look at the code that deals with SpecialJoinInfo.lhs_strict. There isn't very much of it (which is precisely my point ...) regards, tom lane