Re: Improving our clauseless-join heuristics - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Improving our clauseless-join heuristics |
Date | |
Msg-id | CA+TgmoYDinnsLBkkr+YbpsO1hurVGk5+C256o-XGX8VqH27mow@mail.gmail.com Whole thread Raw |
In response to | Improving our clauseless-join heuristics (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Improving our clauseless-join heuristics
Re: Improving our clauseless-join heuristics |
List | pgsql-hackers |
On Fri, Apr 13, 2012 at 10:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I looked into the behavior complained of here: > http://archives.postgresql.org/pgsql-performance/2012-04/msg00093.php > > The problem query can be abstracted to > > select * from a, b, c, d > where a.x = b.y and (a.z = c.c or a.z = d.d) > > Table a is much larger than the others (in fact, in the given example > c and d are known to be single rows), and there are indexes on the > mentioned columns of a. In this situation, the best plan is to > cross-join c and d, then use a BitmapOr indexscan to pick out the rows > of a that satisfy the OR condition, and finally join that small number > of rows to b. The planner will use a cross-join-first plan if we omit > b and the first WHERE clause from the query; but in the query as given, > it fails to discover that plan and falls back on a vastly inferior plan > that involves forming the a/b join first. > > The reason for this behavior is the anti-clauseless-join heuristics in > join_search_one_level(). Without b, there are no join clauses available > at join level 2, so the planner is forced to form all three 2-way cross > joins; and then at level 3 it finds out that joining a to c/d works > well. With b, we find the a/b join has a usable join clause so we form > that join, and then we decide not to make any 2-way clauseless joins. > So the c/d join is never constructed and there is no way to exploit the > desirable indexscan at higher levels. > > After some reflection I think that the blame should be pinned on > have_relevant_joinclause(), which is essentially defined as "is there > any join clause that can be evaluated at the join of these two > relations?". I think it would work better to define it as "is there any > join clause that both these relations participate in?". In the majority > of real-world queries, join clauses relate exactly two relations, so > that these two definitions are equivalent. However, when we do have > join clauses involving 3 or more relations, such as the OR clause in > this example, it's evidently useful to consider cross-product joins of > the smaller relations so that the join clause can be applied during the > scan of the largest table. > > It would probably not be a good idea to back-patch such a change, since > it might have consequences I can't foresee at the moment. But I'm > strongly tempted to squeeze it into 9.2. Thoughts? I think it's getting a little late in the day to be whacking the planner around too much, but I have to admit that seems like a pretty good and safe change to me, so maybe we should go ahead and do it. I'm a bit worried, though, that with all the planner changes this release we are going to spend a lot of time tracking down regressions either in planning time or in plan quality. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: