Proposal for supporting outer joins in 7.1 - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Proposal for supporting outer joins in 7.1 |
Date | |
Msg-id | 4429.967234022@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Proposal for supporting outer joins in 7.1
Re: Proposal for supporting outer joins in 7.1 |
List | pgsql-hackers |
I would like to see if we can't get some amount of OUTER JOIN functionality into the 7.1 release. It seems that the executor implementation should be pretty simple, at least for the left-join case (which is what people appear to need the most). What we are mostly missing is a suitable parsetree representation plus appropriate handling in the planner. Right now, the rangetable associated with a query serves two purposes. First, it provides a list of all relations used in the query for Var nodes to refer to (a Var's "varno" field is just an index into the rangetable list). Second, it drives the planner's construction of the join tree for the query (all entries that are marked inJoinSet will be added to the join tree in some order). I think what we want to do is separate these two functions into two data structures. The rangetable list is just fine for looking up Var references, but it's not sufficient information for representing multiple types of joins. A ready-to-plan Query node should have an additional subsidiary data structure that describes the required join tree of relations. The additional data structure could really be pretty close to the raw parse structure that the grammar emits for FROM clauses. I envision it as a tree whose leaf nodes are references to rangetable entries and whose upper-level nodes correspond to JOIN clauses (each JOIN node has pointers to its child nodes plus information about the specified join type and any supplied constraint conditions). If the FROM clause has more than one item, the top level of the join tree is a CROSS JOIN node with all the FROM-clause items as children. (Note that at least for this case, a join node needs to be able to have more than two children.) If anything, this structure should be easier for the parser/analyzer to emit than what it's doing now. Notice in particular that ON/USING qual clauses associated with joins are to be left in the join tree, *not* merged into the main WHERE clause. (The planner would just have to separate them out again and discover which join they should go with, so what's the point of merging them into WHERE?) I envision the planner as operating by walking this join tree and considering plans that perform each specified join. At nodes with more than two children (such as the top-level CROSS JOIN from a many-element FROM clause) the planner will consider all possible orders for joining the children together, the same as it does now. Note that this implementation implies that the user can guide/constrain the planner by writing an appropriate FROM clause. For example: SELECT ... FROM a,b,cPlanner will consider all three possible join orders:(a cross join b) cross join c(a cross join c) crossjoin b(b cross join c) cross join a SELECT ... FROM (a cross join b) cross join cPlanner will only consider joining a with b and then adding c. This might be construed as either a feature or a bug depending on your views about how much responsibility to put on the planner vs. the user. In any case it seems like a reasonable first-cut implementation; we can think about improvements later. Thoughts, objections, better ideas? One thing I'm not very clear on is how much has been done in the parser about the ISO rules for visibility and aliasing of table columns in explicit JOINs. Thomas, can you sketch out where we stand in that area? regards, tom lane
pgsql-hackers by date: