Thread: Joins on inherited tables
Hi, In some situations, it looks like the optimizer does not chose efficient paths for joining inherited tables. For I created a rather trivial formulation to serve as an example. I created the table 'numbers' comprising of the columns id (int) and value (text). I also created the table 'evens' and 'odds' that inherit numbers, with no additional columns. Into 'evens' I placed 50000 entries, each one with an even (unique) id and random 'value'. Likewise, for 'odds', I created 50000 odd (and unique) id fields id fields and random 'value', and created index on all ID fields of every table that has any rows (and analyzed). so.. my tables look like this: Table "public.numbers" Column | Type | Modifiers --------+---------+----------- id | integer | value | text | Table "public.evens" Column | Type | Modifiers --------+---------+----------- id | integer | value | text | Indexes: "ei" btree (id) Inherits: numbers Table "public.odds" Column | Type | Modifiers --------+---------+----------- id | integer | value | text | Indexes: "oi" btree (id) Inherits: numbers As per the above construction, 'evens' and 'odds' both have 50000 rows. 'numbers' contains none. Now, I created a trivial query that would use 'numbers' as an inheritor table in a join (a very stupid one, but a join nevertheless) as follows, which produces a terrible, but correct, plan: select value from (SELECT 1::integer as id) as ids JOIN numbers on (numbers.id = ids.id); QUERY PLAN --------------------------------------------------------------------------- Hash Join (cost=0.02..2195.79 rows=501 width=19) Hash Cond: ("outer".id = "inner".id) -> Append (cost=0.00..1690.50 rows=100051 width=23) -> Seq Scan on numbers (cost=0.00..0.00 rows=1 width=23) -> Seq Scan on evens numbers (cost=0.00..845.25 rows=50025 width=23) -> Seq Scan on odds numbers (cost=0.00..845.25 rows=50025 width=23) -> Hash (cost=0.02..0.02 rows=1 width=4) -> Subquery Scan ids (cost=0.00..0.02 rows=1 width=4) -> Result (cost=0.00..0.01 rows=1 width=0) Now, I subsitute 'evens' for 'numbers', so I am now joining with a normal, non-inherited table. The plan is much better: select value from (SELECT 1::integer as id) as ids JOIN evens on (evens.id = ids.id); QUERY PLAN ----------------------------------------------------------------------- Nested Loop (cost=0.00..3.05 rows=2 width=19) -> Subquery Scan ids (cost=0.00..0.02 rows=1 width=4) -> Result (cost=0.00..0.01 rows=1 width=0) -> Index Scan using ei on evens (cost=0.00..3.01 rows=1 width=23) Index Cond: (evens.id = "outer".id) I would think that the ideal plan for the first query should be a nested loop like the second that considers indexes where possible, and it would look as follows: HYPOTHETICAL PLAN ------------------------------------------------------------------------ Nested Loop -> Subquery Scan ids -> Result -> Append -> Seq Scan on numbers -> Index Scan using ei on evens Index Cond: (evens.id = "outer.id") -> Index Scan using oi on odds Index Cond: (odds.id = "outer.id") So.. Why wouldn't postgres use such a plan? I could think of three reasons: - The planner wasn't considering this plan due to some fault of its own - That plan makes no sense and would not be able to be run in the executor, and therefore it would be wasteful to consider it. - It truly is more expensive than a hash join I've pretty much ruled out the third and I suspect the second is also untrue (though have not proven that to myself), leaving the first. If it is indeed the second, that the plan makes no sense, someone please let me know! OK, so I took a look into the optimizer and over time got a better understanding of what's going on, though I still don't understand it completely. Here's my theory as to what's happening: For this query, most of the path consideration takes place in match_unsorted_outer() in path/joinpath.c. For the 'good' query against the non-inherited 'evens' table, the good plan is generated in the line: bestinnerjoin = best_inner_indexscan(...); Since an inherited table doesn't have one single index over all its inherited tables, this call produces a null bestinnerjoin. Later on in match_unsorted_inner(), various access paths are considered for a nested loop. One is bestinnerjoin (when it exists), and that is how the 'good' query gets its nested loop with an index scan. Other paths considered for inclusion in the nested loop are 'inner_cheapest_total' and 'inner_cheapest_startup'; These plans, presumably, contain sequential scans, which are expensive enough in the nested loop that the nested loop plan is suboptimal compared to a hash join, which is what happens in the 'bad' query that joins against the inheritor table. Now, it seems to me as if the solution would be one of the following: - create a new bestinnerjoin plan when best_inner_indexscan returned null, yet the current relation is an inheritor. This bestinnerjoin plan will use the optimal Append plan that comprises of the optimal plans for the inherited tables that are relevant for the joins (as in the case of the hypothetical query above where the append consists of a number of index and sequential scans) - Consider the optimal Append path relevant to the join later on when considering paths for use in a nested loop. i.e. inner_cheapest_total or inner_cheapest_startup will have to be this good append plan. For 2, the problem seems to be that in creating the inner_cheapest_total, joininfo nodes for each child relation do not exist. I experimented by just copying the parents joininfo nodes to each child, and it looked like it was considering index scans somewhere along the line, but the final plan chosen did not use index scans. I didn't see where it failed. I'm kinda skeptical of 3 anyway, because the inner_cheapest_total in the 'good' query with out inheritance does not appear to involve index scans at all. So.. does anybody have any advice? Am I totally off base with my assertions that a better plan can exist when joining inherited tables? Does my analysis of the problem and possible solutions in the optimizer make any sense? I basically pored over the source code, read the documentation, and did little tests here and there to arrive at my conclusions, but cannot claim to have a good global view of how everything works. Thanks very much -Aaron
apb18@cornell.edu writes: > So.. does anybody have any advice? Look at set_inherited_rel_pathlist() in allpaths.c --- it forms the best plan for fully scanning the inheritance-tree table. Currently that's the *only* plan considered, and it does not make any use of join clauses. It's possible that something could be done with providing a similar routine for best inner indexscans taken across the whole inheritance tree. You'd have to figure out how to locate the applicable joinclauses though. I think they'd be attached to the inheritance-tree parent relation and not to the individual inheritance tree member relations. Also, if you are wondering why best_inner_indexscan() is so tense about caching its results, that's because it gets called *a lot* in large join problems. If you don't achieve a similar level of efficiency then you'll be seeing some nasty performance problems in larger queries. I think you'd find there is executor work to do as well; I'm not sure how the outer-relation values would get propagated down into the indexscans when there's an Append node between. Maybe the existing code would Just Work, but I bet not. <digression> Not sure if this will help you, but: Once upon a time the planner did the APPEND for an inheritance tree at the top of the plan not the bottom. (It still does when the tree is the target of an update/delete query.) In 7.0 for example I get a plan like this: create table pt (f1 int primary key); create table ct1 (f2 int) inherits (pt); create table ct2 (f2 int) inherits (pt); create index ct1i on ct1(f1); create table bar(f1 int); explain select * from pt*, bar where pt.f1 = bar.f1; NOTICE: QUERY PLAN: Append (cost=69.83..474.33 rows=30000 width=8) -> Merge Join (cost=69.83..154.83 rows=10000 width=8) -> Index Scan using pt_pkey on pt (cost=0.00..60.00 rows=1000 width=4) -> Sort (cost=69.83..69.83 rows=1000 width=4) -> Seq Scan on bar (cost=0.00..20.00 rows=1000 width=4) -> Merge Join (cost=69.83..154.83 rows=10000 width=8) -> Index Scan using ct1i on ct1 pt (cost=0.00..60.00 rows=1000 width=4) -> Sort (cost=69.83..69.83 rows=1000 width=4) -> Seq Scan on bar (cost=0.00..20.00 rows=1000 width=4) -> Merge Join (cost=139.66..164.66 rows=10000 width=8) -> Sort (cost=69.83..69.83 rows=1000 width=4) -> Seq Scan on bar (cost=0.00..20.00 rows=1000 width=4) -> Sort (cost=69.83..69.83 rows=1000 width=4) -> Seq Scan on ct2 pt (cost=0.00..20.00 rows=1000 width=4) whereas the same test in CVS tip produces QUERY PLAN ---------------------------------------------------------------------------- Merge Join (cost=303.09..353.09 rows=3000 width=8) Merge Cond: ("outer".f1 = "inner".f1) -> Sort (cost=69.83..72.33 rows=1000 width=4) Sort Key: bar.f1 -> Seq Scan on bar (cost=0.00..20.00 rows=1000 width=4) -> Sort (cost=233.26..240.76 rows=3000 width=4) Sort Key: public.pt.f1 -> Append (cost=0.00..60.00 rows=3000 width=4) -> Seq Scan on pt (cost=0.00..20.00 rows=1000 width=4) -> Seq Scan on ct1 pt (cost=0.00..20.00 rows=1000 width=4) -> Seq Scan on ct2 pt (cost=0.00..20.00 rows=1000 width=4) The fact that 7.0 could actually adapt to different index sets for different child tables was kinda cool, but the append-at-the-top strategy failed completely for outer joins, so we had to abandon it. In practice I think the generated plan was usually worse anyway (note that bar gets scanned three times in 7.0's plan), but for the specific case where the inheritance tree is on the inside of a nestloop that could be indexed, the new approach is not as good. If you can come up with a fix that doesn't break things in other respects, it'd be great. [digs in CVS logs] The patch that altered the APPEND-at-the-top behavior was this one: 2000-11-11 19:36 tgl * src/: backend/commands/command.c, backend/commands/copy.c, backend/commands/explain.c, backend/executor/execMain.c, backend/executor/execQual.c, backend/executor/execTuples.c, backend/executor/execUtils.c, backend/executor/functions.c, backend/executor/nodeAppend.c, backend/executor/nodeSeqscan.c, backend/nodes/copyfuncs.c, backend/nodes/equalfuncs.c, backend/nodes/outfuncs.c, backend/nodes/readfuncs.c, backend/optimizer/path/allpaths.c, backend/optimizer/path/pathkeys.c, backend/optimizer/plan/createplan.c, backend/optimizer/plan/planmain.c, backend/optimizer/plan/planner.c, backend/optimizer/prep/prepunion.c, backend/optimizer/util/pathnode.c, backend/optimizer/util/relnode.c, backend/parser/parse_clause.c, backend/tcop/pquery.c, include/catalog/catversion.h, include/executor/executor.h, include/executor/tuptable.h, include/nodes/execnodes.h, include/nodes/nodes.h, include/nodes/parsenodes.h, include/nodes/plannodes.h, include/nodes/relation.h, include/optimizer/pathnode.h, include/optimizer/planmain.h, include/optimizer/planner.h, include/optimizer/prep.h, backend/optimizer/README: Restructure handling of inheritance queries so that they work with outer joins, and clean things up a good deal at the same time. Append plan node no longer hacks on rangetable at runtime --- instead, all child tables are given their own RT entries during planning. Concept of multiple target tables pushed up into execMain, replacing bug-prone implementation within nodeAppend. Planner now supports generating Append plans for inheritance sets either at the top of the plan (the old way) or at the bottom. Expanding at the bottom is appropriate for tables used as sources, since they may appear inside an outer join; but we must still expand at the top when the target of an UPDATE or DELETE is an inheritance set, because we actually need a different targetlist and junkfilter for each target table in that case. Fortunately a target table can't be inside an outer join... Bizarre mutual recursion between union_planner and prepunion.c is gone --- in fact, union_planner doesn't really have much to do with union queries anymore, so I renamed it grouping_planner. Not sure if studying that diff would teach you anything useful, but there it is... </digression> regards, tom lane
OK, so I've had a bit of time to look things over, and appear to be making headway. Here's how things stand right now: I added a function called best_inner_scan used the same way as best_inner_indexscan, but it's a bit more generalized in the sense that it can make append plans comprising of the best scans for each constituent table (it makes calls to best_inner_indexscan for each child), or just return the best simple index scan (or null) for plain relations. In order to make that work, I gave the child tables modified join clauses from the parent... modified in the sense that I had to make the operands match the inherited child table (they would match the parent otherwise if I were to simply copy the Joininfo nodes, and thus fail in finding an appropriate index in the child table). I'm not entirely comfortable with that solution yet, as I'm not absolutely certain those additonal modified join clauses wont' affect something else in the code that I'm not aware of, but it appears to be having the desired effect. Basically, with optimizer debug enabled, I'm getting plans that look like this (with the same queries as before) RELOPTINFO (1 2): rows=501 width=19 cheapest total path: NestLoop(1 2) rows=501 cost=0.00..1253.67 clauses: numbers.id = ids.id SeqScan(1) rows=1 cost=0.00..0.02 Append(2) rows=100051 cost=0.00..3.01 As opposed to this: RELOPTINFO (1 2): rows=501 width=19 cheapest total path: HashJoin(1 2) rows=501 cost=0.00..2195.79 clauses: numbers.id = ids.id SeqScan(1) rows=1 cost=0.00..0.02 Append(2) rows=100051 cost=0.00..1690.50 The total cost seems a high for the nestloop.. its constituents are certainly cheap. I need to look to see if I missed keeping track of costs somewhere. When I EXPLAIN, though, I get an error from the executor: "ERROR: both left and right operands are rel-vars". I haven't looked into that yet, but the results so far are encouraging enough to press on and get this completed. There was one hairy part, though, which will have to be addressed at some later point: Right now there is a boolean 'inh' in the RangeTblEntry struct which indicates "inheritance requested". When the inheritance root is first expanded by expand_inherited_rtentry(), the rte->inh is nulled in order to prevent expansion of an UPDATE/DELETE target. This presented problems for me when I wanted to detect which relation was an inheritor one in order to expand it into the append path. For my testing purposes, I just commented out the line, but for a real solution, that's not an acceptable solution and some struct might have to be changed slightly in order to convey the inheritance knowledge.. So, I guess the next step is to see what the executor is complaining about and see if it's something that would need attention in the executor of if it's something I did wrong.. If everything appears to work after that point, then I'll check for efficiency and use of cache in generating the inner scan plans. Thanks for the advice and historical perspective so far, Tom. It has been quite helpful. -Aaron