Thread: UNION ALL has higher cost than inheritance
I found an explicit UNION ALL has higher cost than an automatic expansion by inheritance (49 vs. 83 in the example below). Where does the difference come from? Since they have almost same plan trees, should it be the same cost? =# CREATE TABLE parent (i integer); =# CREATE TABLE child () INHERITS (parent); =# INSERT INTO child SELECT generate_series(1, 1000); =# CREATE INDEX ON child (i); =# ANALYZE; =# EXPLAIN SELECT * FROM parent; QUERY PLAN ----------------------------------------------------------------------------Result (cost=0.00..49.00 rows=3400 width=4) -> Append (cost=0.00..49.00 rows=3400 width=4) -> Seq Scan on parent (cost=0.00..34.00 rows=2400 width=4) -> Seq Scan on child parent (cost=0.00..15.00 rows=1000 width=4) (4 rows) =# EXPLAIN SELECT * FROM ONLY parent UNION ALL SELECT * FROM child; QUERY PLAN ----------------------------------------------------------------Append (cost=0.00..83.00 rows=3400 width=4) -> Seq Scanon parent (cost=0.00..34.00 rows=2400 width=4) -> Seq Scan on child (cost=0.00..15.00 rows=1000 width=4) (3 rows) -- Itagaki Takahiro
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > I found an explicit UNION ALL has higher cost than an automatic expansion > by inheritance (49 vs. 83 in the example below). Where does the difference > come from? The plan for UNION initially involves a couple of SubqueryScan nodes, which impose an extra cost of cpu_tuple_cost per tuple. Those later get optimized away, but we don't try to readjust the cost estimates for that. regards, tom lane
On Thu, Oct 21, 2010 at 2:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The plan for UNION initially involves a couple of SubqueryScan nodes, > which impose an extra cost of cpu_tuple_cost per tuple. Those later > get optimized away, but we don't try to readjust the cost estimates > for that. Thanks. It also explains my another question why Merge Append cannot be used for UNION ALL plans. Inheritance is better than UNION ALL in much more cases thanks to Merge Append. =# EXPLAIN SELECT * FROM parent ORDER BY i LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------------------------Limit (cost=1.02..1.58rows=10 width=4) -> Result (cost=1.02..56.79 rows=1001 width=4) -> Merge Append (cost=1.02..56.79rows=1001 width=4) Sort Key: public.parent.i -> Sort (cost=1.01..1.01 rows=1width=4) Sort Key: public.parent.i -> Seq Scan on parent (cost=0.00..1.00 rows=1width=4) -> Index Scan using child_i_idx on child parent (cost=0.00..43.25 rows=1000 width=4) (8 rows) =# EXPLAIN (SELECT * FROM ONLY parent ORDER BY i) UNION ALL (SELECT * FROM child ORDER BY i) ORDER BY i LIMIT 10; QUERY PLAN -----------------------------------------------------------------------------------------------Limit (cost=75.91..75.93rows=10 width=4) -> Sort (cost=75.91..78.41 rows=1001 width=4) Sort Key: parent.i -> Append (cost=1.01..54.28 rows=1001 width=4) -> Sort (cost=1.01..1.01 rows=1 width=4) SortKey: parent.i -> Seq Scan on parent (cost=0.00..1.00 rows=1 width=4) -> Index Scanusing child_i_idx on child (cost=0.00..43.25 rows=1000 width=4) (8 rows) -- Itagaki Takahiro
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > On Thu, Oct 21, 2010 at 2:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The plan for UNION initially involves a couple of SubqueryScan nodes, >> which impose an extra cost of cpu_tuple_cost per tuple. Those later >> get optimized away, but we don't try to readjust the cost estimates >> for that. > Thanks. It also explains my another question why Merge Append cannot > be used for UNION ALL plans. Hmm, seems like the example you show ought to work. I wonder if there was an oversight in that patch... regards, tom lane
On Thu, Oct 21, 2010 at 6:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Thanks. It also explains my another question why Merge Append cannot >> be used for UNION ALL plans. > > Hmm, seems like the example you show ought to work. I wonder if there > was an oversight in that patch... > Huh, that definitely worked in the earlier versions of the patch (as much as it "worked" at all) -- greg
Greg Stark <gsstark@mit.edu> writes: > On Thu, Oct 21, 2010 at 6:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Thanks. It also explains my another question why Merge Append cannot >>> be used for UNION ALL plans. >> Hmm, seems like the example you show ought to work. �I wonder if there >> was an oversight in that patch... > Huh, that definitely worked in the earlier versions of the patch (as > much as it "worked" at all) Actually, it works as long as the UNION is in a subquery: regression=# EXPLAIN select * from ( (SELECT * FROM ONLY parent ORDER BY i) UNION ALL (SELECT * FROM child ORDER BY i)) ss ORDER BY i LIMIT 10; QUERY PLAN -----------------------------------------------------------------------------------------------Limit (cost=168.76..169.13rows=10 width=4) -> Result (cost=168.76..294.51 rows=3400 width=4) -> Merge Append (cost=168.76..294.51rows=3400 width=4) Sort Key: parent.i -> Sort (cost=168.75..174.75 rows=2400width=4) Sort Key: parent.i -> Seq Scan on parent (cost=0.00..34.00 rows=2400width=4) -> Index Scan using child_i_idx on child (cost=0.00..43.25 rows=1000 width=4) (8 rows) The oversight here is that we don't use appendrel planning for a top-level UNION ALL construct. That didn't use to matter, because you always got the same stupid Append plan either way. Now it seems like we ought to have some more intelligence for the top-level SetOp case. I smell some code refactoring coming up. regards, tom lane
On Oct 21, 2010, at 2:17 PM, Tom Lane wrote: > The oversight here is that we don't use appendrel planning for > a top-level UNION ALL construct. That didn't use to matter, > because you always got the same stupid Append plan either way. > Now it seems like we ought to have some more intelligence for the > top-level SetOp case. I smell some code refactoring coming up. Does it smell like chicken? Best, David
I wrote: > The oversight here is that we don't use appendrel planning for > a top-level UNION ALL construct. That didn't use to matter, > because you always got the same stupid Append plan either way. > Now it seems like we ought to have some more intelligence for the > top-level SetOp case. I smell some code refactoring coming up. I did some hacking on this and came up with the attached patch, which could use a bit more work on the comments but passes regression tests. However, this just solves the issue of being smart about top-level UNION ALL cases. It might be worth looking into using MergeAppend for the sorting required for other types of set operations. That would involve quite a different patch, and I'm not sure if it'd remove the need for this one or not. regards, tom lane diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 07301c77fbf4f2550367ea8ff58c685462835223..a1055978869c5072db6e3a172169d934062ae2aa 100644 *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *************** subquery_planner(PlannerGlobal *glob, Qu *** 341,353 **** inline_set_returning_functions(root); /* ! * Check to see if any subqueries in the rangetable can be merged into * this query. */ parse->jointree = (FromExpr *) pull_up_subqueries(root, (Node *) parse->jointree, NULL, NULL); /* * Detect whether any rangetable entries are RTE_JOIN kind; if not, we can * avoid the expense of doing flatten_join_alias_vars(). Also check for * outer joins --- if none, we can skip reduce_outer_joins(). This must be --- 341,362 ---- inline_set_returning_functions(root); /* ! * Check to see if any subqueries in the jointree can be merged into * this query. */ parse->jointree = (FromExpr *) pull_up_subqueries(root, (Node *) parse->jointree, NULL, NULL); /* + * If this is a simple UNION ALL query, flatten it into an appendrel. + * We do this now because it requires applying pull_up_subqueries to the + * leaf queries of the UNION ALL, which weren't touched above because they + * aren't referenced by the jointree until we do this. + */ + if (parse->setOperations) + flatten_union_all(root); + + /* * Detect whether any rangetable entries are RTE_JOIN kind; if not, we can * avoid the expense of doing flatten_join_alias_vars(). Also check for * outer joins --- if none, we can skip reduce_outer_joins(). This must be diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index e337751328bf0b05e3025a2ecc302f754b766855..847e67959a8b81a5cf8ea55ae48d57ec98f32285 100644 *** a/src/backend/optimizer/prep/prepjointree.c --- b/src/backend/optimizer/prep/prepjointree.c *************** *** 7,12 **** --- 7,13 ---- * pull_up_sublinks * inline_set_returning_functions * pull_up_subqueries + * flatten_union_all * do expression preprocessing (including flattening JOIN alias vars) * reduce_outer_joins * *************** pull_up_simple_union_all(PlannerInfo *ro *** 869,879 **** List *rtable; /* - * Append the subquery rtable entries to upper query. - */ - rtoffset = list_length(root->parse->rtable); - - /* * Append child RTEs to parent rtable. * * Upper-level vars in subquery are now one level closer to their parent --- 870,875 ---- *************** pull_up_simple_union_all(PlannerInfo *ro *** 881,886 **** --- 877,883 ---- * because any such vars must refer to stuff above the level of the query * we are pulling into. */ + rtoffset = list_length(root->parse->rtable); rtable = copyObject(subquery->rtable); IncrementVarSublevelsUp_rtable(rtable, -1, 1); root->parse->rtable = list_concat(root->parse->rtable, rtable); *************** pullup_replace_vars_callback(Var *var, *** 1418,1423 **** --- 1415,1518 ---- return newnode; } + + /* + * flatten_union_all + * Try to optimize a top-level UNION ALL structure + * + * If a query's setOperations tree consists entirely of simple UNION ALL + * operations, flatten it into an append relation, which we can process more + * intelligently than the general set-ops case. + * + * In most cases, this can succeed only for a top-level query, because for a + * subquery in FROM, the parent query's invocation of pull_up_subqueries would + * already have flattened the UNION via pull_up_simple_union_all. But there + * are a few cases we can support here but not in that code path, for example + * when the subquery also contains ORDER BY. + */ + void + flatten_union_all(PlannerInfo *root) + { + Query *parse = root->parse; + SetOperationStmt *topop; + Node *leftmostjtnode; + int leftmostRTI; + RangeTblEntry *leftmostRTE; + int childRTI; + RangeTblEntry *childRTE; + RangeTblRef *rtr; + + /* Shouldn't be called unless query has setops */ + topop = (SetOperationStmt *) parse->setOperations; + Assert(topop && IsA(topop, SetOperationStmt)); + + /* Can't optimize away a recursive UNION */ + if (root->hasRecursion) + return; + + /* + * Recursively check the tree of set operations. If not all UNION ALL + * with identical column types, punt. + */ + if (!is_simple_union_all_recurse((Node *) topop, parse, + topop->colTypes)) + return; + + /* + * Locate the leftmost leaf query in the setops tree. The upper query's + * Vars all refer to this RTE (see transformSetOperationStmt). + */ + leftmostjtnode = topop->larg; + while (leftmostjtnode && IsA(leftmostjtnode, SetOperationStmt)) + leftmostjtnode = ((SetOperationStmt *) leftmostjtnode)->larg; + Assert(leftmostjtnode && IsA(leftmostjtnode, RangeTblRef)); + leftmostRTI = ((RangeTblRef *) leftmostjtnode)->rtindex; + leftmostRTE = rt_fetch(leftmostRTI, parse->rtable); + Assert(leftmostRTE->rtekind == RTE_SUBQUERY); + + /* + * Make a copy of the leftmost RTE and add it to the rtable. This copy + * will represent the leftmost leaf query in its capacity as a member + * of the appendrel. The original will represent the appendrel as a + * whole. (We must do things this way because the upper query's Vars + * have to be seen as referring to the whole appendrel.) + */ + childRTE = copyObject(leftmostRTE); + parse->rtable = lappend(parse->rtable, childRTE); + childRTI = list_length(parse->rtable); + + /* Modify the setops tree to reference the child copy */ + ((RangeTblRef *) leftmostjtnode)->rtindex = childRTI; + + /* + * Modify the formerly-leftmost RTE to mark it as an appendrel parent + * (compare pull_up_simple_union_all). + */ + leftmostRTE->inh = true; + + /* + * Form a RangeTblRef for the appendrel, and insert it into FROM. The top + * Query of a setops tree should have had an empty FromClause initially. + */ + rtr = makeNode(RangeTblRef); + rtr->rtindex = leftmostRTI; + Assert(parse->jointree->fromlist == NIL); + parse->jointree->fromlist = list_make1(rtr); + + /* + * Now pretend the query has no setops. We must do this before trying + * to do subquery pullup, because of Assert in pull_up_simple_subquery. + */ + parse->setOperations = NULL; + + /* + * Build AppendRelInfo information and try to pull up the leaf queries. + * XXX need to adjust comments for pull_up_union_leaf_queries + */ + pull_up_union_leaf_queries((Node *) topop, root, leftmostRTI, parse, 0); + } + + /* * reduce_outer_joins * Attempt to reduce outer joins to plain inner joins. diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h index f8dd5428ee47a74637d37f5a293d2acff117eae4..e4654d9eeab0ef3908f679652866adb2589bd1ce 100644 *** a/src/include/optimizer/prep.h --- b/src/include/optimizer/prep.h *************** extern void inline_set_returning_functio *** 26,31 **** --- 26,32 ---- extern Node *pull_up_subqueries(PlannerInfo *root, Node *jtnode, JoinExpr *lowest_outer_join, AppendRelInfo *containing_appendrel); + extern void flatten_union_all(PlannerInfo *root); extern void reduce_outer_joins(PlannerInfo *root); extern Relids get_relids_in_jointree(Node *jtnode, bool include_joins); extern Relids get_relids_for_join(PlannerInfo *root, int joinrelid);
I wrote: > I did some hacking on this and came up with the attached patch, which > could use a bit more work on the comments but passes regression tests. > However, this just solves the issue of being smart about top-level > UNION ALL cases. It might be worth looking into using MergeAppend > for the sorting required for other types of set operations. That would > involve quite a different patch, and I'm not sure if it'd remove the > need for this one or not. I spent some more time thinking about this. It seems difficult to make any real further progress without essentially throwing out the current approach to planning set-operation trees and starting over. What we'd want is to generate Paths representing the different ways a set-op could be implemented and then pick the best one. I can see the general shape of how to do that: a set-op should be thought of as generating a constrained join relation in a fashion similar to an OUTER JOIN construct, and then the different Paths are possible implementations for the join rel. We'd need new Path types for hashed or sorted UNION, INTERSECT, EXCEPT, as well as some analog of the SpecialJoinInfo struct (or extend that struct to cover these cases). It strikes me that maybe top-level DISTINCT could be married into this too, since it's basically the same as a one-input UNION operator. But not sure what to do with DISTINCT ON. Another thought here is that the current definition of the SetOp execution node type could be improved. Instead of taking a single pre-merged input tuple stream, it'd be better if it had two inputs like a JOIN node. Then we could eliminate the "flag" column, which would simplify matters for planning and reduce the amount of data that has to be passed through sorting. SetOp itself would need to borrow some of the capability of MergeAppend so that it could read two sorted streams in parallel and keep them in sync. But this all looks like a pretty substantial amount of work, and given the low level of user demand for improving the performance of set operations, it seems to belong fairly far down the to-do list. So I'm not going to tackle it now. Barring objection, I'll clean up yesterday's patch a bit more and commit it. regards, tom lane
On Mon, Nov 8, 2010 at 1:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> I did some hacking on this and came up with the attached patch, which >> could use a bit more work on the comments but passes regression tests. >> However, this just solves the issue of being smart about top-level >> UNION ALL cases. It might be worth looking into using MergeAppend >> for the sorting required for other types of set operations. That would >> involve quite a different patch, and I'm not sure if it'd remove the >> need for this one or not. > > I spent some more time thinking about this. It seems difficult to make > any real further progress without essentially throwing out the current > approach to planning set-operation trees and starting over. What we'd > want is to generate Paths representing the different ways a set-op could > be implemented and then pick the best one. I can see the general shape > of how to do that: a set-op should be thought of as generating a > constrained join relation in a fashion similar to an OUTER JOIN > construct, and then the different Paths are possible implementations for > the join rel. We'd need new Path types for hashed or sorted UNION, > INTERSECT, EXCEPT, as well as some analog of the SpecialJoinInfo > struct (or extend that struct to cover these cases). It strikes me that INTERSECT ALL and EXCEPT ALL are pretty much JUST a semi- or anti-join (on the entire set of output columns). But without ALL it's completely different. > It strikes me that maybe top-level DISTINCT could be married into this > too, since it's basically the same as a one-input UNION operator. But > not sure what to do with DISTINCT ON. > > Another thought here is that the current definition of the SetOp > execution node type could be improved. Instead of taking a single > pre-merged input tuple stream, it'd be better if it had two inputs > like a JOIN node. Then we could eliminate the "flag" column, which > would simplify matters for planning and reduce the amount of data > that has to be passed through sorting. SetOp itself would need to > borrow some of the capability of MergeAppend so that it could read > two sorted streams in parallel and keep them in sync. > > But this all looks like a pretty substantial amount of work, and > given the low level of user demand for improving the performance of > set operations, it seems to belong fairly far down the to-do list. > So I'm not going to tackle it now. Barring objection, I'll clean up > yesterday's patch a bit more and commit it. I agree. If we had infinite resources it would be nice to tackle this, but I think we have bigger fish to fry. In particular, I wonder if you've thought any more about the generalized inner-indexscan machinery, or taken a look at any of the issues around KNNGIST. I'd like to see our limited supply of planner-fu invested in those areas, or perhaps in making inner join removal work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Nov 8, 2010 at 1:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> But this all looks like a pretty substantial amount of work, and >> given the low level of user demand for improving the performance of >> set operations, it seems to belong fairly far down the to-do list. >> So I'm not going to tackle it now. �Barring objection, I'll clean up >> yesterday's patch a bit more and commit it. > I agree. If we had infinite resources it would be nice to tackle > this, but I think we have bigger fish to fry. In particular, I wonder > if you've thought any more about the generalized inner-indexscan > machinery, or taken a look at any of the issues around KNNGIST. I'd > like to see our limited supply of planner-fu invested in those areas, > or perhaps in making inner join removal work. The two top things on my to-do list for 9.1 are the generalized inner-indexscan stuff and automatic replans for parameterized queries. I had been hoping to finish one or the other before the next commitfest, though time is draining away rapidly. I'll try to look at KNNGIST during the fest. regards, tom lane
On Mon, Nov 8, 2010 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Nov 8, 2010 at 1:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> But this all looks like a pretty substantial amount of work, and >>> given the low level of user demand for improving the performance of >>> set operations, it seems to belong fairly far down the to-do list. >>> So I'm not going to tackle it now. Barring objection, I'll clean up >>> yesterday's patch a bit more and commit it. > >> I agree. If we had infinite resources it would be nice to tackle >> this, but I think we have bigger fish to fry. In particular, I wonder >> if you've thought any more about the generalized inner-indexscan >> machinery, or taken a look at any of the issues around KNNGIST. I'd >> like to see our limited supply of planner-fu invested in those areas, >> or perhaps in making inner join removal work. > > The two top things on my to-do list for 9.1 are the generalized > inner-indexscan stuff and automatic replans for parameterized queries. > I had been hoping to finish one or the other before the next commitfest, > though time is draining away rapidly. Cool beans. > I'll try to look at KNNGIST during the fest. Thanks. I have posted on it a few times; you may or may not find it helpful to review those before diving in yourself... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane <tgl@sss.pgh.pa.us> writes: > But this all looks like a pretty substantial amount of work, and > given the low level of user demand for improving the performance of > set operations, it seems to belong fairly far down the to-do list. Whatever you say, that's your own todo list after all. I just wanted to chime in and say that it could well be a chicken-and-eggs problem. Other than that, the only general way I know of to optimise a WHERE clause containing OR branches is producing the UNION ALL equivalent. Now maybe I'm all wet, but your explaining sounded like a very nice first step towards such an automatic optimisation. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
BTW, one other point for the archives, before I forget it: it'd probably also be wise to change the parser's output representation of set operations. I think it should create a separate RTE entry corresponding to each SetOperation node, similar to what we do for explicit JOIN nodes. Then there is a clean representation for Vars referencing the setop outputs: they can use the RT index of the SetOp RTE entry. The current implementation is that all Vars in the upper query use the RTI of the leftmost leaf query of the setop structure, which is really bogus. First, that query doesn't necessarily output the same datatypes as the upper setops do, so the datatypes shown by the Vars don't always match the RTE they claim to refer to. (This is one reason why all of the setop optimizations we do have tend to punt as soon as any datatype changes occur.) Second, this design provides no good way to refer to outputs of intermediate setops. I think we will *have* to fix that if we want to make the planner's handling of these cases much smarter than it is now. regards, tom lane