Thread: Using indices for UNION.
Hello, This patch might be too complicated (and seems somewhat ad hoc) for the gain, but more index usage for this kind of operation should be worth doing. Currently, PostgreSQL ignores from the very first the availablity of indexes for UNION. Sorting and SeqScan is choosed as follows, * EX.1 > uniontest=# create table c11 (a int, b int, c int, d text); > uniontest=# create unique index cu11_pkey_idx on c11 (a, b) > uniontest=# create table c12 (like cu11 including all); > uniontest=# insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' > from generate_series(000000, 099999) a); > uniontest=# insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' > from generate_series(100000, 199999) a); > uniontest=# explain (analyze on, costs off) > select a, b from cu11 union select a, b from cu12 order by a, b limit 100; > QUERY PLAN > --------------------------------------------------------------------- > Limit (actual time=287.611..287.673 rows=100 loops=1) > -> Unique (actual time=287.610..287.652 rows=100 loops=1) > -> Sort (actual time=287.609..287.621 rows=100 loops=1) > Sort Key: cu11.a, cu11.b > Sort Method: external sort Disk: 3520kB > -> Append (actual time=0.020..73.859 rows=200000 loops=1) > -> Seq Scan on cu11 (actual time=0.019..28.780 rows=100000 loops=1) > -> Seq Scan on cu12 (actual time=0.006..24.550 rows=100000 loops=1) > Total runtime: 288.596 ms Actually, index paths are sometimes worth considering on UNION planning especially with LIMIT. For example, * EX.2 > uniontest=# explain (analyze on, costs off) > select a, b from cu11 union select a, b from cu12 > order by a desc, b desc limit 100; > QUERY PLAN > ------------------------------------------------------------------ > Limit (actual time=0.041..0.390 rows=100 loops=1) > -> Unique (actual time=0.040..0.355 rows=100 loops=1) > -> Merge Append (actual time=0.040..0.290 rows=100 loops=1) > Sort Key: cu11.a, cu11.b > -> Limit (actual time=0.025..0.025 rows=1 loops=1) > -> Unique (actual time=0.025..0.025 rows=1 loops=1) > -> Index Only Scan Backward on cu11 > (actual time=0.022..0.022 rows=1 loops=1) > Heap Fetches: 1 > -> Limit (actual time=0.012..0.209 rows=100 loops=1) > -> Unique (actual time=0.011..0.188 rows=100 loops=1) > -> Index Only Scan Backward .. on cu12 > (actual time=0.010..0.115 rows=100 loops=1) > Heap Fetches: 100 > Total runtime: 1.191 ms This is archieved by this patch. Rough explanaiton of what this patch does is following, - Consider whether ORDER BY or grouping of setops can be pushed down onto leaf subqueries. (recurse_set_operations) - Merging two children of a setop counting the sort orders of them. Use MergeAppend if they are in the same ordering. (generate_union_plan,recurse_union_children) - Grouping is determined consulting sorting order. This will allow removing redundant sorting. (generate_setop_grouplist) The effects of these items are shown in the Ex.2. What do you think about this? ======= Nevertheless, only this patch does not effective so far on the query like following, >uniontest=# explain (analyze on, costs off) > select * from cu11 union select * from cu12 > order by a, b limit 100; > QUERY PLAN > ----------------------------------------------------------------------- > Limit (actual time=256.263..256.346 rows=100 loops=1) > -> Unique (actual time=256.262..256.332 rows=100 loops=1) > -> Sort (actual time=256.261..256.283 rows=100 loops=1) > Sort Key: cu11.a, cu11.b, cu11.c, cu11.d > Sort Method: external sort Disk: 5280kB > -> Append (actual time=0.028..61.161 rows=200000 loops=1) > -> Seq Scan on cu11 (actual time=0.027..21.067 rows=100000 loops=1) > -> Seq Scan on cu12 (actual time=0.012..18.428 rows=100000 loops=1) > Total runtime: 257.778 ms The indexes made here is UNIQUE one so they should be used for ordering in this query. This will be overcome by another patch (will separately proposed). > uniontest=# explain (analyze on, costs off) > select * from cu11 union select * from cu12 > order by a, b limit 100; > QUERY PLAN > --------------------------------------------------------------------------- > Limit (actual time=0.102..0.351 rows=100 loops=1) > -> Unique (actual time=0.100..0.323 rows=100 loops=1) > -> Merge Append (actual time=0.097..0.222 rows=100 loops=1) > Sort Key: cu11.a, cu11.b, cu11.c, cu11.d > -> Limit (actual time=0.051..0.135 rows=100 loops=1) > -> Index Scan .. on cu11 (actual time=0.051..0.109 rows=100 loops=1) > -> Limit (actual time=0.044..0.044 rows=1 loops=1) > -> Index Scan .. on cu12 (actual time=0.044..0.044 rows=1 loops=1) > Total runtime: 1.244 ms regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index d8aa35d..86abdf6 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1063,15 +1063,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) List *set_sortclauses; /* - * If there's a top-level ORDER BY, assume we have to fetch all the - * tuples. This might be too simplistic given all the hackery below - * to possibly avoid the sort; but the odds of accurate estimates here - * are pretty low anyway. - */ - if (parse->sortClause) - tuple_fraction = 0.0; - - /* * Construct the plan for set operations. The result will not need * any work except perhapsa top-level sort and/or LIMIT. Note that * any special work for recursive unions is the responsibility of diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index b78d727..efb50dc 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -778,9 +778,26 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) Assert(splan->plan.qual ==NIL); foreach(l, splan->appendplans) { - lfirst(l) = set_plan_refs(root, - (Plan *) lfirst(l), - rtoffset); + Append *newp = + set_plan_refs(root, + (Plan *) lfirst(l), + rtoffset); + /* + * UNION on inherited tables may create directly nested + * Appends in plan tree. This structure can be flatten by + * taking grandchildren into parent. + */ + if (IsA(newp, Append) && + list_length(newp->appendplans) > 0) + { + ListCell *plc = list_head(newp->appendplans); + lfirst(l) = lfirst(plc); + for_each_cell(plc, lnext(plc)) + l = lappend_cell(splan->appendplans, + l, lfirst(plc)); + } + else + lfirst(l) = newp; } } break; diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index e249628..b814a72 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -29,6 +29,7 @@#include "postgres.h"#include <limits.h> +#include <math.h>#include "access/heapam.h"#include "access/htup_details.h" @@ -60,6 +61,7 @@ typedef structstatic Plan *recurse_set_operations(Node *setOp, PlannerInfo *root, double tuple_fraction, List *colTypes, List *colCollations, + List *groupClauses, bool junkOK, int flag, List *refnames_tlist, List **sortClauses, double *pNumGroups); @@ -78,7 +80,8 @@ static Plan *generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,static List *recurse_union_children(Node*setOp, PlannerInfo *root, double tuple_fraction, SetOperationStmt *top_union, - List *refnames_tlist); + List *refnames_tlist, + List **child_sortclause);static Plan *make_union_unique(SetOperationStmt *op, Plan *plan, PlannerInfo *root, double tuple_fraction, List **sortClauses); @@ -97,7 +100,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations, bool flag, List *input_plans, List *refnames_tlist); -static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); +static List *generate_setop_grouplist(List *groupClauses, List *targetlist, + List *sortClauses);static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry*rte, Index rti);static void make_inh_translation_list(Relation oldrelation, @@ -152,6 +156,15 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction, Assert(parse->distinctClause == NIL); /* + * If there's a top-level ORDER BY, assume we have to fetch all the tuples + * except for UNION. This might be too simplistic given all the hackery + * below to possibly avoid the sort; but the odds of accurate estimates + * here are pretty low anyway. + */ + if (parse->sortClause && topop->op != SETOP_UNION) + tuple_fraction = 0.0; + + /* * We'll need to build RelOptInfos for each of the leaf subqueries, which * are RTE_SUBQUERY rangetable entriesin this Query. Prepare the index * arrays for that. @@ -186,18 +199,49 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction, */ return recurse_set_operations((Node*) topop, root, tuple_fraction, topop->colTypes, topop->colCollations, + topop->groupClauses, true, -1, leftmostQuery->targetList, sortClauses, NULL);}/* + * save_plannerglobal + * + * save planner global to allow multiple calls of subquery_planner at the same + * global status. This is done apartly from copyObject so as to do medium + * shallow copying. + */ +static PlannerGlobal * +save_plannerglobal(const PlannerGlobal *in) +{ + PlannerGlobal *save = makeNode(PlannerGlobal); + + save->boundParams = in->boundParams; + save->subplans = list_copy(in->subplans); + save->subroots = list_copy(in->subroots); + save->rewindPlanIDs = bms_copy(in->rewindPlanIDs); + save->finalrtable = list_copy(in->finalrtable); + save->finalrowmarks = list_copy(in->finalrowmarks); + save->resultRelations = list_copy(in->resultRelations); + save->relationOids = list_copy(in->relationOids); + save->invalItems = list_copy(in->invalItems); + save->nParamExec = in->nParamExec; + save->lastPHId = in->lastPHId; + save->lastRowMarkId = in->lastRowMarkId; + save->transientPlan = in->transientPlan; + + return save; +} + +/* * recurse_set_operations * Recursively handle one step in a tree of set operations * * tuple_fraction: fractionof tuples we expect to retrieve from node * colTypes: OID list of set-op's result column datatypes * colCollations:OID list of set-op's result column collations + * groupClauses: parent setop's grouping clause. * junkOK: if true, child resjunk columns may be left in the result * flag:if >= 0, add a resjunk output column indicating value of flag * refnames_tlist: targetlist to take column names from @@ -215,6 +259,7 @@ static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root, double tuple_fraction, List *colTypes, List *colCollations, + List *groupClauses, bool junkOK, int flag, List *refnames_tlist, List **sortClauses, double *pNumGroups) @@ -228,9 +273,12 @@ recurse_set_operations(Node *setOp, PlannerInfo *root, PlannerInfo *subroot; Plan *subplan, *plan; + PlannerGlobal *glob_org; Assert(subquery != NULL); + *sortClauses = NIL; + /* * We need to build a RelOptInfo for each leaf subquery. This isn't * used for anything here,but it carries the subroot data structures @@ -243,12 +291,103 @@ recurse_set_operations(Node *setOp, PlannerInfo *root, /* * Generate plan for primitivesubquery + * + * subquery_planner modifies planner globals. Save current planner + * global in order to allow next calling subquery_planner. */ + glob_org = save_plannerglobal(root->glob); subplan = subquery_planner(root->glob, subquery, root, false, tuple_fraction, &subroot); + /* + * Consider whether ORDER BY or groupings of the parent set operation + * can be pushed down onto this subquery. + */ + if(tuple_fraction >= 1 && + !subquery->limitOffset && !subquery->limitCount && + !subquery->distinctClause && + (!subquery->sortClause || + equal(subquery->sortClause, root->parse->sortClause))) + { + ListCell *s, *d; + bool try_pushdown = true; + + /* + * Index scan is expected to be applied in the subquery by this + * pushdown. So it is useless when target list contains non-Var + * member or types of exprs do not match because index scan won't + * be available. The reason why this check is here is to avoid + * unnecessary copyObject for the whole subquery which would be + * rather heavier. + */ + forboth(s, root->parse->targetList, d, subquery->targetList) + { + TargetEntry *ste = (TargetEntry*) lfirst(s); + TargetEntry *dte = (TargetEntry*) lfirst(d); + Node *sexpr = (Node*)ste->expr; + Node *dexpr = (Node*)dte->expr; + if (!IsA(sexpr, Var) || !IsA(dexpr, Var) || + exprType(sexpr) != exprType(dexpr)) + { + try_pushdown = false; + break; + } + } + + if (try_pushdown) + { + Query *pdquery; /* 'pd' comes from 'pushed down'. */ + PlannerInfo *pdsubroot; + Plan *pdplan; + + pdquery = copyObject(subquery); + + if (!pdquery->sortClause) + pdquery->sortClause = root->parse->sortClause; + + Assert(!pdquery->distinctClause); + pdquery->distinctClause = + generate_setop_grouplist(groupClauses, + pdquery->targetList, + pdquery->sortClause); + + if (tuple_fraction >= 1) + pdquery->limitCount = + (Node*)makeConst(INT8OID, -1, InvalidOid, sizeof(int64), + Int64GetDatum(tuple_fraction), + false, FLOAT8PASSBYVAL); + + pdplan = subquery_planner(glob_org, pdquery, + root, + false, tuple_fraction, + &pdsubroot); + + if (pdplan->total_cost < subplan->total_cost) + { + subplan = pdplan; + subroot = pdsubroot; + /* + * Glob cannot be moved because this is referred to from + * many places by its address. So set the address of the + * original glob to subroot, then copy new values there. + */ + subroot->glob = root->glob; + *root->glob = *glob_org; + + /* + * This plan is sorted on sortClause if any, else sorted + * on distinctClause. + */ + if (pdquery->sortClause) + *sortClauses = pdquery->sortClause; + else + *sortClauses = pdquery->distinctClause; + } + } + } + /* Save subroot and subplan in RelOptInfo for setrefs.c */ rel->subplan = subplan; rel->subroot =subroot; @@ -290,12 +429,6 @@ recurse_set_operations(Node *setOp, PlannerInfo *root, rtr->rtindex, subplan); - /* - * We don't bother to determine the subquery's output ordering since - * it won't be reflected in the set-op result anyhow. - */ - *sortClauses = NIL; - return plan; } else if (IsA(setOp, SetOperationStmt)) @@ -379,12 +512,14 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root, */ lplan = recurse_set_operations(setOp->larg,root, tuple_fraction, setOp->colTypes, setOp->colCollations, + setOp->groupClauses, false, -1, refnames_tlist, sortClauses, NULL); /* The right plan will want to look at the left one ... */ root->non_recursive_plan= lplan; rplan = recurse_set_operations(setOp->rarg, root, tuple_fraction, setOp->colTypes, setOp->colCollations, + setOp->groupClauses, false, -1, refnames_tlist, sortClauses, NULL); root->non_recursive_plan = NULL; @@ -409,7 +544,8 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root, double dNumGroups; /* Identify the grouping semantics */ - groupList = generate_setop_grouplist(setOp, tlist); + groupList = + generate_setop_grouplist(setOp->groupClauses, tlist, NULL); /* We only support hashing here */ if (!grouping_is_hashable(groupList)) @@ -452,20 +588,7 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root, List *planlist; List *tlist; Plan *plan; - - /* - * If plain UNION, tell children to fetch all tuples. - * - * Note: in UNION ALL, we pass the top-level tuple_fraction unmodified to - * each arm of the UNION ALL. One could make a case for reducing the - * tuple fraction for later arms (discounting by the expected size of the - * earlier arms' results) but it seems not worth the trouble. The normal - * case where tuple_fraction isn't already zero is a LIMIT at top level, - * and passing it down as-is is usually enough to get the desired result - * of preferring fast-start plans. - */ - if (!op->all) - tuple_fraction = 0.0; + List *lsort, *rsort; /* * If any of my children are identical UNION nodes (same op, all-flag, and @@ -475,34 +598,99 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root, */ planlist = list_concat(recurse_union_children(op->larg,root, tuple_fraction, - op, refnames_tlist), + op, refnames_tlist, + &lsort), recurse_union_children(op->rarg, root, tuple_fraction, - op, refnames_tlist)); + op, refnames_tlist, + &rsort)); /* - * Generate tlist for Append plan node. + * Generate tlist for Append/MergeAppend plan node. * - * The tlist for an Append plan isn't important as far as the Append is - * concerned, but we must make it look real anyway for the benefit of the - * next plan level up. + * The tlist for an Append/MergeAppend plan isn't important as far as the + * they are concerned, but we must make it look real anyway for the + * benefit of the next plan level up. */ tlist = generate_append_tlist(op->colTypes, op->colCollations, false, planlist, refnames_tlist); - /* - * Append the child results together. - */ - plan = (Plan *) make_append(planlist, tlist); - - /* - * For UNION ALL, we just need the Append plan. For UNION, need to add - * node(s) to remove duplicates. - */ - if (op->all) - *sortClauses = NIL; /* result of UNION ALL is always unsorted */ + if (lsort != NIL && equal(lsort, rsort)) + { + /* + * Generate MergeAppend plan if both children are sorted on the same + * sort clause or groupClauses. + */ + ListCell *lc, *slc; + int i = 0; + double total_size; + Plan *p; + List *distinctClause; + + MergeAppend *node = makeNode(MergeAppend); + node->plan.targetlist = tlist; + node->plan.qual = NIL; + node->plan.lefttree = NULL; + node->plan.righttree = NULL; + node->mergeplans = planlist; + node->numCols = list_length(root->parse->targetList); + node->sortColIdx = (AttrNumber*)palloc(sizeof(AttrNumber) * node->numCols); + node->sortOperators = (Oid*)palloc(sizeof(Oid) * node->numCols); + node->collations = (Oid*)palloc(sizeof(Oid) * node->numCols); + node->nullsFirst = (bool*)palloc(sizeof(bool) * node->numCols); + + distinctClause = + generate_setop_grouplist(op->groupClauses, + node->plan.targetlist, + root->parse->sortClause); + forboth (slc, distinctClause, lc, node->plan.targetlist) + { + TargetEntry *te = (TargetEntry*) lfirst(lc); + SortGroupClause *sgc = (SortGroupClause *) lfirst(slc); + + Assert(sgc->tleSortGroupRef == te->ressortgroupref); + node->sortColIdx[i] = i + 1; + node->sortOperators[i] = sgc->sortop; + node->collations[i] = exprCollation((Node*)te->expr); + node->nullsFirst[i] = sgc->nulls_first; + i++; + } + lc = list_head(node->mergeplans); + p = (Plan*) lfirst(lc); + node->plan.startup_cost = p->startup_cost; + node->plan.total_cost = p->total_cost; + node->plan.plan_rows = p->plan_rows; + total_size = p->plan_rows * p->plan_width; + for_each_cell(lc, lnext(lc)) + { + p = (Plan*) lfirst(lc); + node->plan.total_cost += p->total_cost; + node->plan.plan_rows += p->plan_rows; + total_size += p->plan_rows * p->plan_width; + } + node->plan.plan_width = rint(total_size / node->plan.plan_rows); + *sortClauses = root->parse->sortClause; + plan = (Plan*)node; + if (!op->all) + plan = make_union_unique(op, plan, root, tuple_fraction, + &distinctClause); + } else - plan = make_union_unique(op, plan, root, tuple_fraction, sortClauses); + { + /* + * Append the child results together. + */ + plan = (Plan *) make_append(planlist, tlist); + + /* + * For UNION ALL, we just need the Append plan. For UNION, need to + * add node(s) to remove duplicates. + */ + *sortClauses = NIL; + + if (!op->all) + plan = make_union_unique(op, plan, root, tuple_fraction, sortClauses); + } /* * Estimate number of groups if caller wants it. For now we just assume @@ -544,12 +732,14 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root, lplan = recurse_set_operations(op->larg,root, 0.0 /* all tuples needed */ , op->colTypes, op->colCollations, + op->groupClauses, false, 0, refnames_tlist, &child_sortclauses, &dLeftGroups); rplan = recurse_set_operations(op->rarg,root, 0.0 /* all tuples needed */ , op->colTypes, op->colCollations, + op->groupClauses, false, 1, refnames_tlist, &child_sortclauses, &dRightGroups); @@ -589,7 +779,7 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root, plan = (Plan *) make_append(planlist,tlist); /* Identify the grouping semantics */ - groupList = generate_setop_grouplist(op, tlist); + groupList = generate_setop_grouplist(op->groupClauses, tlist, NULL); /* punt if nothing to group on (can this happen?)*/ if (groupList == NIL) @@ -678,9 +868,13 @@ static List *recurse_union_children(Node *setOp, PlannerInfo *root, double tuple_fraction, SetOperationStmt *top_union, - List *refnames_tlist) + List *refnames_tlist, + List **child_sortclauses){ - List *child_sortclauses; + List *lplan, *rplan; + List *lsort, *rsort; + + *child_sortclauses = NIL; if (IsA(setOp, SetOperationStmt)) { @@ -691,14 +885,23 @@ recurse_union_children(Node *setOp, PlannerInfo *root, equal(op->colTypes, top_union->colTypes)) { /* Same UNION, so fold children into parent's subplan list */ - return list_concat(recurse_union_children(op->larg, root, - tuple_fraction, - top_union, - refnames_tlist), - recurse_union_children(op->rarg, root, - tuple_fraction, - top_union, - refnames_tlist)); + lplan = recurse_union_children(op->larg, root, + tuple_fraction, + top_union, + refnames_tlist, + &lsort); + rplan = recurse_union_children(op->rarg, root, + tuple_fraction, + top_union, + refnames_tlist, + &rsort); + /* + * Propagate whether all the descendents are sorted with same + * sortClause. + */ + if (lsort != NIL && equal(lsort, rsort)) + *child_sortclauses = lsort; + return list_concat(lplan, rplan); } } @@ -715,13 +918,16 @@ recurse_union_children(Node *setOp, PlannerInfo *root, tuple_fraction, top_union->colTypes, top_union->colCollations, + top_union->groupClauses, false,-1, refnames_tlist, - &child_sortclauses, NULL)); + child_sortclauses, NULL));}/* * Add nodes to the given plan tree to unique-ifythe result of a UNION. + * + * If the sortClause is given, we assume the plan is already sorted on it. */static Plan *make_union_unique(SetOperationStmt*op, Plan *plan, @@ -731,9 +937,11 @@ make_union_unique(SetOperationStmt *op, Plan *plan, List *groupList; double dNumGroups; long numGroups; + bool sort_needed = true; /* Identify the grouping semantics */ - groupList = generate_setop_grouplist(op, plan->targetlist); + groupList = generate_setop_grouplist(op->groupClauses, + plan->targetlist, *sortClauses); /* punt if nothing to group on (can this happen?)*/ if (groupList == NIL) @@ -742,6 +950,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan, return plan; } + if (*sortClauses && equal(*sortClauses, groupList)) + sort_needed = false; + /* * XXX for the moment, take the number of distinct groups as equal to the * total input size, ie, theworst case. This is too conservative, but we @@ -756,7 +967,8 @@ make_union_unique(SetOperationStmt *op, Plan *plan, numGroups = (long) Min(dNumGroups, (double) LONG_MAX); /* Decide whether to hash or sort */ - if (choose_hashed_setop(root, groupList, plan, + if (sort_needed && + choose_hashed_setop(root, groupList, plan, dNumGroups, dNumGroups, tuple_fraction, "UNION")) { @@ -778,7 +990,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan, else { /* Sort and Unique */ - plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan); + if (sort_needed) + plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan); + plan = (Plan *) make_unique(plan, groupList); plan->plan_rows = dNumGroups; /* We know thesort order of the result */ @@ -1145,23 +1359,34 @@ generate_append_tlist(List *colTypes, List *colCollations, * the parser output representation doesn'tinclude a tlist for each * setop. So what we need to do here is copy that list and install * proper sortgrouprefsinto it and into the targetlist. + * + * sortClause is consulted if provided to avoid re-sorting with different + * orderings on the same keys later. */static List * -generate_setop_grouplist(SetOperationStmt *op, List *targetlist) +generate_setop_grouplist(List *groupClauses, List *targetlist, + List *sortClauses){ - List *grouplist = (List *) copyObject(op->groupClauses); + List *grouplist = (List *) copyObject(groupClauses); ListCell *lg; + ListCell *ls = NULL; ListCell *lt; Index refno = 1; lg = list_head(grouplist); + + if (sortClauses) + ls = list_head(sortClauses); + foreach(lt, targetlist) { TargetEntry *tle = (TargetEntry *) lfirst(lt); - SortGroupClause *sgc; + SortGroupClause *sgc, *sgc_sort = NULL; - /* tlist shouldn't have any sortgrouprefs yet */ - Assert(tle->ressortgroupref == 0); + /* + * tlist shouldn't have any sortgrouprefs yet, or should be unchanged + */ + Assert(tle->ressortgroupref == 0 || tle->ressortgroupref == refno); if (tle->resjunk) continue; /* ignore resjunk columns */ @@ -1174,6 +1399,45 @@ generate_setop_grouplist(SetOperationStmt *op, List *targetlist) /* we could use assignSortGroupRefhere, but seems a bit silly */ sgc->tleSortGroupRef = tle->ressortgroupref = refno++; + + if (ls) + { + /* + * If sortClauses is provided, try to adjust the sorting order to + * get the chance for omitting sorting for grouping later. + */ + sgc_sort = (SortGroupClause *) lfirst(ls); + ls = lnext(ls); + if (sgc->sortop != sgc_sort->sortop) + { + Oid reverse = InvalidOid; + Oid opfamily, opcintype; + int16 strategy; + + if (get_ordering_op_properties(sgc->sortop, + &opfamily, &opcintype, &strategy)) + { + switch (strategy) + { + case BTLessStrategyNumber: + strategy = BTGreaterStrategyNumber; break; + case BTGreaterStrategyNumber: + strategy = BTLessStrategyNumber; break; + } + + reverse = get_opfamily_member(opfamily, + opcintype, + opcintype, + strategy); + if (sgc_sort->sortop == reverse) + { + sgc->sortop = reverse; + sgc->nulls_first = sgc_sort->nulls_first; + } + } + } + } + } Assert(lg == NULL); return grouplist;
On 10/31/13, 6:42 AM, Kyotaro HORIGUCHI wrote: > Hello, This patch might be too complicated (and seems somewhat ad > hoc) for the gain, but more index usage for this kind of > operation should be worth doing. There is a new compiler warning: setrefs.c: In function ‘set_plan_refs’: setrefs.c:782:7: warning: initialization from incompatible pointer type [enabled by default]
Thank you for pointing out. I missed the warning. > There is a new compiler warning: > > setrefs.c: In function ‘set_plan_refs’: > setrefs.c:782:7: warning: initialization from incompatible pointer type > [enabled by default] Added explicit cast there and rebased to current master. Checked no new warning by this patch. make check succeeded at both $(src) and $(src)/src/test. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index d8aa35d..86abdf6 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1063,15 +1063,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) List *set_sortclauses; /* - * If there's a top-level ORDER BY, assume we have to fetch all the - * tuples. This might be too simplistic given all the hackery below - * to possibly avoid the sort; but the odds of accurate estimates here - * are pretty low anyway. - */ - if (parse->sortClause) - tuple_fraction = 0.0; - - /* * Construct the plan for set operations. The result will not need * any work except perhapsa top-level sort and/or LIMIT. Note that * any special work for recursive unions is the responsibility of diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index b78d727..c6abe18 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -778,9 +778,26 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) Assert(splan->plan.qual ==NIL); foreach(l, splan->appendplans) { - lfirst(l) = set_plan_refs(root, - (Plan *) lfirst(l), - rtoffset); + Append *newp = + (Append *)set_plan_refs(root, + (Plan *) lfirst(l), + rtoffset); + /* + * UNION on inherited tables may create directly nested + * Appends in plan tree. This structure can be flatten by + * taking grandchildren into parent. + */ + if (IsA(newp, Append) && + list_length(newp->appendplans) > 0) + { + ListCell *plc = list_head(newp->appendplans); + lfirst(l) = lfirst(plc); + for_each_cell(plc, lnext(plc)) + l = lappend_cell(splan->appendplans, + l, lfirst(plc)); + } + else + lfirst(l) = newp; } } break; diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index e249628..e8a78a7 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -29,6 +29,7 @@#include "postgres.h"#include <limits.h> +#include <math.h>#include "access/heapam.h"#include "access/htup_details.h" @@ -60,6 +61,7 @@ typedef structstatic Plan *recurse_set_operations(Node *setOp, PlannerInfo *root, double tuple_fraction, List *colTypes, List *colCollations, + List *groupClauses, bool junkOK, int flag, List *refnames_tlist, List **sortClauses, double *pNumGroups); @@ -78,7 +80,8 @@ static Plan *generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,static List *recurse_union_children(Node*setOp, PlannerInfo *root, double tuple_fraction, SetOperationStmt *top_union, - List *refnames_tlist); + List *refnames_tlist, + List **child_sortclause);static Plan *make_union_unique(SetOperationStmt *op, Plan *plan, PlannerInfo *root, double tuple_fraction, List **sortClauses); @@ -97,7 +100,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations, bool flag, List *input_plans, List *refnames_tlist); -static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); +static List *generate_setop_grouplist(List *groupClauses, List *targetlist, + List *sortClauses);static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry*rte, Index rti);static void make_inh_translation_list(Relation oldrelation, @@ -152,6 +156,15 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction, Assert(parse->distinctClause == NIL); /* + * If there's a top-level ORDER BY, assume we have to fetch all the tuples + * except for UNION. This might be too simplistic given all the hackery + * below to possibly avoid the sort; but the odds of accurate estimates + * here are pretty low anyway. + */ + if (parse->sortClause && topop->op != SETOP_UNION) + tuple_fraction = 0.0; + + /* * We'll need to build RelOptInfos for each of the leaf subqueries, which * are RTE_SUBQUERY rangetable entriesin this Query. Prepare the index * arrays for that. @@ -186,18 +199,49 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction, */ return recurse_set_operations((Node*) topop, root, tuple_fraction, topop->colTypes, topop->colCollations, + topop->groupClauses, true, -1, leftmostQuery->targetList, sortClauses, NULL);}/* + * save_plannerglobal + * + * save planner global to allow multiple calls of subquery_planner at the same + * global status. This is done apartly from copyObject so as to do medium + * shallow copying. + */ +static PlannerGlobal * +save_plannerglobal(const PlannerGlobal *in) +{ + PlannerGlobal *save = makeNode(PlannerGlobal); + + save->boundParams = in->boundParams; + save->subplans = list_copy(in->subplans); + save->subroots = list_copy(in->subroots); + save->rewindPlanIDs = bms_copy(in->rewindPlanIDs); + save->finalrtable = list_copy(in->finalrtable); + save->finalrowmarks = list_copy(in->finalrowmarks); + save->resultRelations = list_copy(in->resultRelations); + save->relationOids = list_copy(in->relationOids); + save->invalItems = list_copy(in->invalItems); + save->nParamExec = in->nParamExec; + save->lastPHId = in->lastPHId; + save->lastRowMarkId = in->lastRowMarkId; + save->transientPlan = in->transientPlan; + + return save; +} + +/* * recurse_set_operations * Recursively handle one step in a tree of set operations * * tuple_fraction: fractionof tuples we expect to retrieve from node * colTypes: OID list of set-op's result column datatypes * colCollations:OID list of set-op's result column collations + * groupClauses: parent setop's grouping clause. * junkOK: if true, child resjunk columns may be left in the result * flag:if >= 0, add a resjunk output column indicating value of flag * refnames_tlist: targetlist to take column names from @@ -215,6 +259,7 @@ static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root, double tuple_fraction, List *colTypes, List *colCollations, + List *groupClauses, bool junkOK, int flag, List *refnames_tlist, List **sortClauses, double *pNumGroups) @@ -223,14 +268,21 @@ recurse_set_operations(Node *setOp, PlannerInfo *root, { RangeTblRef *rtr = (RangeTblRef*) setOp; RangeTblEntry *rte = root->simple_rte_array[rtr->rtindex]; - Query *subquery = rte->subquery; + Query *subquery = rte->subquery, + *pdquery; RelOptInfo *rel; - PlannerInfo *subroot; + PlannerInfo *subroot, + *pdsubroot; /* 'pd' comes from 'Pushed Down' */ + PlannerGlobal *pdglob; Plan *subplan, - *plan; + *plan, + *pdplan; + bool try_pushdown = false; Assert(subquery != NULL); + *sortClauses = NIL; + /* * We need to build a RelOptInfo for each leaf subquery. This isn't * used for anything here,but it carries the subroot data structures @@ -243,12 +295,125 @@ recurse_set_operations(Node *setOp, PlannerInfo *root, /* * Generate plan for primitivesubquery + * + * Consider whether ORDER BY or groupings of the parent set operation + * can be pushed down onto this subquery. */ + + /* + * Consider whether ORDER BY or groupings of the parent set operation + * can be pushed down onto this subquery. + */ + if(root->parse->sortClause && !subquery->sortClause && + tuple_fraction >= 1) + { + ListCell *lcs, *lcd; + int refno = 1; + + /* + * Check if the sort cluase of the root can be applied on this + * subquery. All non-junk tlist items shoud be simple Var and + * their types match and ressortgroupref is ordered in their + * appearance order. + */ + try_pushdown = true; + forboth(lcs, root->parse->targetList, lcd, subquery->targetList) + { + TargetEntry *ste = (TargetEntry*) lfirst(lcs); + TargetEntry *dte = (TargetEntry*) lfirst(lcd); + Node *sexpr = (Node*)ste->expr; + Node *dexpr = (Node*)dte->expr; + + if (ste->resjunk && dte->resjunk) continue; + + if (ste->resjunk != dte->resjunk || + !IsA(sexpr, Var) || !IsA(dexpr, Var) || + exprType(sexpr) != exprType(dexpr) || + (root->parse->sortClause && + ste->ressortgroupref != refno++)) + { + try_pushdown = false; + break; + } + } + } + + if (try_pushdown) + { + pdquery = copyObject(subquery); + + if (!pdquery->sortClause) + { + ListCell *lcs, *lcd; + + pdquery->sortClause = root->parse->sortClause; + + forboth(lcs, root->parse->targetList, + lcd, pdquery->targetList) + { + TargetEntry *ste = (TargetEntry*) lfirst(lcs); + TargetEntry *dte = (TargetEntry*) lfirst(lcd); + dte->ressortgroupref = ste->ressortgroupref; + } + } + + /* + * Pushing down groupings. Set operations doesn't accept + * distinct clauses. + */ + if (!pdquery->setOperations && + !pdquery->distinctClause && groupClauses) + + pdquery->distinctClause = + generate_setop_grouplist(groupClauses, + pdquery->targetList, + pdquery->sortClause); + + if (tuple_fraction >= 1 && + !pdquery->limitCount && !pdquery->limitOffset) + pdquery->limitCount = + (Node*)makeConst(INT8OID, -1, InvalidOid, sizeof(int64), + Int64GetDatum(tuple_fraction), + false, FLOAT8PASSBYVAL); + + pdglob = save_plannerglobal(root->glob); + } + subplan = subquery_planner(root->glob, subquery, root, false, tuple_fraction, &subroot); + if (try_pushdown) + { + pdplan = subquery_planner(pdglob, pdquery, + root, + false, tuple_fraction, + &pdsubroot); + + if (pdplan->total_cost < subplan->total_cost) + { + subplan = pdplan; + subroot = pdsubroot; + /* + * Glob cannot be moved because this is referred to from + * many places by its address. So set the address of the + * original glob to subroot, then copy new values there. + */ + subroot->glob = root->glob; + *root->glob = *pdglob; + } + } + + /* + * This plan is sorted on sortClause if any, else sorted + * on distinctClause. + */ + if (subquery->sortClause) + *sortClauses = subquery->sortClause; + else + *sortClauses = subquery->distinctClause; + /* Save subroot and subplan in RelOptInfo for setrefs.c */ rel->subplan = subplan; rel->subroot =subroot; @@ -290,12 +455,6 @@ recurse_set_operations(Node *setOp, PlannerInfo *root, rtr->rtindex, subplan); - /* - * We don't bother to determine the subquery's output ordering since - * it won't be reflected in the set-op result anyhow. - */ - *sortClauses = NIL; - return plan; } else if (IsA(setOp, SetOperationStmt)) @@ -379,12 +538,14 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root, */ lplan = recurse_set_operations(setOp->larg,root, tuple_fraction, setOp->colTypes, setOp->colCollations, + setOp->groupClauses, false, -1, refnames_tlist, sortClauses, NULL); /* The right plan will want to look at the left one ... */ root->non_recursive_plan= lplan; rplan = recurse_set_operations(setOp->rarg, root, tuple_fraction, setOp->colTypes, setOp->colCollations, + setOp->groupClauses, false, -1, refnames_tlist, sortClauses, NULL); root->non_recursive_plan = NULL; @@ -409,7 +570,8 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root, double dNumGroups; /* Identify the grouping semantics */ - groupList = generate_setop_grouplist(setOp, tlist); + groupList = + generate_setop_grouplist(setOp->groupClauses, tlist, NULL); /* We only support hashing here */ if (!grouping_is_hashable(groupList)) @@ -452,20 +614,7 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root, List *planlist; List *tlist; Plan *plan; - - /* - * If plain UNION, tell children to fetch all tuples. - * - * Note: in UNION ALL, we pass the top-level tuple_fraction unmodified to - * each arm of the UNION ALL. One could make a case for reducing the - * tuple fraction for later arms (discounting by the expected size of the - * earlier arms' results) but it seems not worth the trouble. The normal - * case where tuple_fraction isn't already zero is a LIMIT at top level, - * and passing it down as-is is usually enough to get the desired result - * of preferring fast-start plans. - */ - if (!op->all) - tuple_fraction = 0.0; + List *lsort, *rsort; /* * If any of my children are identical UNION nodes (same op, all-flag, and @@ -475,34 +624,99 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root, */ planlist = list_concat(recurse_union_children(op->larg,root, tuple_fraction, - op, refnames_tlist), + op, refnames_tlist, + &lsort), recurse_union_children(op->rarg, root, tuple_fraction, - op, refnames_tlist)); + op, refnames_tlist, + &rsort)); /* - * Generate tlist for Append plan node. + * Generate tlist for Append/MergeAppend plan node. * - * The tlist for an Append plan isn't important as far as the Append is - * concerned, but we must make it look real anyway for the benefit of the - * next plan level up. + * The tlist for an Append/MergeAppend plan isn't important as far as the + * they are concerned, but we must make it look real anyway for the + * benefit of the next plan level up. */ tlist = generate_append_tlist(op->colTypes, op->colCollations, false, planlist, refnames_tlist); - /* - * Append the child results together. - */ - plan = (Plan *) make_append(planlist, tlist); - - /* - * For UNION ALL, we just need the Append plan. For UNION, need to add - * node(s) to remove duplicates. - */ - if (op->all) - *sortClauses = NIL; /* result of UNION ALL is always unsorted */ + if (lsort != NIL && equal(lsort, rsort)) + { + /* + * Generate MergeAppend plan if both children are sorted on the same + * sort clause or groupClauses. + */ + ListCell *lc, *slc; + int i = 0; + double total_size; + Plan *p; + List *distinctClause; + + MergeAppend *node = makeNode(MergeAppend); + node->plan.targetlist = tlist; + node->plan.qual = NIL; + node->plan.lefttree = NULL; + node->plan.righttree = NULL; + node->mergeplans = planlist; + node->numCols = list_length(root->parse->targetList); + node->sortColIdx = (AttrNumber*)palloc(sizeof(AttrNumber) * node->numCols); + node->sortOperators = (Oid*)palloc(sizeof(Oid) * node->numCols); + node->collations = (Oid*)palloc(sizeof(Oid) * node->numCols); + node->nullsFirst = (bool*)palloc(sizeof(bool) * node->numCols); + + distinctClause = + generate_setop_grouplist(op->groupClauses, + node->plan.targetlist, + root->parse->sortClause); + forboth (slc, distinctClause, lc, node->plan.targetlist) + { + TargetEntry *te = (TargetEntry*) lfirst(lc); + SortGroupClause *sgc = (SortGroupClause *) lfirst(slc); + + Assert(sgc->tleSortGroupRef == te->ressortgroupref); + node->sortColIdx[i] = i + 1; + node->sortOperators[i] = sgc->sortop; + node->collations[i] = exprCollation((Node*)te->expr); + node->nullsFirst[i] = sgc->nulls_first; + i++; + } + lc = list_head(node->mergeplans); + p = (Plan*) lfirst(lc); + node->plan.startup_cost = p->startup_cost; + node->plan.total_cost = p->total_cost; + node->plan.plan_rows = p->plan_rows; + total_size = p->plan_rows * p->plan_width; + for_each_cell(lc, lnext(lc)) + { + p = (Plan*) lfirst(lc); + node->plan.total_cost += p->total_cost; + node->plan.plan_rows += p->plan_rows; + total_size += p->plan_rows * p->plan_width; + } + node->plan.plan_width = rint(total_size / node->plan.plan_rows); + *sortClauses = root->parse->sortClause; + plan = (Plan*)node; + if (!op->all) + plan = make_union_unique(op, plan, root, tuple_fraction, + &distinctClause); + } else - plan = make_union_unique(op, plan, root, tuple_fraction, sortClauses); + { + /* + * Append the child results together. + */ + plan = (Plan *) make_append(planlist, tlist); + + /* + * For UNION ALL, we just need the Append plan. For UNION, need to + * add node(s) to remove duplicates. + */ + *sortClauses = NIL; + + if (!op->all) + plan = make_union_unique(op, plan, root, tuple_fraction, sortClauses); + } /* * Estimate number of groups if caller wants it. For now we just assume @@ -544,12 +758,14 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root, lplan = recurse_set_operations(op->larg,root, 0.0 /* all tuples needed */ , op->colTypes, op->colCollations, + op->groupClauses, false, 0, refnames_tlist, &child_sortclauses, &dLeftGroups); rplan = recurse_set_operations(op->rarg,root, 0.0 /* all tuples needed */ , op->colTypes, op->colCollations, + op->groupClauses, false, 1, refnames_tlist, &child_sortclauses, &dRightGroups); @@ -589,7 +805,7 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root, plan = (Plan *) make_append(planlist,tlist); /* Identify the grouping semantics */ - groupList = generate_setop_grouplist(op, tlist); + groupList = generate_setop_grouplist(op->groupClauses, tlist, NULL); /* punt if nothing to group on (can this happen?)*/ if (groupList == NIL) @@ -678,9 +894,13 @@ static List *recurse_union_children(Node *setOp, PlannerInfo *root, double tuple_fraction, SetOperationStmt *top_union, - List *refnames_tlist) + List *refnames_tlist, + List **child_sortclauses){ - List *child_sortclauses; + List *lplan, *rplan; + List *lsort, *rsort; + + *child_sortclauses = NIL; if (IsA(setOp, SetOperationStmt)) { @@ -691,14 +911,23 @@ recurse_union_children(Node *setOp, PlannerInfo *root, equal(op->colTypes, top_union->colTypes)) { /* Same UNION, so fold children into parent's subplan list */ - return list_concat(recurse_union_children(op->larg, root, - tuple_fraction, - top_union, - refnames_tlist), - recurse_union_children(op->rarg, root, - tuple_fraction, - top_union, - refnames_tlist)); + lplan = recurse_union_children(op->larg, root, + tuple_fraction, + top_union, + refnames_tlist, + &lsort); + rplan = recurse_union_children(op->rarg, root, + tuple_fraction, + top_union, + refnames_tlist, + &rsort); + /* + * Propagate whether all the descendents are sorted with same + * sortClause. + */ + if (lsort != NIL && equal(lsort, rsort)) + *child_sortclauses = lsort; + return list_concat(lplan, rplan); } } @@ -715,13 +944,16 @@ recurse_union_children(Node *setOp, PlannerInfo *root, tuple_fraction, top_union->colTypes, top_union->colCollations, + top_union->groupClauses, false,-1, refnames_tlist, - &child_sortclauses, NULL)); + child_sortclauses, NULL));}/* * Add nodes to the given plan tree to unique-ifythe result of a UNION. + * + * If the sortClause is given, we assume the plan is already sorted on it. */static Plan *make_union_unique(SetOperationStmt*op, Plan *plan, @@ -731,9 +963,11 @@ make_union_unique(SetOperationStmt *op, Plan *plan, List *groupList; double dNumGroups; long numGroups; + bool sort_needed = true; /* Identify the grouping semantics */ - groupList = generate_setop_grouplist(op, plan->targetlist); + groupList = generate_setop_grouplist(op->groupClauses, + plan->targetlist, *sortClauses); /* punt if nothing to group on (can this happen?)*/ if (groupList == NIL) @@ -742,6 +976,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan, return plan; } + if (*sortClauses && equal(*sortClauses, groupList)) + sort_needed = false; + /* * XXX for the moment, take the number of distinct groups as equal to the * total input size, ie, theworst case. This is too conservative, but we @@ -756,7 +993,8 @@ make_union_unique(SetOperationStmt *op, Plan *plan, numGroups = (long) Min(dNumGroups, (double) LONG_MAX); /* Decide whether to hash or sort */ - if (choose_hashed_setop(root, groupList, plan, + if (sort_needed && + choose_hashed_setop(root, groupList, plan, dNumGroups, dNumGroups, tuple_fraction, "UNION")) { @@ -778,7 +1016,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan, else { /* Sort and Unique */ - plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan); + if (sort_needed) + plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan); + plan = (Plan *) make_unique(plan, groupList); plan->plan_rows = dNumGroups; /* We know thesort order of the result */ @@ -1145,23 +1385,34 @@ generate_append_tlist(List *colTypes, List *colCollations, * the parser output representation doesn'tinclude a tlist for each * setop. So what we need to do here is copy that list and install * proper sortgrouprefsinto it and into the targetlist. + * + * sortClause is consulted if provided to avoid re-sorting with different + * orderings on the same keys later. */static List * -generate_setop_grouplist(SetOperationStmt *op, List *targetlist) +generate_setop_grouplist(List *groupClauses, List *targetlist, + List *sortClauses){ - List *grouplist = (List *) copyObject(op->groupClauses); + List *grouplist = (List *) copyObject(groupClauses); ListCell *lg; + ListCell *ls = NULL; ListCell *lt; Index refno = 1; lg = list_head(grouplist); + + if (sortClauses) + ls = list_head(sortClauses); + foreach(lt, targetlist) { TargetEntry *tle = (TargetEntry *) lfirst(lt); - SortGroupClause *sgc; + SortGroupClause *sgc, *sgc_sort = NULL; - /* tlist shouldn't have any sortgrouprefs yet */ - Assert(tle->ressortgroupref == 0); + /* + * tlist shouldn't have any sortgrouprefs yet, or should be unchanged + */ + Assert(tle->ressortgroupref == 0 || tle->ressortgroupref == refno); if (tle->resjunk) continue; /* ignore resjunk columns */ @@ -1174,6 +1425,45 @@ generate_setop_grouplist(SetOperationStmt *op, List *targetlist) /* we could use assignSortGroupRefhere, but seems a bit silly */ sgc->tleSortGroupRef = tle->ressortgroupref = refno++; + + if (ls) + { + /* + * If sortClauses is provided, try to adjust the sorting order to + * get the chance for omitting sorting for grouping later. + */ + sgc_sort = (SortGroupClause *) lfirst(ls); + ls = lnext(ls); + if (sgc->sortop != sgc_sort->sortop) + { + Oid reverse = InvalidOid; + Oid opfamily, opcintype; + int16 strategy; + + if (get_ordering_op_properties(sgc->sortop, + &opfamily, &opcintype, &strategy)) + { + switch (strategy) + { + case BTLessStrategyNumber: + strategy = BTGreaterStrategyNumber; break; + case BTGreaterStrategyNumber: + strategy = BTLessStrategyNumber; break; + } + + reverse = get_opfamily_member(opfamily, + opcintype, + opcintype, + strategy); + if (sgc_sort->sortop == reverse) + { + sgc->sortop = reverse; + sgc->nulls_first = sgc_sort->nulls_first; + } + } + } + } + } Assert(lg == NULL); return grouplist;
On Wed, 2013-11-13 at 17:25 +0900, Kyotaro HORIGUCHI wrote: > Added explicit cast there and rebased to current master. > Checked no new warning by this patch. > make check succeeded at both $(src) and $(src)/src/test. This patch also has whitespace errors detected by git diff --check.
Hello, As I mentioned on another thread, I'll reorganize patches including this and repost them. Please wait for a while. -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > [ union_uses_idx_v2_20131113.patch ] I'm aware that you said you were going to refactor this, but I took a quick look through it anyway. I don't think mere refactoring is going to make me happy with it :-(. The basic problem here, as well as with some of the hackery that's been proposed recently in planner.c, is that we can't really get much further with improving this layer of the planner until we bite the bullet and convert this layer to work with Paths not finished Plans. Look at what you're proposing here: do a complete planning cycle on the subquery (underneath which both ordered and unordered Paths will be considered, and one or the other will get thrown away). Then do *another* complete planning cycle with ordered output forced. Then compare costs of the results. This is hugely inefficient, and it produces far-from-ideal results too, because you're forced to make a decision right there on which subquery plan to use; you can't make the globally optimal choice after considering costs of all the subqueries. What we need to do is fix things so we can get multiple Paths out of the subquery planning step, some ordered and some not. Then we could construct Paths for both the brute force and merge-append styles of computing the UNION result, and finally choose the cheapest Path at the top level. The same goes for hacking around in grouping_planner. That function is well past the point of collapsing of its own weight; we need to invest some effort in refactoring before we can afford to add much more complexity there. And I think the logical way to refactor is to rewrite the code in a Path-generation-and-comparison style. (Actually, grouping_planner would need to be fixed first, since that's what would have to produce the Paths we're hoping to compare in prepunion.) I had hoped to make some progress on that rewrite this past summer, but ended up with no time to work on it :-(. There's going to be a lot of boring infrastructure work before we see much in the way of user-visible improvement, I'm afraid, so it's kind of hard to make time for it. regards, tom lane
Thank you for looking in detail for this patch, and giving thoughtful advices. > I'm aware that you said you were going to refactor this, but I took a > quick look through it anyway. I don't think mere refactoring is going > to make me happy with it :-(. Ouch.. Anyway I've refactored this patch altogether with the another 'unique index' patch and re-splitted into three parts. One is the 'unique-index stuff' and the second is 'Adding pathkeys and unique info into struct Plan' patch and the third is 'maily-in-prepunion stuff'. They will be sent in other messages although for the scarce chance to be picked up :-) > The basic problem here, as well as with some of the hackery that's > been proposed recently in planner.c, is that we can't really get much > further with improving this layer of the planner until we bite the > bullet and convert this layer to work with Paths not finished Plans. It is right but hard to go. I would be able to put my little power on the issue but the planner hardly seems to be get refactored gradually. > Look at what you're proposing here: do a complete planning cycle on > the subquery (underneath which both ordered and unordered Paths will > be considered, and one or the other will get thrown away). Then do > *another* complete planning cycle with ordered output forced. Then > compare costs of the results. Exactly. > This is hugely inefficient, and it produces far-from-ideal > results too, because you're forced to make a decision right > there on which subquery plan to use; you can't make the > globally optimal choice after considering costs of all the > subqueries. Umm. Yes, I know I've done it in somewhat brute way and it costs a little too much if the subqueries of UNION is rather large, but I suppose it comparably small to the whole execution time for most cases. Putting it aside, from the viewpoint of global-optimizations, current planner also doesn't do such optimizations. Moreover it doesn't consider sorted plans possibly available and effective. Ignoring the additional planning cost (:-), for the UNION queries with LIMITS on large partitioned tables with apropriate indexes (mmm. I suppose it is not so narrow gate..), the query runtime should be extremely shortend. > What we need to do is fix things so we can get multiple Paths > out of the subquery planning step, some ordered and some not. > Then we could construct Paths for both the brute force and > merge-append styles of computing the UNION result, and finally > choose the cheapest Path at the top level. Agreed with it at a glance. It sounds quite reasonable. I've been a bit worried about that the paths and plans seem to be fused in some extent in grouping_planner, and prepunion (plan_set_operations) is jumping over path-generation phase to make plans directly. (And about that I pushed it more on the way..) > The same goes for hacking around in grouping_planner. That function > is well past the point of collapsing of its own weight; we need > to invest some effort in refactoring before we can afford to add > much more complexity there. And I think the logical way to refactor > is to rewrite the code in a Path-generation-and-comparison style. > (Actually, grouping_planner would need to be fixed first, since > that's what would have to produce the Paths we're hoping to compare > in prepunion.) It seems necessary but also seems far far away and hard to go. > I had hoped to make some progress on that rewrite this past summer, > but ended up with no time to work on it :-(. There's going to be > a lot of boring infrastructure work before we see much in the way > of user-visible improvement, I'm afraid, so it's kind of hard to > make time for it. Anyway, because I've happened to pass too close by the issue, I will consider on that for some time (although I would hardly come up with spiffy solution in a short time.) regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, I've totally refactored the series of pathes and cut out the appropriate portion as 'UNION stuff'. This patch rquires another 'pathkeys expansion using fully-orderd index' patch to work. http://www.postgresql.org/message-id/20131119.203516.251520490.horiguchi.kyotaro@lab.ntt.co.jp 1. plan_with_pathkeys_v1_20131119.patch This is a patch adding pathkeys (and uniqueness) information onstruct Plan to do what the latter part of grouping_plannerdoeson current_pathkeys in seemingly autonomous way. As in theprevious discussion, this is in a sense a baddirection togo. But this patch make the path manipulations occured in thelatter of grouping_planner simple and would reduceextra sortingand uniq'ing. 2. union_uses_idx_v3_20131119.patch This is made to apply after the first patch above. The core of"Using indices for UNION". This is - as in the previousdiscussion- also not in so smart way to do that. Most of all,this patch runs subquery_planner additionally for UNIONwithsome conditions. Nevertheless, the expected gain should not besmall. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 5947e5b..d8a17a0 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -98,11 +98,13 @@ static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);static IndexScan *make_indexscan(List*qptlist, List *qpqual, Index scanrelid, Oid indexid, List *indexqual, List *indexqualorig, List *indexorderby, List *indexorderbyorig, + List *pathkeys, bool uniquely_ordered, ScanDirection indexscandir);static IndexOnlyScan *make_indexonlyscan(List*qptlist, List *qpqual, Index scanrelid, Oid indexid, List *indexqual,List *indexorderby, List *indextlist, + List *pathkeys, bool uniquely_ordered, ScanDirection indexscandir);static BitmapIndexScan*make_bitmap_indexscan(Index scanrelid, Oid indexid, List *indexqual, @@ -150,8 +152,8 @@ static MergeJoin *make_mergejoin(List *tlist, bool *mergenullsfirst, Plan*lefttree, Plan *righttree, JoinType jointype); -static Sort *make_sort(PlannerInfo *root, Plan *lefttree, int numCols, - AttrNumber *sortColIdx, Oid *sortOperators, +static Sort *make_sort(PlannerInfo *root, Plan *lefttree, List *pathkeys, + int numCols, AttrNumber *sortColIdx, Oid *sortOperators, Oid *collations, bool *nullsFirst, double limit_tuples);static Plan *prepare_sort_from_pathkeys(PlannerInfo *root, @@ -750,6 +752,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path) plan->qual = NIL; plan->lefttree= NULL; plan->righttree = NULL; + plan->pathkeys = pathkeys; + plan->is_unique = false; /* Compute sort column info, and adjust MergeAppend's tlist as needed */ (void) prepare_sort_from_pathkeys(root,plan, pathkeys, @@ -810,7 +814,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path) /* Now, insert a Sortnode if subplan isn't sufficiently ordered */ if (!pathkeys_contained_in(pathkeys, subpath->pathkeys)) - subplan = (Plan *) make_sort(root, subplan, numsortkeys, + subplan = (Plan *) make_sort(root, subplan, pathkeys, numsortkeys, sortColIdx,sortOperators, collations, nullsFirst, best_path->limit_tuples); @@ -1263,6 +1267,8 @@ create_indexscan_plan(PlannerInfo *root, fixed_indexquals, fixed_indexorderbys, best_path->indexinfo->indextlist, + best_path->path.pathkeys, + false, best_path->indexscandir); else scan_plan = (Scan *) make_indexscan(tlist, @@ -1273,6 +1279,8 @@ create_indexscan_plan(PlannerInfo *root, stripped_indexquals, fixed_indexorderbys, indexorderbys, + best_path->path.pathkeys, + false, best_path->indexscandir); copy_path_costsize(&scan_plan->plan, &best_path->path); @@ -3245,6 +3253,8 @@ make_indexscan(List *qptlist, List *indexqualorig, List *indexorderby, List *indexorderbyorig, + List *pathkeys, + bool uniquely_ordered, ScanDirection indexscandir){ IndexScan *node = makeNode(IndexScan); @@ -3255,6 +3265,8 @@ make_indexscan(List *qptlist, plan->qual = qpqual; plan->lefttree = NULL; plan->righttree= NULL; + plan->pathkeys = pathkeys; + plan->is_unique = uniquely_ordered; node->scan.scanrelid = scanrelid; node->indexid = indexid; node->indexqual= indexqual; @@ -3274,6 +3286,8 @@ make_indexonlyscan(List *qptlist, List *indexqual, List *indexorderby, List *indextlist, + List *pathkeys, + bool uniquely_ordered, ScanDirection indexscandir){ IndexOnlyScan *node = makeNode(IndexOnlyScan); @@ -3284,6 +3298,8 @@ make_indexonlyscan(List *qptlist, plan->qual = qpqual; plan->lefttree = NULL; plan->righttree= NULL; + plan->pathkeys = pathkeys; + plan->is_unique = uniquely_ordered; node->scan.scanrelid = scanrelid; node->indexid = indexid; node->indexqual= indexqual; @@ -3753,8 +3769,8 @@ make_mergejoin(List *tlist, * limit_tuples is as for cost_sort (in particular, pass -1 if no limit)*/static Sort * -make_sort(PlannerInfo *root, Plan *lefttree, int numCols, - AttrNumber *sortColIdx, Oid *sortOperators, +make_sort(PlannerInfo *root, Plan *lefttree, List *pathkeys, + int numCols, AttrNumber *sortColIdx, Oid *sortOperators, Oid *collations, bool *nullsFirst, double limit_tuples){ @@ -3776,6 +3792,8 @@ make_sort(PlannerInfo *root, Plan *lefttree, int numCols, plan->qual = NIL; plan->lefttree =lefttree; plan->righttree = NULL; + plan->pathkeys = pathkeys; + plan->is_unique = false; node->numCols = numCols; node->sortColIdx = sortColIdx; node->sortOperators = sortOperators; @@ -4125,7 +4143,7 @@ make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, List *pathkeys, &nullsFirst); /* Now build the Sort node */ - return make_sort(root, lefttree, numsortkeys, + return make_sort(root, lefttree, pathkeys, numsortkeys, sortColIdx, sortOperators, collations, nullsFirst, limit_tuples);} @@ -4147,6 +4165,7 @@ make_sort_from_sortclauses(PlannerInfo *root, List *sortcls, Plan *lefttree) Oid *sortOperators; Oid *collations; bool *nullsFirst; + List *pathkeys; /* Convert list-ish representation to arrays wanted by executor */ numsortkeys = list_length(sortcls); @@ -4168,7 +4187,9 @@ make_sort_from_sortclauses(PlannerInfo *root, List *sortcls, Plan *lefttree) numsortkeys++; } - return make_sort(root, lefttree, numsortkeys, + pathkeys = make_pathkeys_for_sortclauses(root, sortcls, sub_tlist); + + return make_sort(root, lefttree, pathkeys, numsortkeys, sortColIdx, sortOperators, collations, nullsFirst, -1.0);} @@ -4199,6 +4220,7 @@ make_sort_from_groupcols(PlannerInfo *root, Oid *sortOperators; Oid *collations; bool *nullsFirst; + List *pathkeys; /* Convert list-ish representation to arrays wanted by executor */ numsortkeys = list_length(groupcls); @@ -4220,10 +4242,17 @@ make_sort_from_groupcols(PlannerInfo *root, sortOperators[numsortkeys] = grpcl->sortop; collations[numsortkeys] = exprCollation((Node *) tle->expr); nullsFirst[numsortkeys] = grpcl->nulls_first; + + /* This tlist should not be bound with any other orderig clause */ + Assert(tle->ressortgroupref == 0); + tle->ressortgroupref = grpcl->tleSortGroupRef; + numsortkeys++; } - return make_sort(root, lefttree, numsortkeys, + pathkeys = make_pathkeys_for_sortclauses(root, groupcls, sub_tlist); + + return make_sort(root, lefttree, pathkeys, numsortkeys, sortColIdx, sortOperators, collations, nullsFirst, -1.0);} @@ -4339,6 +4368,16 @@ make_agg(PlannerInfo *root, List *tlist, List *qual, plan->lefttree = lefttree; plan->righttree= NULL; + /* + * We can safely assume that the lefttree and therefore the result is + * sorted on group pathkeys and unique if given aggstrategy is AGG_SORTED. + */ + if (node->aggstrategy == AGG_SORTED) + plan->pathkeys = root->group_pathkeys; + else + plan->pathkeys = NIL; + plan->is_unique = true; + return node;} @@ -4448,6 +4487,13 @@ make_group(PlannerInfo *root, plan->lefttree = lefttree; plan->righttree = NULL; + /* + * We assume that lefttree is ordered on the same pathkeys with that this + * group node wants. + */ + plan->pathkeys = lefttree->pathkeys; + plan->is_unique = true; + return node;} @@ -4486,6 +4532,8 @@ make_unique(Plan *lefttree, List *distinctList) plan->qual = NIL; plan->lefttree = lefttree; plan->righttree = NULL; + plan->pathkeys = lefttree->pathkeys; + plan->is_unique = true; /* * convert SortGroupClause list into arrays of attr indexes and equality @@ -4669,6 +4717,8 @@ make_limit(Plan *lefttree, Node *limitOffset, Node *limitCount, plan->qual = NIL; plan->lefttree= lefttree; plan->righttree = NULL; + plan->pathkeys = lefttree->pathkeys; + plan->is_unique = lefttree->is_unique; node->limitOffset = limitOffset; node->limitCount = limitCount; @@ -4717,6 +4767,10 @@ make_result(PlannerInfo *root, plan->qual = NIL; plan->lefttree = subplan; plan->righttree= NULL; + + /* this plan emits at most one row when subplan is NULL */ + if (subplan == NULL) plan->is_unique = true; + node->resconstantqual = resconstantqual; return node; diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index d8aa35d..a09d38f 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1011,6 +1011,19 @@ inheritance_planner(PlannerInfo *root) SS_assign_special_param(root));} +static bool +plan_is_ordered(Plan *plan, List *pathkeys) +{ + if (pathkeys_contained_in(pathkeys, plan->pathkeys)) + return true; + + if (plan->pathkeys && plan->is_unique && + pathkeys_contained_in(plan->pathkeys, pathkeys)) + return true; + + return false; +} +/*-------------------- * grouping_planner * Perform planning steps related to grouping, aggregation, etc. @@ -1039,7 +1052,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) int64 count_est = 0; double limit_tuples = -1.0; Plan *result_plan; - List *current_pathkeys; double dNumGroups = 0; bool use_hashed_distinct = false; bool tested_hashed_distinct = false; @@ -1081,15 +1093,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) &set_sortclauses); /* - * Calculate pathkeys representing the sort order (if any) of the set - * operation's result. We have to do this before overwriting the sort - * key information... - */ - current_pathkeys = make_pathkeys_for_sortclauses(root, - set_sortclauses, - result_plan->targetlist); - - /* * We should not need to call preprocess_targetlist, since we must be * in a SELECT query node. Instead, use the targetlist returned by * plan_set_operations (since this tells whether it returned any @@ -1442,15 +1445,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) tlist, &agg_costs, best_path); - if (result_plan != NULL) - { - /* - * optimize_minmax_aggregates generated the full plan, with the - * right tlist, and it has no sort order. - */ - current_pathkeys = NIL; - } - else + if (result_plan == NULL) { /* * Normal case --- create a plan according to query_planner's @@ -1459,11 +1454,10 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) bool need_sort_for_grouping= false; result_plan = create_plan(root, best_path); - current_pathkeys = best_path->pathkeys; /* Detect if we'll need an explicit sort for grouping */ if (parse->groupClause && !use_hashed_grouping && - !pathkeys_contained_in(root->group_pathkeys, current_pathkeys)) + !plan_is_ordered(result_plan, root->group_pathkeys)) { need_sort_for_grouping= true; @@ -1541,37 +1535,20 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) extract_grouping_ops(parse->groupClause), numGroups, result_plan); - /* Hashed aggregation produces randomly-ordered results */ - current_pathkeys = NIL; } else if (parse->hasAggs) { /*Plain aggregate plan --- sort if needed */ AggStrategy aggstrategy; - if (parse->groupClause) + aggstrategy = parse->groupClause ? AGG_SORTED : AGG_PLAIN; + if (aggstrategy == AGG_SORTED && need_sort_for_grouping) { - if (need_sort_for_grouping) - { - result_plan = (Plan *) - make_sort_from_groupcols(root, - parse->groupClause, - groupColIdx, - result_plan); - current_pathkeys = root->group_pathkeys; - } - aggstrategy = AGG_SORTED; - - /* - * The AGG node will not change the sort ordering of its - * groups, so current_pathkeys describes the result too. - */ - } - else - { - aggstrategy = AGG_PLAIN; - /* Result will be only one row anyway; no sort order */ - current_pathkeys = NIL; + result_plan = (Plan *) + make_sort_from_groupcols(root, + parse->groupClause, + groupColIdx, + result_plan); } result_plan = (Plan *) make_agg(root, @@ -1601,7 +1578,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) parse->groupClause, groupColIdx, result_plan); - current_pathkeys = root->group_pathkeys; } result_plan = (Plan *) make_group(root, @@ -1612,7 +1588,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) extract_grouping_ops(parse->groupClause), dNumGroups, result_plan); - /* The Group node won't change sort ordering */ } else if (root->hasHavingQual) { @@ -1722,13 +1697,14 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) result_plan, window_pathkeys, -1.0); - if (!pathkeys_contained_in(window_pathkeys, - current_pathkeys)) - { - /* we do indeed need to sort */ + + /* + * we do indeed need to sort if result_plan is not ordered + * on window_pathkeys + */ + if (!plan_is_ordered(result_plan, window_pathkeys)) result_plan = (Plan *) sort_plan; - current_pathkeys = window_pathkeys; - } + /* In either case, extract the per-column information */ get_column_info_for_window(root,wc, tlist, sort_plan->numCols, @@ -1792,12 +1768,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) long numDistinctRows; /* - * If there was grouping or aggregation, use the current number of - * rows as the estimated number of DISTINCT rows (ie, assume the - * result was already mostly unique). If not, use the number of + * If result_plan emits already distinct'ed rows, use the current + * number of rows as the estimated number of DISTINCT rows (ie, assume + * the result was already unique). If not, use the number of * distinct-groups calculated previously. */ - if (parse->groupClause || root->hasHavingQual || parse->hasAggs) + if (result_plan->is_unique) dNumDistinctRows = result_plan->plan_rows; else dNumDistinctRows= dNumGroups; @@ -1822,7 +1798,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) result_plan->total_cost, result_plan->startup_cost, result_plan->total_cost, - current_pathkeys, + result_plan->pathkeys, dNumDistinctRows); } @@ -1840,8 +1816,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) extract_grouping_ops(parse->distinctClause), numDistinctRows, result_plan); - /* Hashed aggregation produces randomly-ordered results */ - current_pathkeys = NIL; } else { @@ -1865,29 +1839,23 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) else needed_pathkeys= root->distinct_pathkeys; - if (!pathkeys_contained_in(needed_pathkeys, current_pathkeys)) + if (!plan_is_ordered(result_plan, needed_pathkeys)) { - if (list_length(root->distinct_pathkeys) >= - list_length(root->sort_pathkeys)) - current_pathkeys = root->distinct_pathkeys; - else - { - current_pathkeys = root->sort_pathkeys; - /* Assert checks that parser didn't mess up... */ - Assert(pathkeys_contained_in(root->distinct_pathkeys, - current_pathkeys)); - } - + /* Assert checks that parser didn't mess up... */ + Assert(pathkeys_contained_in(root->distinct_pathkeys, + needed_pathkeys)); + Assert(pathkeys_contained_in(root->sort_pathkeys, + needed_pathkeys)); result_plan = (Plan *) make_sort_from_pathkeys(root, result_plan, - current_pathkeys, + needed_pathkeys, -1.0); } - result_plan = (Plan *) make_unique(result_plan, - parse->distinctClause); + if (!result_plan->is_unique) + result_plan = (Plan *) make_unique(result_plan, + parse->distinctClause); result_plan->plan_rows = dNumDistinctRows; - /* The Unique node won't change sort ordering */ } } @@ -1897,13 +1865,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) */ if (parse->sortClause) { - if (!pathkeys_contained_in(root->sort_pathkeys, current_pathkeys)) + if (!plan_is_ordered(result_plan, root->sort_pathkeys)) { result_plan = (Plan *) make_sort_from_pathkeys(root, result_plan, root->sort_pathkeys, limit_tuples); - current_pathkeys = root->sort_pathkeys; } } @@ -1914,18 +1881,10 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) * ModifyTable node instead.) */ if (parse->rowMarks) - { result_plan = (Plan *) make_lockrows(result_plan, root->rowMarks, SS_assign_special_param(root)); - /* - * The result can no longer be assumed sorted, since locking might - * cause the sort key columns to be replaced with new values. - */ - current_pathkeys = NIL; - } - /* * Finally, if there is a LIMIT/OFFSET clause, add the LIMIT node. */ @@ -1942,7 +1901,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) * Return the actual output orderingin query_pathkeys for possible use by * an outer query level. */ - root->query_pathkeys = current_pathkeys; + root->query_pathkeys = result_plan->pathkeys; return result_plan;} diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index 44ea0b7..ced07d9 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -101,6 +101,8 @@ typedef struct Plan */ double plan_rows; /* number of rows plan is expected to emit*/ int plan_width; /* average row width in bytes */ + List *pathkeys; /* ordered on this pathkeys if any */ + bool is_unique; /* emits unique result */ /* * Common structural data for all Plan types. diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index d8a17a0..8c60b2c 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -813,7 +813,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path) numsortkeys* sizeof(bool)) == 0); /* Now, insert a Sort node if subplan isn't sufficiently ordered */ - if (!pathkeys_contained_in(pathkeys, subpath->pathkeys)) + if (!path_is_ordered(subpath, pathkeys)) subplan = (Plan *) make_sort(root, subplan, pathkeys, numsortkeys, sortColIdx, sortOperators, collations,nullsFirst, @@ -1151,6 +1151,7 @@ create_indexscan_plan(PlannerInfo *root, List *fixed_indexquals; List *fixed_indexorderbys; ListCell *l; + bool uniquely_ordered = false; /* it should be a base rel... */ Assert(baserelid > 0); @@ -1258,6 +1259,20 @@ create_indexscan_plan(PlannerInfo *root, replace_nestloop_params(root, (Node *) indexorderbys); } + /* + * XXX: This is rather tricky. IndexPath's pathkeys may be both superset + * (including the same) or subset of key columns of the index. This path + * will emit distnct'edly ordered rows when the pathkeys contains the key + * columns and the index is fully ordered on the key columns. + * + * See the point calling truncate_useless_pathkeys in build_index_paths() + * for detail. + */ + if (list_length(best_path->path.pathkeys) >= + best_path->indexinfo->ncolumns && + best_path->indexinfo->full_ordered) + uniquely_ordered = true; + /* Finally ready to build the plan node */ if (indexonly) scan_plan = (Scan *) make_indexonlyscan(tlist, @@ -1268,7 +1283,7 @@ create_indexscan_plan(PlannerInfo *root, fixed_indexorderbys, best_path->indexinfo->indextlist, best_path->path.pathkeys, - false, + uniquely_ordered, best_path->indexscandir); else scan_plan = (Scan *) make_indexscan(tlist, @@ -1280,7 +1295,7 @@ create_indexscan_plan(PlannerInfo *root, fixed_indexorderbys, indexorderbys, best_path->path.pathkeys, - false, + uniquely_ordered, best_path->indexscandir); copy_path_costsize(&scan_plan->plan, &best_path->path); diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index a09d38f..a337c84 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1075,15 +1075,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) List *set_sortclauses; /* - * If there's a top-level ORDER BY, assume we have to fetch all the - * tuples. This might be too simplistic given all the hackery below - * to possibly avoid the sort; but the odds of accurate estimates here - * are pretty low anyway. - */ - if (parse->sortClause) - tuple_fraction = 0.0; - - /* * Construct the plan for set operations. The result will not need * any work except perhapsa top-level sort and/or LIMIT. Note that * any special work for recursive unions is the responsibility of diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index e249628..9fe0b66 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -29,6 +29,7 @@#include "postgres.h"#include <limits.h> +#include <math.h>#include "access/heapam.h"#include "access/htup_details.h" @@ -44,6 +45,7 @@#include "optimizer/planner.h"#include "optimizer/prep.h"#include "optimizer/tlist.h" +#include "optimizer/paths.h"#include "parser/parse_coerce.h"#include "parser/parsetree.h"#include "utils/lsyscache.h" @@ -60,7 +62,7 @@ typedef structstatic Plan *recurse_set_operations(Node *setOp, PlannerInfo *root, double tuple_fraction, List *colTypes, List *colCollations, - bool junkOK, + List *groupClauses, bool junkOK, int flag, List *refnames_tlist, List **sortClauses, double *pNumGroups);static Plan *generate_recursion_plan(SetOperationStmt *setOp, @@ -78,7 +80,8 @@ static Plan *generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,static List *recurse_union_children(Node*setOp, PlannerInfo *root, double tuple_fraction, SetOperationStmt *top_union, - List *refnames_tlist); + List *refnames_tlist, + List **child_sortclause);static Plan *make_union_unique(SetOperationStmt *op, Plan *plan, PlannerInfo *root, double tuple_fraction, List **sortClauses); @@ -97,7 +100,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations, bool flag, List *input_plans, List *refnames_tlist); -static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); +static List *generate_setop_grouplist(List *groupClauses, List *targetlist, + List *sortClauses);static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry*rte, Index rti);static void make_inh_translation_list(Relation oldrelation, @@ -152,6 +156,15 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction, Assert(parse->distinctClause == NIL); /* + * If there's a top-level ORDER BY, assume we have to fetch all the tuples + * except for UNION. This might be too simplistic given all the hackery + * below to possibly avoid the sort; but the odds of accurate estimates + * here are pretty low anyway. + */ + if (parse->sortClause && topop->op != SETOP_UNION) + tuple_fraction = 0.0; + + /* * We'll need to build RelOptInfos for each of the leaf subqueries, which * are RTE_SUBQUERY rangetable entriesin this Query. Prepare the index * arrays for that. @@ -186,18 +199,49 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction, */ return recurse_set_operations((Node*) topop, root, tuple_fraction, topop->colTypes, topop->colCollations, + topop->groupClauses, true, -1, leftmostQuery->targetList, sortClauses, NULL);}/* + * save_plannerglobal + * + * save planner global to allow multiple calls of subquery_planner at the same + * global status. This is done apartly from copyObject so as to do medium + * shallow copying. + */ +static PlannerGlobal * +save_plannerglobal(const PlannerGlobal *in) +{ + PlannerGlobal *save = makeNode(PlannerGlobal); + + save->boundParams = in->boundParams; + save->subplans = list_copy(in->subplans); + save->subroots = list_copy(in->subroots); + save->rewindPlanIDs = bms_copy(in->rewindPlanIDs); + save->finalrtable = list_copy(in->finalrtable); + save->finalrowmarks = list_copy(in->finalrowmarks); + save->resultRelations = list_copy(in->resultRelations); + save->relationOids = list_copy(in->relationOids); + save->invalItems = list_copy(in->invalItems); + save->nParamExec = in->nParamExec; + save->lastPHId = in->lastPHId; + save->lastRowMarkId = in->lastRowMarkId; + save->transientPlan = in->transientPlan; + + return save; +} + +/* * recurse_set_operations * Recursively handle one step in a tree of set operations * * tuple_fraction: fractionof tuples we expect to retrieve from node * colTypes: OID list of set-op's result column datatypes * colCollations:OID list of set-op's result column collations + * groupClauses: parent setop's grouping clause. * junkOK: if true, child resjunk columns may be left in the result * flag:if >= 0, add a resjunk output column indicating value of flag * refnames_tlist: targetlist to take column names from @@ -215,7 +259,7 @@ static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root, double tuple_fraction, List *colTypes, List *colCollations, - bool junkOK, + List *groupClauses, bool junkOK, int flag, List *refnames_tlist, List **sortClauses, double *pNumGroups){ @@ -224,13 +268,20 @@ recurse_set_operations(Node *setOp, PlannerInfo *root, RangeTblRef *rtr = (RangeTblRef *) setOp; RangeTblEntry *rte = root->simple_rte_array[rtr->rtindex]; Query *subquery = rte->subquery; + Query *pdquery = NULL; /* 'pd' comes from 'pushed down' */ RelOptInfo *rel; - PlannerInfo *subroot; + PlannerInfo *subroot, + *pdsubroot; + PlannerGlobal *pdglob; Plan *subplan, - *plan; + *plan, + *pdplan; + bool try_pushdown = false; Assert(subquery != NULL); + *sortClauses = NIL; + /* * We need to build a RelOptInfo for each leaf subquery. This isn't * used for anything here,but it carries the subroot data structures @@ -243,12 +294,146 @@ recurse_set_operations(Node *setOp, PlannerInfo *root, /* * Generate plan for primitivesubquery + * + * Consider pusing down ORDER BY or groupings of the parent set + * operation onto this subquery. */ + if(root->parse->sortClause && !subquery->sortClause && + tuple_fraction >= 1) + { + ListCell *lcs, *lcd; + int refno = 1; + + /* + * Check if the sort cluase of the root can be applied on this + * subquery. All non-junk tlist items shoud be simple Var and + * their match in types and ressortgroupref should be in their + * appearance order. + */ + try_pushdown = true; + forboth(lcs, root->parse->targetList, lcd, subquery->targetList) + { + TargetEntry *ste = (TargetEntry*) lfirst(lcs); + TargetEntry *dte = (TargetEntry*) lfirst(lcd); + Node *sexpr = (Node*)ste->expr; + Node *dexpr = (Node*)dte->expr; + + if (ste->resjunk && dte->resjunk) continue; + + if (ste->resjunk != dte->resjunk || + !IsA(sexpr, Var) || !IsA(dexpr, Var) || + exprType(sexpr) != exprType(dexpr) || + (root->parse->sortClause && + ste->ressortgroupref != 0 && + ste->ressortgroupref != refno++)) + { + try_pushdown = false; + break; + } + } + } + + if (try_pushdown) + { + pdquery = copyObject(subquery); + + if (!pdquery->sortClause) + { + ListCell *lcs, *lcd; + + pdquery->sortClause = root->parse->sortClause; + + forboth(lcs, root->parse->targetList, + lcd, pdquery->targetList) + { + TargetEntry *ste = (TargetEntry *) lfirst(lcs); + TargetEntry *dte = (TargetEntry *) lfirst(lcd); + dte->ressortgroupref = ste->ressortgroupref; + } + } + + /* + * Pushing down groupings. Set operations doesn't accept + * distinct clauses. + */ + if (!pdquery->setOperations && + !pdquery->distinctClause && groupClauses) + { + if (pdquery->sortClause) + { + ListCell *lct; + int i = 1; + + /* Check if groupClause is coappliable with sortClauses */ + foreach (lct, pdquery->targetList) + { + TargetEntry *te = (TargetEntry *) lfirst(lct); + + if (te->resjunk) continue; + if (te->ressortgroupref != 0 && + te->ressortgroupref != i) + break; + i++; + } + + if (!lct) + pdquery->distinctClause = + generate_setop_grouplist(groupClauses, + pdquery->targetList, + pdquery->sortClause); + } + } + if (tuple_fraction >= 1 && + !pdquery->limitCount && !pdquery->limitOffset) + pdquery->limitCount = + (Node*)makeConst(INT8OID, -1, InvalidOid, sizeof(int64), + Int64GetDatum(tuple_fraction), + false, FLOAT8PASSBYVAL); + + /* + * Save current PlnannerGlobal. subquery_planner could modify + * PlannerGlobal so make a shallow backup in order to make the + * alternative plans selectable. + */ + pdglob = save_plannerglobal(root->glob); + } + subplan = subquery_planner(root->glob, subquery, root, false, tuple_fraction, &subroot); + if (try_pushdown) + { + pdplan = subquery_planner(pdglob, pdquery, + root, + false, tuple_fraction, + &pdsubroot); + + if (pdplan->total_cost < subplan->total_cost) + { + subquery = pdquery; + subplan = pdplan; + subroot = pdsubroot; + /* + * Glob cannot be moved because this is referred to from many + * places by its address. So set the address of the original + * glob to subroot, then copy new values there. + */ + subroot->glob = root->glob; + *root->glob = *pdglob; + } + } + + /* + * This plan is sorted on sortClause if any, else sorted + * on distinctClause. + */ + if (subquery->sortClause) + *sortClauses = subquery->sortClause; + else + *sortClauses = subquery->distinctClause; + /* Save subroot and subplan in RelOptInfo for setrefs.c */ rel->subplan = subplan; rel->subroot =subroot; @@ -291,10 +476,28 @@ recurse_set_operations(Node *setOp, PlannerInfo *root, subplan); /* - * We don't bother to determine the subquery's output ordering since - * it won't be reflected in the set-op result anyhow. + * This plan's pathkeys and is_unique are apparently same to them of + * subplan unless type casting or coercing collations. */ - *sortClauses = NIL; + if (subplan->pathkeys && + tlist_same_datatypes(plan->targetlist, colTypes, junkOK) && + tlist_same_collations(plan->targetlist, colCollations, junkOK)) + { + ListCell *lcp, *lcr; + + forboth (lcp, plan->targetlist, lcr, root->parse->targetList) + { + TargetEntry *tep = (TargetEntry*) lfirst(lcp); + TargetEntry *ter = (TargetEntry*) lfirst(lcr); + + tep->ressortgroupref = ter->ressortgroupref; + } + plan->pathkeys = + make_pathkeys_for_sortclauses(root, + *sortClauses, + plan->targetlist); + plan->is_unique = subplan->is_unique; + } return plan; } @@ -379,12 +582,14 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root, */ lplan = recurse_set_operations(setOp->larg,root, tuple_fraction, setOp->colTypes, setOp->colCollations, + setOp->groupClauses, false, -1, refnames_tlist, sortClauses, NULL); /* The right plan will want to look at the left one ... */ root->non_recursive_plan= lplan; rplan = recurse_set_operations(setOp->rarg, root, tuple_fraction, setOp->colTypes, setOp->colCollations, + setOp->groupClauses, false, -1, refnames_tlist, sortClauses, NULL); root->non_recursive_plan = NULL; @@ -409,7 +614,8 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root, double dNumGroups; /* Identify the grouping semantics */ - groupList = generate_setop_grouplist(setOp, tlist); + groupList = + generate_setop_grouplist(setOp->groupClauses, tlist, NULL); /* We only support hashing here */ if (!grouping_is_hashable(groupList)) @@ -452,20 +658,7 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root, List *planlist; List *tlist; Plan *plan; - - /* - * If plain UNION, tell children to fetch all tuples. - * - * Note: in UNION ALL, we pass the top-level tuple_fraction unmodified to - * each arm of the UNION ALL. One could make a case for reducing the - * tuple fraction for later arms (discounting by the expected size of the - * earlier arms' results) but it seems not worth the trouble. The normal - * case where tuple_fraction isn't already zero is a LIMIT at top level, - * and passing it down as-is is usually enough to get the desired result - * of preferring fast-start plans. - */ - if (!op->all) - tuple_fraction = 0.0; + List *lsort, *rsort; /* * If any of my children are identical UNION nodes (same op, all-flag, and @@ -475,34 +668,106 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root, */ planlist = list_concat(recurse_union_children(op->larg,root, tuple_fraction, - op, refnames_tlist), + op, refnames_tlist, + &lsort), recurse_union_children(op->rarg, root, tuple_fraction, - op, refnames_tlist)); + op, refnames_tlist, + &rsort)); /* - * Generate tlist for Append plan node. + * Generate tlist for Append/MergeAppend plan node. * - * The tlist for an Append plan isn't important as far as the Append is - * concerned, but we must make it look real anyway for the benefit of the - * next plan level up. + * The tlist for an Append/MergeAppend plan isn't important as far as the + * they are concerned, but we must make it look real anyway for the + * benefit of the next plan level up. */ tlist = generate_append_tlist(op->colTypes, op->colCollations, false, planlist, refnames_tlist); - /* - * Append the child results together. - */ - plan = (Plan *) make_append(planlist, tlist); + if (lsort != NIL && equal(lsort, rsort)) + { + /* + * Generate MergeAppend plan if both children are sorted on the same + * sort clause or groupClauses. + */ + ListCell *lc, *slc; + int i = 0; + double total_size; + Plan *p; + List *distinctClause; + + MergeAppend *node = makeNode(MergeAppend); + node->plan.targetlist = tlist; + node->plan.qual = NIL; + node->plan.lefttree = NULL; + node->plan.righttree = NULL; + node->mergeplans = planlist; + node->numCols = list_length(root->parse->targetList); + node->sortColIdx = + (AttrNumber*)palloc(sizeof(AttrNumber) * node->numCols); + node->sortOperators = (Oid*)palloc(sizeof(Oid) * node->numCols); + node->collations = (Oid*)palloc(sizeof(Oid) * node->numCols); + node->nullsFirst = (bool*)palloc(sizeof(bool) * node->numCols); + + distinctClause = + generate_setop_grouplist(op->groupClauses, + node->plan.targetlist, + root->parse->sortClause); + forboth (slc, distinctClause, lc, node->plan.targetlist) + { + TargetEntry *te = (TargetEntry*) lfirst(lc); + SortGroupClause *sgc = (SortGroupClause *) lfirst(slc); + + Assert(sgc->tleSortGroupRef == te->ressortgroupref); + node->sortColIdx[i] = i + 1; + node->sortOperators[i] = sgc->sortop; + node->collations[i] = exprCollation((Node*)te->expr); + node->nullsFirst[i] = sgc->nulls_first; + te->ressortgroupref = sgc->tleSortGroupRef; + i++; + } + node->plan.pathkeys = + make_pathkeys_for_sortclauses(root, lsort, tlist); + + lc = list_head(node->mergeplans); + p = (Plan*) lfirst(lc); + node->plan.startup_cost = p->startup_cost; + node->plan.total_cost = p->total_cost; + node->plan.plan_rows = p->plan_rows; + total_size = p->plan_rows * p->plan_width; + for_each_cell(lc, lnext(lc)) + { + p = (Plan*) lfirst(lc); + node->plan.total_cost += p->total_cost; + node->plan.plan_rows += p->plan_rows; + total_size += p->plan_rows * p->plan_width; + } + node->plan.plan_width = rint(total_size / node->plan.plan_rows); + *sortClauses = root->parse->sortClause; - /* - * For UNION ALL, we just need the Append plan. For UNION, need to add - * node(s) to remove duplicates. - */ - if (op->all) - *sortClauses = NIL; /* result of UNION ALL is always unsorted */ + plan = (Plan*)node; + if (!op->all) + plan = make_union_unique(op, plan, root, tuple_fraction, + &distinctClause); + } else - plan = make_union_unique(op, plan, root, tuple_fraction, sortClauses); + { + /* + * Append the child results together. + */ + plan = (Plan *) make_append(planlist, tlist); + + /* + * For UNION ALL, we just need the Append plan. For UNION, need to + * add node(s) to remove duplicates. + */ + *sortClauses = NIL; + + if (!op->all) + plan = make_union_unique(op, plan, root, tuple_fraction, + sortClauses); + } /* * Estimate number of groups if caller wants it. For now we just assume @@ -544,12 +809,14 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root, lplan = recurse_set_operations(op->larg,root, 0.0 /* all tuples needed */ , op->colTypes, op->colCollations, + op->groupClauses, false, 0, refnames_tlist, &child_sortclauses, &dLeftGroups); rplan = recurse_set_operations(op->rarg,root, 0.0 /* all tuples needed */ , op->colTypes, op->colCollations, + op->groupClauses, false, 1, refnames_tlist, &child_sortclauses, &dRightGroups); @@ -589,8 +856,7 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root, plan = (Plan *) make_append(planlist,tlist); /* Identify the grouping semantics */ - groupList = generate_setop_grouplist(op, tlist); - + groupList = generate_setop_grouplist(op->groupClauses, tlist, NULL); /* punt if nothing to group on (can this happen?)*/ if (groupList == NIL) { @@ -678,9 +944,11 @@ static List *recurse_union_children(Node *setOp, PlannerInfo *root, double tuple_fraction, SetOperationStmt *top_union, - List *refnames_tlist) + List *refnames_tlist, + List **child_sortclauses){ - List *child_sortclauses; + List *lplan, *rplan; + List *lsort, *rsort; if (IsA(setOp, SetOperationStmt)) { @@ -690,15 +958,23 @@ recurse_union_children(Node *setOp, PlannerInfo *root, (op->all == top_union->all || op->all)&& equal(op->colTypes, top_union->colTypes)) { - /* Same UNION, so fold children into parent's subplan list */ - return list_concat(recurse_union_children(op->larg, root, - tuple_fraction, - top_union, - refnames_tlist), - recurse_union_children(op->rarg, root, - tuple_fraction, - top_union, - refnames_tlist)); + lplan = recurse_union_children(op->larg, root, + tuple_fraction, + top_union, + refnames_tlist, + &lsort); + rplan = recurse_union_children(op->rarg, root, + tuple_fraction, + top_union, + refnames_tlist, + &rsort); + /* + * Propagate whether all the descendents are sorted with same + * sortClause. + */ + if (lsort != NIL && equal(lsort, rsort)) + *child_sortclauses = lsort; + return list_concat(lplan, rplan); } } @@ -715,13 +991,16 @@ recurse_union_children(Node *setOp, PlannerInfo *root, tuple_fraction, top_union->colTypes, top_union->colCollations, + top_union->groupClauses, false,-1, refnames_tlist, - &child_sortclauses, NULL)); + child_sortclauses, NULL));}/* * Add nodes to the given plan tree to unique-ifythe result of a UNION. + * + * Given a sortClause, we assume the plan is already sorted on it. */static Plan *make_union_unique(SetOperationStmt *op,Plan *plan, @@ -731,9 +1010,11 @@ make_union_unique(SetOperationStmt *op, Plan *plan, List *groupList; double dNumGroups; long numGroups; + bool sort_needed = true; /* Identify the grouping semantics */ - groupList = generate_setop_grouplist(op, plan->targetlist); + groupList = generate_setop_grouplist(op->groupClauses, + plan->targetlist, *sortClauses); /* punt if nothing to group on (can this happen?)*/ if (groupList == NIL) @@ -742,6 +1023,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan, return plan; } + if (*sortClauses && equal(*sortClauses, groupList)) + sort_needed = false; + /* * XXX for the moment, take the number of distinct groups as equal to the * total input size, ie, the worstcase. This is too conservative, but we @@ -756,7 +1040,8 @@ make_union_unique(SetOperationStmt *op, Plan *plan, numGroups = (long) Min(dNumGroups, (double) LONG_MAX); /* Decide whether to hash or sort */ - if (choose_hashed_setop(root, groupList, plan, + if (sort_needed && + choose_hashed_setop(root, groupList, plan, dNumGroups, dNumGroups, tuple_fraction, "UNION")) { @@ -778,7 +1063,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan, else { /* Sort and Unique */ - plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan); + if (sort_needed) + plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan); + plan = (Plan *) make_unique(plan, groupList); plan->plan_rows = dNumGroups; /* We know the sort orderof the result */ @@ -1145,23 +1432,34 @@ generate_append_tlist(List *colTypes, List *colCollations, * the parser output representation doesn'tinclude a tlist for each * setop. So what we need to do here is copy that list and install * proper sortgrouprefsinto it and into the targetlist. + * + * sortClause is consulted if provided to avoid re-sorting in different + * directions on the same keys later. */static List * -generate_setop_grouplist(SetOperationStmt *op, List *targetlist) +generate_setop_grouplist(List *groupClauses, List *targetlist, + List *sortClauses){ - List *grouplist = (List *) copyObject(op->groupClauses); + List *grouplist = (List *) copyObject(groupClauses); ListCell *lg; + ListCell *ls = NULL; ListCell *lt; Index refno = 1; lg = list_head(grouplist); + + if (sortClauses) + ls = list_head(sortClauses); + foreach(lt, targetlist) { TargetEntry *tle = (TargetEntry *) lfirst(lt); - SortGroupClause *sgc; + SortGroupClause *sgc, *sgc_sort = NULL; - /* tlist shouldn't have any sortgrouprefs yet */ - Assert(tle->ressortgroupref == 0); + /* + * tlist shouldn't have any sortgrouprefs yet, or should be unchanged + */ + Assert(tle->ressortgroupref == 0 || tle->ressortgroupref == refno); if (tle->resjunk) continue; /* ignore resjunk columns */ @@ -1174,6 +1472,45 @@ generate_setop_grouplist(SetOperationStmt *op, List *targetlist) /* we could use assignSortGroupRefhere, but seems a bit silly */ sgc->tleSortGroupRef = tle->ressortgroupref = refno++; + + if (ls) + { + /* + * If sortClauses is provided, try to adjust the sorting order to + * get the chance for omitting sorting for grouping later. + */ + sgc_sort = (SortGroupClause *) lfirst(ls); + ls = lnext(ls); + if (sgc->sortop != sgc_sort->sortop) + { + Oid reverse = InvalidOid; + Oid opfamily, opcintype; + int16 strategy; + + if (get_ordering_op_properties(sgc->sortop, + &opfamily, &opcintype, &strategy)) + { + switch (strategy) + { + case BTLessStrategyNumber: + strategy = BTGreaterStrategyNumber; break; + case BTGreaterStrategyNumber: + strategy = BTLessStrategyNumber; break; + } + + reverse = get_opfamily_member(opfamily, + opcintype, + opcintype, + strategy); + if (sgc_sort->sortop == reverse) + { + sgc->sortop = reverse; + sgc->nulls_first = sgc_sort->nulls_first; + } + } + } + } + } Assert(lg == NULL); return grouplist;
Hello, As I was reconsidering after your advise, I came up with an idea of hacking on query tree tranaformation phase in subquery_planner, not on that of plan generation phase as before. (And the older patch is totally scrapped:-) Currently flatten_simple_union_all() flattens 'simple' UNION 'ALL' at the top level of 'current' query. I noticed that a little modification there also could flatten simple UNION (w/o ALL), and it has finally almost same effect regarding more usage of indexes for UNION. Additionally, applying still more the 'UNION ALL and inheritance table' patch, even coveres UNION's on inheritance tables. This patch is far small and neat (though I say so myself :-) than the previous ones. The following plans we got from unpatched PostgreSQL. original=# explain (analyze on, costs off) | select a, b from cu11 union select a, b from cu12 order by a, b limit 10; | QUERY PLAN | ----------------------------------------------------------------------------- | Limit (actual time=272.252..272.259 rows=10 loops=1) | -> Unique (actual time=272.251..272.258 rows=10 loops=1) | -> Sort (actual time=272.249..272.252 rows=10 loops=1) | Sort Key: cu11.a, cu11.b | Sort Method: external sort Disk: 3520kB | -> Append (actual time=0.020..70.935 rows=200000 loops=1) | -> Seq Scan on cu11 (actual time=0.019..26.741 rows=100000 loops=1) | -> Seq Scan on cu12 (actual time=0.006..25.183 rows=100000 loops=1) | Total runtime: 273.162 ms And of course for * (= a, b, c, d) this, original=# explain (analyze on, costs off) | select * from cu11 union select * from cu12 order by a, b limit 10; | QUERY PLAN | ---------------------------------------------------------------------------- | Limit (actual time=234.880..234.891 rows=10 loops=1) | -> Unique (actual time=234.879..234.889 rows=10 loops=1) | -> Sort (actual time=234.878..234.881 rows=10 loops=1) | Sort Key: cu11.a, cu11.b, cu11.c, cu11.d | Sort Method: external sort Disk: 5280kB | -> Append (actual time=0.017..49.502 rows=200000 loops=1) | -> Seq Scan on cu11 (actual time=0.017..15.649 rows=100000 loops=1) | -> Seq Scan on cu12 (actual time=0.007..14.939 rows=100000 loops=1) | Total runtime: 236.029 ms We'll get the following plan changed with this patch plus the latest "pathkey_and_uniqueindx_v4_20131122.patch" patcch. (It got a small fix on pathkeys fixation for index paths) https://commitfest.postgresql.org/action/patch_view?id=1280 patched=# explain (analyze on, costs off) | select * from cu11 union select * from cu12 order by a, b limit 10; | QUERY PLAN | ------------------------------------------------------------------------------ | Limit (actual time=0.059..0.081 rows=10 loops=1) | -> Unique (actual time=0.058..0.079 rows=10 loops=1) | -> Merge Append (actual time=0.056..0.065 rows=10 loops=1) | Sort Key: cu11.a, cu11.b, cu11.c, cu11.d | -> Index Scan ... on cu11 (actual time=0.032..0.037 rows=10 loops=1) | -> Index Scan ... on cu12 (actual time=0.021..0.021 rows=1 loops=1) | Total runtime: 0.162 ms Moreover, with the 'UNION ALL' patches , say, union_inh_idx_typ1_v1_20131024.patch and union_inh_idx_typ1_add_v1_20131024.patch, we'll get the following plan for UNION on inhertance tables, https://commitfest.postgresql.org/action/patch_view?id=1270 patched2=# explain (analyze on, costs off) | select * from pu1 union select * from pu2 order by a, b limit 10; | QUERY PLAN | --------------------------------------------------------------------------- | Limit (actual time=0.209..0.234 rows=10 loops=1) | -> Unique (actual time=0.206..0.230 rows=10 loops=1) | -> Merge Append (actual time=0.204..0.216 rows=10 loops=1) | Sort Key: pu1.a, pu1.b, pu1.c, pu1.d | -> Index Scan..on pu1 (actual time=0.004..0.004 rows=0 loops=1) | -> Index Scan..on cu11 (actual time=0.050..0.058 rows=10 loops=1) | -> Index Scan..on cu12 (actual time=0.052..0.052 rows=1 loops=1) | -> Index Scan..on pu2 (actual time=0.002..0.002 rows=0 loops=1) | -> Index Scan..on cu21 (actual time=0.046..0.046 rows=1 loops=1) | -> Index Scan..on cu22 (actual time=0.046..0.046 rows=1 loops=1) | Total runtime: 0.627 ms The attached union_uses_idx_v4_20131122.patch is changed as follows. - Rebased to current master. - Scrapped the whold old stuff. - flatten_simple_union_all() is renamed to flatten_simple_union() and modified to do so. UNION is flattend only if sortClauseexists and distinctClause is NIL. ====== Test tables can be created following the command below, | create table pu1 (a int not null, b int not null, c int, d text); | create unique index i_pu1_ab on pu1 (a, b); | create table cu11 (like pu1 including all) inherits (pu1); | create table cu12 (like pu1 including all) inherits (pu1); | create table pu2 (like pu1 including all); | create table cu21 (like pu2 including all) inherits (pu2); | create table cu22 (like pu2 including all) inherits (pu2); | insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(000000, 099999) a); | insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(100000, 199999) a); | insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(200000, 299999) a); | insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(300000, 399999) a); ===== regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 6670794..7262c1b 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -353,13 +353,14 @@ subquery_planner(PlannerGlobal *glob, Query *parse, pull_up_subqueries(root, (Node *) parse->jointree); /* - * If this is a simple UNION ALL query, flatten it into an appendrel. We - * do this now because it requires applying pull_up_subqueries to the leaf - * queries of the UNION ALL, which weren't touched above because they - * weren't referenced by the jointree (they will be after we do this). + * If this is a simple UNION/UNION ALL query, flatten it into an + * appendrel. We do this now because it requires applying + * pull_up_subqueries to the leaf queries of the UNION/UNION ALL, which + * weren't touched above because they weren't referenced by the jointree + * (they will be after we do this). */ if (parse->setOperations) - flatten_simple_union_all(root); + flatten_simple_union(root); /* * Detect whether any rangetable entries are RTE_JOIN kind; if not, we can diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 485ac31..adf341b 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -32,6 +32,7 @@#include "optimizer/subselect.h"#include "optimizer/tlist.h"#include "parser/parse_relation.h" +#include "parser/parse_clause.h"#include "parser/parsetree.h"#include "rewrite/rewriteManip.h" @@ -81,8 +82,8 @@ static void make_setop_translation_list(Query *query, Index newvarno,static bool is_simple_subquery(Query*subquery, RangeTblEntry *rte, JoinExpr *lowest_outer_join);static bool is_simple_union_all(Query*subquery); -static bool is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, - List *colTypes); +static bool is_simple_union_recurse(SetOperationStmt *topop, Node *setOp, + Query *setOpQuery, List *colTypes);static bool is_safe_append_member(Query *subquery);staticbool jointree_contains_lateral_outer_refs(Node *jtnode, bool restricted, Relids safe_upper_varnos); @@ -1428,12 +1429,14 @@ is_simple_union_all(Query *subquery) return false; /* Recursively check the tree of setoperations */ - return is_simple_union_all_recurse((Node *) topop, subquery, - topop->colTypes); + if (topop->op != SETOP_UNION || !topop->all) return false; + return is_simple_union_recurse(topop, (Node *) topop, subquery, + topop->colTypes);}static bool -is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes) +is_simple_union_recurse(SetOperationStmt *topop, Node *setOp, + Query *setOpQuery, List *colTypes){ if (IsA(setOp, RangeTblRef)) { @@ -1451,13 +1454,16 @@ is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes) { SetOperationStmt*op = (SetOperationStmt *) setOp; - /* Must be UNION ALL */ - if (op->op != SETOP_UNION || !op->all) + /* All SetOps under topop are UNION and ->all is same to topop */ + if (op->op != SETOP_UNION || op->all != topop->all) return false; /* Recurse to check inputs */ - return is_simple_union_all_recurse(op->larg, setOpQuery, colTypes) && - is_simple_union_all_recurse(op->rarg, setOpQuery, colTypes); + return + is_simple_union_recurse(topop, op->larg, + setOpQuery, colTypes) && + is_simple_union_recurse(topop, op->rarg, + setOpQuery, colTypes); } else { @@ -1896,23 +1902,22 @@ pullup_replace_vars_subquery(Query *query, NULL);} -/* * flatten_simple_union_all - * Try to optimize top-level UNION ALL structure into an appendrel + * Try to optimize top-level UNION/UNION ALL structure into an appendrel * - * If a query's setOperations tree consists entirely of simple UNION ALL - * operations, flatten it into an append relation, which we can process more - * intelligently than the general setops case. Otherwise, do nothing. + * If a query's setOperations tree consists entirely of simple UNION and UNION + * ALL operations, flatten it into an append relation, which we can process + * more intelligently than the general setops case. Otherwise, do nothing. * * In most cases, this can succeed only fora top-level query, because for a * subquery in FROM, the parent query's invocation of pull_up_subqueries would - * already have flattened the UNION via pull_up_simple_union_all. But there + * already have flattened the UNION ALL via pull_up_simple_union_all. But there * are a few cases we can support here butnot in that code path, for example * when the subquery also contains ORDER BY. */void -flatten_simple_union_all(PlannerInfo *root) +flatten_simple_union(PlannerInfo *root){ Query *parse = root->parse; SetOperationStmt *topop; @@ -1932,10 +1937,18 @@ flatten_simple_union_all(PlannerInfo *root) return; /* - * Recursively check the tree of set operations. If not all UNION ALL + * If topop is not UNION (not likey), punt. Also for UNION(not ALL) + * without sortClause or already having distinctClause. + */ + if (topop->op != SETOP_UNION || + (!topop->all && (!parse->sortClause || parse->distinctClause ))) + return; + + /* + * Recursively check the tree of set operations. If not all UNION(ALL) * with identical column types, punt. */ - if (!is_simple_union_all_recurse((Node *) topop, parse, topop->colTypes)) + if (!is_simple_union_recurse(topop, (Node *) topop, parse, topop->colTypes)) return; /* @@ -1983,6 +1996,16 @@ flatten_simple_union_all(PlannerInfo *root) parse->setOperations = NULL; /* + * We can create distinctClause using transformDistinctClause() with + * pstate == NULL. + */ + if (!topop->all) + parse->distinctClause = + transformDistinctClause(NULL, + &parse->targetList, parse->sortClause, + false); + + /* * Build AppendRelInfo information, and apply pull_up_subqueries to the * leaf queries of the UNION ALL. (We must do that now because they * weren't previously referenced by the jointree, and so were missed by diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h index 0934e63..32b3bf4 100644 --- a/src/include/optimizer/prep.h +++ b/src/include/optimizer/prep.h @@ -24,7 +24,7 @@extern void pull_up_sublinks(PlannerInfo *root);extern void inline_set_returning_functions(PlannerInfo *root);externNode *pull_up_subqueries(PlannerInfo *root, Node *jtnode); -extern void flatten_simple_union_all(PlannerInfo *root); +extern void flatten_simple_union(PlannerInfo *root);extern void reduce_outer_joins(PlannerInfo *root);extern Relids get_relids_in_jointree(Node*jtnode, bool include_joins);extern Relids get_relids_for_join(PlannerInfo *root, int joinrelid);
This is cont'd from CF3. http://www.postgresql.org/message-id/20131122.165927.27412386.horiguchi.kyotaro@lab.ntt.co.jp The issue in brief is that UNION is never flattened differently to UNION ALL so UNION cannot make use of index scan even if usable. This patch flattens UNION likewise currently did for UNION ALL. This patch needs another 'UNION ALL' patch and further gain with even another 'Widening application of indices' patch. ('Ready for Commit' now in CF3..) http://www.postgresql.org/message-id/20140114.180447.145186052.horiguchi.kyotaro@lab.ntt.co.jp http://www.postgresql.org/message-id/20140114.180810.122352231.horiguchi.kyotaro@lab.ntt.co.jp You can see the detailed outlines in the message at here, http://www.postgresql.org/message-id/20131031.194251.12577697.horiguchi.kyotaro@lab.ntt.co.jp The attached patche is rebased to current 9.4dev HEAD and make check at the topmost directory and src/test/isolation are passed without error. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Sorry, I missed to attach file. > This is cont'd from CF3. > > http://www.postgresql.org/message-id/20131122.165927.27412386.horiguchi.kyotaro@lab.ntt.co.jp > > The issue in brief is that UNION is never flattened differently > to UNION ALL so UNION cannot make use of index scan even if > usable. > > This patch flattens UNION likewise currently did for UNION ALL. > > This patch needs another 'UNION ALL' patch and further gain with > even another 'Widening application of indices' patch. ('Ready for > Commit' now in CF3..) > > http://www.postgresql.org/message-id/20140114.180447.145186052.horiguchi.kyotaro@lab.ntt.co.jp > http://www.postgresql.org/message-id/20140114.180810.122352231.horiguchi.kyotaro@lab.ntt.co.jp > > > You can see the detailed outlines in the message at here, > > http://www.postgresql.org/message-id/20131031.194251.12577697.horiguchi.kyotaro@lab.ntt.co.jp > > The attached patche is rebased to current 9.4dev HEAD and make > check at the topmost directory and src/test/isolation are passed > without error. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 1da4b2f..e89f8b3 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -353,13 +353,14 @@ subquery_planner(PlannerGlobal *glob, Query *parse, pull_up_subqueries(root, (Node *) parse->jointree); /* - * If this is a simple UNION ALL query, flatten it into an appendrel. We - * do this now because it requires applying pull_up_subqueries to the leaf - * queries of the UNION ALL, which weren't touched above because they - * weren't referenced by the jointree (they will be after we do this). + * If this is a simple UNION/UNION ALL query, flatten it into an + * appendrel. We do this now because it requires applying + * pull_up_subqueries to the leaf queries of the UNION/UNION ALL, which + * weren't touched above because they weren't referenced by the jointree + * (they will be after we do this). */ if (parse->setOperations) - flatten_simple_union_all(root); + flatten_simple_union(root); /* * Detect whether any rangetable entries are RTE_JOIN kind; if not, we can diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 1c6083b..04bba48 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -32,6 +32,7 @@#include "optimizer/subselect.h"#include "optimizer/tlist.h"#include "parser/parse_relation.h" +#include "parser/parse_clause.h"#include "parser/parsetree.h"#include "rewrite/rewriteManip.h" @@ -81,8 +82,8 @@ static void make_setop_translation_list(Query *query, Index newvarno,static bool is_simple_subquery(Query*subquery, RangeTblEntry *rte, JoinExpr *lowest_outer_join);static bool is_simple_union_all(Query*subquery); -static bool is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, - List *colTypes); +static bool is_simple_union_recurse(SetOperationStmt *topop, Node *setOp, + Query *setOpQuery, List *colTypes);static bool is_safe_append_member(Query *subquery);staticbool jointree_contains_lateral_outer_refs(Node *jtnode, bool restricted, Relids safe_upper_varnos); @@ -1440,12 +1441,14 @@ is_simple_union_all(Query *subquery) return false; /* Recursively check the tree of setoperations */ - return is_simple_union_all_recurse((Node *) topop, subquery, - topop->colTypes); + if (topop->op != SETOP_UNION || !topop->all) return false; + return is_simple_union_recurse(topop, (Node *) topop, subquery, + topop->colTypes);}static bool -is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes) +is_simple_union_recurse(SetOperationStmt *topop, Node *setOp, + Query *setOpQuery, List *colTypes){ if (IsA(setOp, RangeTblRef)) { @@ -1463,13 +1466,16 @@ is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes) { SetOperationStmt*op = (SetOperationStmt *) setOp; - /* Must be UNION ALL */ - if (op->op != SETOP_UNION || !op->all) + /* All SetOps under topop are UNION and ->all is same to topop */ + if (op->op != SETOP_UNION || op->all != topop->all) return false; /* Recurse to check inputs */ - return is_simple_union_all_recurse(op->larg, setOpQuery, colTypes) && - is_simple_union_all_recurse(op->rarg, setOpQuery, colTypes); + return + is_simple_union_recurse(topop, op->larg, + setOpQuery, colTypes) && + is_simple_union_recurse(topop, op->rarg, + setOpQuery, colTypes); } else { @@ -1908,23 +1914,22 @@ pullup_replace_vars_subquery(Query *query, NULL);} -/* * flatten_simple_union_all - * Try to optimize top-level UNION ALL structure into an appendrel + * Try to optimize top-level UNION/UNION ALL structure into an appendrel * - * If a query's setOperations tree consists entirely of simple UNION ALL - * operations, flatten it into an append relation, which we can process more - * intelligently than the general setops case. Otherwise, do nothing. + * If a query's setOperations tree consists entirely of simple UNION and UNION + * ALL operations, flatten it into an append relation, which we can process + * more intelligently than the general setops case. Otherwise, do nothing. * * In most cases, this can succeed only fora top-level query, because for a * subquery in FROM, the parent query's invocation of pull_up_subqueries would - * already have flattened the UNION via pull_up_simple_union_all. But there + * already have flattened the UNION ALL via pull_up_simple_union_all. But there * are a few cases we can support here butnot in that code path, for example * when the subquery also contains ORDER BY. */void -flatten_simple_union_all(PlannerInfo *root) +flatten_simple_union(PlannerInfo *root){ Query *parse = root->parse; SetOperationStmt *topop; @@ -1944,10 +1949,18 @@ flatten_simple_union_all(PlannerInfo *root) return; /* - * Recursively check the tree of set operations. If not all UNION ALL + * If topop is not UNION (not likey), punt. Also for UNION(not ALL) + * without sortClause or already having distinctClause. + */ + if (topop->op != SETOP_UNION || + (!topop->all && (!parse->sortClause || parse->distinctClause ))) + return; + + /* + * Recursively check the tree of set operations. If not all UNION(ALL) * with identical column types, punt. */ - if (!is_simple_union_all_recurse((Node *) topop, parse, topop->colTypes)) + if (!is_simple_union_recurse(topop, (Node *) topop, parse, topop->colTypes)) return; /* @@ -1995,6 +2008,16 @@ flatten_simple_union_all(PlannerInfo *root) parse->setOperations = NULL; /* + * We can create distinctClause using transformDistinctClause() with + * pstate == NULL. + */ + if (!topop->all) + parse->distinctClause = + transformDistinctClause(NULL, + &parse->targetList, parse->sortClause, + false); + + /* * Build AppendRelInfo information, and apply pull_up_subqueries to the * leaf queries of the UNION ALL. (We must do that now because they * weren't previously referenced by the jointree, and so were missed by diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h index 0934e63..32b3bf4 100644 --- a/src/include/optimizer/prep.h +++ b/src/include/optimizer/prep.h @@ -24,7 +24,7 @@extern void pull_up_sublinks(PlannerInfo *root);extern void inline_set_returning_functions(PlannerInfo *root);externNode *pull_up_subqueries(PlannerInfo *root, Node *jtnode); -extern void flatten_simple_union_all(PlannerInfo *root); +extern void flatten_simple_union(PlannerInfo *root);extern void reduce_outer_joins(PlannerInfo *root);extern Relids get_relids_in_jointree(Node*jtnode, bool include_joins);extern Relids get_relids_for_join(PlannerInfo *root, int joinrelid);
Hi, On 2014-01-14 18:10:40 +0900, Kyotaro HORIGUCHI wrote: > This is cont'd from CF3. > > http://www.postgresql.org/message-id/20131122.165927.27412386.horiguchi.kyotaro@lab.ntt.co.jp > > The issue in brief is that UNION is never flattened differently > to UNION ALL so UNION cannot make use of index scan even if > usable. > > This patch flattens UNION likewise currently did for UNION ALL. > > This patch needs another 'UNION ALL' patch and further gain with > even another 'Widening application of indices' patch. ('Ready for > Commit' now in CF3..) > > http://www.postgresql.org/message-id/20140114.180447.145186052.horiguchi.kyotaro@lab.ntt.co.jp > http://www.postgresql.org/message-id/20140114.180810.122352231.horiguchi.kyotaro@lab.ntt.co.jp > > > You can see the detailed outlines in the message at here, > > http://www.postgresql.org/message-id/20131031.194251.12577697.horiguchi.kyotaro@lab.ntt.co.jp > > The attached patche is rebased to current 9.4dev HEAD and make > check at the topmost directory and src/test/isolation are passed > without error. I haven't really followed this topic, so please excuse my ignorance. This is still marked as "needs review" in https://commitfest.postgresql.org/action/patch_view?id=1374 . But I am not sure the patch as is is relevant after a87c729153e372f3731689a7be007bc2b53f1410? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-14 18:10:40 +0900, Kyotaro HORIGUCHI wrote: >> This patch flattens UNION likewise currently did for UNION ALL. > I haven't really followed this topic, so please excuse my ignorance. > This is still marked as "needs review" in > https://commitfest.postgresql.org/action/patch_view?id=1374 . But I am > not sure the patch as is is relevant after > a87c729153e372f3731689a7be007bc2b53f1410? I think it's an independent issue. After a quick read of the patch (which is really badly explained :-() I think the idea is that a nest of UNIONs with no datatype conversions can be flattened into a UNION ALL appendrel, with the required duplicate elimination handled by sticking a DISTINCT onto the top level. However, it's not clear to me that this is worth the trouble. The DISTINCT would act as an optimization fence preventing the subquery from being flattened any further, so it doesn't seem like there would be any global benefit just because it now contains a simple appendrel rather than a setop tree. And a nest of conversion-free UNIONs already results in a plan that's a flat Append followed by sort/uniq, which seems like the same thing you'd get from the DISTINCT. So what's the point? regards, tom lane
I wrote: > However, it's not clear to me that this is worth the trouble. The > DISTINCT would act as an optimization fence preventing the subquery from > being flattened any further, so it doesn't seem like there would be any > global benefit just because it now contains a simple appendrel rather than > a setop tree. And a nest of conversion-free UNIONs already results in a > plan that's a flat Append followed by sort/uniq, which seems like the same > thing you'd get from the DISTINCT. So what's the point? Oh, after re-reading the earlier part of the thread I get the point --- after making this change, the planner will consider some plan types for the DISTINCT that it wouldn't have found in the setop-tree planning path. Specifically it can make use of a mergeappend of sorted paths for the individual union leaf queries. That can't happen in the generic setop planning code because we have no ability to consider more than one plan for any leaf query. I still think this stuff mostly needs to be thrown away and rewritten in Path-creation style, but that's a long-term project. In the meantime this seems like a relatively small increment of complexity, so maybe it's worth applying. I'm concerned about the method for building a new DISTINCT clause though; the best that can be said for that is that it's a crude hack, and I'm less than convinced that there are no cases where it'll dump core. regards, tom lane
I wrote: > I still think this stuff mostly needs to be thrown away and rewritten > in Path-creation style, but that's a long-term project. In the meantime > this seems like a relatively small increment of complexity, so maybe it's > worth applying. I'm concerned about the method for building a new > DISTINCT clause though; the best that can be said for that is that it's > a crude hack, and I'm less than convinced that there are no cases where > it'll dump core. OK, after still more time thinking about it, here's the issues with that way of generating a DISTINCT clause corresponding to the UNION: 1. Calling transformDistinctClause with a NULL pstate seems pretty unsafe. It might accidentally fail to fail, today, but it's surely fragile. It's violating the API contract not only of transformDistinctClause itself (which is not documented as accepting a NULL pstate) but of addTargetToGroupList (q.v.). A minimum requirement before I'd accept a patch that did that is that it extended the header comments of those functions to specify under what circumstances a NULL pstate can be passed. However, that's not the direction to go, because of #2. 2. This approach potentially changes the semantics of the UNION. (This is only important for a datatype that has more than one btree equality operator, but let's posit that.) transformDistinctClause, as per its header comment, allows the outer ORDER BY to influence which equality operator it picks for DISTINCT. However, in the case of "(SELECT ... UNION SELECT ...) ORDER BY ...", the UNION semantics were chosen without reference to the ORDER BY, so it's possible that the equality operators cited in the SetOperationStmt's groupClauses list are not what you'd get from applying transformDistinctClause as the patch does. It is not okay for the planner to override the parser's choice of semantics like that. Now I'm fairly sure that planner.c is expecting that the query's distinctClause be a superset of the sortClause if any (cf comments for SortGroupClause in parsenodes.h), so it wouldn't be okay to just blindly build a distinctClause from topop->groupClauses. I think what you need to do is check that topop->groupClauses is compatible with the sortClause if any (skipping the flattening transformation if not) and then build a distinctClause by extending the sort clause list with any missing items from topop->groupClauses. So this is sort of like what transformDistinctClause does, but different in detail, and the failure case is to not do the transformation, rather than ereport'ing. (See also generate_setop_grouplist, which you could almost use except that it's not expecting to have to merge with a sortClause list.) So this is all doable enough, but you're probably going to end up with a new distinctClause-generating function that's at least twice the size of the patch's former net code addition. So the question is whether it's worth the trouble, considering that all this code has (I hope) a life expectancy of no more than one or two more release cycles. I'm going to set the patch back to Waiting on Author, but unless you are prepared to rewrite the distinctClause creation code in the next couple of days, you should change it to Returned with Feedback. regards, tom lane
Hello. Thank you for the attentive comments. > I wrote: > > I still think this stuff mostly needs to be thrown away and rewritten > > in Path-creation style, but that's a long-term project. In the meantime > > this seems like a relatively small increment of complexity, so maybe it's > > worth applying. I'm concerned about the method for building a new > > DISTINCT clause though; the best that can be said for that is that it's > > a crude hack, and I'm less than convinced that there are no cases where > > it'll dump core. > > OK, after still more time thinking about it, here's the issues with that > way of generating a DISTINCT clause corresponding to the UNION: > > 1. Calling transformDistinctClause with a NULL pstate seems pretty unsafe. > It might accidentally fail to fail, today, but it's surely fragile. > It's violating the API contract not only of transformDistinctClause > itself (which is not documented as accepting a NULL pstate) but of > addTargetToGroupList (q.v.). Hmm I'm ashamed to have missed addTargetToGroupList. You are right about that. I saw only parser_errposition to field the NULL.. It should be fixed anyway. > A minimum requirement before I'd accept a > patch that did that is that it extended the header comments of those > functions to specify under what circumstances a NULL pstate can be passed. > However, that's not the direction to go, because of #2. > > 2. This approach potentially changes the semantics of the UNION. (This is > only important for a datatype that has more than one btree equality > operator, but let's posit that.) transformDistinctClause, as per its > header comment, allows the outer ORDER BY to influence which equality > operator it picks for DISTINCT. However, in the case of > "(SELECT ... UNION SELECT ...) ORDER BY ...", the UNION semantics were > chosen without reference to the ORDER BY, If my understanding is correct, the query is equivalent to it without the parens. The ORDER BY belongs to the topmost UNION on both cases. Actually planner compailns about "(SELECT...UNION SELECT ... ORDER BY) ORDER BY" that "multiple ORDER BY clauses not allowed" with/without this patch. > so it's possible that the > equality operators cited in the SetOperationStmt's groupClauses list are > not what you'd get from applying transformDistinctClause as the patch does. This patch flattenes and the SetOperationStmt was put out along with its groupClause. But the groupClauses is originally generated from targetlist in transformSetOperationTree. And the new DistinctClause is generated also from the same targetlist consulting to sortClause. They are not in the same order but it doesn't matter. Is it wrong? If it would make some trouble, I could add an check it out or count it on making the DistinctClauses.. Finally, explain (costs off) select a, b from c11 union select a, b from c12 order by b limit 10000; | QUERY PLAN | ----------------------------------------- | Limit | -> Sort | Sort Key: c11.b | -> HashAggregate | Group Key: c11.b, c11.a | -> Append | -> Seq Scan on c11 | -> Seq Scan on c12 This HashAggregate does DISTINCT which was performed by using Sort-Unique without this patch, and yields the same result, I believe. > It is not okay for the planner to override the parser's choice of > semantics like that. As described above, as far as I saw, itself seems to be a problem. > Now I'm fairly sure that planner.c is expecting that the query's > distinctClause be a superset of the sortClause if any (cf comments for > SortGroupClause in parsenodes.h), so it wouldn't be okay to just blindly > build a distinctClause from topop->groupClauses. Yes, it could be but counting the sortClause into distinctClause is crucial factor for getting rid of unnecessary sorting. > I think what you need > to do is check that topop->groupClauses is compatible with the sortClause > if any (skipping the flattening transformation if not) and then build a > distinctClause by extending the sort clause list with any missing items > from topop->groupClauses. Ah, yes I agree as I described above. > So this is sort of like what > transformDistinctClause does, but different in detail, and the failure > case is to not do the transformation, rather than ereport'ing. (See also > generate_setop_grouplist, which you could almost use except that it's not > expecting to have to merge with a sortClause list.) Thank you. I'll follow this advise. > So this is all doable enough, but you're probably going to end up with > a new distinctClause-generating function that's at least twice the size > of the patch's former net code addition. So the question is whether it's > worth the trouble, considering that all this code has (I hope) a life > expectancy of no more than one or two more release cycles. Hmm. Since this is a kind of local optimization, some future drastic reconstructing might make this useless. But it doesn't seem the reason not to do this. (Ignoring the size..) > I'm going to set the patch back to Waiting on Author, but unless you > are prepared to rewrite the distinctClause creation code in the next > couple of days, you should change it to Returned with Feedback. Ok, I agree to the deal. This needs more refinement. regards, -- Kyotaro Horiguchi NTT Open Source Software Center