Thread: Using indices for UNION.

Using indices for UNION.

From

Kyotaro HORIGUCHI

Date:

31 October 2013, 10:43:04

Hello, This patch might be too complicated (and seems somewhat ad
hoc) for the gain, but more index usage for this kind of
operation should be worth doing.

Currently, PostgreSQL ignores from the very first the availablity
of indexes for UNION. Sorting and SeqScan is choosed as follows,

* EX.1
> uniontest=# create table c11 (a int, b int, c int, d text);
> uniontest=# create unique index cu11_pkey_idx on c11 (a, b)
> uniontest=# create table c12 (like cu11 including all);
> uniontest=# insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11'
>                               from generate_series(000000, 099999) a);
> uniontest=# insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12'
>                               from generate_series(100000, 199999) a);
> uniontest=# explain (analyze on, costs off)
>   select a, b from cu11 union select a, b from cu12 order by a, b limit 100;
>                            QUERY PLAN
> ---------------------------------------------------------------------
> Limit (actual time=287.611..287.673 rows=100 loops=1)
>  ->  Unique (actual time=287.610..287.652 rows=100 loops=1)
>   ->  Sort (actual time=287.609..287.621 rows=100 loops=1)
>    Sort Key: cu11.a, cu11.b
>    Sort Method: external sort  Disk: 3520kB
>    ->  Append (actual time=0.020..73.859 rows=200000 loops=1)
>     ->  Seq Scan on cu11 (actual time=0.019..28.780 rows=100000 loops=1)
>     ->  Seq Scan on cu12 (actual time=0.006..24.550 rows=100000 loops=1)
> Total runtime: 288.596 ms

Actually, index paths are sometimes worth considering on UNION
planning especially with LIMIT. For example,

* EX.2
> uniontest=# explain (analyze on, costs off)
>             select a, b from cu11 union select a, b from cu12
>             order by a desc, b desc limit 100;
>                               QUERY PLAN
> ------------------------------------------------------------------
> Limit (actual time=0.041..0.390 rows=100 loops=1)
>  ->  Unique (actual time=0.040..0.355 rows=100 loops=1)
>   ->  Merge Append (actual time=0.040..0.290 rows=100 loops=1)
>       Sort Key: cu11.a, cu11.b
>    ->  Limit (actual time=0.025..0.025 rows=1 loops=1)
>     ->  Unique (actual time=0.025..0.025 rows=1 loops=1)
>      ->  Index Only Scan Backward on cu11
>                              (actual time=0.022..0.022 rows=1 loops=1)
>            Heap Fetches: 1
>    ->  Limit (actual time=0.012..0.209 rows=100 loops=1)
>     ->  Unique (actual time=0.011..0.188 rows=100 loops=1)
>       ->  Index Only Scan Backward .. on cu12
>                              (actual time=0.010..0.115 rows=100 loops=1)
>                                Heap Fetches: 100
> Total runtime: 1.191 ms

This is archieved by this patch.

Rough explanaiton of what this patch does is following,
- Consider whether ORDER BY or grouping of setops can be pushed  down onto leaf subqueries. (recurse_set_operations)
- Merging two children of a setop counting the sort orders of  them. Use MergeAppend if they are in the same ordering.
(generate_union_plan,recurse_union_children)
 
- Grouping is determined consulting sorting order. This will  allow removing redundant sorting.
(generate_setop_grouplist)

The effects of these items are shown in the Ex.2.

What do you think about this?


=======

Nevertheless, only this patch does not effective so far on the
query like following,

>uniontest=# explain (analyze on, costs off)
>            select * from cu11 union select * from cu12
>            order by a, b limit 100;
>                              QUERY PLAN
> -----------------------------------------------------------------------
> Limit (actual time=256.263..256.346 rows=100 loops=1)
>  ->  Unique (actual time=256.262..256.332 rows=100 loops=1)
>   ->  Sort (actual time=256.261..256.283 rows=100 loops=1)
>       Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
>       Sort Method: external sort  Disk: 5280kB
>    ->  Append (actual time=0.028..61.161 rows=200000 loops=1)
>     ->  Seq Scan on cu11 (actual time=0.027..21.067 rows=100000 loops=1)
>     ->  Seq Scan on cu12 (actual time=0.012..18.428 rows=100000 loops=1)
> Total runtime: 257.778 ms

The indexes made here is UNIQUE one so they should be used for
ordering in this query. This will be overcome by another patch
(will separately proposed).

> uniontest=# explain (analyze on, costs off)
>             select * from cu11 union select * from cu12
>             order by a, b limit 100;
>                              QUERY PLAN
> ---------------------------------------------------------------------------
> Limit (actual time=0.102..0.351 rows=100 loops=1)
>  ->  Unique (actual time=0.100..0.323 rows=100 loops=1)
>   ->  Merge Append (actual time=0.097..0.222 rows=100 loops=1)
>       Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
>    ->  Limit (actual time=0.051..0.135 rows=100 loops=1)
>     ->  Index Scan .. on cu11 (actual time=0.051..0.109 rows=100 loops=1)
>    ->  Limit (actual time=0.044..0.044 rows=1 loops=1)
>     ->  Index Scan .. on cu12 (actual time=0.044..0.044 rows=1 loops=1)
> Total runtime: 1.244 ms

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d8aa35d..86abdf6 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1063,15 +1063,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)        List       *set_sortclauses;
    /*
 
-         * If there's a top-level ORDER BY, assume we have to fetch all the
-         * tuples.    This might be too simplistic given all the hackery below
-         * to possibly avoid the sort; but the odds of accurate estimates here
-         * are pretty low anyway.
-         */
-        if (parse->sortClause)
-            tuple_fraction = 0.0;
-
-        /*         * Construct the plan for set operations.  The result will not need         * any work except
perhapsa top-level sort and/or LIMIT.  Note that         * any special work for recursive unions is the responsibility
of
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b78d727..efb50dc 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -778,9 +778,26 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)                Assert(splan->plan.qual
==NIL);                foreach(l, splan->appendplans)                {
 
-                    lfirst(l) = set_plan_refs(root,
-                                              (Plan *) lfirst(l),
-                                              rtoffset);
+                    Append *newp =
+                        set_plan_refs(root,
+                                      (Plan *) lfirst(l),
+                                      rtoffset);
+                    /*
+                     * UNION on inherited tables may create directly nested
+                     * Appends in plan tree. This structure can be flatten by
+                     * taking grandchildren into parent.
+                     */
+                    if (IsA(newp, Append) && 
+                        list_length(newp->appendplans) > 0)
+                    {
+                        ListCell *plc = list_head(newp->appendplans);
+                        lfirst(l) = lfirst(plc);
+                        for_each_cell(plc, lnext(plc))
+                            l = lappend_cell(splan->appendplans,
+                                             l, lfirst(plc));
+                    }
+                    else
+                        lfirst(l) = newp;                }            }            break;
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..b814a72 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -29,6 +29,7 @@#include "postgres.h"#include <limits.h>
+#include <math.h>#include "access/heapam.h"#include "access/htup_details.h"
@@ -60,6 +61,7 @@ typedef structstatic Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,
 double tuple_fraction,                       List *colTypes, List *colCollations,
 
+                       List *groupClauses,                       bool junkOK,                       int flag, List
*refnames_tlist,                      List **sortClauses, double *pNumGroups);
 
@@ -78,7 +80,8 @@ static Plan *generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,static List
*recurse_union_children(Node*setOp, PlannerInfo *root,                       double tuple_fraction,
 SetOperationStmt *top_union,
 
-                       List *refnames_tlist);
+                       List *refnames_tlist,
+                       List **child_sortclause);static Plan *make_union_unique(SetOperationStmt *op, Plan *plan,
          PlannerInfo *root, double tuple_fraction,                  List **sortClauses);
 
@@ -97,7 +100,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      bool
flag,                     List *input_plans,                      List *refnames_tlist);
 
-static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
+static List *generate_setop_grouplist(List *groupClauses, List *targetlist,
+                                      List *sortClauses);static void expand_inherited_rtentry(PlannerInfo *root,
RangeTblEntry*rte,                         Index rti);static void make_inh_translation_list(Relation oldrelation,
 
@@ -152,6 +156,15 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction,    Assert(parse->distinctClause ==
NIL);   /*
 
+     * If there's a top-level ORDER BY, assume we have to fetch all the tuples
+     * except for UNION. This might be too simplistic given all the hackery
+     * below to possibly avoid the sort; but the odds of accurate estimates
+     * here are pretty low anyway.
+     */
+    if (parse->sortClause && topop->op != SETOP_UNION)
+        tuple_fraction = 0.0;
+
+    /*     * We'll need to build RelOptInfos for each of the leaf subqueries, which     * are RTE_SUBQUERY rangetable
entriesin this Query.  Prepare the index     * arrays for that.
 
@@ -186,18 +199,49 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction,     */    return
recurse_set_operations((Node*) topop, root, tuple_fraction,                                  topop->colTypes,
topop->colCollations,
+                                  topop->groupClauses,                                  true, -1,
           leftmostQuery->targetList,                                  sortClauses, NULL);}/*
 
+ * save_plannerglobal
+ *
+ * save planner global to allow multiple calls of subquery_planner at the same
+ * global status. This is done apartly from copyObject so as to do medium
+ * shallow copying.
+ */
+static PlannerGlobal *
+save_plannerglobal(const PlannerGlobal *in)
+{
+    PlannerGlobal *save = makeNode(PlannerGlobal);
+
+    save->boundParams        = in->boundParams;
+    save->subplans            = list_copy(in->subplans);
+    save->subroots            = list_copy(in->subroots);
+    save->rewindPlanIDs        = bms_copy(in->rewindPlanIDs);
+    save->finalrtable        = list_copy(in->finalrtable);
+    save->finalrowmarks        = list_copy(in->finalrowmarks); 
+    save->resultRelations    = list_copy(in->resultRelations);
+    save->relationOids        = list_copy(in->relationOids);
+    save->invalItems        = list_copy(in->invalItems);
+    save->nParamExec        = in->nParamExec;
+    save->lastPHId            = in->lastPHId;
+    save->lastRowMarkId        = in->lastRowMarkId;
+    save->transientPlan        = in->transientPlan;
+
+    return save;
+}
+
+/* * recurse_set_operations *      Recursively handle one step in a tree of set operations * * tuple_fraction:
fractionof tuples we expect to retrieve from node * colTypes: OID list of set-op's result column datatypes *
colCollations:OID list of set-op's result column collations
 
+ * groupClauses: parent setop's grouping clause. * junkOK: if true, child resjunk columns may be left in the result *
flag:if >= 0, add a resjunk output column indicating value of flag * refnames_tlist: targetlist to take column names
from
@@ -215,6 +259,7 @@ static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double
tuple_fraction,                      List *colTypes, List *colCollations,
 
+                       List *groupClauses,                       bool junkOK,                       int flag, List
*refnames_tlist,                      List **sortClauses, double *pNumGroups)
 
@@ -228,9 +273,12 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,        PlannerInfo *subroot;        Plan
  *subplan,                   *plan;
 
+        PlannerGlobal *glob_org;        Assert(subquery != NULL);
+        *sortClauses = NIL;
+        /*         * We need to build a RelOptInfo for each leaf subquery.  This isn't         * used for anything
here,but it carries the subroot data structures
 
@@ -243,12 +291,103 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,        /*         * Generate plan for
primitivesubquery
 
+         *
+         * subquery_planner modifies planner globals. Save current planner
+         * global in order to allow next calling subquery_planner.         */
+        glob_org = save_plannerglobal(root->glob);        subplan = subquery_planner(root->glob, subquery,
                     root,                                   false, tuple_fraction,
&subroot);
+        /*
+         * Consider whether ORDER BY or groupings of the parent set operation
+         * can be pushed down onto this subquery.
+         */
+        if(tuple_fraction >= 1 &&
+           !subquery->limitOffset && !subquery->limitCount &&
+           !subquery->distinctClause &&
+           (!subquery->sortClause ||
+            equal(subquery->sortClause, root->parse->sortClause)))
+        {
+            ListCell *s, *d;
+            bool    try_pushdown = true;
+
+            /*
+             * Index scan is expected to be applied in the subquery by this
+             * pushdown. So it is useless when target list contains non-Var
+             * member or types of exprs do not match because index scan won't
+             * be available. The reason why this check is here is to avoid
+             * unnecessary copyObject for the whole subquery which would be
+             * rather heavier.
+             */
+            forboth(s, root->parse->targetList, d, subquery->targetList)
+            {
+                TargetEntry *ste = (TargetEntry*) lfirst(s);
+                TargetEntry *dte = (TargetEntry*) lfirst(d);
+                Node *sexpr = (Node*)ste->expr;
+                Node *dexpr = (Node*)dte->expr;
+                if (!IsA(sexpr, Var) || !IsA(dexpr, Var) ||
+                    exprType(sexpr) != exprType(dexpr))
+                {
+                    try_pushdown = false;
+                    break;
+                }
+            }
+
+            if (try_pushdown)
+            {
+                Query  *pdquery;           /* 'pd' comes from 'pushed down'.  */
+                PlannerInfo *pdsubroot;
+                Plan   *pdplan;
+
+                pdquery = copyObject(subquery);
+
+                if (!pdquery->sortClause)
+                    pdquery->sortClause = root->parse->sortClause;
+
+                Assert(!pdquery->distinctClause);
+                pdquery->distinctClause =
+                    generate_setop_grouplist(groupClauses,
+                                             pdquery->targetList,
+                                             pdquery->sortClause);
+
+                if (tuple_fraction >= 1)
+                    pdquery->limitCount =
+                        (Node*)makeConst(INT8OID, -1, InvalidOid, sizeof(int64),
+                                         Int64GetDatum(tuple_fraction),
+                                         false, FLOAT8PASSBYVAL);
+
+                pdplan = subquery_planner(glob_org, pdquery,
+                                          root,
+                                          false, tuple_fraction,
+                                          &pdsubroot);
+
+                if (pdplan->total_cost < subplan->total_cost)
+                {
+                    subplan = pdplan;
+                    subroot = pdsubroot;
+                    /*
+                     * Glob cannot be moved because this is referred to from
+                     * many places by its address. So set the address of the
+                     * original glob to subroot, then copy new values there.
+                     */
+                    subroot->glob = root->glob;
+                    *root->glob = *glob_org;
+
+                    /*
+                     * This plan is sorted on sortClause if any, else sorted
+                     * on distinctClause.
+                     */
+                    if (pdquery->sortClause)
+                        *sortClauses = pdquery->sortClause;
+                    else
+                        *sortClauses = pdquery->distinctClause;
+                }
+            }
+        }
+        /* Save subroot and subplan in RelOptInfo for setrefs.c */        rel->subplan = subplan;        rel->subroot
=subroot;
 
@@ -290,12 +429,6 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,                              rtr->rtindex,
                           subplan);
 
-        /*
-         * We don't bother to determine the subquery's output ordering since
-         * it won't be reflected in the set-op result anyhow.
-         */
-        *sortClauses = NIL;
-        return plan;    }    else if (IsA(setOp, SetOperationStmt))
@@ -379,12 +512,14 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root,     */    lplan =
recurse_set_operations(setOp->larg,root, tuple_fraction,                                   setOp->colTypes,
setOp->colCollations,
+                                   setOp->groupClauses,                                   false, -1,
               refnames_tlist, sortClauses, NULL);    /* The right plan will want to look at the left one ... */
root->non_recursive_plan= lplan;    rplan = recurse_set_operations(setOp->rarg, root, tuple_fraction,
               setOp->colTypes, setOp->colCollations,
 
+                                   setOp->groupClauses,                                   false, -1,
               refnames_tlist, sortClauses, NULL);    root->non_recursive_plan = NULL;
 
@@ -409,7 +544,8 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root,        double
dNumGroups;       /* Identify the grouping semantics */
 
-        groupList = generate_setop_grouplist(setOp, tlist);
+        groupList =
+            generate_setop_grouplist(setOp->groupClauses, tlist, NULL);        /* We only support hashing here */
 if (!grouping_is_hashable(groupList))
 
@@ -452,20 +588,7 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root,    List       *planlist;    List
*tlist;    Plan       *plan;
 
-
-    /*
-     * If plain UNION, tell children to fetch all tuples.
-     *
-     * Note: in UNION ALL, we pass the top-level tuple_fraction unmodified to
-     * each arm of the UNION ALL.  One could make a case for reducing the
-     * tuple fraction for later arms (discounting by the expected size of the
-     * earlier arms' results) but it seems not worth the trouble. The normal
-     * case where tuple_fraction isn't already zero is a LIMIT at top level,
-     * and passing it down as-is is usually enough to get the desired result
-     * of preferring fast-start plans.
-     */
-    if (!op->all)
-        tuple_fraction = 0.0;
+    List       *lsort, *rsort;    /*     * If any of my children are identical UNION nodes (same op, all-flag, and
@@ -475,34 +598,99 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root,     */    planlist =
list_concat(recurse_union_children(op->larg,root,                                                  tuple_fraction,
 
-                                                  op, refnames_tlist),
+                                                  op, refnames_tlist,
+                                                  &lsort),                           recurse_union_children(op->rarg,
root,                                                 tuple_fraction,
 
-                                                  op, refnames_tlist));
+                                                  op, refnames_tlist,
+                                                  &rsort));    /*
-     * Generate tlist for Append plan node.
+     * Generate tlist for Append/MergeAppend plan node.     *
-     * The tlist for an Append plan isn't important as far as the Append is
-     * concerned, but we must make it look real anyway for the benefit of the
-     * next plan level up.
+     * The tlist for an Append/MergeAppend plan isn't important as far as the
+     * they are concerned, but we must make it look real anyway for the
+     * benefit of the next plan level up.     */    tlist = generate_append_tlist(op->colTypes, op->colCollations,
false,                                 planlist, refnames_tlist);
 
-    /*
-     * Append the child results together.
-     */
-    plan = (Plan *) make_append(planlist, tlist);
-
-    /*
-     * For UNION ALL, we just need the Append plan.  For UNION, need to add
-     * node(s) to remove duplicates.
-     */
-    if (op->all)
-        *sortClauses = NIL;        /* result of UNION ALL is always unsorted */
+    if (lsort != NIL && equal(lsort, rsort))
+    {
+        /*
+         * Generate MergeAppend plan if both children are sorted on the same
+         * sort clause or groupClauses.
+         */
+        ListCell *lc, *slc;
+        int i = 0;
+        double total_size;
+        Plan *p;
+        List *distinctClause;
+
+        MergeAppend *node = makeNode(MergeAppend);
+        node->plan.targetlist = tlist;
+        node->plan.qual = NIL;
+        node->plan.lefttree = NULL;
+        node->plan.righttree = NULL;
+        node->mergeplans = planlist;
+        node->numCols = list_length(root->parse->targetList);
+        node->sortColIdx    = (AttrNumber*)palloc(sizeof(AttrNumber) * node->numCols);
+        node->sortOperators = (Oid*)palloc(sizeof(Oid) * node->numCols);
+        node->collations    = (Oid*)palloc(sizeof(Oid) * node->numCols);
+        node->nullsFirst    = (bool*)palloc(sizeof(bool) * node->numCols);
+
+        distinctClause = 
+            generate_setop_grouplist(op->groupClauses,
+                                     node->plan.targetlist,
+                                     root->parse->sortClause);
+        forboth (slc, distinctClause, lc, node->plan.targetlist)
+        {
+            TargetEntry *te = (TargetEntry*) lfirst(lc);
+            SortGroupClause *sgc = (SortGroupClause *) lfirst(slc);
+
+            Assert(sgc->tleSortGroupRef == te->ressortgroupref);
+            node->sortColIdx[i] = i + 1;
+            node->sortOperators[i] = sgc->sortop;
+            node->collations[i] = exprCollation((Node*)te->expr);
+            node->nullsFirst[i] = sgc->nulls_first;
+            i++;
+        }
+        lc = list_head(node->mergeplans);
+        p = (Plan*) lfirst(lc);
+        node->plan.startup_cost = p->startup_cost;
+        node->plan.total_cost   = p->total_cost;
+        node->plan.plan_rows    = p->plan_rows;
+        total_size             = p->plan_rows * p->plan_width;
+        for_each_cell(lc, lnext(lc))
+        {
+            p = (Plan*) lfirst(lc);
+            node->plan.total_cost += p->total_cost;
+            node->plan.plan_rows  += p->plan_rows;
+            total_size            += p->plan_rows * p->plan_width;
+        }
+        node->plan.plan_width = rint(total_size / node->plan.plan_rows);
+        *sortClauses = root->parse->sortClause;
+        plan = (Plan*)node;
+        if (!op->all)
+            plan = make_union_unique(op, plan, root, tuple_fraction,
+                                     &distinctClause);
+    }    else
-        plan = make_union_unique(op, plan, root, tuple_fraction, sortClauses);
+    {
+        /*
+         * Append the child results together.
+         */
+        plan = (Plan *) make_append(planlist, tlist);
+
+        /*
+         * For UNION ALL, we just need the Append plan.  For UNION, need to
+         * add node(s) to remove duplicates.
+         */
+        *sortClauses = NIL;
+
+        if (!op->all)
+            plan = make_union_unique(op, plan, root, tuple_fraction, sortClauses);
+    }    /*     * Estimate number of groups if caller wants it.  For now we just assume
@@ -544,12 +732,14 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,    lplan =
recurse_set_operations(op->larg,root,                                   0.0 /* all tuples needed */ ,
               op->colTypes, op->colCollations,
 
+                                   op->groupClauses,                                   false, 0,
           refnames_tlist,                                   &child_sortclauses, &dLeftGroups);    rplan =
recurse_set_operations(op->rarg,root,                                   0.0 /* all tuples needed */ ,
               op->colTypes, op->colCollations,
 
+                                   op->groupClauses,                                   false, 1,
           refnames_tlist,                                   &child_sortclauses, &dRightGroups);
 
@@ -589,7 +779,7 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,    plan = (Plan *)
make_append(planlist,tlist);    /* Identify the grouping semantics */
 
-    groupList = generate_setop_grouplist(op, tlist);
+    groupList = generate_setop_grouplist(op->groupClauses, tlist, NULL);    /* punt if nothing to group on (can this
happen?)*/    if (groupList == NIL)
 
@@ -678,9 +868,13 @@ static List *recurse_union_children(Node *setOp, PlannerInfo *root,                       double
tuple_fraction,                      SetOperationStmt *top_union,
 
-                       List *refnames_tlist)
+                       List *refnames_tlist,
+                       List **child_sortclauses){
-    List       *child_sortclauses;
+    List       *lplan, *rplan;
+    List       *lsort, *rsort;
+
+    *child_sortclauses = NIL;    if (IsA(setOp, SetOperationStmt))    {
@@ -691,14 +885,23 @@ recurse_union_children(Node *setOp, PlannerInfo *root,            equal(op->colTypes,
top_union->colTypes))       {            /* Same UNION, so fold children into parent's subplan list */
 
-            return list_concat(recurse_union_children(op->larg, root,
-                                                      tuple_fraction,
-                                                      top_union,
-                                                      refnames_tlist),
-                               recurse_union_children(op->rarg, root,
-                                                      tuple_fraction,
-                                                      top_union,
-                                                      refnames_tlist));
+            lplan = recurse_union_children(op->larg, root,
+                                           tuple_fraction,
+                                           top_union,
+                                           refnames_tlist,
+                                           &lsort);
+            rplan = recurse_union_children(op->rarg, root,
+                                           tuple_fraction,
+                                           top_union,
+                                           refnames_tlist,
+                                           &rsort);
+            /*
+             * Propagate whether all the descendents are sorted with same
+             *  sortClause.
+             */
+            if (lsort != NIL && equal(lsort, rsort))
+                *child_sortclauses = lsort;
+            return list_concat(lplan, rplan);        }    }
@@ -715,13 +918,16 @@ recurse_union_children(Node *setOp, PlannerInfo *root,
tuple_fraction,                                             top_union->colTypes,
    top_union->colCollations,
 
+                                             top_union->groupClauses,
false,-1,                                             refnames_tlist,
 
-                                             &child_sortclauses, NULL));
+                                             child_sortclauses, NULL));}/* * Add nodes to the given plan tree to
unique-ifythe result of a UNION.
 
+ *
+ * If the sortClause is given, we assume the plan is already sorted on it. */static Plan
*make_union_unique(SetOperationStmt*op, Plan *plan,
 
@@ -731,9 +937,11 @@ make_union_unique(SetOperationStmt *op, Plan *plan,    List       *groupList;    double
dNumGroups;   long        numGroups;
 
+    bool        sort_needed = true;    /* Identify the grouping semantics */
-    groupList = generate_setop_grouplist(op, plan->targetlist);
+    groupList = generate_setop_grouplist(op->groupClauses,
+                                         plan->targetlist, *sortClauses);    /* punt if nothing to group on (can this
happen?)*/    if (groupList == NIL)
 
@@ -742,6 +950,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan,        return plan;    }
+    if (*sortClauses && equal(*sortClauses, groupList))
+        sort_needed = false;
+        /*     * XXX for the moment, take the number of distinct groups as equal to the     * total input size, ie,
theworst case.  This is too conservative, but we
 
@@ -756,7 +967,8 @@ make_union_unique(SetOperationStmt *op, Plan *plan,    numGroups = (long) Min(dNumGroups, (double)
LONG_MAX);   /* Decide whether to hash or sort */
 
-    if (choose_hashed_setop(root, groupList, plan,
+    if (sort_needed &&
+        choose_hashed_setop(root, groupList, plan,                            dNumGroups, dNumGroups, tuple_fraction,
                         "UNION"))    {
 
@@ -778,7 +990,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan,    else    {        /* Sort and Unique */
-        plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan);
+        if (sort_needed)
+            plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan);
+                plan = (Plan *) make_unique(plan, groupList);        plan->plan_rows = dNumGroups;        /* We know
thesort order of the result */
 
@@ -1145,23 +1359,34 @@ generate_append_tlist(List *colTypes, List *colCollations, * the parser output representation
doesn'tinclude a tlist for each * setop.  So what we need to do here is copy that list and install * proper
sortgrouprefsinto it and into the targetlist.
 
+ *
+ * sortClause is consulted if provided to avoid re-sorting with different
+ * orderings on the same keys later. */static List *
-generate_setop_grouplist(SetOperationStmt *op, List *targetlist)
+generate_setop_grouplist(List  *groupClauses, List *targetlist, 
+                         List *sortClauses){
-    List       *grouplist = (List *) copyObject(op->groupClauses);
+    List       *grouplist = (List *) copyObject(groupClauses);    ListCell   *lg;
+    ListCell   *ls = NULL;    ListCell   *lt;    Index        refno = 1;    lg = list_head(grouplist);
+
+    if (sortClauses)
+        ls = list_head(sortClauses);
+    foreach(lt, targetlist)    {        TargetEntry *tle = (TargetEntry *) lfirst(lt);
-        SortGroupClause *sgc;
+        SortGroupClause *sgc, *sgc_sort = NULL;
-        /* tlist shouldn't have any sortgrouprefs yet */
-        Assert(tle->ressortgroupref == 0);
+        /*
+         * tlist shouldn't have any sortgrouprefs yet, or should be unchanged
+         */
+        Assert(tle->ressortgroupref == 0 || tle->ressortgroupref == refno);        if (tle->resjunk)
continue;           /* ignore resjunk columns */
 
@@ -1174,6 +1399,45 @@ generate_setop_grouplist(SetOperationStmt *op, List *targetlist)        /* we could use
assignSortGroupRefhere, but seems a bit silly */        sgc->tleSortGroupRef = tle->ressortgroupref = refno++;
 
+        
+        if (ls)
+        {
+            /*
+             * If sortClauses is provided, try to adjust the sorting order to
+             * get the chance for omitting sorting for grouping later.
+             */
+            sgc_sort = (SortGroupClause *) lfirst(ls);
+            ls = lnext(ls);
+            if (sgc->sortop != sgc_sort->sortop)
+            {
+                Oid reverse = InvalidOid;
+                Oid opfamily, opcintype;
+                int16 strategy;
+                
+                if (get_ordering_op_properties(sgc->sortop,
+                                &opfamily, &opcintype, &strategy))
+                {
+                    switch (strategy)
+                    {
+                        case BTLessStrategyNumber:
+                            strategy = BTGreaterStrategyNumber; break;
+                        case BTGreaterStrategyNumber:
+                            strategy = BTLessStrategyNumber; break;
+                    }
+
+                    reverse = get_opfamily_member(opfamily,
+                                                  opcintype,
+                                                  opcintype,
+                                                  strategy);
+                    if (sgc_sort->sortop == reverse)
+                    {
+                        sgc->sortop = reverse;
+                        sgc->nulls_first = sgc_sort->nulls_first;
+                    }
+                }
+            }
+        }
+                }    Assert(lg == NULL);    return grouplist;

Re: Using indices for UNION.

From

Peter Eisentraut

Date:

07 November 2013, 22:13:11

On 10/31/13, 6:42 AM, Kyotaro HORIGUCHI wrote:
> Hello, This patch might be too complicated (and seems somewhat ad
> hoc) for the gain, but more index usage for this kind of
> operation should be worth doing.

There is a new compiler warning:

setrefs.c: In function ‘set_plan_refs’:
setrefs.c:782:7: warning: initialization from incompatible pointer type
[enabled by default]

Re: Using indices for UNION.

From

Kyotaro HORIGUCHI

Date:

13 November 2013, 08:25:24

Thank you for pointing out. I missed the warning.

> There is a new compiler warning:
> 
> setrefs.c: In function ‘set_plan_refs’:
> setrefs.c:782:7: warning: initialization from incompatible pointer type
> [enabled by default]

Added explicit cast there and rebased to current master.
Checked no new warning by this patch.
make check succeeded at both $(src) and $(src)/src/test.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d8aa35d..86abdf6 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1063,15 +1063,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)        List       *set_sortclauses;
    /*
 
-         * If there's a top-level ORDER BY, assume we have to fetch all the
-         * tuples.    This might be too simplistic given all the hackery below
-         * to possibly avoid the sort; but the odds of accurate estimates here
-         * are pretty low anyway.
-         */
-        if (parse->sortClause)
-            tuple_fraction = 0.0;
-
-        /*         * Construct the plan for set operations.  The result will not need         * any work except
perhapsa top-level sort and/or LIMIT.  Note that         * any special work for recursive unions is the responsibility
of
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b78d727..c6abe18 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -778,9 +778,26 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)                Assert(splan->plan.qual
==NIL);                foreach(l, splan->appendplans)                {
 
-                    lfirst(l) = set_plan_refs(root,
-                                              (Plan *) lfirst(l),
-                                              rtoffset);
+                    Append *newp =
+                        (Append *)set_plan_refs(root,
+                                                (Plan *) lfirst(l),
+                                                rtoffset);
+                    /*
+                     * UNION on inherited tables may create directly nested
+                     * Appends in plan tree. This structure can be flatten by
+                     * taking grandchildren into parent.
+                     */
+                    if (IsA(newp, Append) && 
+                        list_length(newp->appendplans) > 0)
+                    {
+                        ListCell *plc = list_head(newp->appendplans);
+                        lfirst(l) = lfirst(plc);
+                        for_each_cell(plc, lnext(plc))
+                            l = lappend_cell(splan->appendplans,
+                                             l, lfirst(plc));
+                    }
+                    else
+                        lfirst(l) = newp;                }            }            break;
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..e8a78a7 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -29,6 +29,7 @@#include "postgres.h"#include <limits.h>
+#include <math.h>#include "access/heapam.h"#include "access/htup_details.h"
@@ -60,6 +61,7 @@ typedef structstatic Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,
 double tuple_fraction,                       List *colTypes, List *colCollations,
 
+                       List *groupClauses,                       bool junkOK,                       int flag, List
*refnames_tlist,                      List **sortClauses, double *pNumGroups);
 
@@ -78,7 +80,8 @@ static Plan *generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,static List
*recurse_union_children(Node*setOp, PlannerInfo *root,                       double tuple_fraction,
 SetOperationStmt *top_union,
 
-                       List *refnames_tlist);
+                       List *refnames_tlist,
+                       List **child_sortclause);static Plan *make_union_unique(SetOperationStmt *op, Plan *plan,
          PlannerInfo *root, double tuple_fraction,                  List **sortClauses);
 
@@ -97,7 +100,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      bool
flag,                     List *input_plans,                      List *refnames_tlist);
 
-static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
+static List *generate_setop_grouplist(List *groupClauses, List *targetlist,
+                                      List *sortClauses);static void expand_inherited_rtentry(PlannerInfo *root,
RangeTblEntry*rte,                         Index rti);static void make_inh_translation_list(Relation oldrelation,
 
@@ -152,6 +156,15 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction,    Assert(parse->distinctClause ==
NIL);   /*
 
+     * If there's a top-level ORDER BY, assume we have to fetch all the tuples
+     * except for UNION. This might be too simplistic given all the hackery
+     * below to possibly avoid the sort; but the odds of accurate estimates
+     * here are pretty low anyway.
+     */
+    if (parse->sortClause && topop->op != SETOP_UNION)
+        tuple_fraction = 0.0;
+
+    /*     * We'll need to build RelOptInfos for each of the leaf subqueries, which     * are RTE_SUBQUERY rangetable
entriesin this Query.  Prepare the index     * arrays for that.
 
@@ -186,18 +199,49 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction,     */    return
recurse_set_operations((Node*) topop, root, tuple_fraction,                                  topop->colTypes,
topop->colCollations,
+                                  topop->groupClauses,                                  true, -1,
           leftmostQuery->targetList,                                  sortClauses, NULL);}/*
 
+ * save_plannerglobal
+ *
+ * save planner global to allow multiple calls of subquery_planner at the same
+ * global status. This is done apartly from copyObject so as to do medium
+ * shallow copying.
+ */
+static PlannerGlobal *
+save_plannerglobal(const PlannerGlobal *in)
+{
+    PlannerGlobal *save = makeNode(PlannerGlobal);
+
+    save->boundParams        = in->boundParams;
+    save->subplans            = list_copy(in->subplans);
+    save->subroots            = list_copy(in->subroots);
+    save->rewindPlanIDs        = bms_copy(in->rewindPlanIDs);
+    save->finalrtable        = list_copy(in->finalrtable);
+    save->finalrowmarks        = list_copy(in->finalrowmarks); 
+    save->resultRelations    = list_copy(in->resultRelations);
+    save->relationOids        = list_copy(in->relationOids);
+    save->invalItems        = list_copy(in->invalItems);
+    save->nParamExec        = in->nParamExec;
+    save->lastPHId            = in->lastPHId;
+    save->lastRowMarkId        = in->lastRowMarkId;
+    save->transientPlan        = in->transientPlan;
+
+    return save;
+}
+
+/* * recurse_set_operations *      Recursively handle one step in a tree of set operations * * tuple_fraction:
fractionof tuples we expect to retrieve from node * colTypes: OID list of set-op's result column datatypes *
colCollations:OID list of set-op's result column collations
 
+ * groupClauses: parent setop's grouping clause. * junkOK: if true, child resjunk columns may be left in the result *
flag:if >= 0, add a resjunk output column indicating value of flag * refnames_tlist: targetlist to take column names
from
@@ -215,6 +259,7 @@ static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double
tuple_fraction,                      List *colTypes, List *colCollations,
 
+                       List *groupClauses,                       bool junkOK,                       int flag, List
*refnames_tlist,                      List **sortClauses, double *pNumGroups)
 
@@ -223,14 +268,21 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,    {        RangeTblRef *rtr =
(RangeTblRef*) setOp;        RangeTblEntry *rte = root->simple_rte_array[rtr->rtindex];
 
-        Query       *subquery = rte->subquery;
+        Query       *subquery = rte->subquery,
+                   *pdquery;        RelOptInfo *rel;
-        PlannerInfo *subroot;
+        PlannerInfo *subroot,
+                    *pdsubroot; /* 'pd' comes from 'Pushed Down' */
+        PlannerGlobal *pdglob;        Plan       *subplan,
-                   *plan;
+                   *plan,
+                   *pdplan;
+        bool        try_pushdown = false;        Assert(subquery != NULL);
+        *sortClauses = NIL;
+        /*         * We need to build a RelOptInfo for each leaf subquery.  This isn't         * used for anything
here,but it carries the subroot data structures
 
@@ -243,12 +295,125 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,        /*         * Generate plan for
primitivesubquery
 
+         *
+         * Consider whether ORDER BY or groupings of the parent set operation
+         * can be pushed down onto this subquery.         */
+
+        /*
+         * Consider whether ORDER BY or groupings of the parent set operation
+         * can be pushed down onto this subquery.
+         */
+        if(root->parse->sortClause && !subquery->sortClause &&
+           tuple_fraction >= 1)
+        {
+            ListCell *lcs, *lcd;
+            int refno = 1;
+
+            /*
+             * Check if the sort cluase of the root can be applied on this
+             * subquery. All non-junk tlist items shoud be simple Var and
+             * their types match and ressortgroupref is ordered in their
+             * appearance order.
+             */
+            try_pushdown = true;
+            forboth(lcs, root->parse->targetList, lcd, subquery->targetList)
+            {
+                TargetEntry *ste = (TargetEntry*) lfirst(lcs);
+                TargetEntry *dte = (TargetEntry*) lfirst(lcd);
+                Node *sexpr = (Node*)ste->expr;
+                Node *dexpr = (Node*)dte->expr;
+
+                if (ste->resjunk && dte->resjunk) continue;
+
+                if (ste->resjunk != dte->resjunk ||
+                    !IsA(sexpr, Var) || !IsA(dexpr, Var) ||
+                    exprType(sexpr) != exprType(dexpr) ||
+                    (root->parse->sortClause &&
+                     ste->ressortgroupref != refno++))
+                {
+                    try_pushdown = false;
+                    break;
+                }
+            }
+        }
+
+        if (try_pushdown)
+        {
+            pdquery = copyObject(subquery);
+
+            if (!pdquery->sortClause)
+            {
+                ListCell *lcs, *lcd;
+
+                pdquery->sortClause = root->parse->sortClause;
+
+                forboth(lcs, root->parse->targetList,
+                        lcd, pdquery->targetList)
+                {
+                    TargetEntry *ste = (TargetEntry*) lfirst(lcs);
+                    TargetEntry *dte = (TargetEntry*) lfirst(lcd);
+                    dte->ressortgroupref = ste->ressortgroupref;
+                }
+            }
+            
+            /*
+             * Pushing down groupings. Set operations doesn't accept
+             * distinct clauses.
+             */
+            if (!pdquery->setOperations &&
+                !pdquery->distinctClause && groupClauses)
+
+                pdquery->distinctClause =
+                    generate_setop_grouplist(groupClauses,
+                                             pdquery->targetList,
+                                             pdquery->sortClause);
+
+            if (tuple_fraction >= 1 &&
+                !pdquery->limitCount && !pdquery->limitOffset)
+                pdquery->limitCount =
+                    (Node*)makeConst(INT8OID, -1, InvalidOid, sizeof(int64),
+                                     Int64GetDatum(tuple_fraction),
+                                     false, FLOAT8PASSBYVAL);
+
+            pdglob = save_plannerglobal(root->glob);
+        }
+                subplan = subquery_planner(root->glob, subquery,                                   root,
                   false, tuple_fraction,                                   &subroot);
 
+        if (try_pushdown)
+        {
+            pdplan = subquery_planner(pdglob, pdquery,
+                                      root,
+                                      false, tuple_fraction,
+                                      &pdsubroot);
+
+            if (pdplan->total_cost < subplan->total_cost)
+            {
+                subplan = pdplan;
+                subroot = pdsubroot;
+                /*
+                 * Glob cannot be moved because this is referred to from
+                 * many places by its address. So set the address of the
+                 * original glob to subroot, then copy new values there.
+                 */
+                subroot->glob = root->glob;
+                *root->glob = *pdglob;
+            }
+        }
+
+        /*
+         * This plan is sorted on sortClause if any, else sorted
+         * on distinctClause.
+         */
+        if (subquery->sortClause)
+            *sortClauses = subquery->sortClause;
+        else
+            *sortClauses = subquery->distinctClause;
+        /* Save subroot and subplan in RelOptInfo for setrefs.c */        rel->subplan = subplan;        rel->subroot
=subroot;
 
@@ -290,12 +455,6 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,                              rtr->rtindex,
                           subplan);
 
-        /*
-         * We don't bother to determine the subquery's output ordering since
-         * it won't be reflected in the set-op result anyhow.
-         */
-        *sortClauses = NIL;
-        return plan;    }    else if (IsA(setOp, SetOperationStmt))
@@ -379,12 +538,14 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root,     */    lplan =
recurse_set_operations(setOp->larg,root, tuple_fraction,                                   setOp->colTypes,
setOp->colCollations,
+                                   setOp->groupClauses,                                   false, -1,
               refnames_tlist, sortClauses, NULL);    /* The right plan will want to look at the left one ... */
root->non_recursive_plan= lplan;    rplan = recurse_set_operations(setOp->rarg, root, tuple_fraction,
               setOp->colTypes, setOp->colCollations,
 
+                                   setOp->groupClauses,                                   false, -1,
               refnames_tlist, sortClauses, NULL);    root->non_recursive_plan = NULL;
 
@@ -409,7 +570,8 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root,        double
dNumGroups;       /* Identify the grouping semantics */
 
-        groupList = generate_setop_grouplist(setOp, tlist);
+        groupList =
+            generate_setop_grouplist(setOp->groupClauses, tlist, NULL);        /* We only support hashing here */
 if (!grouping_is_hashable(groupList))
 
@@ -452,20 +614,7 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root,    List       *planlist;    List
*tlist;    Plan       *plan;
 
-
-    /*
-     * If plain UNION, tell children to fetch all tuples.
-     *
-     * Note: in UNION ALL, we pass the top-level tuple_fraction unmodified to
-     * each arm of the UNION ALL.  One could make a case for reducing the
-     * tuple fraction for later arms (discounting by the expected size of the
-     * earlier arms' results) but it seems not worth the trouble. The normal
-     * case where tuple_fraction isn't already zero is a LIMIT at top level,
-     * and passing it down as-is is usually enough to get the desired result
-     * of preferring fast-start plans.
-     */
-    if (!op->all)
-        tuple_fraction = 0.0;
+    List       *lsort, *rsort;    /*     * If any of my children are identical UNION nodes (same op, all-flag, and
@@ -475,34 +624,99 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root,     */    planlist =
list_concat(recurse_union_children(op->larg,root,                                                  tuple_fraction,
 
-                                                  op, refnames_tlist),
+                                                  op, refnames_tlist,
+                                                  &lsort),                           recurse_union_children(op->rarg,
root,                                                 tuple_fraction,
 
-                                                  op, refnames_tlist));
+                                                  op, refnames_tlist,
+                                                  &rsort));    /*
-     * Generate tlist for Append plan node.
+     * Generate tlist for Append/MergeAppend plan node.     *
-     * The tlist for an Append plan isn't important as far as the Append is
-     * concerned, but we must make it look real anyway for the benefit of the
-     * next plan level up.
+     * The tlist for an Append/MergeAppend plan isn't important as far as the
+     * they are concerned, but we must make it look real anyway for the
+     * benefit of the next plan level up.     */    tlist = generate_append_tlist(op->colTypes, op->colCollations,
false,                                 planlist, refnames_tlist);
 
-    /*
-     * Append the child results together.
-     */
-    plan = (Plan *) make_append(planlist, tlist);
-
-    /*
-     * For UNION ALL, we just need the Append plan.  For UNION, need to add
-     * node(s) to remove duplicates.
-     */
-    if (op->all)
-        *sortClauses = NIL;        /* result of UNION ALL is always unsorted */
+    if (lsort != NIL && equal(lsort, rsort))
+    {
+        /*
+         * Generate MergeAppend plan if both children are sorted on the same
+         * sort clause or groupClauses.
+         */
+        ListCell *lc, *slc;
+        int i = 0;
+        double total_size;
+        Plan *p;
+        List *distinctClause;
+
+        MergeAppend *node = makeNode(MergeAppend);
+        node->plan.targetlist = tlist;
+        node->plan.qual = NIL;
+        node->plan.lefttree = NULL;
+        node->plan.righttree = NULL;
+        node->mergeplans = planlist;
+        node->numCols = list_length(root->parse->targetList);
+        node->sortColIdx    = (AttrNumber*)palloc(sizeof(AttrNumber) * node->numCols);
+        node->sortOperators = (Oid*)palloc(sizeof(Oid) * node->numCols);
+        node->collations    = (Oid*)palloc(sizeof(Oid) * node->numCols);
+        node->nullsFirst    = (bool*)palloc(sizeof(bool) * node->numCols);
+
+        distinctClause = 
+            generate_setop_grouplist(op->groupClauses,
+                                     node->plan.targetlist,
+                                     root->parse->sortClause);
+        forboth (slc, distinctClause, lc, node->plan.targetlist)
+        {
+            TargetEntry *te = (TargetEntry*) lfirst(lc);
+            SortGroupClause *sgc = (SortGroupClause *) lfirst(slc);
+
+            Assert(sgc->tleSortGroupRef == te->ressortgroupref);
+            node->sortColIdx[i] = i + 1;
+            node->sortOperators[i] = sgc->sortop;
+            node->collations[i] = exprCollation((Node*)te->expr);
+            node->nullsFirst[i] = sgc->nulls_first;
+            i++;
+        }
+        lc = list_head(node->mergeplans);
+        p = (Plan*) lfirst(lc);
+        node->plan.startup_cost = p->startup_cost;
+        node->plan.total_cost   = p->total_cost;
+        node->plan.plan_rows    = p->plan_rows;
+        total_size             = p->plan_rows * p->plan_width;
+        for_each_cell(lc, lnext(lc))
+        {
+            p = (Plan*) lfirst(lc);
+            node->plan.total_cost += p->total_cost;
+            node->plan.plan_rows  += p->plan_rows;
+            total_size            += p->plan_rows * p->plan_width;
+        }
+        node->plan.plan_width = rint(total_size / node->plan.plan_rows);
+        *sortClauses = root->parse->sortClause;
+        plan = (Plan*)node;
+        if (!op->all)
+            plan = make_union_unique(op, plan, root, tuple_fraction,
+                                     &distinctClause);
+    }    else
-        plan = make_union_unique(op, plan, root, tuple_fraction, sortClauses);
+    {
+        /*
+         * Append the child results together.
+         */
+        plan = (Plan *) make_append(planlist, tlist);
+
+        /*
+         * For UNION ALL, we just need the Append plan.  For UNION, need to
+         * add node(s) to remove duplicates.
+         */
+        *sortClauses = NIL;
+
+        if (!op->all)
+            plan = make_union_unique(op, plan, root, tuple_fraction, sortClauses);
+    }    /*     * Estimate number of groups if caller wants it.  For now we just assume
@@ -544,12 +758,14 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,    lplan =
recurse_set_operations(op->larg,root,                                   0.0 /* all tuples needed */ ,
               op->colTypes, op->colCollations,
 
+                                   op->groupClauses,                                   false, 0,
           refnames_tlist,                                   &child_sortclauses, &dLeftGroups);    rplan =
recurse_set_operations(op->rarg,root,                                   0.0 /* all tuples needed */ ,
               op->colTypes, op->colCollations,
 
+                                   op->groupClauses,                                   false, 1,
           refnames_tlist,                                   &child_sortclauses, &dRightGroups);
 
@@ -589,7 +805,7 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,    plan = (Plan *)
make_append(planlist,tlist);    /* Identify the grouping semantics */
 
-    groupList = generate_setop_grouplist(op, tlist);
+    groupList = generate_setop_grouplist(op->groupClauses, tlist, NULL);    /* punt if nothing to group on (can this
happen?)*/    if (groupList == NIL)
 
@@ -678,9 +894,13 @@ static List *recurse_union_children(Node *setOp, PlannerInfo *root,                       double
tuple_fraction,                      SetOperationStmt *top_union,
 
-                       List *refnames_tlist)
+                       List *refnames_tlist,
+                       List **child_sortclauses){
-    List       *child_sortclauses;
+    List       *lplan, *rplan;
+    List       *lsort, *rsort;
+
+    *child_sortclauses = NIL;    if (IsA(setOp, SetOperationStmt))    {
@@ -691,14 +911,23 @@ recurse_union_children(Node *setOp, PlannerInfo *root,            equal(op->colTypes,
top_union->colTypes))       {            /* Same UNION, so fold children into parent's subplan list */
 
-            return list_concat(recurse_union_children(op->larg, root,
-                                                      tuple_fraction,
-                                                      top_union,
-                                                      refnames_tlist),
-                               recurse_union_children(op->rarg, root,
-                                                      tuple_fraction,
-                                                      top_union,
-                                                      refnames_tlist));
+            lplan = recurse_union_children(op->larg, root,
+                                           tuple_fraction,
+                                           top_union,
+                                           refnames_tlist,
+                                           &lsort);
+            rplan = recurse_union_children(op->rarg, root,
+                                           tuple_fraction,
+                                           top_union,
+                                           refnames_tlist,
+                                           &rsort);
+            /*
+             * Propagate whether all the descendents are sorted with same
+             *  sortClause.
+             */
+            if (lsort != NIL && equal(lsort, rsort))
+                *child_sortclauses = lsort;
+            return list_concat(lplan, rplan);        }    }
@@ -715,13 +944,16 @@ recurse_union_children(Node *setOp, PlannerInfo *root,
tuple_fraction,                                             top_union->colTypes,
    top_union->colCollations,
 
+                                             top_union->groupClauses,
false,-1,                                             refnames_tlist,
 
-                                             &child_sortclauses, NULL));
+                                             child_sortclauses, NULL));}/* * Add nodes to the given plan tree to
unique-ifythe result of a UNION.
 
+ *
+ * If the sortClause is given, we assume the plan is already sorted on it. */static Plan
*make_union_unique(SetOperationStmt*op, Plan *plan,
 
@@ -731,9 +963,11 @@ make_union_unique(SetOperationStmt *op, Plan *plan,    List       *groupList;    double
dNumGroups;   long        numGroups;
 
+    bool        sort_needed = true;    /* Identify the grouping semantics */
-    groupList = generate_setop_grouplist(op, plan->targetlist);
+    groupList = generate_setop_grouplist(op->groupClauses,
+                                         plan->targetlist, *sortClauses);    /* punt if nothing to group on (can this
happen?)*/    if (groupList == NIL)
 
@@ -742,6 +976,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan,        return plan;    }
+    if (*sortClauses && equal(*sortClauses, groupList))
+        sort_needed = false;
+        /*     * XXX for the moment, take the number of distinct groups as equal to the     * total input size, ie,
theworst case.  This is too conservative, but we
 
@@ -756,7 +993,8 @@ make_union_unique(SetOperationStmt *op, Plan *plan,    numGroups = (long) Min(dNumGroups, (double)
LONG_MAX);   /* Decide whether to hash or sort */
 
-    if (choose_hashed_setop(root, groupList, plan,
+    if (sort_needed &&
+        choose_hashed_setop(root, groupList, plan,                            dNumGroups, dNumGroups, tuple_fraction,
                         "UNION"))    {
 
@@ -778,7 +1016,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan,    else    {        /* Sort and Unique */
-        plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan);
+        if (sort_needed)
+            plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan);
+                plan = (Plan *) make_unique(plan, groupList);        plan->plan_rows = dNumGroups;        /* We know
thesort order of the result */
 
@@ -1145,23 +1385,34 @@ generate_append_tlist(List *colTypes, List *colCollations, * the parser output representation
doesn'tinclude a tlist for each * setop.  So what we need to do here is copy that list and install * proper
sortgrouprefsinto it and into the targetlist.
 
+ *
+ * sortClause is consulted if provided to avoid re-sorting with different
+ * orderings on the same keys later. */static List *
-generate_setop_grouplist(SetOperationStmt *op, List *targetlist)
+generate_setop_grouplist(List  *groupClauses, List *targetlist, 
+                         List *sortClauses){
-    List       *grouplist = (List *) copyObject(op->groupClauses);
+    List       *grouplist = (List *) copyObject(groupClauses);    ListCell   *lg;
+    ListCell   *ls = NULL;    ListCell   *lt;    Index        refno = 1;    lg = list_head(grouplist);
+
+    if (sortClauses)
+        ls = list_head(sortClauses);
+    foreach(lt, targetlist)    {        TargetEntry *tle = (TargetEntry *) lfirst(lt);
-        SortGroupClause *sgc;
+        SortGroupClause *sgc, *sgc_sort = NULL;
-        /* tlist shouldn't have any sortgrouprefs yet */
-        Assert(tle->ressortgroupref == 0);
+        /*
+         * tlist shouldn't have any sortgrouprefs yet, or should be unchanged
+         */
+        Assert(tle->ressortgroupref == 0 || tle->ressortgroupref == refno);        if (tle->resjunk)
continue;           /* ignore resjunk columns */
 
@@ -1174,6 +1425,45 @@ generate_setop_grouplist(SetOperationStmt *op, List *targetlist)        /* we could use
assignSortGroupRefhere, but seems a bit silly */        sgc->tleSortGroupRef = tle->ressortgroupref = refno++;
 
+        
+        if (ls)
+        {
+            /*
+             * If sortClauses is provided, try to adjust the sorting order to
+             * get the chance for omitting sorting for grouping later.
+             */
+            sgc_sort = (SortGroupClause *) lfirst(ls);
+            ls = lnext(ls);
+            if (sgc->sortop != sgc_sort->sortop)
+            {
+                Oid reverse = InvalidOid;
+                Oid opfamily, opcintype;
+                int16 strategy;
+                
+                if (get_ordering_op_properties(sgc->sortop,
+                                &opfamily, &opcintype, &strategy))
+                {
+                    switch (strategy)
+                    {
+                        case BTLessStrategyNumber:
+                            strategy = BTGreaterStrategyNumber; break;
+                        case BTGreaterStrategyNumber:
+                            strategy = BTLessStrategyNumber; break;
+                    }
+
+                    reverse = get_opfamily_member(opfamily,
+                                                  opcintype,
+                                                  opcintype,
+                                                  strategy);
+                    if (sgc_sort->sortop == reverse)
+                    {
+                        sgc->sortop = reverse;
+                        sgc->nulls_first = sgc_sort->nulls_first;
+                    }
+                }
+            }
+        }
+                }    Assert(lg == NULL);    return grouplist;

Re: Using indices for UNION.

From

Peter Eisentraut

Date:

14 November 2013, 04:20:57

On Wed, 2013-11-13 at 17:25 +0900, Kyotaro HORIGUCHI wrote:

> Added explicit cast there and rebased to current master.
> Checked no new warning by this patch.
> make check succeeded at both $(src) and $(src)/src/test.

This patch also has whitespace errors detected by git diff --check.

Re: Using indices for UNION.

From

Kyotaro HORIGUCHI

Date:

15 November 2013, 06:59:00

Hello, As I mentioned on another thread, I'll reorganize patches
including this and repost them.

Please wait for a while.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Using indices for UNION.

From

Tom Lane

Date:

17 November 2013, 18:01:22

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> [ union_uses_idx_v2_20131113.patch ]

I'm aware that you said you were going to refactor this, but I took a
quick look through it anyway.  I don't think mere refactoring is going
to make me happy with it :-(.

The basic problem here, as well as with some of the hackery that's
been proposed recently in planner.c, is that we can't really get much
further with improving this layer of the planner until we bite the
bullet and convert this layer to work with Paths not finished Plans.
Look at what you're proposing here: do a complete planning cycle on
the subquery (underneath which both ordered and unordered Paths will
be considered, and one or the other will get thrown away).  Then do
*another* complete planning cycle with ordered output forced.  Then
compare costs of the results.  This is hugely inefficient, and it
produces far-from-ideal results too, because you're forced to make a
decision right there on which subquery plan to use; you can't make the
globally optimal choice after considering costs of all the subqueries.
What we need to do is fix things so we can get multiple Paths out of
the subquery planning step, some ordered and some not.  Then we could
construct Paths for both the brute force and merge-append styles of
computing the UNION result, and finally choose the cheapest Path at the
top level.

The same goes for hacking around in grouping_planner.  That function
is well past the point of collapsing of its own weight; we need
to invest some effort in refactoring before we can afford to add
much more complexity there.  And I think the logical way to refactor
is to rewrite the code in a Path-generation-and-comparison style.
(Actually, grouping_planner would need to be fixed first, since
that's what would have to produce the Paths we're hoping to compare
in prepunion.)

I had hoped to make some progress on that rewrite this past summer,
but ended up with no time to work on it :-(.  There's going to be
a lot of boring infrastructure work before we see much in the way
of user-visible improvement, I'm afraid, so it's kind of hard to
make time for it.
        regards, tom lane

Re: Using indices for UNION.

From

Kyotaro HORIGUCHI

Date:

19 November 2013, 10:24:31

Thank you for looking in detail for this patch, and giving
thoughtful advices.

> I'm aware that you said you were going to refactor this, but I took a
> quick look through it anyway.  I don't think mere refactoring is going
> to make me happy with it :-(.

Ouch.. Anyway I've refactored this patch altogether with the
another 'unique index' patch and re-splitted into three
parts. One is the 'unique-index stuff' and the second is 'Adding
pathkeys and unique info into struct Plan' patch and the third is
'maily-in-prepunion stuff'. They will be sent in other messages
although for the scarce chance to be picked up :-)

> The basic problem here, as well as with some of the hackery that's
> been proposed recently in planner.c, is that we can't really get much
> further with improving this layer of the planner until we bite the
> bullet and convert this layer to work with Paths not finished Plans.

It is right but hard to go. I would be able to put my little
power on the issue but the planner hardly seems to be get
refactored gradually.

> Look at what you're proposing here: do a complete planning cycle on
> the subquery (underneath which both ordered and unordered Paths will
> be considered, and one or the other will get thrown away).  Then do
> *another* complete planning cycle with ordered output forced.  Then
> compare costs of the results.

Exactly.

> This is hugely inefficient, and it produces far-from-ideal
> results too, because you're forced to make a decision right
> there on which subquery plan to use; you can't make the
> globally optimal choice after considering costs of all the
> subqueries.

Umm. Yes, I know I've done it in somewhat brute way and it costs
a little too much if the subqueries of UNION is rather large, but
I suppose it comparably small to the whole execution time for
most cases.

Putting it aside, from the viewpoint of global-optimizations,
current planner also doesn't do such optimizations.  Moreover it
doesn't consider sorted plans possibly available and
effective. Ignoring the additional planning cost (:-), for the
UNION queries with LIMITS on large partitioned tables with
apropriate indexes (mmm. I suppose it is not so narrow gate..),
the query runtime should be extremely shortend.

>  What we need to do is fix things so we can get multiple Paths
> out of the subquery planning step, some ordered and some not.
> Then we could construct Paths for both the brute force and
> merge-append styles of computing the UNION result, and finally
> choose the cheapest Path at the top level.

Agreed with it at a glance. It sounds quite reasonable. I've been
a bit worried about that the paths and plans seem to be fused in
some extent in grouping_planner, and prepunion
(plan_set_operations) is jumping over path-generation phase to
make plans directly. (And about that I pushed it more on the
way..)

> The same goes for hacking around in grouping_planner.  That function
> is well past the point of collapsing of its own weight; we need
> to invest some effort in refactoring before we can afford to add
> much more complexity there.  And I think the logical way to refactor
> is to rewrite the code in a Path-generation-and-comparison style.
> (Actually, grouping_planner would need to be fixed first, since
> that's what would have to produce the Paths we're hoping to compare
> in prepunion.)

It seems necessary but also seems far far away and hard to
go.

> I had hoped to make some progress on that rewrite this past summer,
> but ended up with no time to work on it :-(.  There's going to be
> a lot of boring infrastructure work before we see much in the way
> of user-visible improvement, I'm afraid, so it's kind of hard to
> make time for it.

Anyway, because I've happened to pass too close by the issue, I
will consider on that for some time (although I would hardly come
up with spiffy solution in a short time.)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Using indices for UNION.

From

Kyotaro HORIGUCHI

Date:

19 November 2013, 11:42:19

Hello, I've totally refactored the series of pathes and cut out
the appropriate portion as 'UNION stuff'.

This patch rquires another 'pathkeys expansion using fully-orderd
index' patch to work.

http://www.postgresql.org/message-id/20131119.203516.251520490.horiguchi.kyotaro@lab.ntt.co.jp

1. plan_with_pathkeys_v1_20131119.patch
This is a patch adding pathkeys (and uniqueness) information onstruct Plan to do what the latter part of
grouping_plannerdoeson current_pathkeys in seemingly autonomous way. As in theprevious discussion, this is in a sense a
baddirection togo. But this patch make the path manipulations occured in thelatter of grouping_planner simple and would
reduceextra sortingand uniq'ing.
 

2. union_uses_idx_v3_20131119.patch
This is made to apply after the first patch above. The core of"Using indices for UNION". This is - as in the
previousdiscussion- also not in so smart way to do that. Most of all,this patch runs subquery_planner additionally for
UNIONwithsome conditions. Nevertheless, the expected gain should not besmall.
 


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 5947e5b..d8a17a0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -98,11 +98,13 @@ static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);static IndexScan
*make_indexscan(List*qptlist, List *qpqual, Index scanrelid,               Oid indexid, List *indexqual, List
*indexqualorig,              List *indexorderby, List *indexorderbyorig,
 
+               List *pathkeys, bool uniquely_ordered,               ScanDirection indexscandir);static IndexOnlyScan
*make_indexonlyscan(List*qptlist, List *qpqual,                   Index scanrelid, Oid indexid,                   List
*indexqual,List *indexorderby,                   List *indextlist,
 
+                   List *pathkeys, bool uniquely_ordered,                   ScanDirection indexscandir);static
BitmapIndexScan*make_bitmap_indexscan(Index scanrelid, Oid indexid,                      List *indexqual,
 
@@ -150,8 +152,8 @@ static MergeJoin *make_mergejoin(List *tlist,               bool *mergenullsfirst,
Plan*lefttree, Plan *righttree,               JoinType jointype);
 
-static Sort *make_sort(PlannerInfo *root, Plan *lefttree, int numCols,
-          AttrNumber *sortColIdx, Oid *sortOperators,
+static Sort *make_sort(PlannerInfo *root, Plan *lefttree, List *pathkeys,
+          int numCols,  AttrNumber *sortColIdx, Oid *sortOperators,          Oid *collations, bool *nullsFirst,
 double limit_tuples);static Plan *prepare_sort_from_pathkeys(PlannerInfo *root,
 
@@ -750,6 +752,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)    plan->qual = NIL;
plan->lefttree= NULL;    plan->righttree = NULL;
 
+    plan->pathkeys = pathkeys;
+    plan->is_unique = false;    /* Compute sort column info, and adjust MergeAppend's tlist as needed */    (void)
prepare_sort_from_pathkeys(root,plan, pathkeys,
 
@@ -810,7 +814,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)        /* Now, insert a
Sortnode if subplan isn't sufficiently ordered */        if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
 
-            subplan = (Plan *) make_sort(root, subplan, numsortkeys,
+            subplan = (Plan *) make_sort(root, subplan, pathkeys, numsortkeys,
sortColIdx,sortOperators,                                         collations, nullsFirst,
         best_path->limit_tuples);
 
@@ -1263,6 +1267,8 @@ create_indexscan_plan(PlannerInfo *root,
fixed_indexquals,                                               fixed_indexorderbys,
       best_path->indexinfo->indextlist,
 
+                                                best_path->path.pathkeys,
+                                                false,
best_path->indexscandir);   else        scan_plan = (Scan *) make_indexscan(tlist,
 
@@ -1273,6 +1279,8 @@ create_indexscan_plan(PlannerInfo *root,
stripped_indexquals,                                           fixed_indexorderbys,
      indexorderbys,
 
+                                            best_path->path.pathkeys,
+                                            false,
best_path->indexscandir);   copy_path_costsize(&scan_plan->plan, &best_path->path);
 
@@ -3245,6 +3253,8 @@ make_indexscan(List *qptlist,               List *indexqualorig,               List
*indexorderby,              List *indexorderbyorig,
 
+               List *pathkeys,
+               bool  uniquely_ordered,               ScanDirection indexscandir){    IndexScan  *node =
makeNode(IndexScan);
@@ -3255,6 +3265,8 @@ make_indexscan(List *qptlist,    plan->qual = qpqual;    plan->lefttree = NULL;
plan->righttree= NULL;
 
+    plan->pathkeys = pathkeys;
+    plan->is_unique = uniquely_ordered;    node->scan.scanrelid = scanrelid;    node->indexid = indexid;
node->indexqual= indexqual;
 
@@ -3274,6 +3286,8 @@ make_indexonlyscan(List *qptlist,                   List *indexqual,                   List
*indexorderby,                  List *indextlist,
 
+                   List *pathkeys,
+                   bool  uniquely_ordered,                   ScanDirection indexscandir){    IndexOnlyScan *node =
makeNode(IndexOnlyScan);
@@ -3284,6 +3298,8 @@ make_indexonlyscan(List *qptlist,    plan->qual = qpqual;    plan->lefttree = NULL;
plan->righttree= NULL;
 
+    plan->pathkeys = pathkeys;
+    plan->is_unique = uniquely_ordered;    node->scan.scanrelid = scanrelid;    node->indexid = indexid;
node->indexqual= indexqual;
 
@@ -3753,8 +3769,8 @@ make_mergejoin(List *tlist, * limit_tuples is as for cost_sort (in particular, pass -1 if no
limit)*/static Sort *
 
-make_sort(PlannerInfo *root, Plan *lefttree, int numCols,
-          AttrNumber *sortColIdx, Oid *sortOperators,
+make_sort(PlannerInfo *root, Plan *lefttree, List *pathkeys,
+          int numCols,  AttrNumber *sortColIdx, Oid *sortOperators,          Oid *collations, bool *nullsFirst,
 double limit_tuples){
 
@@ -3776,6 +3792,8 @@ make_sort(PlannerInfo *root, Plan *lefttree, int numCols,    plan->qual = NIL;    plan->lefttree
=lefttree;    plan->righttree = NULL;
 
+    plan->pathkeys = pathkeys;
+    plan->is_unique = false;    node->numCols = numCols;    node->sortColIdx = sortColIdx;    node->sortOperators =
sortOperators;
@@ -4125,7 +4143,7 @@ make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, List *pathkeys,
                 &nullsFirst);    /* Now build the Sort node */
 
-    return make_sort(root, lefttree, numsortkeys,
+    return make_sort(root, lefttree, pathkeys, numsortkeys,                     sortColIdx, sortOperators, collations,
                   nullsFirst, limit_tuples);}
 
@@ -4147,6 +4165,7 @@ make_sort_from_sortclauses(PlannerInfo *root, List *sortcls, Plan *lefttree)    Oid
*sortOperators;   Oid           *collations;    bool       *nullsFirst;
 
+    List        *pathkeys;    /* Convert list-ish representation to arrays wanted by executor */    numsortkeys =
list_length(sortcls);
@@ -4168,7 +4187,9 @@ make_sort_from_sortclauses(PlannerInfo *root, List *sortcls, Plan *lefttree)
numsortkeys++;   }
 
-    return make_sort(root, lefttree, numsortkeys,
+    pathkeys = make_pathkeys_for_sortclauses(root, sortcls, sub_tlist);
+
+    return make_sort(root, lefttree, pathkeys, numsortkeys,                     sortColIdx, sortOperators, collations,
                   nullsFirst, -1.0);}
 
@@ -4199,6 +4220,7 @@ make_sort_from_groupcols(PlannerInfo *root,    Oid           *sortOperators;    Oid
*collations;   bool       *nullsFirst;
 
+    List       *pathkeys;    /* Convert list-ish representation to arrays wanted by executor */    numsortkeys =
list_length(groupcls);
@@ -4220,10 +4242,17 @@ make_sort_from_groupcols(PlannerInfo *root,        sortOperators[numsortkeys] = grpcl->sortop;
     collations[numsortkeys] = exprCollation((Node *) tle->expr);        nullsFirst[numsortkeys] = grpcl->nulls_first;
 
+
+        /* This tlist should not be bound with any other orderig clause */
+        Assert(tle->ressortgroupref == 0);
+        tle->ressortgroupref = grpcl->tleSortGroupRef;
+        numsortkeys++;    }
-    return make_sort(root, lefttree, numsortkeys,
+    pathkeys = make_pathkeys_for_sortclauses(root, groupcls, sub_tlist);
+
+    return make_sort(root, lefttree, pathkeys, numsortkeys,                     sortColIdx, sortOperators, collations,
                   nullsFirst, -1.0);}
 
@@ -4339,6 +4368,16 @@ make_agg(PlannerInfo *root, List *tlist, List *qual,    plan->lefttree = lefttree;
plan->righttree= NULL;
 
+    /*
+     * We can safely assume that the lefttree and therefore the result is
+     * sorted on group pathkeys and unique if given aggstrategy is AGG_SORTED.
+     */
+    if (node->aggstrategy == AGG_SORTED)
+        plan->pathkeys =  root->group_pathkeys;
+    else
+        plan->pathkeys =  NIL;
+    plan->is_unique = true;
+    return node;}
@@ -4448,6 +4487,13 @@ make_group(PlannerInfo *root,    plan->lefttree = lefttree;    plan->righttree = NULL;
+    /*
+     * We assume that lefttree is ordered on the same pathkeys with that this
+     * group node wants.
+     */
+    plan->pathkeys = lefttree->pathkeys;
+    plan->is_unique = true;
+    return node;}
@@ -4486,6 +4532,8 @@ make_unique(Plan *lefttree, List *distinctList)    plan->qual = NIL;    plan->lefttree =
lefttree;   plan->righttree = NULL;
 
+    plan->pathkeys = lefttree->pathkeys;
+    plan->is_unique = true;    /*     * convert SortGroupClause list into arrays of attr indexes and equality
@@ -4669,6 +4717,8 @@ make_limit(Plan *lefttree, Node *limitOffset, Node *limitCount,    plan->qual = NIL;
plan->lefttree= lefttree;    plan->righttree = NULL;
 
+    plan->pathkeys = lefttree->pathkeys;
+    plan->is_unique = lefttree->is_unique;    node->limitOffset = limitOffset;    node->limitCount = limitCount;
@@ -4717,6 +4767,10 @@ make_result(PlannerInfo *root,    plan->qual = NIL;    plan->lefttree = subplan;
plan->righttree= NULL;
 
+
+    /* this plan emits at most one row when subplan is NULL */
+    if (subplan == NULL) plan->is_unique = true;
+    node->resconstantqual = resconstantqual;    return node;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d8aa35d..a09d38f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1011,6 +1011,19 @@ inheritance_planner(PlannerInfo *root)
SS_assign_special_param(root));}
+static bool
+plan_is_ordered(Plan *plan, List *pathkeys)
+{
+    if (pathkeys_contained_in(pathkeys, plan->pathkeys))
+        return true;
+
+    if (plan->pathkeys && plan->is_unique &&
+        pathkeys_contained_in(plan->pathkeys, pathkeys))
+        return true;
+
+    return false;
+}
+/*-------------------- * grouping_planner *      Perform planning steps related to grouping, aggregation, etc.
@@ -1039,7 +1052,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)    int64        count_est = 0;
double       limit_tuples = -1.0;    Plan       *result_plan;
 
-    List       *current_pathkeys;    double        dNumGroups = 0;    bool        use_hashed_distinct = false;    bool
      tested_hashed_distinct = false;
 
@@ -1081,15 +1093,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
  &set_sortclauses);        /*
 
-         * Calculate pathkeys representing the sort order (if any) of the set
-         * operation's result.  We have to do this before overwriting the sort
-         * key information...
-         */
-        current_pathkeys = make_pathkeys_for_sortclauses(root,
-                                                         set_sortclauses,
-                                                    result_plan->targetlist);
-
-        /*         * We should not need to call preprocess_targetlist, since we must be         * in a SELECT query
node.   Instead, use the targetlist returned by         * plan_set_operations (since this tells whether it returned
any
@@ -1442,15 +1445,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
         tlist,                                                 &agg_costs,
   best_path);
 
-        if (result_plan != NULL)
-        {
-            /*
-             * optimize_minmax_aggregates generated the full plan, with the
-             * right tlist, and it has no sort order.
-             */
-            current_pathkeys = NIL;
-        }
-        else
+        if (result_plan == NULL)        {            /*             * Normal case --- create a plan according to
query_planner's
@@ -1459,11 +1454,10 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)            bool
need_sort_for_grouping= false;            result_plan = create_plan(root, best_path);
 
-            current_pathkeys = best_path->pathkeys;            /* Detect if we'll need an explicit sort for grouping
*/           if (parse->groupClause && !use_hashed_grouping &&
 
-              !pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
+                !plan_is_ordered(result_plan, root->group_pathkeys))            {
need_sort_for_grouping= true;
 
@@ -1541,37 +1535,20 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
extract_grouping_ops(parse->groupClause),                                               numGroups,
                         result_plan);
 
-                /* Hashed aggregation produces randomly-ordered results */
-                current_pathkeys = NIL;            }            else if (parse->hasAggs)            {
/*Plain aggregate plan --- sort if needed */                AggStrategy aggstrategy;
 
-                if (parse->groupClause)
+                aggstrategy = parse->groupClause ? AGG_SORTED : AGG_PLAIN;
+                if (aggstrategy == AGG_SORTED && need_sort_for_grouping)                {
-                    if (need_sort_for_grouping)
-                    {
-                        result_plan = (Plan *)
-                            make_sort_from_groupcols(root,
-                                                     parse->groupClause,
-                                                     groupColIdx,
-                                                     result_plan);
-                        current_pathkeys = root->group_pathkeys;
-                    }
-                    aggstrategy = AGG_SORTED;
-
-                    /*
-                     * The AGG node will not change the sort ordering of its
-                     * groups, so current_pathkeys describes the result too.
-                     */
-                }
-                else
-                {
-                    aggstrategy = AGG_PLAIN;
-                    /* Result will be only one row anyway; no sort order */
-                    current_pathkeys = NIL;
+                    result_plan = (Plan *)
+                        make_sort_from_groupcols(root,
+                                                 parse->groupClause,
+                                                 groupColIdx,
+                                                 result_plan);                }                result_plan = (Plan *)
make_agg(root,
@@ -1601,7 +1578,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
        parse->groupClause,                                                 groupColIdx,
                result_plan);
 
-                    current_pathkeys = root->group_pathkeys;                }                result_plan = (Plan *)
make_group(root,
@@ -1612,7 +1588,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
extract_grouping_ops(parse->groupClause),                                                 dNumGroups,
                              result_plan);
 
-                /* The Group node won't change sort ordering */            }            else if (root->hasHavingQual)
         {
 
@@ -1722,13 +1697,14 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
                 result_plan,                                                        window_pathkeys,
                                    -1.0);
 
-                    if (!pathkeys_contained_in(window_pathkeys,
-                                               current_pathkeys))
-                    {
-                        /* we do indeed need to sort */
+
+                    /*
+                     * we do indeed need to sort if result_plan is not ordered
+                     * on window_pathkeys
+                     */
+                    if (!plan_is_ordered(result_plan, window_pathkeys))                        result_plan = (Plan *)
sort_plan;
-                        current_pathkeys = window_pathkeys;
-                    }
+                    /* In either case, extract the per-column information */
get_column_info_for_window(root,wc, tlist,                                               sort_plan->numCols,
 
@@ -1792,12 +1768,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)        long        numDistinctRows;
     /*
 
-         * If there was grouping or aggregation, use the current number of
-         * rows as the estimated number of DISTINCT rows (ie, assume the
-         * result was already mostly unique).  If not, use the number of
+         * If result_plan emits already distinct'ed rows, use the current
+         * number of rows as the estimated number of DISTINCT rows (ie, assume
+         * the result was already unique).  If not, use the number of         * distinct-groups calculated previously.
       */
 
-        if (parse->groupClause || root->hasHavingQual || parse->hasAggs)
+        if (result_plan->is_unique)            dNumDistinctRows = result_plan->plan_rows;        else
dNumDistinctRows= dNumGroups;
 
@@ -1822,7 +1798,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
result_plan->total_cost,                                      result_plan->startup_cost,
      result_plan->total_cost,
 
-                                       current_pathkeys,
+                                       result_plan->pathkeys,                                       dNumDistinctRows);
      }
 
@@ -1840,8 +1816,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
extract_grouping_ops(parse->distinctClause),                                           numDistinctRows,
                          result_plan);
 
-            /* Hashed aggregation produces randomly-ordered results */
-            current_pathkeys = NIL;        }        else        {
@@ -1865,29 +1839,23 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)            else
needed_pathkeys= root->distinct_pathkeys;
 
-            if (!pathkeys_contained_in(needed_pathkeys, current_pathkeys))
+            if (!plan_is_ordered(result_plan, needed_pathkeys))            {
-                if (list_length(root->distinct_pathkeys) >=
-                    list_length(root->sort_pathkeys))
-                    current_pathkeys = root->distinct_pathkeys;
-                else
-                {
-                    current_pathkeys = root->sort_pathkeys;
-                    /* Assert checks that parser didn't mess up... */
-                    Assert(pathkeys_contained_in(root->distinct_pathkeys,
-                                                 current_pathkeys));
-                }
-
+                /* Assert checks that parser didn't mess up... */
+                Assert(pathkeys_contained_in(root->distinct_pathkeys,
+                                             needed_pathkeys));
+                Assert(pathkeys_contained_in(root->sort_pathkeys,
+                                             needed_pathkeys));                result_plan = (Plan *)
make_sort_from_pathkeys(root,                                                              result_plan,
 
-                                                            current_pathkeys,
+                                                            needed_pathkeys,
                   -1.0);            }
 
-            result_plan = (Plan *) make_unique(result_plan,
-                                               parse->distinctClause);
+            if (!result_plan->is_unique)
+                result_plan = (Plan *) make_unique(result_plan,
+                                                   parse->distinctClause);            result_plan->plan_rows =
dNumDistinctRows;
-            /* The Unique node won't change sort ordering */        }    }
@@ -1897,13 +1865,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)     */    if (parse->sortClause)
{
-        if (!pathkeys_contained_in(root->sort_pathkeys, current_pathkeys))
+        if (!plan_is_ordered(result_plan, root->sort_pathkeys))        {            result_plan = (Plan *)
make_sort_from_pathkeys(root,                                                          result_plan,
                                   root->sort_pathkeys,
limit_tuples);
-            current_pathkeys = root->sort_pathkeys;        }    }
@@ -1914,18 +1881,10 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)     * ModifyTable node instead.)
*/   if (parse->rowMarks)
 
-    {        result_plan = (Plan *) make_lockrows(result_plan,
root->rowMarks,                                            SS_assign_special_param(root));
 
-        /*
-         * The result can no longer be assumed sorted, since locking might
-         * cause the sort key columns to be replaced with new values.
-         */
-        current_pathkeys = NIL;
-    }
-    /*     * Finally, if there is a LIMIT/OFFSET clause, add the LIMIT node.     */
@@ -1942,7 +1901,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)     * Return the actual output
orderingin query_pathkeys for possible use by     * an outer query level.     */
 
-    root->query_pathkeys = current_pathkeys;
+    root->query_pathkeys = result_plan->pathkeys;    return result_plan;}
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 44ea0b7..ced07d9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -101,6 +101,8 @@ typedef struct Plan     */    double        plan_rows;        /* number of rows plan is expected to
emit*/    int            plan_width;        /* average row width in bytes */
 
+    List       *pathkeys;        /* ordered on this pathkeys if any */
+    bool        is_unique;        /* emits unique result */    /*     * Common structural data for all Plan types.
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index d8a17a0..8c60b2c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -813,7 +813,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
numsortkeys* sizeof(bool)) == 0);        /* Now, insert a Sort node if subplan isn't sufficiently ordered */
 
-        if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
+        if (!path_is_ordered(subpath, pathkeys))            subplan = (Plan *) make_sort(root, subplan, pathkeys,
numsortkeys,                                        sortColIdx, sortOperators,
collations,nullsFirst,
 
@@ -1151,6 +1151,7 @@ create_indexscan_plan(PlannerInfo *root,    List       *fixed_indexquals;    List
*fixed_indexorderbys;   ListCell   *l;
 
+    bool        uniquely_ordered = false;    /* it should be a base rel... */    Assert(baserelid > 0);
@@ -1258,6 +1259,20 @@ create_indexscan_plan(PlannerInfo *root,            replace_nestloop_params(root, (Node *)
indexorderbys);   }
 
+    /*
+     * XXX: This is rather tricky. IndexPath's pathkeys may be both superset
+     * (including the same) or subset of key columns of the index. This path
+     * will emit distnct'edly ordered rows when the pathkeys contains the key
+     * columns and the index is fully ordered on the key columns.
+     *
+     * See the point calling truncate_useless_pathkeys in build_index_paths()
+     * for detail.
+     */
+    if (list_length(best_path->path.pathkeys) >=
+        best_path->indexinfo->ncolumns &&
+        best_path->indexinfo->full_ordered)
+        uniquely_ordered = true;
+    /* Finally ready to build the plan node */    if (indexonly)        scan_plan = (Scan *)
make_indexonlyscan(tlist,
@@ -1268,7 +1283,7 @@ create_indexscan_plan(PlannerInfo *root,
fixed_indexorderbys,                                           best_path->indexinfo->indextlist,
                       best_path->path.pathkeys,
 
-                                                false,
+                                                uniquely_ordered,
best_path->indexscandir);   else        scan_plan = (Scan *) make_indexscan(tlist,
 
@@ -1280,7 +1295,7 @@ create_indexscan_plan(PlannerInfo *root,
fixed_indexorderbys,                                           indexorderbys,
best_path->path.pathkeys,
 
-                                            false,
+                                            uniquely_ordered,
best_path->indexscandir);   copy_path_costsize(&scan_plan->plan, &best_path->path);
 
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a09d38f..a337c84 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1075,15 +1075,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)        List       *set_sortclauses;
    /*
 
-         * If there's a top-level ORDER BY, assume we have to fetch all the
-         * tuples.    This might be too simplistic given all the hackery below
-         * to possibly avoid the sort; but the odds of accurate estimates here
-         * are pretty low anyway.
-         */
-        if (parse->sortClause)
-            tuple_fraction = 0.0;
-
-        /*         * Construct the plan for set operations.  The result will not need         * any work except
perhapsa top-level sort and/or LIMIT.  Note that         * any special work for recursive unions is the responsibility
of
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..9fe0b66 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -29,6 +29,7 @@#include "postgres.h"#include <limits.h>
+#include <math.h>#include "access/heapam.h"#include "access/htup_details.h"
@@ -44,6 +45,7 @@#include "optimizer/planner.h"#include "optimizer/prep.h"#include "optimizer/tlist.h"
+#include "optimizer/paths.h"#include "parser/parse_coerce.h"#include "parser/parsetree.h"#include "utils/lsyscache.h"
@@ -60,7 +62,7 @@ typedef structstatic Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,
 double tuple_fraction,                       List *colTypes, List *colCollations,
 
-                       bool junkOK,
+                       List *groupClauses, bool junkOK,                       int flag, List *refnames_tlist,
            List **sortClauses, double *pNumGroups);static Plan *generate_recursion_plan(SetOperationStmt *setOp,
 
@@ -78,7 +80,8 @@ static Plan *generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,static List
*recurse_union_children(Node*setOp, PlannerInfo *root,                       double tuple_fraction,
 SetOperationStmt *top_union,
 
-                       List *refnames_tlist);
+                       List *refnames_tlist,
+                       List **child_sortclause);static Plan *make_union_unique(SetOperationStmt *op, Plan *plan,
          PlannerInfo *root, double tuple_fraction,                  List **sortClauses);
 
@@ -97,7 +100,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      bool
flag,                     List *input_plans,                      List *refnames_tlist);
 
-static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
+static List *generate_setop_grouplist(List *groupClauses, List *targetlist,
+                                      List *sortClauses);static void expand_inherited_rtentry(PlannerInfo *root,
RangeTblEntry*rte,                         Index rti);static void make_inh_translation_list(Relation oldrelation,
 
@@ -152,6 +156,15 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction,    Assert(parse->distinctClause ==
NIL);   /*
 
+     * If there's a top-level ORDER BY, assume we have to fetch all the tuples
+     * except for UNION. This might be too simplistic given all the hackery
+     * below to possibly avoid the sort; but the odds of accurate estimates
+     * here are pretty low anyway.
+     */
+    if (parse->sortClause && topop->op != SETOP_UNION)
+        tuple_fraction = 0.0;
+
+    /*     * We'll need to build RelOptInfos for each of the leaf subqueries, which     * are RTE_SUBQUERY rangetable
entriesin this Query.  Prepare the index     * arrays for that.
 
@@ -186,18 +199,49 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction,     */    return
recurse_set_operations((Node*) topop, root, tuple_fraction,                                  topop->colTypes,
topop->colCollations,
+                                  topop->groupClauses,                                  true, -1,
           leftmostQuery->targetList,                                  sortClauses, NULL);}/*
 
+ * save_plannerglobal
+ *
+ * save planner global to allow multiple calls of subquery_planner at the same
+ * global status. This is done apartly from copyObject so as to do medium
+ * shallow copying.
+ */
+static PlannerGlobal *
+save_plannerglobal(const PlannerGlobal *in)
+{
+    PlannerGlobal *save = makeNode(PlannerGlobal);
+
+    save->boundParams        = in->boundParams;
+    save->subplans            = list_copy(in->subplans);
+    save->subroots            = list_copy(in->subroots);
+    save->rewindPlanIDs        = bms_copy(in->rewindPlanIDs);
+    save->finalrtable        = list_copy(in->finalrtable);
+    save->finalrowmarks        = list_copy(in->finalrowmarks); 
+    save->resultRelations    = list_copy(in->resultRelations);
+    save->relationOids        = list_copy(in->relationOids);
+    save->invalItems        = list_copy(in->invalItems);
+    save->nParamExec        = in->nParamExec;
+    save->lastPHId            = in->lastPHId;
+    save->lastRowMarkId        = in->lastRowMarkId;
+    save->transientPlan        = in->transientPlan;
+
+    return save;
+}
+
+/* * recurse_set_operations *      Recursively handle one step in a tree of set operations * * tuple_fraction:
fractionof tuples we expect to retrieve from node * colTypes: OID list of set-op's result column datatypes *
colCollations:OID list of set-op's result column collations
 
+ * groupClauses: parent setop's grouping clause. * junkOK: if true, child resjunk columns may be left in the result *
flag:if >= 0, add a resjunk output column indicating value of flag * refnames_tlist: targetlist to take column names
from
@@ -215,7 +259,7 @@ static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double
tuple_fraction,                      List *colTypes, List *colCollations,
 
-                       bool junkOK,
+                       List *groupClauses, bool junkOK,                       int flag, List *refnames_tlist,
            List **sortClauses, double *pNumGroups){
 
@@ -224,13 +268,20 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,        RangeTblRef *rtr = (RangeTblRef *)
setOp;       RangeTblEntry *rte = root->simple_rte_array[rtr->rtindex];        Query       *subquery = rte->subquery;
 
+        Query       *pdquery = NULL; /* 'pd' comes from 'pushed down' */        RelOptInfo *rel;
-        PlannerInfo *subroot;
+        PlannerInfo *subroot,
+                    *pdsubroot;
+        PlannerGlobal *pdglob;        Plan       *subplan,
-                   *plan;
+                   *plan,
+                   *pdplan;
+        bool        try_pushdown = false;        Assert(subquery != NULL);
+        *sortClauses = NIL;
+        /*         * We need to build a RelOptInfo for each leaf subquery.  This isn't         * used for anything
here,but it carries the subroot data structures
 
@@ -243,12 +294,146 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,        /*         * Generate plan for
primitivesubquery
 
+         *
+         * Consider pusing down ORDER BY or groupings of the parent set
+         * operation onto this subquery.         */
+        if(root->parse->sortClause && !subquery->sortClause &&
+           tuple_fraction >= 1)
+        {
+            ListCell *lcs, *lcd;
+            int refno = 1;
+
+            /*
+             * Check if the sort cluase of the root can be applied on this
+             * subquery. All non-junk tlist items shoud be simple Var and
+             * their match in types and ressortgroupref should be in their
+             * appearance order.
+             */
+            try_pushdown = true;
+            forboth(lcs, root->parse->targetList, lcd, subquery->targetList)
+            {
+                TargetEntry *ste = (TargetEntry*) lfirst(lcs);
+                TargetEntry *dte = (TargetEntry*) lfirst(lcd);
+                Node *sexpr = (Node*)ste->expr;
+                Node *dexpr = (Node*)dte->expr;
+
+                if (ste->resjunk && dte->resjunk) continue;
+
+                if (ste->resjunk != dte->resjunk ||
+                    !IsA(sexpr, Var) || !IsA(dexpr, Var) ||
+                    exprType(sexpr) != exprType(dexpr) ||
+                    (root->parse->sortClause &&
+                     ste->ressortgroupref != 0 &&
+                     ste->ressortgroupref != refno++))
+                {
+                    try_pushdown = false;
+                    break;
+                }
+            }
+        }
+
+        if (try_pushdown)
+        {
+            pdquery = copyObject(subquery);
+
+            if (!pdquery->sortClause)
+            {
+                ListCell *lcs, *lcd;
+
+                pdquery->sortClause = root->parse->sortClause;
+
+                forboth(lcs, root->parse->targetList,
+                        lcd, pdquery->targetList)
+                {
+                    TargetEntry *ste = (TargetEntry *) lfirst(lcs);
+                    TargetEntry *dte = (TargetEntry *) lfirst(lcd);
+                    dte->ressortgroupref = ste->ressortgroupref;
+                }
+            }
+            
+            /*
+             * Pushing down groupings. Set operations doesn't accept
+             * distinct clauses.
+             */
+            if (!pdquery->setOperations &&
+                !pdquery->distinctClause && groupClauses)
+            {
+                if (pdquery->sortClause)
+                {
+                    ListCell *lct;
+                    int i = 1;
+
+                    /* Check if groupClause is coappliable with sortClauses */
+                    foreach (lct, pdquery->targetList)
+                    {
+                        TargetEntry *te = (TargetEntry *) lfirst(lct);
+
+                        if (te->resjunk) continue;
+                        if (te->ressortgroupref != 0 &&
+                            te->ressortgroupref != i)
+                            break;
+                        i++;
+                    }
+
+                    if (!lct)
+                        pdquery->distinctClause =
+                            generate_setop_grouplist(groupClauses,
+                                                     pdquery->targetList,
+                                                     pdquery->sortClause);
+                }
+            }
+            if (tuple_fraction >= 1 &&
+                !pdquery->limitCount && !pdquery->limitOffset)
+                pdquery->limitCount =
+                    (Node*)makeConst(INT8OID, -1, InvalidOid, sizeof(int64),
+                                     Int64GetDatum(tuple_fraction),
+                                     false, FLOAT8PASSBYVAL);
+
+            /*
+             * Save current PlnannerGlobal. subquery_planner could modify
+             * PlannerGlobal so make a shallow backup in order to make the
+             * alternative plans selectable.
+             */
+            pdglob = save_plannerglobal(root->glob);
+        }
+                subplan = subquery_planner(root->glob, subquery,                                   root,
                   false, tuple_fraction,                                   &subroot);
 
+        if (try_pushdown)
+        {
+            pdplan = subquery_planner(pdglob, pdquery,
+                                      root,
+                                      false, tuple_fraction,
+                                      &pdsubroot);
+
+            if (pdplan->total_cost < subplan->total_cost)
+            {
+                subquery = pdquery;
+                subplan = pdplan;
+                subroot = pdsubroot;
+                /*
+                 * Glob cannot be moved because this is referred to from many
+                 * places by its address. So set the address of the original
+                 * glob to subroot, then copy new values there.
+                 */
+                subroot->glob = root->glob;
+                *root->glob = *pdglob;
+            }
+        }
+
+        /*
+         * This plan is sorted on sortClause if any, else sorted
+         * on distinctClause.
+         */
+        if (subquery->sortClause)
+            *sortClauses = subquery->sortClause;
+        else
+            *sortClauses = subquery->distinctClause;
+        /* Save subroot and subplan in RelOptInfo for setrefs.c */        rel->subplan = subplan;        rel->subroot
=subroot;
 
@@ -291,10 +476,28 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,                              subplan);
  /*
 
-         * We don't bother to determine the subquery's output ordering since
-         * it won't be reflected in the set-op result anyhow.
+         * This plan's pathkeys and is_unique are apparently same to them of
+         * subplan unless type casting or coercing collations.         */
-        *sortClauses = NIL;
+        if (subplan->pathkeys &&
+            tlist_same_datatypes(plan->targetlist, colTypes, junkOK) &&
+            tlist_same_collations(plan->targetlist, colCollations, junkOK))
+        {
+            ListCell *lcp, *lcr;
+
+            forboth (lcp, plan->targetlist, lcr, root->parse->targetList)
+            {
+                TargetEntry *tep = (TargetEntry*) lfirst(lcp);
+                TargetEntry *ter = (TargetEntry*) lfirst(lcr);
+
+                tep->ressortgroupref = ter->ressortgroupref;
+            }
+            plan->pathkeys =
+                make_pathkeys_for_sortclauses(root,
+                                              *sortClauses,
+                                              plan->targetlist);
+            plan->is_unique = subplan->is_unique;
+        }        return plan;    }
@@ -379,12 +582,14 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root,     */    lplan =
recurse_set_operations(setOp->larg,root, tuple_fraction,                                   setOp->colTypes,
setOp->colCollations,
+                                   setOp->groupClauses,                                   false, -1,
               refnames_tlist, sortClauses, NULL);    /* The right plan will want to look at the left one ... */
root->non_recursive_plan= lplan;    rplan = recurse_set_operations(setOp->rarg, root, tuple_fraction,
               setOp->colTypes, setOp->colCollations,
 
+                                   setOp->groupClauses,                                   false, -1,
               refnames_tlist, sortClauses, NULL);    root->non_recursive_plan = NULL;
 
@@ -409,7 +614,8 @@ generate_recursion_plan(SetOperationStmt *setOp, PlannerInfo *root,        double
dNumGroups;       /* Identify the grouping semantics */
 
-        groupList = generate_setop_grouplist(setOp, tlist);
+        groupList =
+            generate_setop_grouplist(setOp->groupClauses, tlist, NULL);        /* We only support hashing here */
 if (!grouping_is_hashable(groupList))
 
@@ -452,20 +658,7 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root,    List       *planlist;    List
*tlist;    Plan       *plan;
 
-
-    /*
-     * If plain UNION, tell children to fetch all tuples.
-     *
-     * Note: in UNION ALL, we pass the top-level tuple_fraction unmodified to
-     * each arm of the UNION ALL.  One could make a case for reducing the
-     * tuple fraction for later arms (discounting by the expected size of the
-     * earlier arms' results) but it seems not worth the trouble. The normal
-     * case where tuple_fraction isn't already zero is a LIMIT at top level,
-     * and passing it down as-is is usually enough to get the desired result
-     * of preferring fast-start plans.
-     */
-    if (!op->all)
-        tuple_fraction = 0.0;
+    List       *lsort, *rsort;    /*     * If any of my children are identical UNION nodes (same op, all-flag, and
@@ -475,34 +668,106 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root,     */    planlist =
list_concat(recurse_union_children(op->larg,root,                                                  tuple_fraction,
 
-                                                  op, refnames_tlist),
+                                                  op, refnames_tlist,
+                                                  &lsort),                           recurse_union_children(op->rarg,
root,                                                 tuple_fraction, 
-                                                  op, refnames_tlist));
+                                                  op, refnames_tlist,
+                                                  &rsort));    /*
-     * Generate tlist for Append plan node.
+     * Generate tlist for Append/MergeAppend plan node.     *
-     * The tlist for an Append plan isn't important as far as the Append is
-     * concerned, but we must make it look real anyway for the benefit of the
-     * next plan level up.
+     * The tlist for an Append/MergeAppend plan isn't important as far as the
+     * they are concerned, but we must make it look real anyway for the
+     * benefit of the next plan level up.     */    tlist = generate_append_tlist(op->colTypes, op->colCollations,
false,                                 planlist, refnames_tlist);
 
-    /*
-     * Append the child results together.
-     */
-    plan = (Plan *) make_append(planlist, tlist);
+    if (lsort != NIL && equal(lsort, rsort))
+    {
+        /*
+         * Generate MergeAppend plan if both children are sorted on the same
+         * sort clause or groupClauses.
+         */
+        ListCell *lc, *slc;
+        int i = 0;
+        double total_size;
+        Plan *p;
+        List *distinctClause;
+
+        MergeAppend *node = makeNode(MergeAppend);
+        node->plan.targetlist = tlist;
+        node->plan.qual = NIL;
+        node->plan.lefttree = NULL;
+        node->plan.righttree = NULL;
+        node->mergeplans = planlist;
+        node->numCols = list_length(root->parse->targetList);
+        node->sortColIdx    =
+            (AttrNumber*)palloc(sizeof(AttrNumber) * node->numCols);
+        node->sortOperators = (Oid*)palloc(sizeof(Oid) * node->numCols);
+        node->collations    = (Oid*)palloc(sizeof(Oid) * node->numCols);
+        node->nullsFirst    = (bool*)palloc(sizeof(bool) * node->numCols);
+
+        distinctClause = 
+            generate_setop_grouplist(op->groupClauses,
+                                     node->plan.targetlist,
+                                     root->parse->sortClause);
+        forboth (slc, distinctClause, lc, node->plan.targetlist)
+        {
+            TargetEntry *te = (TargetEntry*) lfirst(lc);
+            SortGroupClause *sgc = (SortGroupClause *) lfirst(slc);
+
+            Assert(sgc->tleSortGroupRef == te->ressortgroupref);
+            node->sortColIdx[i] = i + 1;
+            node->sortOperators[i] = sgc->sortop;
+            node->collations[i] = exprCollation((Node*)te->expr);
+            node->nullsFirst[i] = sgc->nulls_first;
+            te->ressortgroupref = sgc->tleSortGroupRef;
+            i++;
+        }
+        node->plan.pathkeys =
+            make_pathkeys_for_sortclauses(root, lsort, tlist);
+
+        lc = list_head(node->mergeplans);
+        p = (Plan*) lfirst(lc);
+        node->plan.startup_cost = p->startup_cost;
+        node->plan.total_cost   = p->total_cost;
+        node->plan.plan_rows    = p->plan_rows;
+        total_size             = p->plan_rows * p->plan_width;
+        for_each_cell(lc, lnext(lc))
+        {
+            p = (Plan*) lfirst(lc);
+            node->plan.total_cost += p->total_cost;
+            node->plan.plan_rows  += p->plan_rows;
+            total_size            += p->plan_rows * p->plan_width;
+        }
+        node->plan.plan_width = rint(total_size / node->plan.plan_rows);
+        *sortClauses = root->parse->sortClause;
-    /*
-     * For UNION ALL, we just need the Append plan.  For UNION, need to add
-     * node(s) to remove duplicates.
-     */
-    if (op->all)
-        *sortClauses = NIL;        /* result of UNION ALL is always unsorted */
+        plan = (Plan*)node;
+        if (!op->all)
+            plan = make_union_unique(op, plan, root, tuple_fraction,
+                                     &distinctClause);
+    }    else
-        plan = make_union_unique(op, plan, root, tuple_fraction, sortClauses);
+    {
+        /*
+         * Append the child results together.
+         */
+        plan = (Plan *) make_append(planlist, tlist);
+
+        /*
+         * For UNION ALL, we just need the Append plan.  For UNION, need to
+         * add node(s) to remove duplicates.
+         */
+        *sortClauses = NIL;
+
+        if (!op->all)
+            plan = make_union_unique(op, plan, root, tuple_fraction,
+                                     sortClauses);
+    }    /*     * Estimate number of groups if caller wants it.  For now we just assume
@@ -544,12 +809,14 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,    lplan =
recurse_set_operations(op->larg,root,                                   0.0 /* all tuples needed */ ,
               op->colTypes, op->colCollations,
 
+                                   op->groupClauses,                                   false, 0,
           refnames_tlist,                                   &child_sortclauses, &dLeftGroups);    rplan =
recurse_set_operations(op->rarg,root,                                   0.0 /* all tuples needed */ ,
               op->colTypes, op->colCollations,
 
+                                   op->groupClauses,                                   false, 1,
           refnames_tlist,                                   &child_sortclauses, &dRightGroups);
 
@@ -589,8 +856,7 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,    plan = (Plan *)
make_append(planlist,tlist);    /* Identify the grouping semantics */
 
-    groupList = generate_setop_grouplist(op, tlist);
-
+    groupList = generate_setop_grouplist(op->groupClauses, tlist, NULL);    /* punt if nothing to group on (can this
happen?)*/    if (groupList == NIL)    {
 
@@ -678,9 +944,11 @@ static List *recurse_union_children(Node *setOp, PlannerInfo *root,                       double
tuple_fraction,                      SetOperationStmt *top_union,
 
-                       List *refnames_tlist)
+                       List *refnames_tlist,
+                       List **child_sortclauses){
-    List       *child_sortclauses;
+    List       *lplan, *rplan;
+    List       *lsort, *rsort;    if (IsA(setOp, SetOperationStmt))    {
@@ -690,15 +958,23 @@ recurse_union_children(Node *setOp, PlannerInfo *root,            (op->all == top_union->all ||
op->all)&&            equal(op->colTypes, top_union->colTypes))        {
 
-            /* Same UNION, so fold children into parent's subplan list */
-            return list_concat(recurse_union_children(op->larg, root,
-                                                      tuple_fraction,
-                                                      top_union,
-                                                      refnames_tlist),
-                               recurse_union_children(op->rarg, root,
-                                                      tuple_fraction,
-                                                      top_union,
-                                                      refnames_tlist));
+            lplan = recurse_union_children(op->larg, root,
+                                           tuple_fraction,
+                                           top_union,
+                                           refnames_tlist,
+                                           &lsort);
+            rplan = recurse_union_children(op->rarg, root,
+                                           tuple_fraction,
+                                           top_union,
+                                           refnames_tlist,
+                                           &rsort);
+            /*
+             * Propagate whether all the descendents are sorted with same
+             *  sortClause.
+             */
+            if (lsort != NIL && equal(lsort, rsort))
+                *child_sortclauses = lsort;
+            return list_concat(lplan, rplan);        }    }
@@ -715,13 +991,16 @@ recurse_union_children(Node *setOp, PlannerInfo *root,
tuple_fraction,                                             top_union->colTypes,
    top_union->colCollations,
 
+                                             top_union->groupClauses,
false,-1,                                             refnames_tlist,
 
-                                             &child_sortclauses, NULL));
+                                             child_sortclauses, NULL));}/* * Add nodes to the given plan tree to
unique-ifythe result of a UNION.
 
+ *
+ * Given a sortClause, we assume the plan is already sorted on it. */static Plan *make_union_unique(SetOperationStmt
*op,Plan *plan,
 
@@ -731,9 +1010,11 @@ make_union_unique(SetOperationStmt *op, Plan *plan,    List       *groupList;    double
dNumGroups;   long        numGroups;
 
+    bool        sort_needed = true;    /* Identify the grouping semantics */
-    groupList = generate_setop_grouplist(op, plan->targetlist);
+    groupList = generate_setop_grouplist(op->groupClauses,
+                                         plan->targetlist, *sortClauses);    /* punt if nothing to group on (can this
happen?)*/    if (groupList == NIL)
 
@@ -742,6 +1023,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan,        return plan;    }
+    if (*sortClauses && equal(*sortClauses, groupList))
+        sort_needed = false;
+    /*     * XXX for the moment, take the number of distinct groups as equal to the     * total input size, ie, the
worstcase.  This is too conservative, but we
 
@@ -756,7 +1040,8 @@ make_union_unique(SetOperationStmt *op, Plan *plan,    numGroups = (long) Min(dNumGroups, (double)
LONG_MAX);   /* Decide whether to hash or sort */
 
-    if (choose_hashed_setop(root, groupList, plan,
+    if (sort_needed &&
+        choose_hashed_setop(root, groupList, plan,                            dNumGroups, dNumGroups, tuple_fraction,
                         "UNION"))    {
 
@@ -778,7 +1063,9 @@ make_union_unique(SetOperationStmt *op, Plan *plan,    else    {        /* Sort and Unique */
-        plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan);
+        if (sort_needed)
+            plan = (Plan *) make_sort_from_sortclauses(root, groupList, plan);
+        plan = (Plan *) make_unique(plan, groupList);        plan->plan_rows = dNumGroups;        /* We know the sort
orderof the result */
 
@@ -1145,23 +1432,34 @@ generate_append_tlist(List *colTypes, List *colCollations, * the parser output representation
doesn'tinclude a tlist for each * setop.  So what we need to do here is copy that list and install * proper
sortgrouprefsinto it and into the targetlist.
 
+ *
+ * sortClause is consulted if provided to avoid re-sorting in different
+ * directions on the same keys later. */static List *
-generate_setop_grouplist(SetOperationStmt *op, List *targetlist)
+generate_setop_grouplist(List  *groupClauses, List *targetlist, 
+                         List *sortClauses){
-    List       *grouplist = (List *) copyObject(op->groupClauses);
+    List       *grouplist = (List *) copyObject(groupClauses);    ListCell   *lg;
+    ListCell   *ls = NULL;    ListCell   *lt;    Index        refno = 1;    lg = list_head(grouplist);
+
+    if (sortClauses)
+        ls = list_head(sortClauses);
+    foreach(lt, targetlist)    {        TargetEntry *tle = (TargetEntry *) lfirst(lt);
-        SortGroupClause *sgc;
+        SortGroupClause *sgc, *sgc_sort = NULL;
-        /* tlist shouldn't have any sortgrouprefs yet */
-        Assert(tle->ressortgroupref == 0);
+        /*
+         * tlist shouldn't have any sortgrouprefs yet, or should be unchanged
+         */
+        Assert(tle->ressortgroupref == 0 || tle->ressortgroupref == refno);        if (tle->resjunk)
continue;           /* ignore resjunk columns */
 
@@ -1174,6 +1472,45 @@ generate_setop_grouplist(SetOperationStmt *op, List *targetlist)        /* we could use
assignSortGroupRefhere, but seems a bit silly */        sgc->tleSortGroupRef = tle->ressortgroupref = refno++;
 
+        
+        if (ls)
+        {
+            /*
+             * If sortClauses is provided, try to adjust the sorting order to
+             * get the chance for omitting sorting for grouping later.
+             */
+            sgc_sort = (SortGroupClause *) lfirst(ls);
+            ls = lnext(ls);
+            if (sgc->sortop != sgc_sort->sortop)
+            {
+                Oid reverse = InvalidOid;
+                Oid opfamily, opcintype;
+                int16 strategy;
+                
+                if (get_ordering_op_properties(sgc->sortop,
+                                &opfamily, &opcintype, &strategy))
+                {
+                    switch (strategy)
+                    {
+                        case BTLessStrategyNumber:
+                            strategy = BTGreaterStrategyNumber; break;
+                        case BTGreaterStrategyNumber:
+                            strategy = BTLessStrategyNumber; break;
+                    }
+
+                    reverse = get_opfamily_member(opfamily,
+                                                  opcintype,
+                                                  opcintype,
+                                                  strategy);
+                    if (sgc_sort->sortop == reverse)
+                    {
+                        sgc->sortop = reverse;
+                        sgc->nulls_first = sgc_sort->nulls_first;
+                    }
+                }
+            }
+        }
+                }    Assert(lg == NULL);    return grouplist;

Re: Using indices for UNION.

From

Kyotaro HORIGUCHI

Date:

22 November 2013, 07:59:43

Hello, As I was reconsidering after your advise, I came up with
an idea of hacking on query tree tranaformation phase in
subquery_planner, not on that of plan generation phase as
before. (And the older patch is totally scrapped:-)


Currently flatten_simple_union_all() flattens 'simple' UNION
'ALL' at the top level of 'current' query. I noticed that a
little modification there also could flatten simple UNION (w/o
ALL), and it has finally almost same effect regarding more usage
of indexes for UNION. Additionally, applying still more the
'UNION ALL and inheritance table' patch, even coveres UNION's on
inheritance tables.

This patch is far small and neat (though I say so myself :-) than
the previous ones. The following plans we got from unpatched
PostgreSQL.

original=# explain (analyze on, costs off)
|     select a, b from cu11 union select a, b from cu12 order by a, b limit 10;
|                               QUERY PLAN
| -----------------------------------------------------------------------------
|  Limit (actual time=272.252..272.259 rows=10 loops=1)
|  ->  Unique (actual time=272.251..272.258 rows=10 loops=1)
|   ->  Sort (actual time=272.249..272.252 rows=10 loops=1)
|         Sort Key: cu11.a, cu11.b
|         Sort Method: external sort  Disk: 3520kB
|         ->  Append (actual time=0.020..70.935 rows=200000 loops=1)
|          ->  Seq Scan on cu11 (actual time=0.019..26.741 rows=100000 loops=1)
|          ->  Seq Scan on cu12 (actual time=0.006..25.183 rows=100000 loops=1)
|  Total runtime: 273.162 ms

And of course for * (= a, b, c, d) this,

original=# explain (analyze on, costs off)
|            select * from cu11 union select * from cu12 order by a, b limit 10;
|                                QUERY PLAN
| ----------------------------------------------------------------------------
|  Limit (actual time=234.880..234.891 rows=10 loops=1)
|  ->  Unique (actual time=234.879..234.889 rows=10 loops=1)
|   ->  Sort (actual time=234.878..234.881 rows=10 loops=1)
|        Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
|        Sort Method: external sort  Disk: 5280kB
|        ->  Append (actual time=0.017..49.502 rows=200000 loops=1)
|           ->  Seq Scan on cu11 (actual time=0.017..15.649 rows=100000 loops=1)
|           ->  Seq Scan on cu12 (actual time=0.007..14.939 rows=100000 loops=1)
|  Total runtime: 236.029 ms

We'll get the following plan changed with this patch plus the
latest "pathkey_and_uniqueindx_v4_20131122.patch" patcch. (It got
a small fix on pathkeys fixation for index paths)

https://commitfest.postgresql.org/action/patch_view?id=1280

patched=# explain (analyze on, costs off)
|           select * from cu11 union select * from cu12 order by a, b limit 10;
|                                  QUERY PLAN
| ------------------------------------------------------------------------------
|  Limit (actual time=0.059..0.081 rows=10 loops=1)
|  ->  Unique (actual time=0.058..0.079 rows=10 loops=1)
|   ->  Merge Append (actual time=0.056..0.065 rows=10 loops=1)
|         Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
|         ->  Index Scan ... on cu11 (actual time=0.032..0.037 rows=10 loops=1)
|         ->  Index Scan ... on cu12 (actual time=0.021..0.021 rows=1 loops=1)
|  Total runtime: 0.162 ms

Moreover, with the 'UNION ALL' patches , say,
union_inh_idx_typ1_v1_20131024.patch and
union_inh_idx_typ1_add_v1_20131024.patch, we'll get the following
plan for UNION on inhertance tables,

https://commitfest.postgresql.org/action/patch_view?id=1270


patched2=# explain (analyze on, costs off)
|          select * from pu1 union select * from pu2 order by a, b limit 10;
|                             QUERY PLAN
| ---------------------------------------------------------------------------
|  Limit (actual time=0.209..0.234 rows=10 loops=1)
|  ->  Unique (actual time=0.206..0.230 rows=10 loops=1)
|   ->  Merge Append (actual time=0.204..0.216 rows=10 loops=1)
|        Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
|        ->  Index Scan..on pu1 (actual time=0.004..0.004 rows=0 loops=1)
|        ->  Index Scan..on cu11 (actual time=0.050..0.058 rows=10 loops=1)
|        ->  Index Scan..on cu12 (actual time=0.052..0.052 rows=1 loops=1)
|        ->  Index Scan..on pu2 (actual time=0.002..0.002 rows=0 loops=1)
|        ->  Index Scan..on cu21 (actual time=0.046..0.046 rows=1 loops=1)
|        ->  Index Scan..on cu22 (actual time=0.046..0.046 rows=1 loops=1)
|  Total runtime: 0.627 ms

The attached union_uses_idx_v4_20131122.patch is changed as
follows.
- Rebased to current master.
- Scrapped the whold old stuff.
- flatten_simple_union_all() is renamed to  flatten_simple_union() and modified to do so.  UNION is  flattend only if
sortClauseexists and distinctClause is NIL.
 

======
Test tables can be created following the command below,

| create table pu1 (a int not null, b int not null, c int, d text);
| create unique index i_pu1_ab on pu1 (a, b);
| create table cu11 (like pu1 including all) inherits (pu1);
| create table cu12 (like pu1 including all) inherits (pu1);
| create table pu2 (like pu1 including all);
| create table cu21 (like pu2 including all) inherits (pu2);
| create table cu22 (like pu2 including all) inherits (pu2);
| insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(000000, 099999) a);
| insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(100000, 199999) a);
| insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(200000, 299999) a);
| insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(300000, 399999) a);
=====

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6670794..7262c1b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -353,13 +353,14 @@ subquery_planner(PlannerGlobal *glob, Query *parse,        pull_up_subqueries(root, (Node *)
parse->jointree);   /*
 
-     * If this is a simple UNION ALL query, flatten it into an appendrel. We
-     * do this now because it requires applying pull_up_subqueries to the leaf
-     * queries of the UNION ALL, which weren't touched above because they
-     * weren't referenced by the jointree (they will be after we do this).
+     * If this is a simple UNION/UNION ALL query, flatten it into an
+     * appendrel. We do this now because it requires applying
+     * pull_up_subqueries to the leaf queries of the UNION/UNION ALL, which
+     * weren't touched above because they weren't referenced by the jointree
+     * (they will be after we do this).     */    if (parse->setOperations)
-        flatten_simple_union_all(root);
+        flatten_simple_union(root);    /*     * Detect whether any rangetable entries are RTE_JOIN kind; if not, we
can
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 485ac31..adf341b 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -32,6 +32,7 @@#include "optimizer/subselect.h"#include "optimizer/tlist.h"#include "parser/parse_relation.h"
+#include "parser/parse_clause.h"#include "parser/parsetree.h"#include "rewrite/rewriteManip.h"
@@ -81,8 +82,8 @@ static void make_setop_translation_list(Query *query, Index newvarno,static bool
is_simple_subquery(Query*subquery, RangeTblEntry *rte,                   JoinExpr *lowest_outer_join);static bool
is_simple_union_all(Query*subquery);
 
-static bool is_simple_union_all_recurse(Node *setOp, Query *setOpQuery,
-                            List *colTypes);
+static bool is_simple_union_recurse(SetOperationStmt *topop, Node *setOp,
+                                    Query *setOpQuery, List *colTypes);static bool is_safe_append_member(Query
*subquery);staticbool jointree_contains_lateral_outer_refs(Node *jtnode, bool restricted,
     Relids safe_upper_varnos);
 
@@ -1428,12 +1429,14 @@ is_simple_union_all(Query *subquery)        return false;    /* Recursively check the tree of
setoperations */
 
-    return is_simple_union_all_recurse((Node *) topop, subquery,
-                                       topop->colTypes);
+    if (topop->op != SETOP_UNION || !topop->all) return false;
+    return is_simple_union_recurse(topop, (Node *) topop, subquery,
+                                   topop->colTypes);}static bool
-is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes)
+is_simple_union_recurse(SetOperationStmt *topop, Node *setOp,
+                        Query *setOpQuery, List *colTypes){    if (IsA(setOp, RangeTblRef))    {
@@ -1451,13 +1454,16 @@ is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes)    {
SetOperationStmt*op = (SetOperationStmt *) setOp;
 
-        /* Must be UNION ALL */
-        if (op->op != SETOP_UNION || !op->all)
+        /* All SetOps under topop are UNION and ->all is same to topop */
+        if (op->op != SETOP_UNION || op->all != topop->all)            return false;        /* Recurse to check inputs
*/
-        return is_simple_union_all_recurse(op->larg, setOpQuery, colTypes) &&
-            is_simple_union_all_recurse(op->rarg, setOpQuery, colTypes);
+        return
+            is_simple_union_recurse(topop, op->larg,
+                                    setOpQuery, colTypes) &&
+            is_simple_union_recurse(topop, op->rarg,
+                                    setOpQuery, colTypes);    }    else    {
@@ -1896,23 +1902,22 @@ pullup_replace_vars_subquery(Query *query,                                           NULL);}
-/* * flatten_simple_union_all
- *        Try to optimize top-level UNION ALL structure into an appendrel
+ *        Try to optimize top-level UNION/UNION ALL structure into an appendrel *
- * If a query's setOperations tree consists entirely of simple UNION ALL
- * operations, flatten it into an append relation, which we can process more
- * intelligently than the general setops case.    Otherwise, do nothing.
+ * If a query's setOperations tree consists entirely of simple UNION and UNION
+ * ALL operations, flatten it into an append relation, which we can process
+ * more intelligently than the general setops case. Otherwise, do nothing. * * In most cases, this can succeed only
fora top-level query, because for a * subquery in FROM, the parent query's invocation of pull_up_subqueries would
 
- * already have flattened the UNION via pull_up_simple_union_all.  But there
+ * already have flattened the UNION ALL via pull_up_simple_union_all.  But there * are a few cases we can support here
butnot in that code path, for example * when the subquery also contains ORDER BY. */void
 
-flatten_simple_union_all(PlannerInfo *root)
+flatten_simple_union(PlannerInfo *root){    Query       *parse = root->parse;    SetOperationStmt *topop;
@@ -1932,10 +1937,18 @@ flatten_simple_union_all(PlannerInfo *root)        return;    /*
-     * Recursively check the tree of set operations.  If not all UNION ALL
+     * If topop is not UNION (not likey), punt. Also for UNION(not ALL)
+     * without sortClause or already having distinctClause.
+     */
+    if (topop->op != SETOP_UNION ||
+        (!topop->all && (!parse->sortClause || parse->distinctClause )))
+        return;
+
+    /*
+     * Recursively check the tree of set operations.  If not all UNION(ALL)     * with identical column types, punt.
 */
 
-    if (!is_simple_union_all_recurse((Node *) topop, parse, topop->colTypes))
+    if (!is_simple_union_recurse(topop, (Node *) topop, parse, topop->colTypes))        return;    /*
@@ -1983,6 +1996,16 @@ flatten_simple_union_all(PlannerInfo *root)    parse->setOperations = NULL;    /*
+     * We can create distinctClause using transformDistinctClause() with
+     * pstate == NULL.
+     */
+    if (!topop->all)
+        parse->distinctClause =
+            transformDistinctClause(NULL,
+                                    &parse->targetList, parse->sortClause,
+                                    false);
+
+    /*     * Build AppendRelInfo information, and apply pull_up_subqueries to the     * leaf queries of the UNION ALL.
(We must do that now because they     * weren't previously referenced by the jointree, and so were missed by
 
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 0934e63..32b3bf4 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -24,7 +24,7 @@extern void pull_up_sublinks(PlannerInfo *root);extern void inline_set_returning_functions(PlannerInfo
*root);externNode *pull_up_subqueries(PlannerInfo *root, Node *jtnode);
 
-extern void flatten_simple_union_all(PlannerInfo *root);
+extern void flatten_simple_union(PlannerInfo *root);extern void reduce_outer_joins(PlannerInfo *root);extern Relids
get_relids_in_jointree(Node*jtnode, bool include_joins);extern Relids get_relids_for_join(PlannerInfo *root, int
joinrelid);

Re: Using indices for UNION.

From

Kyotaro HORIGUCHI

Date:

14 January 2014, 09:10:47

This is cont'd from CF3.

http://www.postgresql.org/message-id/20131122.165927.27412386.horiguchi.kyotaro@lab.ntt.co.jp

The issue in brief is that UNION is never flattened differently
to UNION ALL so UNION cannot make use of index scan even if
usable.

This patch flattens UNION likewise currently did for UNION ALL.

This patch needs another 'UNION ALL' patch and further gain with
even another 'Widening application of indices' patch. ('Ready for
Commit' now in CF3..)

http://www.postgresql.org/message-id/20140114.180447.145186052.horiguchi.kyotaro@lab.ntt.co.jp
http://www.postgresql.org/message-id/20140114.180810.122352231.horiguchi.kyotaro@lab.ntt.co.jp


You can see the detailed outlines in the message at here,

http://www.postgresql.org/message-id/20131031.194251.12577697.horiguchi.kyotaro@lab.ntt.co.jp

The attached patche is rebased to current 9.4dev HEAD and make
check at the topmost directory and src/test/isolation are passed
without error.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Using indices for UNION.

From

Kyotaro HORIGUCHI

Date:

14 January 2014, 09:16:28

Sorry, I missed to attach file.

> This is cont'd from CF3.
> 
> http://www.postgresql.org/message-id/20131122.165927.27412386.horiguchi.kyotaro@lab.ntt.co.jp
> 
> The issue in brief is that UNION is never flattened differently
> to UNION ALL so UNION cannot make use of index scan even if
> usable.
> 
> This patch flattens UNION likewise currently did for UNION ALL.
> 
> This patch needs another 'UNION ALL' patch and further gain with
> even another 'Widening application of indices' patch. ('Ready for
> Commit' now in CF3..)
> 
> http://www.postgresql.org/message-id/20140114.180447.145186052.horiguchi.kyotaro@lab.ntt.co.jp
> http://www.postgresql.org/message-id/20140114.180810.122352231.horiguchi.kyotaro@lab.ntt.co.jp
> 
> 
> You can see the detailed outlines in the message at here,
> 
> http://www.postgresql.org/message-id/20131031.194251.12577697.horiguchi.kyotaro@lab.ntt.co.jp
> 
> The attached patche is rebased to current 9.4dev HEAD and make
> check at the topmost directory and src/test/isolation are passed
> without error.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1da4b2f..e89f8b3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -353,13 +353,14 @@ subquery_planner(PlannerGlobal *glob, Query *parse,        pull_up_subqueries(root, (Node *)
parse->jointree);   /*
 
-     * If this is a simple UNION ALL query, flatten it into an appendrel. We
-     * do this now because it requires applying pull_up_subqueries to the leaf
-     * queries of the UNION ALL, which weren't touched above because they
-     * weren't referenced by the jointree (they will be after we do this).
+     * If this is a simple UNION/UNION ALL query, flatten it into an
+     * appendrel. We do this now because it requires applying
+     * pull_up_subqueries to the leaf queries of the UNION/UNION ALL, which
+     * weren't touched above because they weren't referenced by the jointree
+     * (they will be after we do this).     */    if (parse->setOperations)
-        flatten_simple_union_all(root);
+        flatten_simple_union(root);    /*     * Detect whether any rangetable entries are RTE_JOIN kind; if not, we
can
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 1c6083b..04bba48 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -32,6 +32,7 @@#include "optimizer/subselect.h"#include "optimizer/tlist.h"#include "parser/parse_relation.h"
+#include "parser/parse_clause.h"#include "parser/parsetree.h"#include "rewrite/rewriteManip.h"
@@ -81,8 +82,8 @@ static void make_setop_translation_list(Query *query, Index newvarno,static bool
is_simple_subquery(Query*subquery, RangeTblEntry *rte,                   JoinExpr *lowest_outer_join);static bool
is_simple_union_all(Query*subquery);
 
-static bool is_simple_union_all_recurse(Node *setOp, Query *setOpQuery,
-                            List *colTypes);
+static bool is_simple_union_recurse(SetOperationStmt *topop, Node *setOp,
+                                    Query *setOpQuery, List *colTypes);static bool is_safe_append_member(Query
*subquery);staticbool jointree_contains_lateral_outer_refs(Node *jtnode, bool restricted,
     Relids safe_upper_varnos);
 
@@ -1440,12 +1441,14 @@ is_simple_union_all(Query *subquery)        return false;    /* Recursively check the tree of
setoperations */
 
-    return is_simple_union_all_recurse((Node *) topop, subquery,
-                                       topop->colTypes);
+    if (topop->op != SETOP_UNION || !topop->all) return false;
+    return is_simple_union_recurse(topop, (Node *) topop, subquery,
+                                   topop->colTypes);}static bool
-is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes)
+is_simple_union_recurse(SetOperationStmt *topop, Node *setOp,
+                        Query *setOpQuery, List *colTypes){    if (IsA(setOp, RangeTblRef))    {
@@ -1463,13 +1466,16 @@ is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes)    {
SetOperationStmt*op = (SetOperationStmt *) setOp;
 
-        /* Must be UNION ALL */
-        if (op->op != SETOP_UNION || !op->all)
+        /* All SetOps under topop are UNION and ->all is same to topop */
+        if (op->op != SETOP_UNION || op->all != topop->all)            return false;        /* Recurse to check inputs
*/
-        return is_simple_union_all_recurse(op->larg, setOpQuery, colTypes) &&
-            is_simple_union_all_recurse(op->rarg, setOpQuery, colTypes);
+        return
+            is_simple_union_recurse(topop, op->larg,
+                                    setOpQuery, colTypes) &&
+            is_simple_union_recurse(topop, op->rarg,
+                                    setOpQuery, colTypes);    }    else    {
@@ -1908,23 +1914,22 @@ pullup_replace_vars_subquery(Query *query,                                           NULL);}
-/* * flatten_simple_union_all
- *        Try to optimize top-level UNION ALL structure into an appendrel
+ *        Try to optimize top-level UNION/UNION ALL structure into an appendrel *
- * If a query's setOperations tree consists entirely of simple UNION ALL
- * operations, flatten it into an append relation, which we can process more
- * intelligently than the general setops case.    Otherwise, do nothing.
+ * If a query's setOperations tree consists entirely of simple UNION and UNION
+ * ALL operations, flatten it into an append relation, which we can process
+ * more intelligently than the general setops case. Otherwise, do nothing. * * In most cases, this can succeed only
fora top-level query, because for a * subquery in FROM, the parent query's invocation of pull_up_subqueries would
 
- * already have flattened the UNION via pull_up_simple_union_all.  But there
+ * already have flattened the UNION ALL via pull_up_simple_union_all.  But there * are a few cases we can support here
butnot in that code path, for example * when the subquery also contains ORDER BY. */void
 
-flatten_simple_union_all(PlannerInfo *root)
+flatten_simple_union(PlannerInfo *root){    Query       *parse = root->parse;    SetOperationStmt *topop;
@@ -1944,10 +1949,18 @@ flatten_simple_union_all(PlannerInfo *root)        return;    /*
-     * Recursively check the tree of set operations.  If not all UNION ALL
+     * If topop is not UNION (not likey), punt. Also for UNION(not ALL)
+     * without sortClause or already having distinctClause.
+     */
+    if (topop->op != SETOP_UNION ||
+        (!topop->all && (!parse->sortClause || parse->distinctClause )))
+        return;
+
+    /*
+     * Recursively check the tree of set operations.  If not all UNION(ALL)     * with identical column types, punt.
 */
 
-    if (!is_simple_union_all_recurse((Node *) topop, parse, topop->colTypes))
+    if (!is_simple_union_recurse(topop, (Node *) topop, parse, topop->colTypes))        return;    /*
@@ -1995,6 +2008,16 @@ flatten_simple_union_all(PlannerInfo *root)    parse->setOperations = NULL;    /*
+     * We can create distinctClause using transformDistinctClause() with
+     * pstate == NULL.
+     */
+    if (!topop->all)
+        parse->distinctClause =
+            transformDistinctClause(NULL,
+                                    &parse->targetList, parse->sortClause,
+                                    false);
+
+    /*     * Build AppendRelInfo information, and apply pull_up_subqueries to the     * leaf queries of the UNION ALL.
(We must do that now because they     * weren't previously referenced by the jointree, and so were missed by
 
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 0934e63..32b3bf4 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -24,7 +24,7 @@extern void pull_up_sublinks(PlannerInfo *root);extern void inline_set_returning_functions(PlannerInfo
*root);externNode *pull_up_subqueries(PlannerInfo *root, Node *jtnode);
 
-extern void flatten_simple_union_all(PlannerInfo *root);
+extern void flatten_simple_union(PlannerInfo *root);extern void reduce_outer_joins(PlannerInfo *root);extern Relids
get_relids_in_jointree(Node*jtnode, bool include_joins);extern Relids get_relids_for_join(PlannerInfo *root, int
joinrelid);

Re: Using indices for UNION.

From

Andres Freund

Date:

04 April 2014, 16:27:55

Hi,

On 2014-01-14 18:10:40 +0900, Kyotaro HORIGUCHI wrote:
> This is cont'd from CF3.
> 
> http://www.postgresql.org/message-id/20131122.165927.27412386.horiguchi.kyotaro@lab.ntt.co.jp
> 
> The issue in brief is that UNION is never flattened differently
> to UNION ALL so UNION cannot make use of index scan even if
> usable.
> 
> This patch flattens UNION likewise currently did for UNION ALL.
> 
> This patch needs another 'UNION ALL' patch and further gain with
> even another 'Widening application of indices' patch. ('Ready for
> Commit' now in CF3..)
> 
> http://www.postgresql.org/message-id/20140114.180447.145186052.horiguchi.kyotaro@lab.ntt.co.jp
> http://www.postgresql.org/message-id/20140114.180810.122352231.horiguchi.kyotaro@lab.ntt.co.jp
> 
> 
> You can see the detailed outlines in the message at here,
> 
> http://www.postgresql.org/message-id/20131031.194251.12577697.horiguchi.kyotaro@lab.ntt.co.jp
> 
> The attached patche is rebased to current 9.4dev HEAD and make
> check at the topmost directory and src/test/isolation are passed
> without error.

I haven't really followed this topic, so please excuse my ignorance.

This is still marked as "needs review" in
https://commitfest.postgresql.org/action/patch_view?id=1374 . But I am
not sure the patch as is is relevant after
a87c729153e372f3731689a7be007bc2b53f1410?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Using indices for UNION.

From

Tom Lane

Date:

04 April 2014, 17:14:52

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-01-14 18:10:40 +0900, Kyotaro HORIGUCHI wrote:
>> This patch flattens UNION likewise currently did for UNION ALL.

> I haven't really followed this topic, so please excuse my ignorance.

> This is still marked as "needs review" in
> https://commitfest.postgresql.org/action/patch_view?id=1374 . But I am
> not sure the patch as is is relevant after
> a87c729153e372f3731689a7be007bc2b53f1410?

I think it's an independent issue.

After a quick read of the patch (which is really badly explained :-()
I think the idea is that a nest of UNIONs with no datatype conversions
can be flattened into a UNION ALL appendrel, with the required duplicate
elimination handled by sticking a DISTINCT onto the top level.

However, it's not clear to me that this is worth the trouble.  The
DISTINCT would act as an optimization fence preventing the subquery from
being flattened any further, so it doesn't seem like there would be any
global benefit just because it now contains a simple appendrel rather than
a setop tree.  And a nest of conversion-free UNIONs already results in a
plan that's a flat Append followed by sort/uniq, which seems like the same
thing you'd get from the DISTINCT.  So what's the point?
        regards, tom lane

Re: Using indices for UNION.

From

Tom Lane

Date:

04 April 2014, 19:36:36

I wrote:
> However, it's not clear to me that this is worth the trouble.  The
> DISTINCT would act as an optimization fence preventing the subquery from
> being flattened any further, so it doesn't seem like there would be any
> global benefit just because it now contains a simple appendrel rather than
> a setop tree.  And a nest of conversion-free UNIONs already results in a
> plan that's a flat Append followed by sort/uniq, which seems like the same
> thing you'd get from the DISTINCT.  So what's the point?

Oh, after re-reading the earlier part of the thread I get the point ---
after making this change, the planner will consider some plan types for
the DISTINCT that it wouldn't have found in the setop-tree planning path.
Specifically it can make use of a mergeappend of sorted paths for the
individual union leaf queries.  That can't happen in the generic setop
planning code because we have no ability to consider more than one plan
for any leaf query.

I still think this stuff mostly needs to be thrown away and rewritten
in Path-creation style, but that's a long-term project.  In the meantime
this seems like a relatively small increment of complexity, so maybe it's
worth applying.  I'm concerned about the method for building a new
DISTINCT clause though; the best that can be said for that is that it's
a crude hack, and I'm less than convinced that there are no cases where
it'll dump core.
        regards, tom lane

Re: Using indices for UNION.

From

Tom Lane

Date:

04 April 2014, 20:58:04

I wrote:
> I still think this stuff mostly needs to be thrown away and rewritten
> in Path-creation style, but that's a long-term project.  In the meantime
> this seems like a relatively small increment of complexity, so maybe it's
> worth applying.  I'm concerned about the method for building a new
> DISTINCT clause though; the best that can be said for that is that it's
> a crude hack, and I'm less than convinced that there are no cases where
> it'll dump core.

OK, after still more time thinking about it, here's the issues with that
way of generating a DISTINCT clause corresponding to the UNION:

1. Calling transformDistinctClause with a NULL pstate seems pretty unsafe.
It might accidentally fail to fail, today, but it's surely fragile.
It's violating the API contract not only of transformDistinctClause
itself (which is not documented as accepting a NULL pstate) but of
addTargetToGroupList (q.v.).  A minimum requirement before I'd accept a
patch that did that is that it extended the header comments of those
functions to specify under what circumstances a NULL pstate can be passed.
However, that's not the direction to go, because of #2.

2. This approach potentially changes the semantics of the UNION.  (This is
only important for a datatype that has more than one btree equality
operator, but let's posit that.)  transformDistinctClause, as per its
header comment, allows the outer ORDER BY to influence which equality
operator it picks for DISTINCT.  However, in the case of
"(SELECT ... UNION SELECT ...) ORDER BY ...", the UNION semantics were
chosen without reference to the ORDER BY, so it's possible that the
equality operators cited in the SetOperationStmt's groupClauses list are
not what you'd get from applying transformDistinctClause as the patch does.
It is not okay for the planner to override the parser's choice of
semantics like that.

Now I'm fairly sure that planner.c is expecting that the query's
distinctClause be a superset of the sortClause if any (cf comments for
SortGroupClause in parsenodes.h), so it wouldn't be okay to just blindly
build a distinctClause from topop->groupClauses.  I think what you need
to do is check that topop->groupClauses is compatible with the sortClause
if any (skipping the flattening transformation if not) and then build a
distinctClause by extending the sort clause list with any missing items
from topop->groupClauses.  So this is sort of like what
transformDistinctClause does, but different in detail, and the failure
case is to not do the transformation, rather than ereport'ing.  (See also
generate_setop_grouplist, which you could almost use except that it's not
expecting to have to merge with a sortClause list.)

So this is all doable enough, but you're probably going to end up with
a new distinctClause-generating function that's at least twice the size
of the patch's former net code addition.  So the question is whether it's
worth the trouble, considering that all this code has (I hope) a life
expectancy of no more than one or two more release cycles.

I'm going to set the patch back to Waiting on Author, but unless you
are prepared to rewrite the distinctClause creation code in the next
couple of days, you should change it to Returned with Feedback.
        regards, tom lane

Re: Using indices for UNION.

From

Kyotaro HORIGUCHI

Date:

11 April 2014, 08:44:06

Hello. Thank you for the attentive comments.

> I wrote:
> > I still think this stuff mostly needs to be thrown away and rewritten
> > in Path-creation style, but that's a long-term project.  In the meantime
> > this seems like a relatively small increment of complexity, so maybe it's
> > worth applying.  I'm concerned about the method for building a new
> > DISTINCT clause though; the best that can be said for that is that it's
> > a crude hack, and I'm less than convinced that there are no cases where
> > it'll dump core.
> 
> OK, after still more time thinking about it, here's the issues with that
> way of generating a DISTINCT clause corresponding to the UNION:
> 
> 1. Calling transformDistinctClause with a NULL pstate seems pretty unsafe.
> It might accidentally fail to fail, today, but it's surely fragile.
> It's violating the API contract not only of transformDistinctClause
> itself (which is not documented as accepting a NULL pstate) but of
> addTargetToGroupList (q.v.). 

Hmm I'm ashamed to have missed addTargetToGroupList. You are
right about that. I saw only parser_errposition to field the
NULL.. It should be fixed anyway.

>  A minimum requirement before I'd accept a
> patch that did that is that it extended the header comments of those
> functions to specify under what circumstances a NULL pstate can be passed.
> However, that's not the direction to go, because of #2.
> 
> 2. This approach potentially changes the semantics of the UNION.  (This is
> only important for a datatype that has more than one btree equality
> operator, but let's posit that.)  transformDistinctClause, as per its
> header comment, allows the outer ORDER BY to influence which equality
> operator it picks for DISTINCT.  However, in the case of
> "(SELECT ... UNION SELECT ...) ORDER BY ...", the UNION semantics were
> chosen without reference to the ORDER BY,

If my understanding is correct, the query is equivalent to it
without the parens. The ORDER BY belongs to the topmost UNION on
both cases. Actually planner compailns about "(SELECT...UNION
SELECT ... ORDER BY) ORDER BY" that "multiple ORDER BY clauses
not allowed" with/without this patch.

> so it's possible that the
> equality operators cited in the SetOperationStmt's groupClauses list are
> not what you'd get from applying transformDistinctClause as the patch does.

This patch flattenes and the SetOperationStmt was put out along
with its groupClause. But the groupClauses is originally
generated from targetlist in transformSetOperationTree. And the
new DistinctClause is generated also from the same targetlist
consulting to sortClause. They are not in the same order but it
doesn't matter. Is it wrong? If it would make some trouble, I
could add an check it out or count it on making the
DistinctClauses..

Finally, 
explain (costs off) select a, b from c11 union select a, b from c12 order by  b limit 10000;
|                QUERY PLAN                
| -----------------------------------------
|  Limit
|    ->  Sort
|          Sort Key: c11.b
|          ->  HashAggregate
|                Group Key: c11.b, c11.a
|                ->  Append
|                      ->  Seq Scan on c11
|                      ->  Seq Scan on c12

This HashAggregate does DISTINCT which was performed by using
Sort-Unique without this patch, and yields the same result, I
believe.

> It is not okay for the planner to override the parser's choice of
> semantics like that.

As described above, as far as I saw, itself seems to be a problem.

> Now I'm fairly sure that planner.c is expecting that the query's
> distinctClause be a superset of the sortClause if any (cf comments for
> SortGroupClause in parsenodes.h), so it wouldn't be okay to just blindly
> build a distinctClause from topop->groupClauses.

Yes, it could be but counting the sortClause into distinctClause
is crucial factor for getting rid of unnecessary sorting.

>  I think what you need
> to do is check that topop->groupClauses is compatible with the sortClause
> if any (skipping the flattening transformation if not) and then build a
> distinctClause by extending the sort clause list with any missing items
> from topop->groupClauses.

Ah, yes I agree as I described above.

> So this is sort of like what
> transformDistinctClause does, but different in detail, and the failure
> case is to not do the transformation, rather than ereport'ing.  (See also
> generate_setop_grouplist, which you could almost use except that it's not
> expecting to have to merge with a sortClause list.)

Thank you. I'll follow this advise.

> So this is all doable enough, but you're probably going to end up with
> a new distinctClause-generating function that's at least twice the size
> of the patch's former net code addition.  So the question is whether it's
> worth the trouble, considering that all this code has (I hope) a life
> expectancy of no more than one or two more release cycles.

Hmm. Since this is a kind of local optimization, some future
drastic reconstructing might make this useless. But it doesn't
seem the reason not to do this. (Ignoring the size..)

> I'm going to set the patch back to Waiting on Author, but unless you
> are prepared to rewrite the distinctClause creation code in the next
> couple of days, you should change it to Returned with Feedback.

Ok, I agree to the deal. This needs more refinement.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center