Thread: UNION ALL on partitioned tables won't use indices.

UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello, Sorry that it's been a while..

1. Observed symptom

As you know, UNION ALL accompanied with ORDER BY uses indexes if
available.

> uniontest=# EXPLAIN SELECT * FROM c11 UNION ALL SELECT * FROM c12 ORDER BY a;
>                                QUERY PLAN                                
> ---------------------------------------------------------------------------
>  Merge Append  (cost=0.59..10214.10 rows=200000 width=16)
>    Sort Key: c11.a
>    ->  Index Scan using ... on c11  (cost=0.29..3857.04 rows=100000 width=16)
>    ->  Index Scan using ... on c12  (cost=0.29..3857.04 rows=100000 width=16)

So do simple queries on partitioned (inheritance) tables.

> uniontest=# EXPLAIN SELECT * FROM p1 ORDER BY a;
>                                 QUERY PLAN                                
> ------------------------------------------------------------------------------
>  Merge Append  (cost=0.73..11392.19 rows=200001 width=16)
>    Sort Key: p1.a
>    ->  Index Scan using ... on p1  (cost=0.12..8.14 rows=1 width=44)
>    ->  Index Scan using ... on c11  (cost=0.29..3857.04 rows=100000 width=16)
>    ->  Index Scan using ... on c12  (cost=0.29..3857.04 rows=100000 width=16)

Nevertheless, UNION ALL on partitioned tables doesn't. This is
quite unfavourable behavior especially having LIMIT.

>uniontest=# EXPLAIN ANALYZE SELECT * FROM p1
>            UNION ALL SELECT * FROM p2 ORDER BY a LIMIT 10;
>                              QUERY PLAN                       
>------------------------------------------------------------------------
> Limit   (actual time=182.732..182.735 rows=10 loops=1)
>   ->  Sort  (actual time=182.729..182.730 rows=10 loops=1)
>     Sort Key: p1.a
>     Sort Method: top-N heapsort  Memory: 25kB
>     ->  Append  (actual time=0.029..108.109 rows=400000 loops=1)
>       ->  Seq Scan on p1  (actual time=0.001..0.001 rows=0 loops=1)
>       ->  Seq Scan on c11  (actual time=0.027..19.074 rows=100000 loops=1)
>       ->  Seq Scan on c12  (actual time=0.014..16.653 rows=100000 loops=1)
>       ->  Seq Scan on p2  (actual time=0.000..0.000 rows=0 loops=1)
>       ->  Seq Scan on c21  (actual time=0.012..16.677 rows=100000 loops=1)
>       ->  Seq Scan on c22  (actual time=0.012..16.794 rows=100000 loops=1)
> Total runtime: 182.857 ms


2. The cause

In grouping_planner, flattern_simple_union_all creates
appendrelinfos for each subqueries then expand_inherited_tables
furthur expands the parent tables in each subqueries. Finally,
this sequence leaves following structure. Where rte[2] and [3]
are subqueries abandoned on the way pulling up rte[4] and [5].

rte[1] Subquery "SELECT*1", inh = 1  +- appinfo[0] - rte[4] Relation "p1", inh = 1  |               +- appinfo[2] -
rte[6] Relation "p1", inh = 0  |               +- appinfo[3] - rte[7]  Relation "c11", inh = 0  |               +-
appinfo[4]- rte[8]  Relation "c12", inh = 0  +- appinfo[1] - rte[5] Relation "p2", inh = 1                  +-
appinfo[5]- rte[9]  Relation "p1", inh = 0                  +- appinfo[6] - rte[10] Relation "c11", inh = 0
    +- appinfo[7] - rte[11] Relation "c12", inh = 0
 

On the other hand, ec member finally has exprs only for varno =
1, 4 and 5, in other words, it lacks the members for the most
descendant RTEs.  This is because add_child_rel_equivalences()
always inhibits add_eq_member from registering the child_rel's
relid on EC. Conequently these grandchild relations does not find
index pathkeys for themselves.


3. Measures

I could thought up three approaches for the problem.

One is to simplly modifying here to give child flag in the
parameters of add_eq_member accordig to whether the child_rel is
inh or not. This seems to work fine although leaves two levels of
MergeAppends (PATCH-1). So the additional patch is attached to
collapse these MergeAppends (PATCH-2). This gives the same plan
as PATCH-3.

> uniontest=# explain analyze select * from p1 union all
>                             select * from p2 order by a limit 10;
>                                QUERY PLAN                       
> ------------------------------------------------------------------------
>  Limit  (actual time=0.208..0.224 rows=10 loops=1)
>    ->  Merge Append  (actual time=0.205..0.218 rows=10 loops=1)
>      Sort Key: p1.a
>      ->  Merge Append  (actual time=0.110..0.120 rows=10 loops=1)
>        Sort Key: p1.a
>        ->  Index Scan .. on p1  (actual time=0.006..0.006 rows=0 loops=1)
>        ->  Index Scan .. on c11  (actual time=0.054..0.060 rows=10 loops=1)
>        ->  Index Scan .. on c12  (actual time=0.046..0.046 rows=1 loops=1)
>      ->  Merge Append  (actual time=0.093..0.093 rows=1 loops=1)
>        Sort Key: p2.a
>        ->  Index Scan .. on p2  (actual time=0.002..0.002 rows=0 loops=1)
>        ->  Index Scan .. on c21  (actual time=0.047..0.047 rows=1 loops=1)
>        ->  Index Scan .. on c22  (actual time=0.043..0.043 rows=1 loops=1)
>  Total runtime: 0.448 ms


The second is to collapse the appendrel structure shown above to
have only single level inheritance. This also seems to work
fine. This makes the plan with single level
MergeAppend. (PATCH-3)

> uniontest=# EXPLAIN ANALYZE SELECT * FROM p1 UNION ALL
>             SELECT * FROM p2 ORDER BY a LIMIT 10;
>                                   QUERY PLAN                 
> -------------------------------------------------------------------
>  Limit  (actual time=0.095..0.107 rows=10 loops=1)
>    ->  Merge Append  (actual time=0.092..0.102 rows=10 loops=1)
>      Sort Key: p1.a
>      ->  Index Scan ... on p1 (actual time=0.005..0.005 rows=0 loops=1)
>      ->  Index Scan ... on c11  (actual time=0.023..0.030 rows=10 loops=1)
>      ->  Index Scan ... on c12  (actual time=0.020..0.020 rows=1 lops=1)
>      ->  Index Scan ... on p2  (actual time=0.001..0.001 rows=0 loops=1)
>      ->  Index Scan ... on c21  (actual time=0.019..0.019 rows=1 loops=1)
>      ->  Index Scan ... on c22  (actual time=0.019..0.019 rows=1 loops=1)
>  Total runtime: 0.241 ms



The last one is most strait-forward in some sense, and conversely
is most ad hoc measure. Modifying expand_inherited_rtentry() to
pull up adding appendrels if needed, using the similar manner to
PATCH-3. This is added as PATCH-4.


what do you think about this?


Four patches following are attached to this message.

1. union_inh_idx_typ1_v1_20131024.patch     .. PATCH-1 described above.
2. union_inh_idx_typ1_add_v1_20131024.patch .. PATCH-2 described above.
3. union_inh_idx_typ2_v1_20131024.patch     .. PATCH-3 described above.
4. union_inh_idx_typ3_v1_20131024.patch     .. PATCH-4 described above.

regards,
-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 711b161..734ed47 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1940,6 +1940,7 @@ add_child_rel_equivalences(PlannerInfo *root,                Expr       *child_expr;
 Relids        new_relids;                Relids        new_nullable_relids;
 
+                bool        has_children = false;                child_expr = (Expr *)
adjust_appendrel_attrs(root,
@@ -1969,9 +1970,15 @@ add_child_rel_equivalences(PlannerInfo *root,
     child_rel->relids);                }
 
+                /*
+                 * Does this child_rel have children? If and only if so, tell
+                 * add_eq_member to register new_relids to cur_ec.
+                 */
+                has_children =
+                    root->simple_rte_array[child_rel->relid]->inh;                (void) add_eq_member(cur_ec,
child_expr,                                    new_relids, new_nullable_relids,
 
-                                     true, cur_em->em_datatype);
+                                     !has_children, cur_em->em_datatype);            }        }    }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 64b1705..0e3cf4b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -944,7 +944,8 @@ create_merge_append_path(PlannerInfo *root,    MergeAppendPath *pathnode =
makeNode(MergeAppendPath);   Cost        input_startup_cost;    Cost        input_total_cost;
 
-    ListCell   *l;
+    ListCell   *l, *lm;
+    bool        collapse_subpaths = false;    pathnode->path.pathtype = T_MergeAppend;    pathnode->path.parent =
rel;
@@ -953,6 +954,47 @@ create_merge_append_path(PlannerInfo *root,    pathnode->path.pathkeys = pathkeys;
pathnode->subpaths= subpaths;
 
+
+    /*
+     * If subpaths containes MergeAppendPaths already ordered on the pathkeys
+     * of the creating node, they can be expanded onto this node.
+     */
+    foreach (lm, subpaths)
+    {
+        Path *spath = (Path *) lfirst(lm);
+
+        if (IsA(spath, MergeAppendPath) &&
+            pathkeys_contained_in(pathkeys, spath->pathkeys))
+        {
+            collapse_subpaths = true;
+            break;
+        }
+    }
+
+    if (collapse_subpaths)
+    {
+        subpaths = NIL;
+
+        foreach (lm, pathnode->subpaths)
+        {
+            MergeAppendPath *mpath = (MergeAppendPath *) lfirst(lm);
+            ListCell *lcm;
+            
+            if (IsA(mpath, MergeAppendPath) &&
+                pathkeys_contained_in(pathkeys, mpath->path.pathkeys))
+            {
+                foreach (lcm, mpath->subpaths)
+                {
+                    Path *smpath = (Path*) lfirst (lcm);
+                    subpaths = lappend(subpaths, smpath);
+                }
+            }
+            else
+                subpaths = lappend(subpaths, subpaths);
+        }
+        pathnode->subpaths = subpaths;
+    }
+    /*     * Apply query-wide LIMIT if known and path is for sole base relation.     * (Handling this at this low
levelis a bit klugy.) 
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d8aa35d..8167583 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -404,6 +404,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse,    expand_inherited_tables(root);    /*
+     * Collapse multilevel inheritances into fewer levels.
+     *
+     * UNION ALL containing subselects on inherited tables leaves multilevel
+     * inheritance after calling expand_inherited_tables().  Fold them in
+     * order to make MergeAppend plan available for such queries.
+     */
+    collapse_appendrels(root);
+
+    /*     * Set hasHavingQual to remember if HAVING clause is present.  Needed     * because preprocess_expression
willreduce a constant-true condition to     * an empty qual list ... but "HAVING TRUE" is not a semantic no-op.
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..c7933b5 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index              target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1211,6 +1218,131 @@ expand_inherited_tables(PlannerInfo *root)}/*
+ * collapse_appendrels
+ *      Pulling up the descendents by recursively combining successive
+ *      translations.
+ *
+ *      Although the same thing could be done in adjust_appendrel_attrs(),
+ *      doing it here all through at once is more efficient than individually
+ *      (and maybe repetitively) done there.
+ */
+void
+collapse_appendrels(PlannerInfo *root)
+{
+    Index               last_parent_relid = 1;
+    AppendRelInfo    *last_parent = NULL;
+    ListCell         *lcchild;
+    ListCell         *lcparent;
+    bool             full_search = false;
+
+    foreach(lcchild, root->append_rel_list)
+    {
+        AppendRelInfo    *ch_appinfo         = (AppendRelInfo *)lfirst(lcchild);
+        Index             ch_parent_relid = ch_appinfo->parent_relid;
+        
+        if (last_parent_relid != ch_parent_relid)
+        {
+            /*
+             * Find the parent for the current child appinfo if parent relid
+             * is different from that of preveous child.
+             */
+            do
+            {
+                /*
+                 * For most cases, the relations are referred to as the parent
+                 * in their apperarence order in rtable from
+                 * append_rel_list. So start searching for the parent appinfos
+                 * from just after the last parent. If the incremental search
+                 * was failed, retry searching the entire list and exit on
+                 * failure.
+                 */
+                if (!last_parent)
+                {
+                    /*
+                     * If this is the first time or the preveous search was
+                     * failed, set up for full search.
+                     */
+                    lcparent = list_head(root->append_rel_list);
+                    last_parent_relid = 1;
+                    full_search = true;
+                }
+
+                last_parent = NULL;
+                for_each_cell(lcparent, lcparent)
+                {
+                    /*
+                     * Children's and parents' apprelinfos are bonded via
+                     * rtable relations. So two apprelinfos are in
+                     * parent-child relationship when the child_relid of the
+                     * parent is equal to the parent_relid of the child.
+                     */
+                    AppendRelInfo *papp = (AppendRelInfo*)lfirst(lcparent);
+                    if (papp->child_relid == ch_parent_relid)
+                    {
+                        last_parent = papp;
+                        last_parent_relid = ch_parent_relid;
+                        
+                        /* Search from the next cell next time. */
+                        lcparent = lnext(lcparent);
+                        full_search = false;
+                        break;        /* Parent found */
+                    }
+                }
+                /* Retry only when incremental search was failed. */
+            } while (!full_search && !last_parent);
+        }
+
+        /*
+         * Replace child translated_vars so as to be a direct children of the
+         * topmost relation.
+         */
+        if (last_parent)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = ch_appinfo;
+            ctx.target_rti  = last_parent->child_relid;
+
+            ch_appinfo->translated_vars =
+                (List*)expression_tree_mutator(
+                    (Node*)last_parent->translated_vars,
+                    transvars_merge_mutator, &ctx);
+            ch_appinfo->parent_relid = last_parent->parent_relid;
+            ch_appinfo->parent_reltype = last_parent->parent_reltype;
+        }
+    }
+}
+
+/*
+ * Replace Var nodes with corresponding child nodes.
+ */
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            return (Node*)copyObject(list_nth(ctx->child_appinfo->translated_vars,
+                                              oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set. *        If so,
addentries for all the child tables to the query's
 
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 0934e63..7a83938 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -49,6 +49,7 @@ extern Plan *plan_set_operations(PlannerInfo *root, double tuple_fraction,                    List
**sortClauses);externvoid expand_inherited_tables(PlannerInfo *root);
 
+extern void collapse_appendrels(PlannerInfo *root);extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
                    AppendRelInfo *appinfo); 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..3cba2a1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index             target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1210,6 +1217,34 @@ expand_inherited_tables(PlannerInfo *root)    }}
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                    list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            
+            return (Node*)copyObject(
+                list_nth(ctx->child_appinfo->translated_vars,
+                         oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set.
@@ -1237,6 +1272,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)    List
*inhOIDs;   List       *appinfos;    ListCell   *l;
 
+    AppendRelInfo *parent_appinfo = NULL;    /* Does RT entry allow inheritance? */    if (!rte->inh)
@@ -1301,6 +1337,21 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
oldrc->isParent= true;    /*
 
+     * If parent relation is appearing in a subselect of UNION ALL, it has
+     * furthur parent appendrelinfo. Save it to pull up inheritance children
+     * later.
+     */
+    foreach(l, root->append_rel_list)
+    {
+        AppendRelInfo *appinfo = (AppendRelInfo *)lfirst(l);
+        if(appinfo->child_relid == rti)
+        {
+            parent_appinfo = appinfo;
+            break;
+        }
+    }
+    
+    /*     * Must open the parent relation to examine its tupdesc.  We need not lock     * it; we assume the rewriter
alreadydid.     */
 
@@ -1378,6 +1429,28 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)        }        /*
+         * Pull up this appinfo onto just above of the parent. The parent
+         * relation has its own parent when it appears as a subquery of UNION
+         * ALL. Pulling up these children gives a chance to consider
+         * MergeAppend on whole the UNION ALL tree.
+         */
+
+        if (parent_appinfo)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = appinfo;
+            ctx.target_rti      = rti;
+
+            appinfo->parent_relid = parent_appinfo->parent_relid;
+            appinfo->parent_reltype = parent_appinfo->parent_reltype;
+            appinfo->parent_reloid = parent_appinfo->parent_reloid;
+            appinfo->translated_vars = (List*)expression_tree_mutator(
+                parent_appinfo->translated_vars,
+                transvars_merge_mutator, &ctx);
+        }
+
+        /*         * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.         */        if (oldrc)

Re: UNION ALL on partitioned tables won't use indices.

From
Robert Haas
Date:
On Thu, Oct 24, 2013 at 6:39 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello, Sorry that it's been a while..
>
> 1. Observed symptom
>
> As you know, UNION ALL accompanied with ORDER BY uses indexes if
> available.
>
>> uniontest=# EXPLAIN SELECT * FROM c11 UNION ALL SELECT * FROM c12 ORDER BY a;
>>                                QUERY PLAN
>> ---------------------------------------------------------------------------
>>  Merge Append  (cost=0.59..10214.10 rows=200000 width=16)
>>    Sort Key: c11.a
>>    ->  Index Scan using ... on c11  (cost=0.29..3857.04 rows=100000 width=16)
>>    ->  Index Scan using ... on c12  (cost=0.29..3857.04 rows=100000 width=16)
>
> So do simple queries on partitioned (inheritance) tables.
>
>> uniontest=# EXPLAIN SELECT * FROM p1 ORDER BY a;
>>                                 QUERY PLAN
>> ------------------------------------------------------------------------------
>>  Merge Append  (cost=0.73..11392.19 rows=200001 width=16)
>>    Sort Key: p1.a
>>    ->  Index Scan using ... on p1  (cost=0.12..8.14 rows=1 width=44)
>>    ->  Index Scan using ... on c11  (cost=0.29..3857.04 rows=100000 width=16)
>>    ->  Index Scan using ... on c12  (cost=0.29..3857.04 rows=100000 width=16)
>
> Nevertheless, UNION ALL on partitioned tables doesn't. This is
> quite unfavourable behavior especially having LIMIT.
>
>>uniontest=# EXPLAIN ANALYZE SELECT * FROM p1
>>            UNION ALL SELECT * FROM p2 ORDER BY a LIMIT 10;
>>                              QUERY PLAN
>>------------------------------------------------------------------------
>> Limit   (actual time=182.732..182.735 rows=10 loops=1)
>>   ->  Sort  (actual time=182.729..182.730 rows=10 loops=1)
>>     Sort Key: p1.a
>>     Sort Method: top-N heapsort  Memory: 25kB
>>     ->  Append  (actual time=0.029..108.109 rows=400000 loops=1)
>>       ->  Seq Scan on p1  (actual time=0.001..0.001 rows=0 loops=1)
>>       ->  Seq Scan on c11  (actual time=0.027..19.074 rows=100000 loops=1)
>>       ->  Seq Scan on c12  (actual time=0.014..16.653 rows=100000 loops=1)
>>       ->  Seq Scan on p2  (actual time=0.000..0.000 rows=0 loops=1)
>>       ->  Seq Scan on c21  (actual time=0.012..16.677 rows=100000 loops=1)
>>       ->  Seq Scan on c22  (actual time=0.012..16.794 rows=100000 loops=1)
>> Total runtime: 182.857 ms
>
>
> 2. The cause
>
> In grouping_planner, flattern_simple_union_all creates
> appendrelinfos for each subqueries then expand_inherited_tables
> furthur expands the parent tables in each subqueries. Finally,
> this sequence leaves following structure. Where rte[2] and [3]
> are subqueries abandoned on the way pulling up rte[4] and [5].
>
> rte[1] Subquery "SELECT*1", inh = 1
>    +- appinfo[0] - rte[4] Relation "p1", inh = 1
>    |               +- appinfo[2] - rte[6]  Relation "p1", inh = 0
>    |               +- appinfo[3] - rte[7]  Relation "c11", inh = 0
>    |               +- appinfo[4] - rte[8]  Relation "c12", inh = 0
>    +- appinfo[1] - rte[5] Relation "p2", inh = 1
>                    +- appinfo[5] - rte[9]  Relation "p1", inh = 0
>                    +- appinfo[6] - rte[10] Relation "c11", inh = 0
>                    +- appinfo[7] - rte[11] Relation "c12", inh = 0
>
> On the other hand, ec member finally has exprs only for varno =
> 1, 4 and 5, in other words, it lacks the members for the most
> descendant RTEs.  This is because add_child_rel_equivalences()
> always inhibits add_eq_member from registering the child_rel's
> relid on EC. Conequently these grandchild relations does not find
> index pathkeys for themselves.
>
>
> 3. Measures
>
> I could thought up three approaches for the problem.
>
> One is to simplly modifying here to give child flag in the
> parameters of add_eq_member accordig to whether the child_rel is
> inh or not. This seems to work fine although leaves two levels of
> MergeAppends (PATCH-1). So the additional patch is attached to
> collapse these MergeAppends (PATCH-2). This gives the same plan
> as PATCH-3.
>
>> uniontest=# explain analyze select * from p1 union all
>>                             select * from p2 order by a limit 10;
>>                                QUERY PLAN
>> ------------------------------------------------------------------------
>>  Limit  (actual time=0.208..0.224 rows=10 loops=1)
>>    ->  Merge Append  (actual time=0.205..0.218 rows=10 loops=1)
>>      Sort Key: p1.a
>>      ->  Merge Append  (actual time=0.110..0.120 rows=10 loops=1)
>>        Sort Key: p1.a
>>        ->  Index Scan .. on p1  (actual time=0.006..0.006 rows=0 loops=1)
>>        ->  Index Scan .. on c11  (actual time=0.054..0.060 rows=10 loops=1)
>>        ->  Index Scan .. on c12  (actual time=0.046..0.046 rows=1 loops=1)
>>      ->  Merge Append  (actual time=0.093..0.093 rows=1 loops=1)
>>        Sort Key: p2.a
>>        ->  Index Scan .. on p2  (actual time=0.002..0.002 rows=0 loops=1)
>>        ->  Index Scan .. on c21  (actual time=0.047..0.047 rows=1 loops=1)
>>        ->  Index Scan .. on c22  (actual time=0.043..0.043 rows=1 loops=1)
>>  Total runtime: 0.448 ms
>
>
> The second is to collapse the appendrel structure shown above to
> have only single level inheritance. This also seems to work
> fine. This makes the plan with single level
> MergeAppend. (PATCH-3)
>
>> uniontest=# EXPLAIN ANALYZE SELECT * FROM p1 UNION ALL
>>             SELECT * FROM p2 ORDER BY a LIMIT 10;
>>                                   QUERY PLAN
>> -------------------------------------------------------------------
>>  Limit  (actual time=0.095..0.107 rows=10 loops=1)
>>    ->  Merge Append  (actual time=0.092..0.102 rows=10 loops=1)
>>      Sort Key: p1.a
>>      ->  Index Scan ... on p1 (actual time=0.005..0.005 rows=0 loops=1)
>>      ->  Index Scan ... on c11  (actual time=0.023..0.030 rows=10 loops=1)
>>      ->  Index Scan ... on c12  (actual time=0.020..0.020 rows=1 lops=1)
>>      ->  Index Scan ... on p2  (actual time=0.001..0.001 rows=0 loops=1)
>>      ->  Index Scan ... on c21  (actual time=0.019..0.019 rows=1 loops=1)
>>      ->  Index Scan ... on c22  (actual time=0.019..0.019 rows=1 loops=1)
>>  Total runtime: 0.241 ms
>
>
>
> The last one is most strait-forward in some sense, and conversely
> is most ad hoc measure. Modifying expand_inherited_rtentry() to
> pull up adding appendrels if needed, using the similar manner to
> PATCH-3. This is added as PATCH-4.
>
>
> what do you think about this?
>
>
> Four patches following are attached to this message.
>
> 1. union_inh_idx_typ1_v1_20131024.patch     .. PATCH-1 described above.
> 2. union_inh_idx_typ1_add_v1_20131024.patch .. PATCH-2 described above.
> 3. union_inh_idx_typ2_v1_20131024.patch     .. PATCH-3 described above.
> 4. union_inh_idx_typ3_v1_20131024.patch     .. PATCH-4 described above.

Please add your patches to the currently-open CommitFest so that we
don't lose track of them.

https://commitfest.postgresql.org/action/commitfest_view/open

I'm not sure which approach to this problem is best, but I agree that
it is worth solving.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello,

> Please add your patches to the currently-open CommitFest so that we
> don't lose track of them.
> 
> https://commitfest.postgresql.org/action/commitfest_view/open
> 
> I'm not sure which approach to this problem is best, but I agree that
> it is worth solving.

Thank you, I've regsitered this on CF3.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: UNION ALL on partitioned tables won't use indices.

From
Peter Eisentraut
Date:
On 10/29/13, 11:05 PM, Kyotaro HORIGUCHI wrote:
> Hello,
> 
>> Please add your patches to the currently-open CommitFest so that we
>> don't lose track of them.
>>
>> https://commitfest.postgresql.org/action/commitfest_view/open
>>
>> I'm not sure which approach to this problem is best, but I agree that
>> it is worth solving.
> 
> Thank you, I've regsitered this on CF3.

Your patch doesn't apply anymore.  Please rebase it.




Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Umm. I might be working on a bit unstable place..

> Your patch doesn't apply anymore.  Please rebase it.

Thank you. I rebased all patches to current master.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 711b161..734ed47 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1940,6 +1940,7 @@ add_child_rel_equivalences(PlannerInfo *root,                Expr       *child_expr;
 Relids        new_relids;                Relids        new_nullable_relids;
 
+                bool        has_children = false;                child_expr = (Expr *)
adjust_appendrel_attrs(root,
@@ -1969,9 +1970,15 @@ add_child_rel_equivalences(PlannerInfo *root,
     child_rel->relids);                }
 
+                /*
+                 * Does this child_rel have children? If and only if so, tell
+                 * add_eq_member to register new_relids to cur_ec.
+                 */
+                has_children =
+                    root->simple_rte_array[child_rel->relid]->inh;                (void) add_eq_member(cur_ec,
child_expr,                                    new_relids, new_nullable_relids,
 
-                                     true, cur_em->em_datatype);
+                                     !has_children, cur_em->em_datatype);            }        }    }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 64b1705..0e3cf4b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -944,7 +944,8 @@ create_merge_append_path(PlannerInfo *root,    MergeAppendPath *pathnode =
makeNode(MergeAppendPath);   Cost        input_startup_cost;    Cost        input_total_cost;
 
-    ListCell   *l;
+    ListCell   *l, *lm;
+    bool        collapse_subpaths = false;    pathnode->path.pathtype = T_MergeAppend;    pathnode->path.parent =
rel;
@@ -953,6 +954,47 @@ create_merge_append_path(PlannerInfo *root,    pathnode->path.pathkeys = pathkeys;
pathnode->subpaths= subpaths;
 
+
+    /*
+     * If subpaths containes MergeAppendPaths already ordered on the pathkeys
+     * of the creating node, they can be expanded onto this node.
+     */
+    foreach (lm, subpaths)
+    {
+        Path *spath = (Path *) lfirst(lm);
+
+        if (IsA(spath, MergeAppendPath) &&
+            pathkeys_contained_in(pathkeys, spath->pathkeys))
+        {
+            collapse_subpaths = true;
+            break;
+        }
+    }
+
+    if (collapse_subpaths)
+    {
+        subpaths = NIL;
+
+        foreach (lm, pathnode->subpaths)
+        {
+            MergeAppendPath *mpath = (MergeAppendPath *) lfirst(lm);
+            ListCell *lcm;
+            
+            if (IsA(mpath, MergeAppendPath) &&
+                pathkeys_contained_in(pathkeys, mpath->path.pathkeys))
+            {
+                foreach (lcm, mpath->subpaths)
+                {
+                    Path *smpath = (Path*) lfirst (lcm);
+                    subpaths = lappend(subpaths, smpath);
+                }
+            }
+            else
+                subpaths = lappend(subpaths, subpaths);
+        }
+        pathnode->subpaths = subpaths;
+    }
+    /*     * Apply query-wide LIMIT if known and path is for sole base relation.     * (Handling this at this low
levelis a bit klugy.) 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 64b1705..0e3cf4b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -944,7 +944,8 @@ create_merge_append_path(PlannerInfo *root,    MergeAppendPath *pathnode =
makeNode(MergeAppendPath);   Cost        input_startup_cost;    Cost        input_total_cost;
 
-    ListCell   *l;
+    ListCell   *l, *lm;
+    bool        collapse_subpaths = false;    pathnode->path.pathtype = T_MergeAppend;    pathnode->path.parent =
rel;
@@ -953,6 +954,47 @@ create_merge_append_path(PlannerInfo *root,    pathnode->path.pathkeys = pathkeys;
pathnode->subpaths= subpaths;
 
+
+    /*
+     * If subpaths containes MergeAppendPaths already ordered on the pathkeys
+     * of the creating node, they can be expanded onto this node.
+     */
+    foreach (lm, subpaths)
+    {
+        Path *spath = (Path *) lfirst(lm);
+
+        if (IsA(spath, MergeAppendPath) &&
+            pathkeys_contained_in(pathkeys, spath->pathkeys))
+        {
+            collapse_subpaths = true;
+            break;
+        }
+    }
+
+    if (collapse_subpaths)
+    {
+        subpaths = NIL;
+
+        foreach (lm, pathnode->subpaths)
+        {
+            MergeAppendPath *mpath = (MergeAppendPath *) lfirst(lm);
+            ListCell *lcm;
+            
+            if (IsA(mpath, MergeAppendPath) &&
+                pathkeys_contained_in(pathkeys, mpath->path.pathkeys))
+            {
+                foreach (lcm, mpath->subpaths)
+                {
+                    Path *smpath = (Path*) lfirst (lcm);
+                    subpaths = lappend(subpaths, smpath);
+                }
+            }
+            else
+                subpaths = lappend(subpaths, subpaths);
+        }
+        pathnode->subpaths = subpaths;
+    }
+    /*     * Apply query-wide LIMIT if known and path is for sole base relation.     * (Handling this at this low
levelis a bit klugy.) 
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d8aa35d..8167583 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -404,6 +404,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse,    expand_inherited_tables(root);    /*
+     * Collapse multilevel inheritances into fewer levels.
+     *
+     * UNION ALL containing subselects on inherited tables leaves multilevel
+     * inheritance after calling expand_inherited_tables().  Fold them in
+     * order to make MergeAppend plan available for such queries.
+     */
+    collapse_appendrels(root);
+
+    /*     * Set hasHavingQual to remember if HAVING clause is present.  Needed     * because preprocess_expression
willreduce a constant-true condition to     * an empty qual list ... but "HAVING TRUE" is not a semantic no-op.
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..c7933b5 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index              target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1211,6 +1218,131 @@ expand_inherited_tables(PlannerInfo *root)}/*
+ * collapse_appendrels
+ *      Pulling up the descendents by recursively combining successive
+ *      translations.
+ *
+ *      Although the same thing could be done in adjust_appendrel_attrs(),
+ *      doing it here all through at once is more efficient than individually
+ *      (and maybe repetitively) done there.
+ */
+void
+collapse_appendrels(PlannerInfo *root)
+{
+    Index               last_parent_relid = 1;
+    AppendRelInfo    *last_parent = NULL;
+    ListCell         *lcchild;
+    ListCell         *lcparent;
+    bool             full_search = false;
+
+    foreach(lcchild, root->append_rel_list)
+    {
+        AppendRelInfo    *ch_appinfo         = (AppendRelInfo *)lfirst(lcchild);
+        Index             ch_parent_relid = ch_appinfo->parent_relid;
+        
+        if (last_parent_relid != ch_parent_relid)
+        {
+            /*
+             * Find the parent for the current child appinfo if parent relid
+             * is different from that of preveous child.
+             */
+            do
+            {
+                /*
+                 * For most cases, the relations are referred to as the parent
+                 * in their apperarence order in rtable from
+                 * append_rel_list. So start searching for the parent appinfos
+                 * from just after the last parent. If the incremental search
+                 * was failed, retry searching the entire list and exit on
+                 * failure.
+                 */
+                if (!last_parent)
+                {
+                    /*
+                     * If this is the first time or the preveous search was
+                     * failed, set up for full search.
+                     */
+                    lcparent = list_head(root->append_rel_list);
+                    last_parent_relid = 1;
+                    full_search = true;
+                }
+
+                last_parent = NULL;
+                for_each_cell(lcparent, lcparent)
+                {
+                    /*
+                     * Children's and parents' apprelinfos are bonded via
+                     * rtable relations. So two apprelinfos are in
+                     * parent-child relationship when the child_relid of the
+                     * parent is equal to the parent_relid of the child.
+                     */
+                    AppendRelInfo *papp = (AppendRelInfo*)lfirst(lcparent);
+                    if (papp->child_relid == ch_parent_relid)
+                    {
+                        last_parent = papp;
+                        last_parent_relid = ch_parent_relid;
+                        
+                        /* Search from the next cell next time. */
+                        lcparent = lnext(lcparent);
+                        full_search = false;
+                        break;        /* Parent found */
+                    }
+                }
+                /* Retry only when incremental search was failed. */
+            } while (!full_search && !last_parent);
+        }
+
+        /*
+         * Replace child translated_vars so as to be a direct children of the
+         * topmost relation.
+         */
+        if (last_parent)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = ch_appinfo;
+            ctx.target_rti  = last_parent->child_relid;
+
+            ch_appinfo->translated_vars =
+                (List*)expression_tree_mutator(
+                    (Node*)last_parent->translated_vars,
+                    transvars_merge_mutator, &ctx);
+            ch_appinfo->parent_relid = last_parent->parent_relid;
+            ch_appinfo->parent_reltype = last_parent->parent_reltype;
+        }
+    }
+}
+
+/*
+ * Replace Var nodes with corresponding child nodes.
+ */
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            return (Node*)copyObject(list_nth(ctx->child_appinfo->translated_vars,
+                                              oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set. *        If so,
addentries for all the child tables to the query's
 
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 0934e63..7a83938 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -49,6 +49,7 @@ extern Plan *plan_set_operations(PlannerInfo *root, double tuple_fraction,                    List
**sortClauses);externvoid expand_inherited_tables(PlannerInfo *root);
 
+extern void collapse_appendrels(PlannerInfo *root);extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
                    AppendRelInfo *appinfo); 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..3cba2a1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index             target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1210,6 +1217,34 @@ expand_inherited_tables(PlannerInfo *root)    }}
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                    list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            
+            return (Node*)copyObject(
+                list_nth(ctx->child_appinfo->translated_vars,
+                         oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set.
@@ -1237,6 +1272,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)    List
*inhOIDs;   List       *appinfos;    ListCell   *l;
 
+    AppendRelInfo *parent_appinfo = NULL;    /* Does RT entry allow inheritance? */    if (!rte->inh)
@@ -1301,6 +1337,21 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
oldrc->isParent= true;    /*
 
+     * If parent relation is appearing in a subselect of UNION ALL, it has
+     * furthur parent appendrelinfo. Save it to pull up inheritance children
+     * later.
+     */
+    foreach(l, root->append_rel_list)
+    {
+        AppendRelInfo *appinfo = (AppendRelInfo *)lfirst(l);
+        if(appinfo->child_relid == rti)
+        {
+            parent_appinfo = appinfo;
+            break;
+        }
+    }
+    
+    /*     * Must open the parent relation to examine its tupdesc.  We need not lock     * it; we assume the rewriter
alreadydid.     */
 
@@ -1378,6 +1429,28 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)        }        /*
+         * Pull up this appinfo onto just above of the parent. The parent
+         * relation has its own parent when it appears as a subquery of UNION
+         * ALL. Pulling up these children gives a chance to consider
+         * MergeAppend on whole the UNION ALL tree.
+         */
+
+        if (parent_appinfo)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = appinfo;
+            ctx.target_rti      = rti;
+
+            appinfo->parent_relid = parent_appinfo->parent_relid;
+            appinfo->parent_reltype = parent_appinfo->parent_reltype;
+            appinfo->parent_reloid = parent_appinfo->parent_reloid;
+            appinfo->translated_vars = (List*)expression_tree_mutator(
+                parent_appinfo->translated_vars,
+                transvars_merge_mutator, &ctx);
+        }
+
+        /*         * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.         */        if (oldrc)

Re: UNION ALL on partitioned tables won't use indices.

From
Noah Misch
Date:
On Thu, Oct 24, 2013 at 07:39:53PM +0900, Kyotaro HORIGUCHI wrote:
> 1. Observed symptom
> 
> As you know, UNION ALL accompanied with ORDER BY uses indexes if
> available.

> So do simple queries on partitioned (inheritance) tables.

> Nevertheless, UNION ALL on partitioned tables doesn't. This is
> quite unfavourable behavior especially having LIMIT.
> 
> >uniontest=# EXPLAIN ANALYZE SELECT * FROM p1
> >            UNION ALL SELECT * FROM p2 ORDER BY a LIMIT 10;
> >                              QUERY PLAN                       
> >------------------------------------------------------------------------
> > Limit   (actual time=182.732..182.735 rows=10 loops=1)
> >   ->  Sort  (actual time=182.729..182.730 rows=10 loops=1)
> >     Sort Key: p1.a
> >     Sort Method: top-N heapsort  Memory: 25kB
> >     ->  Append  (actual time=0.029..108.109 rows=400000 loops=1)
> >       ->  Seq Scan on p1  (actual time=0.001..0.001 rows=0 loops=1)
> >       ->  Seq Scan on c11  (actual time=0.027..19.074 rows=100000 loops=1)
> >       ->  Seq Scan on c12  (actual time=0.014..16.653 rows=100000 loops=1)
> >       ->  Seq Scan on p2  (actual time=0.000..0.000 rows=0 loops=1)
> >       ->  Seq Scan on c21  (actual time=0.012..16.677 rows=100000 loops=1)
> >       ->  Seq Scan on c22  (actual time=0.012..16.794 rows=100000 loops=1)
> > Total runtime: 182.857 ms

That is indeed surprising.

> 2. The cause
> 
> In grouping_planner, flattern_simple_union_all creates
> appendrelinfos for each subqueries then expand_inherited_tables
> furthur expands the parent tables in each subqueries. Finally,
> this sequence leaves following structure. Where rte[2] and [3]
> are subqueries abandoned on the way pulling up rte[4] and [5].
> 
> rte[1] Subquery "SELECT*1", inh = 1
>    +- appinfo[0] - rte[4] Relation "p1", inh = 1
>    |               +- appinfo[2] - rte[6]  Relation "p1", inh = 0
>    |               +- appinfo[3] - rte[7]  Relation "c11", inh = 0
>    |               +- appinfo[4] - rte[8]  Relation "c12", inh = 0
>    +- appinfo[1] - rte[5] Relation "p2", inh = 1
>                    +- appinfo[5] - rte[9]  Relation "p1", inh = 0
>                    +- appinfo[6] - rte[10] Relation "c11", inh = 0
>                    +- appinfo[7] - rte[11] Relation "c12", inh = 0
> 
> On the other hand, ec member finally has exprs only for varno =
> 1, 4 and 5, in other words, it lacks the members for the most
> descendant RTEs.  This is because add_child_rel_equivalences()
> always inhibits add_eq_member from registering the child_rel's
> relid on EC. Conequently these grandchild relations does not find
> index pathkeys for themselves.

I'm unclear on the key ideas behind distinguishing em_is_child members from
ordinary EC members.  src/backend/optimizer/README says "These members are
*not* full-fledged members of the EquivalenceClass and do not affect the
class's overall properties at all."  Is that an optimization to avoid futile
planner work, or is it necessary for correctness?  If the latter, what sort of
query might give wrong answers if an EquivalenceMember were incorrectly added
as em_is_child = false?

> 3. Measures
> 
> I could thought up three approaches for the problem.

Thanks for writing up multiple options.  My tentative preference is for (3),
then (2), then (1):

> One is to simplly modifying here to give child flag in the
> parameters of add_eq_member accordig to whether the child_rel is
> inh or not. This seems to work fine although leaves two levels of
> MergeAppends (PATCH-1). So the additional patch is attached to
> collapse these MergeAppends (PATCH-2). This gives the same plan
> as PATCH-3.

Revising the append graph this late in planning (create_merge_append_path())
seems like the wrong location.  Changing add_child_rel_equivalences() in this
way adds further subtlety to the meaning of em_is_child, and I haven't
comprehended the full implications.  Neither of those objections are fatal,
but let's focus on the other two approaches, which avoid those objections.

> The second is to collapse the appendrel structure shown above to
> have only single level inheritance. This also seems to work
> fine. This makes the plan with single level
> MergeAppend. (PATCH-3)

> The last one is most strait-forward in some sense, and conversely
> is most ad hoc measure. Modifying expand_inherited_rtentry() to
> pull up adding appendrels if needed, using the similar manner to
> PATCH-3. This is added as PATCH-4.

The choice between (2) and (3) should depend on the extent of ways in which we
can end up with a nested AppendRelInfo graph.  If expand_inherited_rtentry()
is the only place that can introduce nesting, modifying it to stop doing that
makes sense.  Otherwise, adding a generic collapse_appendrels() step might
help more situations.  expand_inherited_rtentry() is the only culprit I can
identify, so I lean toward approach-3/PATCH-4.  For a data point corroborating
that hypothesis, "make check" issues no queries that trigger this warning:

--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1924,6 +1924,9 @@ add_child_rel_equivalences(PlannerInfo *root,{    ListCell   *lc1;
+    if (root->simple_rte_array[child_rel->relid]->inh)
+        elog(WARNING, "unexpected nested append");
+    foreach(lc1, root->eq_classes)    {        EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);


On Wed, Nov 13, 2013 at 05:38:11PM +0900, Kyotaro HORIGUCHI wrote:

[Approach (3): union_inh_idx_typ3_v2_20131113.patch]

> @@ -1210,6 +1217,34 @@ expand_inherited_tables(PlannerInfo *root)
>      }
>  }
>  
> +static Node *
> +transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
> +{
> +    if (node == NULL)
> +        return NULL;
> +
> +    if (IsA(node, Var))
> +    {
> +        Var *oldv = (Var*)node;
> +
> +        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
> +        {
> +            if (oldv->varattno >
> +                    list_length(ctx->child_appinfo->translated_vars))
> +                elog(ERROR,
> +                     "attribute %d of relation \"%s\" does not exist",
> +                     oldv->varattno,
> +                     get_rel_name(ctx->child_appinfo->parent_reloid));
> +            
> +            return (Node*)copyObject(
> +                list_nth(ctx->child_appinfo->translated_vars,
> +                         oldv->varattno - 1));
> +        }
> +    }
> +    return expression_tree_mutator(node,
> +                                   transvars_merge_mutator, (void*)ctx);
> +}

This has roughly the same purpose as adjust_appendrel_attrs_mutator(), but it
propagates the change to far fewer node types.  Why can it be so much simpler?
The translated_vars of parent_appinfo may bear mostly-arbitrary expressions.

> +            appinfo->translated_vars = (List*)expression_tree_mutator(
> +                parent_appinfo->translated_vars,
> +                transvars_merge_mutator, &ctx);

I get this warning:

prepunion.c: In function `expand_inherited_rtentry':
prepunion.c:1450: warning: passing argument 1 of `expression_tree_mutator' from incompatible pointer type


I filled out your test case as follows to reproduce your results:

create table p1 (a int, b int);
create table c11 () inherits (p1);
create table c12 () inherits (p1);
create table p2 (a int, b int);
create table c21 () inherits (p2);
create table c22 () inherits (p2);
alter table p1 add primary key (a);
alter table c11 add primary key (a);
alter table c12 add primary key (a);
alter table p2 add primary key (a);
alter table c21 add primary key (a);
alter table c22 add primary key (a);
EXPLAIN ANALYZE SELECT * FROM p1 UNION ALL SELECT * FROM p2 ORDER BY a LIMIT 10;

However, adding a qual to one of the inheritance queries once again defeated
MergeAppend with the patches for approaches (2) and (3).  The approach-1 patch
gives a segfault.  Test queries:

-- uses MergeAppend for p1
EXPLAIN ANALYZE SELECT * FROM p1 WHERE b <> 0 ORDER BY a;
-- no MergeAppend for p1
EXPLAIN ANALYZE SELECT * FROM p1 WHERE b <> 0 UNION ALL SELECT * FROM p2 ORDER BY a;

I did not study this further to identify the cause.  However, I am suspicious
that we're missing much of the work in set_append_rel_size(), notably the
transfer of baserestrictinfo to the child.  set_append_rel_size() identifies
the need to do that work based on AppendRelId parent_relid connectivity, but
we can now flatten that graph in an earlier step.


Approaches (2) and (3) leave the inheritance parent with rte->inh == true
despite no AppendRelInfo pointing to it as their parent.  Until now,
expand_inherited_rtentry() has been careful to clear rte->inh in such cases.
My gut feeling is that we should retain that property; what do you think?

Finally, the patch should add test cases.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: UNION ALL on partitioned tables won't use indices.

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> I'm unclear on the key ideas behind distinguishing em_is_child members from
> ordinary EC members.  src/backend/optimizer/README says "These members are
> *not* full-fledged members of the EquivalenceClass and do not affect the
> class's overall properties at all."  Is that an optimization to avoid futile
> planner work, or is it necessary for correctness?  If the latter, what sort of
> query might give wrong answers if an EquivalenceMember were incorrectly added
> as em_is_child = false?

See commit dd4134ea, which added that text.  IIRC, the key point is that
among "real" EC members, there will never be duplicates: a Var "x.y"
for instance cannot appear in two different ECs, because we'd have merged
the two ECs into one instead.  However, this property does *not* hold for
child EC members.  The problem is that two completely unrelated UNION ALL
child subselects might have, say, constant "1" in their tlists.  This
should not lead to concluding that we can merge ECs that mention their
UNION ALL parent variables.

I've not looked at these patches yet, but if they're touching equivclass.c
at all, I'd guess that it's wrong.  I think what we want is for appendrel
formation to take care of flattening nested appendrels.  The core of the
problem is that pull_up_simple_union_all handles that for nested
UNION ALLs, and expand_inherited_rtentry sees to it by letting
find_all_inheritors do the recursion, but the two cases don't know about
each other.

My gut feeling after two minutes' thought is that the best fix is for
expand_inherited_rtentry to notice when the parent table is already an
appendrel member, and enlarge that appendrel instead of making a new one.
(This depends on the assumption that pull_up_simple_union_all always
happens first, but that seems OK.)  I'm not sure if that concept exactly
corresponds to either PATCH-3 or PATCH-4, but that's the way I'd think
about it.

> However, adding a qual to one of the inheritance queries once again defeated
> MergeAppend with the patches for approaches (2) and (3).

That's an independent limitation, see is_safe_append_member:
    * Also, the child can't have any WHERE quals because there's no place to    * put them in an appendrel.  (This is a
bitannoying...)
 

It'd be nice to fix that, but it's not going to be easy, and it should
be a separate patch IMO.  It's pretty much unrelated to the question at
hand here.

> Approaches (2) and (3) leave the inheritance parent with rte->inh == true
> despite no AppendRelInfo pointing to it as their parent.  Until now,
> expand_inherited_rtentry() has been careful to clear rte->inh in such cases.
> My gut feeling is that we should retain that property; what do you think?

Yes.  IIRC, failing to clear rte->inh doesn't actually break anything,
but it will lead to wasted work later in the planner.

> Finally, the patch should add test cases.

Agreed, at least one example showing that flattening does happen.
        regards, tom lane



Re: UNION ALL on partitioned tables won't use indices.

From
Noah Misch
Date:
On Sat, Nov 23, 2013 at 01:35:32PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > I'm unclear on the key ideas behind distinguishing em_is_child members from
> > ordinary EC members.  src/backend/optimizer/README says "These members are
> > *not* full-fledged members of the EquivalenceClass and do not affect the
> > class's overall properties at all."  Is that an optimization to avoid futile
> > planner work, or is it necessary for correctness?  If the latter, what sort of
> > query might give wrong answers if an EquivalenceMember were incorrectly added
> > as em_is_child = false?
> 
> See commit dd4134ea, which added that text.  IIRC, the key point is that
> among "real" EC members, there will never be duplicates: a Var "x.y"
> for instance cannot appear in two different ECs, because we'd have merged
> the two ECs into one instead.  However, this property does *not* hold for
> child EC members.  The problem is that two completely unrelated UNION ALL
> child subselects might have, say, constant "1" in their tlists.  This
> should not lead to concluding that we can merge ECs that mention their
> UNION ALL parent variables.

Thanks for the pointer; I found things clearer after reviewing the ECs from
the test case queries from commits dd4134ea and 57664ed2.

> I've not looked at these patches yet, but if they're touching equivclass.c
> at all, I'd guess that it's wrong.

Only PATCH-1 touched equivclass.c.

> My gut feeling after two minutes' thought is that the best fix is for
> expand_inherited_rtentry to notice when the parent table is already an
> appendrel member, and enlarge that appendrel instead of making a new one.
> (This depends on the assumption that pull_up_simple_union_all always
> happens first, but that seems OK.)  I'm not sure if that concept exactly
> corresponds to either PATCH-3 or PATCH-4, but that's the way I'd think
> about it.

PATCH-4 does approximately that.  I agree that's the right general direction.

> > However, adding a qual to one of the inheritance queries once again defeated
> > MergeAppend with the patches for approaches (2) and (3).
> 
> That's an independent limitation, see is_safe_append_member:
> 
>      * Also, the child can't have any WHERE quals because there's no place to
>      * put them in an appendrel.  (This is a bit annoying...)
> 
> It'd be nice to fix that, but it's not going to be easy, and it should
> be a separate patch IMO.  It's pretty much unrelated to the question at
> hand here.

After further study, I agree.  It would still be good to understand why that
test case crashed PATCH-1, then ensure that the other patches don't have a
similar lurking bug.

An alternative to extending our ability to pull up UNION ALL subqueries,
having perhaps broader applicability, would be to push down the
possibly-useful sort order to subqueries we can't pull up.  We'd sometimes
have two levels of MergeAppend, but that could still handily beat the
explicit-sort plan.  In any case, it is indeed a separate topic.  For the sake
of the archives, you can get such plans today by manually adding the ORDER BY
to the relevant UNION ALL branch:

EXPLAIN (ANALYZE) (SELECT oid, * FROM pg_proc WHERE protransform = 0 ORDER BY oid) UNION ALL SELECT oid, * FROM pg_proc
ORDERBY 1 LIMIT 5;                                                                      QUERY PLAN
                                                
 

--------------------------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=0.57..1.31 rows=5 width=544) (actual time=0.102..0.155 rows=5 loops=1)  ->  Merge Append  (cost=0.57..748.25
rows=5084width=544) (actual time=0.095..0.130 rows=5 loops=1)        Sort Key: pg_proc_1.oid        ->  Index Scan
usingpg_proc_oid_index on pg_proc pg_proc_1  (cost=0.28..332.83 rows=2538 width=544) (actual time=0.058..0.069 rows=3
loops=1)             Filter: ((protransform)::oid = 0::oid)        ->  Index Scan using pg_proc_oid_index on pg_proc
(cost=0.28..326.47rows=2546 width=544) (actual time=0.029..0.036 rows=3 loops=1)
 

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Thank you.

I'll soon come up with regress for the patch 3, 4.

=====
> > See commit dd4134ea, which added that text.  IIRC, the key point is that
> > among "real" EC members, there will never be duplicates: a Var "x.y"
> > for instance cannot appear in two different ECs, because we'd have merged
> > the two ECs into one instead.  However, this property does *not* hold for
> > child EC members.  The problem is that two completely unrelated UNION ALL
> > child subselects might have, say, constant "1" in their tlists.  This
> > should not lead to concluding that we can merge ECs that mention their
> > UNION ALL parent variables.
> Thanks for the pointer; I found things clearer after reviewing the ECs from
> the test case queries from commits dd4134ea and 57664ed2.

Is it correct that ec-members are parted into two groups one of
which is 'full-fledged' and never duplicate among different ECs
so as to be useful for forming pathkeys (parent) and another is
immatured and might be duplicate which could not form
pathkeys. Thus the new-in-patch-1 third group which is not
duplicate (because they should be always be Var "x.y" (for this
case)) but immatured (child) currently can not be adequately
classified?
My PATCH-1 which make them in the third group forcibly
classified as 'parent' instead of 'child' as before therefore
makes 'broken' ec member?

> > I've not looked at these patches yet, but if they're touching equivclass.c
> > at all, I'd guess that it's wrong.
> 
> Only PATCH-1 touched equivclass.c.

Surely.

> > My gut feeling after two minutes' thought is that the best fix is for
> > expand_inherited_rtentry to notice when the parent table is already an
> > appendrel member, and enlarge that appendrel instead of making a new one.
> > (This depends on the assumption that pull_up_simple_union_all always
> > happens first, but that seems OK.)  I'm not sure if that concept exactly
> > corresponds to either PATCH-3 or PATCH-4, but that's the way I'd think
> > about it.
> 
> PATCH-4 does approximately that.  I agree that's the right
> general direction.

Ok, I'll provide regress for the patches 3 and 4.

> An alternative to extending our ability to pull up UNION ALL subqueries,
> having perhaps broader applicability, would be to push down the
> possibly-useful sort order to subqueries we can't pull up.  We'd sometimes
> have two levels of MergeAppend, but that could still handily beat the
> explicit-sort plan.  In any case, it is indeed a separate topic.  For the sake
> of the archives, you can get such plans today by manually adding the ORDER BY
> to the relevant UNION ALL branch:

Yes, but this seems could be done in path-generation phase or
earlier (that's mere a smattering).

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello, 

> I'll soon come up with regress for the patch 3, 4.

New version patches added with regression tests are attached.

I'll show you pulling-up-in-expand_inherited_rtentry version
later.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6670794..5d1cf83 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -404,6 +404,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse,    expand_inherited_tables(root);    /*
+     * Collapse multilevel inheritances into fewer levels.
+     *
+     * UNION ALL containing subselects on inherited tables leaves multilevel
+     * inheritance after calling expand_inherited_tables().  Fold them in
+     * order to make MergeAppend plan available for such queries.
+     */
+    collapse_appendrels(root);
+
+    /*     * Set hasHavingQual to remember if HAVING clause is present.  Needed     * because preprocess_expression
willreduce a constant-true condition to     * an empty qual list ... but "HAVING TRUE" is not a semantic no-op.
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..c7933b5 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index              target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1211,6 +1218,131 @@ expand_inherited_tables(PlannerInfo *root)}/*
+ * collapse_appendrels
+ *      Pulling up the descendents by recursively combining successive
+ *      translations.
+ *
+ *      Although the same thing could be done in adjust_appendrel_attrs(),
+ *      doing it here all through at once is more efficient than individually
+ *      (and maybe repetitively) done there.
+ */
+void
+collapse_appendrels(PlannerInfo *root)
+{
+    Index               last_parent_relid = 1;
+    AppendRelInfo    *last_parent = NULL;
+    ListCell         *lcchild;
+    ListCell         *lcparent;
+    bool             full_search = false;
+
+    foreach(lcchild, root->append_rel_list)
+    {
+        AppendRelInfo    *ch_appinfo         = (AppendRelInfo *)lfirst(lcchild);
+        Index             ch_parent_relid = ch_appinfo->parent_relid;
+        
+        if (last_parent_relid != ch_parent_relid)
+        {
+            /*
+             * Find the parent for the current child appinfo if parent relid
+             * is different from that of preveous child.
+             */
+            do
+            {
+                /*
+                 * For most cases, the relations are referred to as the parent
+                 * in their apperarence order in rtable from
+                 * append_rel_list. So start searching for the parent appinfos
+                 * from just after the last parent. If the incremental search
+                 * was failed, retry searching the entire list and exit on
+                 * failure.
+                 */
+                if (!last_parent)
+                {
+                    /*
+                     * If this is the first time or the preveous search was
+                     * failed, set up for full search.
+                     */
+                    lcparent = list_head(root->append_rel_list);
+                    last_parent_relid = 1;
+                    full_search = true;
+                }
+
+                last_parent = NULL;
+                for_each_cell(lcparent, lcparent)
+                {
+                    /*
+                     * Children's and parents' apprelinfos are bonded via
+                     * rtable relations. So two apprelinfos are in
+                     * parent-child relationship when the child_relid of the
+                     * parent is equal to the parent_relid of the child.
+                     */
+                    AppendRelInfo *papp = (AppendRelInfo*)lfirst(lcparent);
+                    if (papp->child_relid == ch_parent_relid)
+                    {
+                        last_parent = papp;
+                        last_parent_relid = ch_parent_relid;
+                        
+                        /* Search from the next cell next time. */
+                        lcparent = lnext(lcparent);
+                        full_search = false;
+                        break;        /* Parent found */
+                    }
+                }
+                /* Retry only when incremental search was failed. */
+            } while (!full_search && !last_parent);
+        }
+
+        /*
+         * Replace child translated_vars so as to be a direct children of the
+         * topmost relation.
+         */
+        if (last_parent)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = ch_appinfo;
+            ctx.target_rti  = last_parent->child_relid;
+
+            ch_appinfo->translated_vars =
+                (List*)expression_tree_mutator(
+                    (Node*)last_parent->translated_vars,
+                    transvars_merge_mutator, &ctx);
+            ch_appinfo->parent_relid = last_parent->parent_relid;
+            ch_appinfo->parent_reltype = last_parent->parent_reltype;
+        }
+    }
+}
+
+/*
+ * Replace Var nodes with corresponding child nodes.
+ */
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            return (Node*)copyObject(list_nth(ctx->child_appinfo->translated_vars,
+                                              oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set. *        If so,
addentries for all the child tables to the query's
 
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 0934e63..7a83938 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -49,6 +49,7 @@ extern Plan *plan_set_operations(PlannerInfo *root, double tuple_fraction,                    List
**sortClauses);externvoid expand_inherited_tables(PlannerInfo *root);
 
+extern void collapse_appendrels(PlannerInfo *root);extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
                    AppendRelInfo *appinfo);
 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index ae690cf..8b9f9e2 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -501,9 +501,40 @@ explain (costs off)               Index Cond: (ab = 'ab'::text)(6 rows)
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+NOTICE:  merging column "a" with inherited definition
+NOTICE:  merging column "b" with inherited definition
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+NOTICE:  merging column "ab" with inherited definition
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+                    QUERY PLAN                    
+--------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: ((t1.a || t1.b))
+         ->  Index Scan using t1_ab_idx on t1
+         ->  Index Scan using t1c_expr_idx on t1c
+         ->  Index Scan using t2_pkey on t2
+         ->  Index Scan using t2c_pkey on t2c
+(7 rows)
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off) SELECT * FROM
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d567cf1..7f9a9b1 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -198,9 +198,29 @@ explain (costs off)  SELECT * FROM t2) t WHERE ab = 'ab';
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off)
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..3cba2a1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index             target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1210,6 +1217,34 @@ expand_inherited_tables(PlannerInfo *root)    }}
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                    list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            
+            return (Node*)copyObject(
+                list_nth(ctx->child_appinfo->translated_vars,
+                         oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set.
@@ -1237,6 +1272,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)    List
*inhOIDs;   List       *appinfos;    ListCell   *l;
 
+    AppendRelInfo *parent_appinfo = NULL;    /* Does RT entry allow inheritance? */    if (!rte->inh)
@@ -1301,6 +1337,21 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
oldrc->isParent= true;    /*
 
+     * If parent relation is appearing in a subselect of UNION ALL, it has
+     * furthur parent appendrelinfo. Save it to pull up inheritance children
+     * later.
+     */
+    foreach(l, root->append_rel_list)
+    {
+        AppendRelInfo *appinfo = (AppendRelInfo *)lfirst(l);
+        if(appinfo->child_relid == rti)
+        {
+            parent_appinfo = appinfo;
+            break;
+        }
+    }
+    
+    /*     * Must open the parent relation to examine its tupdesc.  We need not lock     * it; we assume the rewriter
alreadydid.     */
 
@@ -1378,6 +1429,28 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)        }        /*
+         * Pull up this appinfo onto just above of the parent. The parent
+         * relation has its own parent when it appears as a subquery of UNION
+         * ALL. Pulling up these children gives a chance to consider
+         * MergeAppend on whole the UNION ALL tree.
+         */
+
+        if (parent_appinfo)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = appinfo;
+            ctx.target_rti      = rti;
+
+            appinfo->parent_relid = parent_appinfo->parent_relid;
+            appinfo->parent_reltype = parent_appinfo->parent_reltype;
+            appinfo->parent_reloid = parent_appinfo->parent_reloid;
+            appinfo->translated_vars = (List*)expression_tree_mutator(
+                parent_appinfo->translated_vars,
+                transvars_merge_mutator, &ctx);
+        }
+
+        /*         * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.         */        if (oldrc)
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index ae690cf..8b9f9e2 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -501,9 +501,40 @@ explain (costs off)               Index Cond: (ab = 'ab'::text)(6 rows)
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+NOTICE:  merging column "a" with inherited definition
+NOTICE:  merging column "b" with inherited definition
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+NOTICE:  merging column "ab" with inherited definition
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+                    QUERY PLAN                    
+--------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: ((t1.a || t1.b))
+         ->  Index Scan using t1_ab_idx on t1
+         ->  Index Scan using t1c_expr_idx on t1c
+         ->  Index Scan using t2_pkey on t2
+         ->  Index Scan using t2c_pkey on t2c
+(7 rows)
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off) SELECT * FROM
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d567cf1..7f9a9b1 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -198,9 +198,29 @@ explain (costs off)  SELECT * FROM t2) t WHERE ab = 'ab';
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off)

Re: UNION ALL on partitioned tables won't use indices.

From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
>  My PATCH-1 which make them in the third group forcibly
> classified as 'parent' instead of 'child' as before therefore
> makes 'broken' ec member?

If you marked child entries as non-child, then yes, it's broken.

The definition of a regular EC member is that it's always equal to any
other regular member of that EC, in any row that's had all available WHERE
filter constraints applied to it.  This cannot be true for child members
because those are only valid in a subset of rows.  Consider

SELECT * FROM t1, (SELECT y FROM t2 UNION ALL SELECT 0 AS y FROM t3) ss
WHERE t1.x = ss.y

We will put t1.x and ss.y into an EC, where ss.y is a variable of the
appendrel representing the union.  Then we'll put t2.y and constant 0
into the EC as child members.  It would be incorrect to deduce "t1.x = 0"
from this --- while that *is* a valid constraint for join rows arising
from rows out of t3, it's incorrect for rows arising from rows out of t2,
so it's not universally valid.

Basically the point here is that deductions involving child EC members
are one-way.  You can use them to generate constraints to apply to a
scan of the associated appendrel member, but not to generate constraints
that would apply to other relations in the query.

In a situation where you're considering "grandchild" elements, both those
and the original children have to be marked as children, because all of
them represent only subsets of rows of the appendrel.
        regards, tom lane



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
This is cont'd from previous CF3.

You'll see the overview and the discussion since in the thread
begins from there. The issue ramains as of current 9.4dev head.

http://www.postgresql.org/message-id/20131024.193953.233464126.horiguchi.kyotaro@lab.ntt.co.jp

The issue in brief is that UNION ALL on inheritance tables does
not make use of indices in contrast to that on bare tables. This
is because of appendrels with the depth more than 2 levels
brought by inheritance tables.

Some of UNION ALL operation - especially using ORDERBY and LIMIT
- will be accelerated by this patch. Details in the message
above.

I proposed 3 types of solution but the one of them - tweaking
Equivalence Class (Type 1)- was dismissed because of
inappropriate manipulation on Equivalence Class. The rest do
essentially the same thing - flattening appendrels - at different
timings.

In type 2, the process is implemented as a generic appendrel
flattening function and applied after expand_inherited_tables()
in subquery_planner.

In type 3, the equivelant process is combined in
expand_inherited_rtentry().

Type 2 loops once for whole appendrel list (in most cases) so it
might be more effective than type 3 which searches the parent
appendrel for each appended ones. Type 3 is more approprieate
design assuming that appendrel tree deeper than 2 levels will be
generated only by expand_inherited_tables().

The attached two patches are rebased to current 9.4dev HEAD and
make check at the topmost directory and src/test/isolation are
passed without error. One bug was found and fixed on the way. It
was an assertion failure caused by probably unexpected type
conversion introduced by collapse_appendrels() which leads
implicit whole-row cast from some valid reltype to invalid
reltype (0) into adjust_appendrel_attrs_mutator().

Attached files are the two versions of patches mentioned above.
- unionall_inh_idx_typ2_v4_20140114.patch- unionall_inh_idx_typ3_v4_20140114.patch

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..582e8c3 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index             target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1210,6 +1217,34 @@ expand_inherited_tables(PlannerInfo *root)    }}
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                    list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            
+            return (Node*)copyObject(
+                list_nth(ctx->child_appinfo->translated_vars,
+                         oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set.
@@ -1237,6 +1272,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)    List
*inhOIDs;   List       *appinfos;    ListCell   *l;
 
+    AppendRelInfo *parent_appinfo = NULL;    /* Does RT entry allow inheritance? */    if (!rte->inh)
@@ -1301,6 +1337,21 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
oldrc->isParent= true;    /*
 
+     * If parent relation is appearing in a subselect of UNION ALL, it has
+     * furthur parent appendrelinfo. Save it to pull up inheritance children
+     * later.
+     */
+    foreach(l, root->append_rel_list)
+    {
+        AppendRelInfo *appinfo = (AppendRelInfo *)lfirst(l);
+        if(appinfo->child_relid == rti)
+        {
+            parent_appinfo = appinfo;
+            break;
+        }
+    }
+    
+    /*     * Must open the parent relation to examine its tupdesc.  We need not lock     * it; we assume the rewriter
alreadydid.     */
 
@@ -1378,6 +1429,28 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)        }        /*
+         * Pull up this appinfo onto just above of the parent. The parent
+         * relation has its own parent when it appears as a subquery of UNION
+         * ALL. Pulling up these children gives a chance to consider
+         * MergeAppend on whole the UNION ALL tree.
+         */
+
+        if (parent_appinfo)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = appinfo;
+            ctx.target_rti      = rti;
+
+            appinfo->parent_relid = parent_appinfo->parent_relid;
+            appinfo->parent_reltype = parent_appinfo->parent_reltype;
+            appinfo->parent_reloid = parent_appinfo->parent_reloid;
+            appinfo->translated_vars = (List*)expression_tree_mutator(
+                parent_appinfo->translated_vars,
+                transvars_merge_mutator, &ctx);
+        }
+
+        /*         * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.         */        if (oldrc)
@@ -1662,7 +1735,8 @@ adjust_appendrel_attrs_mutator(Node *node,                 * step to convert the tuple layout to
theparent's rowtype.                 * Otherwise we have to generate a RowExpr.                 */
 
-                if (OidIsValid(appinfo->child_reltype))
+                if (OidIsValid(appinfo->child_reltype) &&
+                    OidIsValid(appinfo->parent_reltype))                {                    Assert(var->vartype ==
appinfo->parent_reltype);                   if (appinfo->parent_reltype != appinfo->child_reltype)
 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6f9ee5e..d254ff3 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -502,9 +502,40 @@ explain (costs off)               Index Cond: (ab = 'ab'::text)(7 rows)
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+NOTICE:  merging column "a" with inherited definition
+NOTICE:  merging column "b" with inherited definition
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+NOTICE:  merging column "ab" with inherited definition
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+                    QUERY PLAN                    
+--------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: ((t1.a || t1.b))
+         ->  Index Scan using t1_ab_idx on t1
+         ->  Index Scan using t1c_expr_idx on t1c
+         ->  Index Scan using t2_pkey on t2
+         ->  Index Scan using t2c_pkey on t2c
+(7 rows)
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off) SELECT * FROM
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d567cf1..7f9a9b1 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -198,9 +198,29 @@ explain (costs off)  SELECT * FROM t2) t WHERE ab = 'ab';
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1da4b2f..f7873a9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -404,6 +404,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse,    expand_inherited_tables(root);    /*
+     * Collapse multilevel inheritances into fewer levels.
+     *
+     * UNION ALL containing subselects on inherited tables leaves multilevel
+     * inheritance after calling expand_inherited_tables().  Fold them in
+     * order to make MergeAppend plan available for such queries.
+     */
+    collapse_appendrels(root);
+
+    /*     * Set hasHavingQual to remember if HAVING clause is present.  Needed     * because preprocess_expression
willreduce a constant-true condition to     * an empty qual list ... but "HAVING TRUE" is not a semantic no-op.
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..165c010 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index              target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1211,6 +1218,131 @@ expand_inherited_tables(PlannerInfo *root)}/*
+ * collapse_appendrels
+ *      Pulling up the descendents by recursively combining successive
+ *      translations.
+ *
+ *      Although the same thing could be done in adjust_appendrel_attrs(),
+ *      doing it here all through at once is more efficient than individually
+ *      (and maybe repetitively) done there.
+ */
+void
+collapse_appendrels(PlannerInfo *root)
+{
+    Index               last_parent_relid = 1;
+    AppendRelInfo    *last_parent = NULL;
+    ListCell         *lcchild;
+    ListCell         *lcparent;
+    bool             full_search = false;
+
+    foreach(lcchild, root->append_rel_list)
+    {
+        AppendRelInfo    *ch_appinfo         = (AppendRelInfo *)lfirst(lcchild);
+        Index             ch_parent_relid = ch_appinfo->parent_relid;
+        
+        if (last_parent_relid != ch_parent_relid)
+        {
+            /*
+             * Find the parent for the current child appinfo if parent relid
+             * is different from that of preveous child.
+             */
+            do
+            {
+                /*
+                 * For most cases, the relations are referred to as the parent
+                 * in their apperarence order in rtable from
+                 * append_rel_list. So start searching for the parent appinfos
+                 * from just after the last parent. If the incremental search
+                 * was failed, retry searching the entire list and exit on
+                 * failure.
+                 */
+                if (!last_parent)
+                {
+                    /*
+                     * If this is the first time or the preveous search was
+                     * failed, set up for full search.
+                     */
+                    lcparent = list_head(root->append_rel_list);
+                    last_parent_relid = 1;
+                    full_search = true;
+                }
+
+                last_parent = NULL;
+                for_each_cell(lcparent, lcparent)
+                {
+                    /*
+                     * Children's and parents' apprelinfos are bonded via
+                     * rtable relations. So two apprelinfos are in
+                     * parent-child relationship when the child_relid of the
+                     * parent is equal to the parent_relid of the child.
+                     */
+                    AppendRelInfo *papp = (AppendRelInfo*)lfirst(lcparent);
+                    if (papp->child_relid == ch_parent_relid)
+                    {
+                        last_parent = papp;
+                        last_parent_relid = ch_parent_relid;
+                        
+                        /* Search from the next cell next time. */
+                        lcparent = lnext(lcparent);
+                        full_search = false;
+                        break;        /* Parent found */
+                    }
+                }
+                /* Retry only when incremental search was failed. */
+            } while (!full_search && !last_parent);
+        }
+
+        /*
+         * Replace child translated_vars so as to be a direct children of the
+         * topmost relation.
+         */
+        if (last_parent)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = ch_appinfo;
+            ctx.target_rti  = last_parent->child_relid;
+
+            ch_appinfo->translated_vars =
+                (List*)expression_tree_mutator(
+                    (Node*)last_parent->translated_vars,
+                    transvars_merge_mutator, &ctx);
+            ch_appinfo->parent_relid = last_parent->parent_relid;
+            ch_appinfo->parent_reltype = last_parent->parent_reltype;
+        }
+    }
+}
+
+/*
+ * Replace Var nodes with corresponding child nodes.
+ */
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            return (Node*)copyObject(list_nth(ctx->child_appinfo->translated_vars,
+                                              oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set. *        If so,
addentries for all the child tables to the query's
 
@@ -1662,7 +1794,8 @@ adjust_appendrel_attrs_mutator(Node *node,                 * step to convert the tuple layout to
theparent's rowtype.                 * Otherwise we have to generate a RowExpr.                 */
 
-                if (OidIsValid(appinfo->child_reltype))
+                if (OidIsValid(appinfo->child_reltype) &&
+                    OidIsValid(appinfo->parent_reltype))                {                    Assert(var->vartype ==
appinfo->parent_reltype);                   if (appinfo->parent_reltype != appinfo->child_reltype)
 
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 0934e63..7a83938 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -49,6 +49,7 @@ extern Plan *plan_set_operations(PlannerInfo *root, double tuple_fraction,                    List
**sortClauses);externvoid expand_inherited_tables(PlannerInfo *root);
 
+extern void collapse_appendrels(PlannerInfo *root);extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
                    AppendRelInfo *appinfo);
 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6f9ee5e..d254ff3 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -502,9 +502,40 @@ explain (costs off)               Index Cond: (ab = 'ab'::text)(7 rows)
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+NOTICE:  merging column "a" with inherited definition
+NOTICE:  merging column "b" with inherited definition
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+NOTICE:  merging column "ab" with inherited definition
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+                    QUERY PLAN                    
+--------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: ((t1.a || t1.b))
+         ->  Index Scan using t1_ab_idx on t1
+         ->  Index Scan using t1c_expr_idx on t1c
+         ->  Index Scan using t2_pkey on t2
+         ->  Index Scan using t2c_pkey on t2c
+(7 rows)
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off) SELECT * FROM
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d567cf1..7f9a9b1 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -198,9 +198,29 @@ explain (costs off)  SELECT * FROM t2) t WHERE ab = 'ab';
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off)

Re: UNION ALL on partitioned tables won't use indices.

From
Noah Misch
Date:
On Tue, Jan 14, 2014 at 06:04:47PM +0900, Kyotaro HORIGUCHI wrote:
> I proposed 3 types of solution but the one of them - tweaking
> Equivalence Class (Type 1)- was dismissed because of
> inappropriate manipulation on Equivalence Class. The rest do
> essentially the same thing - flattening appendrels - at different
> timings.
> 
> In type 2, the process is implemented as a generic appendrel
> flattening function and applied after expand_inherited_tables()
> in subquery_planner.
> 
> In type 3, the equivelant process is combined in
> expand_inherited_rtentry().
> 
> Type 2 loops once for whole appendrel list (in most cases) so it
> might be more effective than type 3 which searches the parent
> appendrel for each appended ones. Type 3 is more approprieate
> design assuming that appendrel tree deeper than 2 levels will be
> generated only by expand_inherited_tables().

Let's focus on type 3; Tom and I both found it most promising.

> The attached two patches are rebased to current 9.4dev HEAD and
> make check at the topmost directory and src/test/isolation are
> passed without error. One bug was found and fixed on the way. It
> was an assertion failure caused by probably unexpected type
> conversion introduced by collapse_appendrels() which leads
> implicit whole-row cast from some valid reltype to invalid
> reltype (0) into adjust_appendrel_attrs_mutator().

What query demonstrated this bug in the previous type 2/3 patches?

>  - unionall_inh_idx_typ3_v4_20140114.patch

You have not addressed my review comments from November:
http://www.postgresql.org/message-id/20131123073913.GA1008138@tornado.leadboat.com

Specifically, these:

[transvar_merge_mutator()] has roughly the same purpose as
adjust_appendrel_attrs_mutator(), but it propagates the change to far fewer
node types.  Why this case so much simpler?  The parent translated_vars of
parent_appinfo may bear mostly-arbitrary expressions.

...

I get this warning:

prepunion.c: In function `expand_inherited_rtentry':
prepunion.c:1450: warning: passing argument 1 of `expression_tree_mutator' from incompatible pointer type

...

Approaches (2) and (3) leave the inheritance parent with rte->inh == true
despite no AppendRelInfo pointing to it as their parent.  Until now,
expand_inherited_rtentry() has been careful to clear rte->inh in such cases.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello, Thank you, and I' sorry to have kept you waiting.

> Let's focus on type 3; Tom and I both found it most promising.

Agreed.

> > The attached two patches are rebased to current 9.4dev HEAD and
> > make check at the topmost directory and src/test/isolation are
> > passed without error. One bug was found and fixed on the way. It
> > was an assertion failure caused by probably unexpected type
> > conversion introduced by collapse_appendrels() which leads
> > implicit whole-row cast from some valid reltype to invalid
> > reltype (0) into adjust_appendrel_attrs_mutator().
> 
> What query demonstrated this bug in the previous type 2/3 patches?
> 
> >  - unionall_inh_idx_typ3_v4_20140114.patch
> 
> You have not addressed my review comments from November:
> http://www.postgresql.org/message-id/20131123073913.GA1008138@tornado.leadboat.com

mmm. Sorry, I've overlooked most of it..

em_is_child is no more a issue, and the rest seem to cited below, thanks.

> Specifically, these:
> 
> [transvar_merge_mutator()] has roughly the same purpose as
> adjust_appendrel_attrs_mutator(), but it propagates the change to far fewer
> node types.  Why this case so much simpler?  The parent translated_vars of
> parent_appinfo may bear mostly-arbitrary expressions.

There are only two places where AppendRelInfo node is generated,
expand_inherited_rtentry and
pull_up_union_leaf_queries.(_copyAppendrelInfo is irrelevant to
this discussion.) The core part generating translated_vars for
them are make_inh_translation_list and
make_setop_translation_list respectively. That's all. And they
are essentially works on the same way, making a Var for every
referred target entry of children like following.

| makeVar(varno,
|         tle->resno,
|         exprType((Node *) tle->expr),
|         exprTypmod((Node *) tle->expr),
|         exprCollation((Node *) tle->expr),
|         0);

So all we should do to collapse nested appendrels is simplly
connecting each RTEs directly to the root (most ancient?) RTE in
the relationship, resolving Vars like above, as seen in patched
expand_inherited_rtentry.

# If translated_vars are generated always in the way shown above,
# using tree walker might be no use..

This can be done apart from all other stuffs compensating
translation skew done in adjust_appendrel_attrs. I believe they
are in orthogonal relationship.


> Approaches (2) and (3) leave the inheritance parent with rte->inh == true
> despite no AppendRelInfo pointing to it as their parent.  Until now,
> expand_inherited_rtentry() has been careful to clear rte->inh in such cases.

Thank you. I missed that point. rte->inh at first works as a
trigger to try expand inheritance and create append_rel_list
entries, and later works to say "dig me through appinfos". From
that point of view, inh of the RTEs whose children took away
should be 0. The two meanings of inh are now become different
from each other so I suppose it'd better rename it, but I don't
come up with good alternatives..

Anyway this is corrected in the patch attached and works as
follows,

BEFORE:rte[1] Subquery "SELECT*1", inh = 1   +- appinfo[0] - rte[4] Relation "p1", inh = 1   |               +-
appinfo[2]- rte[6]  Relation "p1", inh = 0   |               +- appinfo[3] - rte[7]  Relation "c11", inh = 0   |
      +- appinfo[4] - rte[8]  Relation "c12", inh = 0   +- appinfo[1] - rte[5] Relation "p2", inh = 1
+-appinfo[5] - rte[9]  Relation "p1", inh = 0                   +- appinfo[6] - rte[10] Relation "c11", inh = 0
         +- appinfo[7] - rte[11] Relation "c12", inh = 0
 

COLLAPSED:rte[1] Subquery "SELECT*1", inh = 1   +- appinfo[0] - rte[4]  Relation "p1",  inh = 1 => 0   +- appinfo[2] -
rte[6] Relation "p1",  inh = 0   +- appinfo[3] - rte[7]  Relation "c11", inh = 0   +- appinfo[4] - rte[8]  Relation
"c12",inh = 0   +- appinfo[1] - rte[5]  Relation "p2",  inh = 1 => 0   +- appinfo[5] - rte[9]  Relation "p1",  inh = 0
+- appinfo[6] - rte[10] Relation "c11", inh = 0   +- appinfo[7] - rte[11] Relation "c12", inh = 0
 


> I get this warning:
> 
> prepunion.c: In function `expand_inherited_rtentry':
> prepunion.c:1450: warning: passing argument 1 of `expression_tree_mutator' from incompatible pointer type

Sorry, I forgot to put a casting to generic type. It is fixed in
the attached version.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 52dcc72..6ef82d7 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index             target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1210,6 +1217,34 @@ expand_inherited_tables(PlannerInfo *root)    }}
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                    list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            
+            return (Node*)copyObject(
+                list_nth(ctx->child_appinfo->translated_vars,
+                         oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set.
@@ -1237,6 +1272,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)    List
*inhOIDs;   List       *appinfos;    ListCell   *l;
 
+    AppendRelInfo *parent_appinfo = NULL;    /* Does RT entry allow inheritance? */    if (!rte->inh)
@@ -1301,6 +1337,21 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
oldrc->isParent= true;    /*
 
+     * If parent relation is appearing in a subselect of UNION ALL, it has
+     * further parent appendrelinfo. Save it to pull up inheritance children
+     * later.
+     */
+    foreach(l, root->append_rel_list)
+    {
+        AppendRelInfo *appinfo = (AppendRelInfo *)lfirst(l);
+        if(appinfo->child_relid == rti)
+        {
+            parent_appinfo = appinfo;
+            break;
+        }
+    }
+    
+    /*     * Must open the parent relation to examine its tupdesc.  We need not lock     * it; we assume the rewriter
alreadydid.     */
 
@@ -1308,6 +1359,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)    /* Scan the
inheritanceset and expand it */    appinfos = NIL;
 
+
+    /*
+     * This rte will lose all children being pulled up later if it has further
+     * parent
+     */
+    if (parent_appinfo)
+        rte->inh = false;
+    foreach(l, inhOIDs)    {        Oid            childOID = lfirst_oid(l);
@@ -1378,6 +1437,28 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)        }        /*
+         * Pull up this appinfo onto just above of the parent. The parent
+         * relation has its own parent when it appears as a subquery of UNION
+         * ALL. Pulling up these children gives a chance to consider
+         * MergeAppend on whole the UNION ALL tree.
+         */
+
+        if (parent_appinfo)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = appinfo;
+            ctx.target_rti      = rti;
+
+            appinfo->parent_relid = parent_appinfo->parent_relid;
+            appinfo->parent_reltype = parent_appinfo->parent_reltype;
+            appinfo->parent_reloid = parent_appinfo->parent_reloid;
+            appinfo->translated_vars = (List*)expression_tree_mutator(
+                (Node*)parent_appinfo->translated_vars,
+                transvars_merge_mutator, &ctx);
+        }
+
+        /*         * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.         */        if (oldrc)
@@ -1662,7 +1743,8 @@ adjust_appendrel_attrs_mutator(Node *node,                 * step to convert the tuple layout to
theparent's rowtype.                 * Otherwise we have to generate a RowExpr.                 */
 
-                if (OidIsValid(appinfo->child_reltype))
+                if (OidIsValid(appinfo->child_reltype) &&
+                    OidIsValid(appinfo->parent_reltype))                {                    Assert(var->vartype ==
appinfo->parent_reltype);                   if (appinfo->parent_reltype != appinfo->child_reltype)
 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6f9ee5e..f69e92b 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -502,9 +502,42 @@ explain (costs off)               Index Cond: (ab = 'ab'::text)(7 rows)
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+NOTICE:  merging column "a" with inherited definition
+NOTICE:  merging column "b" with inherited definition
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+NOTICE:  merging column "ab" with inherited definition
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: ((t1.a || t1.b))
+         ->  Index Scan using t1_ab_idx on t1
+         ->  Index Scan using t2_pkey on t2
+         ->  Index Scan using t1_ab_idx on t1 t1_1
+         ->  Index Scan using t1c_expr_idx on t1c
+         ->  Index Scan using t2_pkey on t2 t2_1
+         ->  Index Scan using t2c_pkey on t2c
+(9 rows)
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off) SELECT * FROM
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d567cf1..7f9a9b1 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -198,9 +198,29 @@ explain (costs off)  SELECT * FROM t2) t WHERE ab = 'ab';
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off)

Re: UNION ALL on partitioned tables won't use indices.

From
Noah Misch
Date:
On Tue, Jan 28, 2014 at 06:53:32PM +0900, Kyotaro HORIGUCHI wrote:
> Hello, Thank you, and I' sorry to have kept you waiting.

No hurry.

> > > The attached two patches are rebased to current 9.4dev HEAD and
> > > make check at the topmost directory and src/test/isolation are
> > > passed without error. One bug was found and fixed on the way. It
> > > was an assertion failure caused by probably unexpected type
> > > conversion introduced by collapse_appendrels() which leads
> > > implicit whole-row cast from some valid reltype to invalid
> > > reltype (0) into adjust_appendrel_attrs_mutator().
> > 
> > What query demonstrated this bug in the previous type 2/3 patches?

I would still like to know the answer to the above question.

> > [transvar_merge_mutator()] has roughly the same purpose as
> > adjust_appendrel_attrs_mutator(), but it propagates the change to far fewer
> > node types.  Why this case so much simpler?  The parent translated_vars of
> > parent_appinfo may bear mostly-arbitrary expressions.
> 
> There are only two places where AppendRelInfo node is generated,
> expand_inherited_rtentry and
> pull_up_union_leaf_queries.(_copyAppendrelInfo is irrelevant to
> this discussion.) The core part generating translated_vars for
> them are make_inh_translation_list and
> make_setop_translation_list respectively. That's all. And they
> are essentially works on the same way, making a Var for every
> referred target entry of children like following.
> 
> | makeVar(varno,
> |         tle->resno,
> |         exprType((Node *) tle->expr),
> |         exprTypmod((Node *) tle->expr),
> |         exprCollation((Node *) tle->expr),
> |         0);

It's true that each AppendRelInfo is initially created that way, but they need
not remain so simple.  You can assume ctx->child_appinfo->translated_vars
contains only these Var nodes, but parent_appinfo->translated_vars may contain
arbitrary expressions.  (I think the pullup_replace_vars() call at
prepjointree.c:954 is where an AppendRelInfo can get non-Var expressions.)

> So all we should do to collapse nested appendrels is simplly
> connecting each RTEs directly to the root (most ancient?) RTE in
> the relationship, resolving Vars like above, as seen in patched
> expand_inherited_rtentry.
> 
> # If translated_vars are generated always in the way shown above,
> # using tree walker might be no use..
> 
> This can be done apart from all other stuffs compensating
> translation skew done in adjust_appendrel_attrs. I believe they
> are in orthogonal relationship.

Here is a test case that provokes an assertion failure under the patch; I
believe the cause is oversimplification in transvars_merge_mutator():

create table parent (a int, b int);
create table child () inherits (parent);
select parent from parent union all select parent from parent;


While poking at that test case, I noticed that the following test emits three
rows on HEAD and wrongly emits four rows with the patch:

create table parent (a int, b int);
create table child () inherits (parent);
insert into parent values (1,10);
insert into child values (2,20);
select a, b from parent union all select a, b from child;

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello, 

> No hurry.

Thanks.

> > > > The attached two patches are rebased to current 9.4dev HEAD and
> > > > make check at the topmost directory and src/test/isolation are
> > > > passed without error. One bug was found and fixed on the way. It
> > > > was an assertion failure caused by probably unexpected type
> > > > conversion introduced by collapse_appendrels() which leads
> > > > implicit whole-row cast from some valid reltype to invalid
> > > > reltype (0) into adjust_appendrel_attrs_mutator().
> > > 
> > > What query demonstrated this bug in the previous type 2/3 patches?
> 
> I would still like to know the answer to the above question.

Ouch! Sorry but I couldn't replay the bug just now, please wait
for a while.

> It's true that each AppendRelInfo is initially created that way, but they need
> not remain so simple.  You can assume ctx->child_appinfo->translated_vars
> contains only these Var nodes, but parent_appinfo->translated_vars may contain
> arbitrary expressions.  (I think the pullup_replace_vars() call at
> prepjointree.c:954 is where an AppendRelInfo can get non-Var expressions.)

Itself is not a problem.

My patch doesn't replace parent's sole Var at top-level with the
corresponding node in child, it repaces any Var node in parent's
expressions in translated_vars with the corresponding node in
child. So expressions in FROM clauses of union's-operand queries
can adequately modifies expressions made in prepjointree. The
query like follows returns correct result with this patch. As I
mentioned before, I think the other stuffs than Var pullup would
be done in adjust_appendrel_attrs separately.

|  SELECT (a + 1) as x, (b * 3) as y FROM p1
|  UNION ALL
|  SELECT (a * 2), (b / 5) FROM p2 ORDER BY x LIMIT 10;


> > So all we should do to collapse nested appendrels is simplly
> > connecting each RTEs directly to the root (most ancient?) RTE in
> > the relationship, resolving Vars like above, as seen in patched
> > expand_inherited_rtentry.
> > 
> > # If translated_vars are generated always in the way shown above,
> > # using tree walker might be no use..
> > 
> > This can be done apart from all other stuffs compensating
> > translation skew done in adjust_appendrel_attrs. I believe they
> > are in orthogonal relationship.
> 
> Here is a test case that provokes an assertion failure under the patch; I
> believe the cause is oversimplification in transvars_merge_mutator():
> 
> create table parent (a int, b int);
> create table child () inherits (parent);
> select parent from parent union all select parent from parent;

Wow. Ok, I'm pretty convinced that you're right. I'll dig it on.

> While poking at that test case, I noticed that the following test emits three
> rows on HEAD and wrongly emits four rows with the patch:
> 
> create table parent (a int, b int);
> create table child () inherits (parent);
> insert into parent values (1,10);
> insert into child values (2,20);
> select a, b from parent union all select a, b from child;

Mmm. I had the same result. Please let me have a bit more time.


regards,


-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello, The attached file
"unionall_inh_idx_typ3_v6_20140203.patch" is the new version of
this patch fixed the 'whole-row-var' bug.


First of all, on second thought about this,

> > create table parent (a int, b int);
> > create table child () inherits (parent);
> > insert into parent values (1,10);
> > insert into child values (2,20);
> > select a, b from parent union all select a, b from child;
> 
> Mmm. I had the same result. Please let me have a bit more time.

This turned out to be a correct result. The two tables have
following records after the two INSERTs.

| =# select * from only parent;
|  1 | 10
| (1 row)
| 
| =# select * from child;
|  2 | 20
| (1 row)

Then it is natural that the parent-side in the UNION ALL returns
following results.

| =# select * from parent;
|  a | b  
| ---+----
|  1 | 10
|  2 | 20
| (2 rows)

Finally, the result we got has proved not to be a problem.

====
Second, about the crash in this sql,

> select parent from parent union all select parent from parent;

It is ignored whole-row reference (=0) which makes the index of
child translated-vars list invalid (-1).  I completely ignored it
in spite that myself referred to before.

Unfortunately ConvertRowtypeExpr prevents appendrels from being
removed currently, and such a case don't seem to take place so
often, so I decided to exclude the case. In addition, I found
that only setting "rte->inh = false" causes duplicate scanning on
the same relations through abandoned appendrels so I set
parent/child_relid to InvalidOid so as no more to referred to
from the ex-parent (and ex-children).

The following queries seems to work correctly (but with no
optimization) after these fixes.

create table parent (a int, b int);
create table child () inherits (parent);
insert into parent values (1,10);
insert into child values (2,20);
select parent from parent union all select parent from parent;parent 
--------(1,10)(2,20)(1,10)(2,20)
(4 rows)
select a, parent from parent union all select a,parent from parent;a | parent 
---+--------1 | (1,10)2 | (2,20)1 | (1,10)2 | (2,20)
(4 rows)
select a, b from parent union all select a, b from parent;a | b  
---+----1 | 102 | 201 | 102 | 20
(4 rows)
select a, b from parent union all select a, b from child;a | b  
---+----2 | 201 | 102 | 20
(3 rows)

> > > > > The attached two patches are rebased to current 9.4dev HEAD and
> > > > > make check at the topmost directory and src/test/isolation are
> > > > > passed without error. One bug was found and fixed on the way. It
> > > > > was an assertion failure caused by probably unexpected type
> > > > > conversion introduced by collapse_appendrels() which leads
> > > > > implicit whole-row cast from some valid reltype to invalid
> > > > > reltype (0) into adjust_appendrel_attrs_mutator().
> > > > 
> > > > What query demonstrated this bug in the previous type 2/3 patches?
> > 
> > I would still like to know the answer to the above question.

I rememberd after some struggles. It failed during 'make check',
on the following query in inherit.sql.

| update bar set f2 = f2 + 100
| from
|   ( select f1 from foo union all select f1+3 from foo ) ss
| where bar.f1 = ss.f1;

The following SQLs causes the same type of crash.

create temp table foo(f1 int, f2 int);
create temp table foo2(f3 int) inherits (foo);
create temp table bar(f1 int, f2 int);
update bar set f2 = 1
from ( select f1 from foo union all select f1+3 from foo ) ss
where bar.f1 = ss.f1;

The tipped stone was "wholerow1" for the subquery created in
preprocess_targetlist.

|   /* Not a table, so we need the whole row as a junk var */
|   var = makeWholeRowVar(rt_fetch(rc->rti, range_table),
..
|   snprintf(resname, sizeof(resname), "wholerow%u", rc->rowmarkId);

Then the finishing blow was a appendRelInfo corresponding to the
var above, whose parent_reltype = 0 and child_reltype != 0, and
of course introduced by my patch. Since such a situation takes
place even for this patch, the modification is left alone.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 52dcc72..ece9318 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,12 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index             target_rti;
+    bool             failed;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +104,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1210,6 +1218,48 @@ expand_inherited_tables(PlannerInfo *root)    }}
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    /* fast path after failure */
+    if (ctx->failed)
+        return node;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno == 0)
+            {
+                /*
+                 * Appendrels which does whole-row-var conversion cannot be
+                 * removed. ConvertRowtypeExpr can convert only RELs which can
+                 * be referred to using relid.
+                 */
+                ctx->failed = true;
+                return node;
+            }
+            if (oldv->varattno >
+                list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+
+            return (Node*)copyObject(
+                list_nth(ctx->child_appinfo->translated_vars,
+                         oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set.
@@ -1237,6 +1287,8 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)    List
*inhOIDs;   List       *appinfos;    ListCell   *l;
 
+    AppendRelInfo *parent_appinfo = NULL;
+    bool        detach_parent = false;    /* Does RT entry allow inheritance? */    if (!rte->inh)
@@ -1301,6 +1353,22 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
oldrc->isParent= true;    /*
 
+     * If parent relation is appearing in a subselect of UNION ALL, it has
+     * further parent appendrelinfo. Save it to pull up inheritance children
+     * later.
+     */
+    foreach(l, root->append_rel_list)
+    {
+        AppendRelInfo *appinfo = (AppendRelInfo *)lfirst(l);
+        if(appinfo->child_relid == rti)
+        {
+            parent_appinfo = appinfo;
+            detach_parent = true;
+            break;
+        }
+    }
+    
+    /*     * Must open the parent relation to examine its tupdesc.  We need not lock     * it; we assume the rewriter
alreadydid.     */
 
@@ -1308,6 +1376,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)    /* Scan the
inheritanceset and expand it */    appinfos = NIL;
 
+    foreach(l, inhOIDs)    {        Oid            childOID = lfirst_oid(l);
@@ -1378,6 +1447,46 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)        }        /*
+         * Pull up this appinfo onto just above of the parent. The parent
+         * relation has its own parent when it appears as a subquery of UNION
+         * ALL. Pulling up these children gives a chance to consider
+         * MergeAppend on whole the UNION ALL tree.
+         */
+
+        if (parent_appinfo)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = appinfo;
+            ctx.target_rti      = rti;
+
+            if (parent_appinfo->parent_reltype == 0 &&
+                parent_appinfo->child_reltype == 0)
+            {
+                List *new_transvars;
+
+                /*
+                 * Connect this appinfo up to the parent RTE of
+                 * parent_appinfo.
+                 */
+                ctx.failed = false;
+                new_transvars = (List*)expression_tree_mutator(
+                    (Node*)parent_appinfo->translated_vars,
+                    transvars_merge_mutator, &ctx);
+                
+                if (ctx.failed)
+                    /* Some children remain so this parent cannot detach */
+                    detach_parent = false;
+                else
+                {
+                    appinfo->parent_relid = parent_appinfo->parent_relid;
+                    appinfo->parent_reltype = parent_appinfo->parent_reltype;
+                    appinfo->translated_vars = new_transvars;
+                }
+            }
+        }
+
+        /*         * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.         */        if (oldrc)
@@ -1399,6 +1508,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
heap_close(newrelation,NoLock);    }
 
+    /* Remove childless appinfo from inheritance tree */
+    if (detach_parent)
+    {
+        rte->inh = false;
+        parent_appinfo->parent_relid = InvalidOid;
+        parent_appinfo->child_relid = InvalidOid;
+    }
+    heap_close(oldrelation, NoLock);    /*
@@ -1662,7 +1779,8 @@ adjust_appendrel_attrs_mutator(Node *node,                 * step to convert the tuple layout to
theparent's rowtype.                 * Otherwise we have to generate a RowExpr.                 */
 
-                if (OidIsValid(appinfo->child_reltype))
+                if (OidIsValid(appinfo->child_reltype) &&
+                    OidIsValid(appinfo->parent_reltype))                {                    Assert(var->vartype ==
appinfo->parent_reltype);                   if (appinfo->parent_reltype != appinfo->child_reltype)
 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6f9ee5e..d254ff3 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -502,9 +502,40 @@ explain (costs off)               Index Cond: (ab = 'ab'::text)(7 rows)
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+NOTICE:  merging column "a" with inherited definition
+NOTICE:  merging column "b" with inherited definition
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+NOTICE:  merging column "ab" with inherited definition
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+                    QUERY PLAN                    
+--------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: ((t1.a || t1.b))
+         ->  Index Scan using t1_ab_idx on t1
+         ->  Index Scan using t1c_expr_idx on t1c
+         ->  Index Scan using t2_pkey on t2
+         ->  Index Scan using t2c_pkey on t2c
+(7 rows)
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off) SELECT * FROM
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d567cf1..7f9a9b1 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -198,9 +198,29 @@ explain (costs off)  SELECT * FROM t2) t WHERE ab = 'ab';
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off)

Re: UNION ALL on partitioned tables won't use indices.

From
Noah Misch
Date:
On Mon, Feb 03, 2014 at 07:36:22PM +0900, Kyotaro HORIGUCHI wrote:
> > > create table parent (a int, b int);
> > > create table child () inherits (parent);
> > > insert into parent values (1,10);
> > > insert into child values (2,20);
> > > select a, b from parent union all select a, b from child;
> >
> > Mmm. I had the same result. Please let me have a bit more time.
>
> This turned out to be a correct result. The two tables have
> following records after the two INSERTs.
>
> | =# select * from only parent;
> |  1 | 10
> | (1 row)
> |
> | =# select * from child;
> |  2 | 20
> | (1 row)
>
> Then it is natural that the parent-side in the UNION ALL returns
> following results.
>
> | =# select * from parent;
> |  a | b
> | ---+----
> |  1 | 10
> |  2 | 20
> | (2 rows)
>
> Finally, the result we got has proved not to be a problem.

The first union branch should return two rows, and the second union branch
should return one row, for a total of three.  In any case, I see later in your
mail that you fixed this.  The larger point is that this patch has no business
changing the output rows of *any* query.  Its goal is to pick a more-efficient
plan for arriving at the same answer.  If there's a bug in our current output
for some query, that's a separate discussion from the topic of this thread.

> Second, about the crash in this sql,
>
> > select parent from parent union all select parent from parent;
>
> It is ignored whole-row reference (=0) which makes the index of
> child translated-vars list invalid (-1).  I completely ignored it
> in spite that myself referred to before.
>
> Unfortunately ConvertRowtypeExpr prevents appendrels from being
> removed currently, and such a case don't seem to take place so
> often, so I decided to exclude the case.

> +                /*
> +                 * Appendrels which does whole-row-var conversion cannot be
> +                 * removed. ConvertRowtypeExpr can convert only RELs which can
> +                 * be referred to using relid.

We have parent and child relids, so it is not clear to me how imposing that
restriction helps us.  I replaced transvars_merge_mutator() with a call to
adjust_appendrel_attrs().  This reduces code duplication, and it handles
whole-row references.  (I don't think the other nodes adjust_appendrel_attrs()
can handle matter to this caller.  translated_vars will never contain join
tree nodes, and I doubt it could contain a PlaceHolderVar with phrels
requiring translation.)

The central design question for this patch seems to be how to represent, in
the range table, the fact that we expanded an inheritance parent despite its
children ending up as appendrel children of a freestanding UNION ALL.  The v6
patch detaches the original RTE from the join tree and clears its "inh" flag.
This breaks sepgsql_dml_privileges(), which looks for RTE_RELATION with inh =
true and consults selinux concerning every child table.  We could certainly
change the way sepgsql discovers inheritance hierarchies, but nothing clearly
better comes to mind.  I selected the approach of preserving the RTE's "inh"
flag, removing the AppendRelInfo connecting that RTE to its enclosing UNION
ALL, and creating no AppendRelInfo children for that RTE.  An alternative was
to introduce a new RTE flag, say "append".  An inheritance parent under a
UNION ALL would have append = false, inh = true; other inheritance parent RTEs
would have append = true, inh = true; an RTE for UNION ALL itself would have
append = true, inh = false.

> > > > > > The attached two patches are rebased to current 9.4dev HEAD and
> > > > > > make check at the topmost directory and src/test/isolation are
> > > > > > passed without error. One bug was found and fixed on the way. It
> > > > > > was an assertion failure caused by probably unexpected type
> > > > > > conversion introduced by collapse_appendrels() which leads
> > > > > > implicit whole-row cast from some valid reltype to invalid
> > > > > > reltype (0) into adjust_appendrel_attrs_mutator().
> > > > >
> > > > > What query demonstrated this bug in the previous type 2/3 patches?
> > >
> > > I would still like to know the answer to the above question.
>
> I rememberd after some struggles. It failed during 'make check',
> on the following query in inherit.sql.

> [details]

Interesting.  Today, the parent_reltype and child_reltype fields of
AppendRelInfo are either both valid or both invalid.  Your v6 patch allowed us
to have a valid child_reltype with an invalid parent_reltype.  At the moment,
we can't benefit from just one valid reltype.  I restored the old invariant.

If the attached patch version looks reasonable, I will commit it.


Incidentally, I tried adding an assertion that append_rel_list does not show
one appendrel as a direct child of another.  The following query, off-topic
for the patch at hand, triggered that assertion:

SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0
UNION ALL
SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0;

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Attachment

Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Thank you for the labor for polishing this patch.

I have no obvious objection at a glance on this new patch.

I agree to commit this if you this is pertinent to commit except
for the issue newly revealed by this patch. Though could you let
me have some more time to examine this by myself and fully
understand the changes what you made?


At Wed, 26 Feb 2014 23:29:35 -0500, Noah Misch wrote
> > Finally, the result we got has proved not to be a problem.
> 
> The first union branch should return two rows, and the second union branch
> should return one row, for a total of three.  In any case, I see later in your
> mail that you fixed this.

I got it. I confirmed that *after* fixing duplicate rows, then
misunderstood your point.

> The larger point is that this patch has no business
> changing the output rows of *any* query.  Its goal is to pick a more-efficient
> plan for arriving at the same answer.  If there's a bug in our current output
> for some query, that's a separate discussion from the topic of this thread.

Totally agreed.

> > Second, about the crash in this sql,
> > 
> > > select parent from parent union all select parent from parent;
> > 
> > It is ignored whole-row reference (=0) which makes the index of
> > child translated-vars list invalid (-1).  I completely ignored it
> > in spite that myself referred to before.
> > 
> > Unfortunately ConvertRowtypeExpr prevents appendrels from being
> > removed currently, and such a case don't seem to take place so
> > often, so I decided to exclude the case.
> 
> > +                /*
> > +                 * Appendrels which does whole-row-var conversion cannot be
> > +                 * removed. ConvertRowtypeExpr can convert only RELs which can
> > +                 * be referred to using relid.
> 
> We have parent and child relids, so it is not clear to me how imposing that
> restriction helps us.  I replaced transvars_merge_mutator() with a call to
> adjust_appendrel_attrs().  This reduces code duplication, and it handles
> whole-row references.

Thank you, I didn't grasp them as a whole..

> (I don't think the other nodes adjust_appendrel_attrs()
> can handle matter to this caller.  translated_vars will never contain join
> tree nodes, and I doubt it could contain a PlaceHolderVar with phrels
> requiring translation.)
> 
> The central design question for this patch seems to be how to represent, in
> the range table, the fact that we expanded an inheritance parent despite its
> children ending up as appendrel children of a freestanding UNION ALL.  The v6
> patch detaches the original RTE from the join tree and clears its "inh" flag.
> This breaks sepgsql_dml_privileges(), which looks for RTE_RELATION with inh =
> true and consults selinux concerning every child table.

Mmm. It was not in my sight. A bit more time is needed to
understand this.

> We could certainly
> change the way sepgsql discovers inheritance hierarchies, but nothing clearly
> better comes to mind.  I selected the approach of preserving the RTE's "inh"
> flag, removing the AppendRelInfo connecting that RTE to its enclosing UNION
> ALL, and creating no AppendRelInfo children for that RTE.

Ok, I did that because I was not certain whether removing
"detached" AppendRelInfos are safe or not. It is far smarter than
mine.

> An alternative was
> to introduce a new RTE flag, say "append".  An inheritance parent under a
> UNION ALL would have append = false, inh = true; other inheritance parent RTEs
> would have append = true, inh = true; an RTE for UNION ALL itself would have
> append = true, inh = false.

I think that kind of complexity is not necessary for this issue.

> > > > > > > The attached two patches are rebased to current 9.4dev HEAD and
> > > > > > > make check at the topmost directory and src/test/isolation are
> > > > > > > passed without error. One bug was found and fixed on the way. It
> > > > > > > was an assertion failure caused by probably unexpected type
> > > > > > > conversion introduced by collapse_appendrels() which leads
> > > > > > > implicit whole-row cast from some valid reltype to invalid
> > > > > > > reltype (0) into adjust_appendrel_attrs_mutator().
> > > > > > 
> > > > > > What query demonstrated this bug in the previous type 2/3 patches?
> > > > 
> > > > I would still like to know the answer to the above question.
> > 
> > I rememberd after some struggles. It failed during 'make check',
> > on the following query in inherit.sql.
> 
> > [details]
> 
> Interesting.  Today, the parent_reltype and child_reltype fields of
> AppendRelInfo are either both valid or both invalid.  Your v6 patch allowed us
> to have a valid child_reltype with an invalid parent_reltype.  At the moment,
> we can't benefit from just one valid reltype.  I restored the old invariant.

Ok, I understood it.

> If the attached patch version looks reasonable, I will commit it.
> 
> 
> Incidentally, I tried adding an assertion that append_rel_list does not show
> one appendrel as a direct child of another.  The following query, off-topic
> for the patch at hand, triggered that assertion:
> 
> SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0
> UNION ALL
> SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0;

This seems not to crash unless this patch is applied, but itself
doesn't seem to be a bug. I think it should be cured along with
this patch even if it is not the issue of this patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: UNION ALL on partitioned tables won't use indices.

From
Noah Misch
Date:
On Thu, Feb 27, 2014 at 07:30:57PM +0900, Kyotaro HORIGUCHI wrote:
> Thank you for the labor for polishing this patch.
> 
> I have no obvious objection at a glance on this new patch.
> 
> I agree to commit this if you this is pertinent to commit except
> for the issue newly revealed by this patch. Though could you let
> me have some more time to examine this by myself and fully
> understand the changes what you made?

Yes; waiting several days is no problem.

> At Wed, 26 Feb 2014 23:29:35 -0500, Noah Misch wrote
> > An alternative was
> > to introduce a new RTE flag, say "append".  An inheritance parent under a
> > UNION ALL would have append = false, inh = true; other inheritance parent RTEs
> > would have append = true, inh = true; an RTE for UNION ALL itself would have
> > append = true, inh = false.
> 
> I think that kind of complexity is not necessary for this issue.

Agreed.

> > Incidentally, I tried adding an assertion that append_rel_list does not show
> > one appendrel as a direct child of another.  The following query, off-topic
> > for the patch at hand, triggered that assertion:
> > 
> > SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0
> > UNION ALL
> > SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0;
> 
> This seems not to crash unless this patch is applied, but itself
> doesn't seem to be a bug.

To my knowledge, the query does not crash the server under any patch provided
on this thread.

> I think it should be cured along with
> this patch even if it is not the issue of this patch.

That would require changing rather different code, probably something in the
vicinity of pull_up_subqueries().  I'll leave it for another patch.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: UNION ALL on partitioned tables won't use indices.

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> If the attached patch version looks reasonable, I will commit it.

The test case is completely bogus, as the query explained is significantly
different from the query executed.  I'm not sure whether you can just
remove the extra ORDER BY column without getting machine-dependent
results, though.

More generally, I'm afraid the whole approach is probably wrong, or at
least not solving all problems in this area, because of this:

> Incidentally, I tried adding an assertion that append_rel_list does not show
> one appendrel as a direct child of another.  The following query, off-topic
> for the patch at hand, triggered that assertion:
> SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0
> UNION ALL
> SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0;

That's not "off topic" at all; it shows that there's not been any effort
to date to flatten appendrel membership, and therefore this partial
implementation is going to miss some cases.  It doesn't help to merge an
inheritance-based appendrel into its parent if the query ORDER BY is still
a level or two above that due to UNION ALLs.

I wonder whether we should consider adding a pass to flatten any nested
appendrels after we're done creating them all.  Or alternatively, perhaps
rather than changing the representation invariant, we need to take a
harder look at why ordering info isn't getting pushed down through
appendrels.
        regards, tom lane



Re: UNION ALL on partitioned tables won't use indices.

From
Tom Lane
Date:
BTW, isn't the proposed change to the comments for adjust_appendrel_attrs
just wrong?  If it's correct, why doesn't the Assert(!IsA(node, SubLink))
therein fire?  (AFAICS, the existing comment is correct: we don't use
this function until after expression preprocessing is complete.)
        regards, tom lane



Re: UNION ALL on partitioned tables won't use indices.

From
Noah Misch
Date:
On Thu, Feb 27, 2014 at 05:47:16PM -0500, Tom Lane wrote:
> BTW, isn't the proposed change to the comments for adjust_appendrel_attrs
> just wrong?  If it's correct, why doesn't the Assert(!IsA(node, SubLink))
> therein fire?  (AFAICS, the existing comment is correct: we don't use
> this function until after expression preprocessing is complete.)

The comment change is accurate; expand_inherited_tables(), which precedes
subplan conversion, now calls adjust_appendrel_attrs().  I neglected to remove
the SubLink assert or test a scenario actually entailing a SubLink in
translated_vars.  Thanks for spotting the problem.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: UNION ALL on partitioned tables won't use indices.

From
Noah Misch
Date:
On Thu, Feb 27, 2014 at 05:33:47PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > If the attached patch version looks reasonable, I will commit it.
> 
> The test case is completely bogus, as the query explained is significantly
> different from the query executed.  I'm not sure whether you can just
> remove the extra ORDER BY column without getting machine-dependent
> results, though.

Each query tests something slightly different.  The EXPLAIN verifies that we
can MergeAppend given this query structure, and the plain SELECT verifies that
any join tree contortions we made to achieve that do not change the answer.

> More generally, I'm afraid the whole approach is probably wrong, or at
> least not solving all problems in this area, because of this:
> 
> > Incidentally, I tried adding an assertion that append_rel_list does not show
> > one appendrel as a direct child of another.  The following query, off-topic
> > for the patch at hand, triggered that assertion:
> > SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0
> > UNION ALL
> > SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0;
> 
> That's not "off topic" at all; it shows that there's not been any effort
> to date to flatten appendrel membership, and therefore this partial
> implementation is going to miss some cases.  It doesn't help to merge an
> inheritance-based appendrel into its parent if the query ORDER BY is still
> a level or two above that due to UNION ALLs.

Nonetheless, I find it reasonable to accept a patch that makes
expand_inherited_tables() avoid introducing nested appendrels.  Making
pull_up_subqueries() do the same can be a separate effort.  There will still
be a pile of ways to stifle MergeAppend, including everything that makes
is_simple_union_all() return false.  Having said that, both you and Kyotaro
sound keen to achieve an appendrel flatness invariant in a single patch.  If
that's the consensus, I'm fine with bouncing this back so it can happen.

> I wonder whether we should consider adding a pass to flatten any nested
> appendrels after we're done creating them all.

We did consider that upthread.  It's a valid option, but I remain more
inclined to teach pull_up_subqueries() to preserve flatness just like
expand_inherited_tables() will.

> Or alternatively, perhaps
> rather than changing the representation invariant, we need to take a
> harder look at why ordering info isn't getting pushed down through
> appendrels.

That could facilitate MergeAppend in considerably more places, such as UNION
ALL containing non-simple branches.  I don't have much of a sense concerning
what such a patch would entail.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello,

At Thu, 27 Feb 2014 21:53:52 -0500, Noah Misch wrote
> On Thu, Feb 27, 2014 at 05:33:47PM -0500, Tom Lane wrote:
> > Noah Misch <noah@leadboat.com> writes:
> > > If the attached patch version looks reasonable, I will commit it.
> > 
> > The test case is completely bogus, as the query explained is significantly
> > different from the query executed.  I'm not sure whether you can just
> > remove the extra ORDER BY column without getting machine-dependent
> > results, though.
> 
> Each query tests something slightly different.  The EXPLAIN verifies that we
> can MergeAppend given this query structure, and the plain SELECT verifies that
> any join tree contortions we made to achieve that do not change the answer.

I think Tom said that the second query can yield the same result
even if the 2nd column in orderby is removed so the pertinence of
it seems doubtful. Actually the altered query gave me the same
result on my workset (CentOS6.5/x86-64).

> > More generally, I'm afraid the whole approach is probably wrong, or at
> > least not solving all problems in this area, because of this:
> > 
> > > Incidentally, I tried adding an assertion that append_rel_list does not show
> > > one appendrel as a direct child of another.  The following query, off-topic
> > > for the patch at hand, triggered that assertion:
> > > SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0
> > > UNION ALL
> > > SELECT 0 FROM (SELECT 0 UNION ALL SELECT 0) t0;
> > 
> > That's not "off topic" at all; it shows that there's not been any effort
> > to date to flatten appendrel membership, and therefore this partial
> > implementation is going to miss some cases.  It doesn't help to merge an
> > inheritance-based appendrel into its parent if the query ORDER BY is still
> > a level or two above that due to UNION ALLs.
> 
> Nonetheless, I find it reasonable to accept a patch that makes
> expand_inherited_tables() avoid introducing nested appendrels.  Making
> pull_up_subqueries() do the same can be a separate effort.  There will still
> be a pile of ways to stifle MergeAppend, including everything that makes
> is_simple_union_all() return false.  Having said that, both you and Kyotaro
> sound keen to achieve an appendrel flatness invariant in a single patch.  If
> that's the consensus, I'm fine with bouncing this back so it can happen.

I understood what that query causes and still don't have definite
opinion on which is better between fixing them individually and
in one go. Nontheless, I'd feel more at ease flattening them
altogether regardless of their origin.

> > I wonder whether we should consider adding a pass to flatten any nested
> > appendrels after we're done creating them all.
> 
> We did consider that upthread.  It's a valid option, but I remain more
> inclined to teach pull_up_subqueries() to preserve flatness just like
> expand_inherited_tables() will.

Yes, the old dumped version of typ2 patch did so. It flattened
appendrel tree for the query prpoerly. Let me hear the reson you
prefer to do so.

> > Or alternatively, perhaps
> > rather than changing the representation invariant, we need to take a
> > harder look at why ordering info isn't getting pushed down through
> > appendrels.
> 
> That could facilitate MergeAppend in considerably more places, such as UNION
> ALL containing non-simple branches.  I don't have much of a sense concerning
> what such a patch would entail.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Sorry, I did wrong test.

> > > Noah Misch <noah@leadboat.com> writes:
> > > > If the attached patch version looks reasonable, I will commit it.
> > > 
> > > The test case is completely bogus, as the query explained is significantly
> > > different from the query executed.  I'm not sure whether you can just
> > > remove the extra ORDER BY column without getting machine-dependent
> > > results, though.
> > 
> > Each query tests something slightly different.  The EXPLAIN verifies that we
> > can MergeAppend given this query structure, and the plain SELECT verifies that
> > any join tree contortions we made to achieve that do not change the answer.
> 
> I think Tom said that the second query can yield the same result
> even if the 2nd column in orderby is removed so the pertinence of
> it seems doubtful. Actually the altered query gave me the same
> result on my workset (CentOS6.5/x86-64).

No, no! It was actually returned a *different* result. I
accidentially(?) added 'desc' to the '2'. Please forget it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: UNION ALL on partitioned tables won't use indices.

From
Noah Misch
Date:
On Fri, Feb 28, 2014 at 02:35:50PM +0900, Kyotaro HORIGUCHI wrote:
> At Thu, 27 Feb 2014 21:53:52 -0500, Noah Misch wrote
> > On Thu, Feb 27, 2014 at 05:33:47PM -0500, Tom Lane wrote:
> > > I wonder whether we should consider adding a pass to flatten any nested
> > > appendrels after we're done creating them all.
> > 
> > We did consider that upthread.  It's a valid option, but I remain more
> > inclined to teach pull_up_subqueries() to preserve flatness just like
> > expand_inherited_tables() will.
> 
> Yes, the old dumped version of typ2 patch did so. It flattened
> appendrel tree for the query prpoerly. Let me hear the reson you
> prefer to do so.

Having reviewed my upthread reasoning for preferring one of those two
approaches over the other, it's a weak preference.  They have similar runtime
costs.  Putting the logic with the code that creates appendrels reduces the
number of code sites one must examine to reason about possible plan
structures.  We might not flatten RTE_RELATION appendrel parents exactly the
same way we flatten RTE_SUBQUERY appendrel parents.  I would tend to leave
inh=true for the former, for reasons explained in my notes on v7, but set
inh=false for the latter to save needless work.  On the other hand, a
flattening pass is less code overall and brings an attractive
uniformity-by-default to the area.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello,

> > Yes, the old dumped version of typ2 patch did so. It flattened
> > appendrel tree for the query prpoerly. Let me hear the reson you
> > prefer to do so.
> 
> Having reviewed my upthread reasoning for preferring one of those two
> approaches over the other, it's a weak preference.  They have similar runtime
> costs.  Putting the logic with the code that creates appendrels reduces the
> number of code sites one must examine to reason about possible plan
> structures.  We might not flatten RTE_RELATION appendrel parents exactly the
> same way we flatten RTE_SUBQUERY appendrel parents.  I would tend to leave
> inh=true for the former, for reasons explained in my notes on v7, but set
> inh=false for the latter to save needless work.

Unfortunately, RTE_SUBQUERY-es with their inh flag cleard might
cause something inconvenient in preprocess_minmax_aggregates()
and/or add_rtes_to_flat_rtable()..

# I haven't found that related to sepgsql, though :-(

I understood that your concern is to deal parent RTEs defferently
according to their relkinds. But I think the inh flags could not
be modified at all for the reason mentioned above.

Finally the seemingly most safe way to exclude removed
intermediate RTEs in subsequent processing is simplly remove
apprelinfos directly linked to them (as did in v7), for both of
the parents, RTE_RELATION and RTE_SUBQUERY. The difference has
disappeared in this story.


At least as of now, inheritance tables define the bottom bound of
a appendrel tree and on the other hand complex(?)  union_alls
define the upper bound, and both multilevel (simple)union_alls
and inheritances are flattened individually so all possible
inheritance tree to be collapsed by this patch is only, I think,
 Subquery(inh=1)[Relation-inhparent [Relation-child, child, child]]

> On the other hand, a flattening pass is less code overall and
> brings an attractive uniformity-by-default to the area.

Focusing only on the above structure, what we should do to
collapse this tree is only connect the childs to the Subquery
directly, then remove all appendrelinfos connecting to the
Relation-inhparent. inh flag need not to be modified.

# Umm. I strongly suspect that I overlooked something..

Then even widening to general case, the case doesn't seem to
change. All we need to do is, link a child to its grandparent and
isolate the parent removing apprelinfos.

Thoughts?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: UNION ALL on partitioned tables won't use indices.

From
Noah Misch
Date:
On Mon, Mar 03, 2014 at 07:01:09PM +0900, Kyotaro HORIGUCHI wrote:
> > > Yes, the old dumped version of typ2 patch did so. It flattened
> > > appendrel tree for the query prpoerly. Let me hear the reson you
> > > prefer to do so.
> > 
> > Having reviewed my upthread reasoning for preferring one of those two
> > approaches over the other, it's a weak preference.  They have similar runtime
> > costs.  Putting the logic with the code that creates appendrels reduces the
> > number of code sites one must examine to reason about possible plan
> > structures.  We might not flatten RTE_RELATION appendrel parents exactly the
> > same way we flatten RTE_SUBQUERY appendrel parents.  I would tend to leave
> > inh=true for the former, for reasons explained in my notes on v7, but set
> > inh=false for the latter to save needless work.
> 
> Unfortunately, RTE_SUBQUERY-es with their inh flag cleard might
> cause something inconvenient in preprocess_minmax_aggregates()
> and/or add_rtes_to_flat_rtable()..

preprocess_minmax_aggregates() only cares about RTEs reachable from the join
tree, and the appendrel parents made obsolete by flattening would not be
reachable from the join tree.  add_rtes_to_flat_rtable() might be unhappy.

> # I haven't found that related to sepgsql, though :-(
> 
> I understood that your concern is to deal parent RTEs defferently
> according to their relkinds. But I think the inh flags could not
> be modified at all for the reason mentioned above.

That's fine, then.  It was a minor point.

If you are convinced that a separate flattening pass is best, that suffices
for me at this stage.  Please submit the patch you have in mind, incorporating
any improvements from the v7 patch that are relevant to both approaches.

> At least as of now, inheritance tables define the bottom bound of
> a appendrel tree and on the other hand complex(?)  union_alls
> define the upper bound, and both multilevel (simple)union_alls
> and inheritances are flattened individually so all possible
> inheritance tree to be collapsed by this patch is only, I think,
> 
>   Subquery(inh=1)[Relation-inhparent [Relation-child, child, child]]
> 
> > On the other hand, a flattening pass is less code overall and
> > brings an attractive uniformity-by-default to the area.
> 
> Focusing only on the above structure, what we should do to
> collapse this tree is only connect the childs to the Subquery
> directly, then remove all appendrelinfos connecting to the
> Relation-inhparent. inh flag need not to be modified.
> 
> # Umm. I strongly suspect that I overlooked something..
> 
> Then even widening to general case, the case doesn't seem to
> change. All we need to do is, link a child to its grandparent and
> isolate the parent removing apprelinfos.

I barely follow what you're saying here.  Do you speak of
union_inh_idx_typ2_v2_20131113.patch, unionall_inh_idx_typ3_v7.patch, or some
future design?  If we use a separate flattening pass, there's no small limit
on how many layers of appendrel we may need to flatten.  pull_up_subqueries()
can create many nested RTE_SUBQUERY appendrel layers; there may be more than
just child, parent and grandparent.  There's never more than one layer of
RTE_RELATION appendrel, though.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: UNION ALL on partitioned tables won't use indices.

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> If you are convinced that a separate flattening pass is best, that suffices
> for me at this stage.  Please submit the patch you have in mind, incorporating
> any improvements from the v7 patch that are relevant to both approaches.

I went back and re-read the original message, and this time it struck me
that really the issue here is that add_child_rel_equivalences() doesn't
think it has to deal with the case of a "parent" rel being itself a child.
That's not inherent though; it's just that it didn't occur to me at the
time that such a situation could arise.  It takes only very small changes
to allow that to happen.

If you do that, as in the attached changes to equivclass.c, you get
"could not find pathkey item to sort" errors from createplan.c; but
that's just because create_merge_append_plan() is likewise not expecting
the mergeappend's parent rel to be itself a child.  Fix for that is
a one-liner, ie, pass down relids.

That gets you to a point where the code generates a valid plan, but it's
got nested MergeAppends, which are unnecessary expense.  Kyotaro-san's
original fix for that was overcomplicated.  It's sufficient to teach
accumulate_append_subpath() to flatten nested MergeAppendPaths.

In short, the attached patch seems to fix it, for a lot less added
complication than anything else that's been discussed on this thread.
I've only lightly tested it (it could use a new regression test case),
but unless someone can find a flaw I think we should use this approach.

            regards, tom lane

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 03be7b1..5777cb2 100644
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
*************** get_cheapest_parameterized_child_path(Pl
*** 1021,1030 ****
   * accumulate_append_subpath
   *        Add a subpath to the list being built for an Append or MergeAppend
   *
!  * It's possible that the child is itself an Append path, in which case
!  * we can "cut out the middleman" and just add its child paths to our
!  * own list.  (We don't try to do this earlier because we need to
!  * apply both levels of transformation to the quals.)
   */
  static List *
  accumulate_append_subpath(List *subpaths, Path *path)
--- 1021,1035 ----
   * accumulate_append_subpath
   *        Add a subpath to the list being built for an Append or MergeAppend
   *
!  * It's possible that the child is itself an Append or MergeAppend path, in
!  * which case we can "cut out the middleman" and just add its child paths to
!  * our own list.  (We don't try to do this earlier because we need to apply
!  * both levels of transformation to the quals.)
!  *
!  * Note that if we omit a child MergeAppend in this way, we are effectively
!  * omitting a sort step, which seems fine: if the parent is to be an Append,
!  * its result would be unsorted anyway, while if the parent is to be a
!  * MergeAppend, there's no point in a separate sort on a child.
   */
  static List *
  accumulate_append_subpath(List *subpaths, Path *path)
*************** accumulate_append_subpath(List *subpaths
*** 1036,1041 ****
--- 1041,1053 ----
          /* list_copy is important here to avoid sharing list substructure */
          return list_concat(subpaths, list_copy(apath->subpaths));
      }
+     else if (IsA(path, MergeAppendPath))
+     {
+         MergeAppendPath *mpath = (MergeAppendPath *) path;
+
+         /* list_copy is important here to avoid sharing list substructure */
+         return list_concat(subpaths, list_copy(mpath->subpaths));
+     }
      else
          return lappend(subpaths, path);
  }
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 35d2a83..ac12f84 100644
*** a/src/backend/optimizer/path/equivclass.c
--- b/src/backend/optimizer/path/equivclass.c
*************** add_child_rel_equivalences(PlannerInfo *
*** 1937,1952 ****
          if (cur_ec->ec_has_volatile)
              continue;

!         /* No point in searching if parent rel not mentioned in eclass */
!         if (!bms_is_subset(parent_rel->relids, cur_ec->ec_relids))
              continue;

          foreach(lc2, cur_ec->ec_members)
          {
              EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);

!             if (cur_em->em_is_const || cur_em->em_is_child)
!                 continue;        /* ignore consts and children here */

              /* Does it reference parent_rel? */
              if (bms_overlap(cur_em->em_relids, parent_rel->relids))
--- 1937,1956 ----
          if (cur_ec->ec_has_volatile)
              continue;

!         /*
!          * No point in searching if parent rel not mentioned in eclass; but
!          * we can't tell that for sure if parent rel is itself a child.
!          */
!         if (parent_rel->reloptkind == RELOPT_BASEREL &&
!             !bms_is_subset(parent_rel->relids, cur_ec->ec_relids))
              continue;

          foreach(lc2, cur_ec->ec_members)
          {
              EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);

!             if (cur_em->em_is_const)
!                 continue;        /* ignore consts here */

              /* Does it reference parent_rel? */
              if (bms_overlap(cur_em->em_relids, parent_rel->relids))
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 184d37a..784805f 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
*************** create_merge_append_plan(PlannerInfo *ro
*** 751,757 ****

      /* Compute sort column info, and adjust MergeAppend's tlist as needed */
      (void) prepare_sort_from_pathkeys(root, plan, pathkeys,
!                                       NULL,
                                        NULL,
                                        true,
                                        &node->numCols,
--- 751,757 ----

      /* Compute sort column info, and adjust MergeAppend's tlist as needed */
      (void) prepare_sort_from_pathkeys(root, plan, pathkeys,
!                                       best_path->path.parent->relids,
                                        NULL,
                                        true,
                                        &node->numCols,

Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello,

> > Unfortunately, RTE_SUBQUERY-es with their inh flag cleard might
> > cause something inconvenient in preprocess_minmax_aggregates()
> > and/or add_rtes_to_flat_rtable()..
> 
> preprocess_minmax_aggregates() only cares about RTEs reachable from the join
> tree, and the appendrel parents made obsolete by flattening would not be
> reachable from the join tree.  add_rtes_to_flat_rtable() might be unhappy.
> 
> > # I haven't found that related to sepgsql, though :-(
> > 
> > I understood that your concern is to deal parent RTEs defferently
> > according to their relkinds. But I think the inh flags could not
> > be modified at all for the reason mentioned above.
> 
> That's fine, then.  It was a minor point.

Ya, anyway, inh's alterability during inheritance tree
transformation is not a minor issue, if we choose flattening
rather than equivalance adjustment (what typ1 patch did).

> If you are convinced that a separate flattening pass is best, that suffices
> for me at this stage.  Please submit the patch you have in mind, incorporating
> any improvements from the v7 patch that are relevant to both approaches.

All right.

> > At least as of now, inheritance tables define the bottom bound of
> > a appendrel tree and on the other hand complex(?)  union_alls
> > define the upper bound, and both multilevel (simple)union_alls
> > and inheritances are flattened individually so all possible
> > inheritance tree to be collapsed by this patch is only, I think,
> > 
> >   Subquery(inh=1)[Relation-inhparent [Relation-child, child, child]]
> > 
> > > On the other hand, a flattening pass is less code overall and
> > > brings an attractive uniformity-by-default to the area.
> > 
> > Focusing only on the above structure, what we should do to
> > collapse this tree is only connect the childs to the Subquery
> > directly, then remove all appendrelinfos connecting to the
> > Relation-inhparent. inh flag need not to be modified.
> > 
> > # Umm. I strongly suspect that I overlooked something..
> > 
> > Then even widening to general case, the case doesn't seem to
> > change. All we need to do is, link a child to its grandparent and
> > isolate the parent removing apprelinfos.
> 
> I barely follow what you're saying here.  Do you speak of
> union_inh_idx_typ2_v2_20131113.patch, unionall_inh_idx_typ3_v7.patch, or some
> future design?

Sorry, it's about some future design, but the point is inevitable
if we select the 'flattening pass' way.

>  If we use a separate flattening pass, there's no small limit
> on how many layers of appendrel we may need to flatten.  pull_up_subqueries()
> can create many nested RTE_SUBQUERY appendrel layers; there may be more than
> just child, parent and grandparent.  There's never more than one layer of
> RTE_RELATION appendrel, though.

Yes, that is near to my 'general case' I had in my mind something
similar in the structure but different in the source to
inheritance tables, but talking about such possibilities seems
senseless. Now the 'general case' is exactly the same to what you
mentioned above.

On the other hand, Tom gave us a proposal to deal this in
add_child_rel_equivalences(), it is similar at the entry to my
old failed typ1 patch but more profound. I'll consider on this
for now.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello, I examined the your patch and it seemed reasonable, but I
have one question about this patch.

You made ec_relids differ to the union of all ec members'
em_relids. Is it right?

====
At Mon, 03 Mar 2014 14:05:10 -0500, Tom Lane wrote
> Noah Misch <noah@leadboat.com> writes:
> > If you are convinced that a separate flattening pass is best, that suffices
> > for me at this stage.  Please submit the patch you have in mind, incorporating
> > any improvements from the v7 patch that are relevant to both approaches.
> 
> I went back and re-read the original message, and this time it struck me
> that really the issue here is that add_child_rel_equivalences() doesn't
> think it has to deal with the case of a "parent" rel being itself a child.

Yes, that is what I tried to solve in one of the first patch but
the brute way led to failure:(

> That's not inherent though; it's just that it didn't occur to me at the
> time that such a situation could arise.  It takes only very small changes
> to allow that to happen.  

I saw what you did. I marked not-full-fledged eq members as 'not
child'(its real meaning there was full-fledged in contradiction)
to reach the child rels. In contrast, you loosened the ec_relids
shortcut filter and it seems to work but.. Is it right to make
ec_relids different to union of all members' em_relids? This
seems to affect joins afterwards.

> If you do that, as in the attached changes to equivclass.c, you get
> "could not find pathkey item to sort" errors from createplan.c; but
> that's just because create_merge_append_plan() is likewise not expecting
> the mergeappend's parent rel to be itself a child.  Fix for that is
> a one-liner, ie, pass down relids.

This seems to just correcting the over simplification by the
assumption that the best_path there cannot have a parent.

> That gets you to a point where the code generates a valid plan, but it's
> got nested MergeAppends, which are unnecessary expense.  Kyotaro-san's
> original fix for that was overcomplicated.  It's sufficient to teach
> accumulate_append_subpath() to flatten nested MergeAppendPaths.

Ah, it was in typ1-collapse patch. As you might noticed seeing
there, I couldn't put the assumption that directly nested
MergeAppendPaths share the same pathkeys (really?). It's no
problem if the assumption goes on.

> In short, the attached patch seems to fix it, for a lot less added
> complication than anything else that's been discussed on this thread.
> I've only lightly tested it (it could use a new regression test case),
> but unless someone can find a flaw I think we should use this approach.

Mmm. That's motifying but you seems to be right :) Equipping this
with some regression tests become my work from now.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: UNION ALL on partitioned tables won't use indices.

From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> Hello, I examined the your patch and it seemed reasonable, but I
> have one question about this patch.

> You made ec_relids differ to the union of all ec members'
> em_relids. Is it right?

ec_relids has never included child relids.
        regards, tom lane



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello, I haven't look closer on their relationship.

> > Hello, I examined the your patch and it seemed reasonable, but I
> > have one question about this patch.
> 
> > You made ec_relids differ to the union of all ec members'
> > em_relids. Is it right?
> 
> ec_relids has never included child relids.

relation.h says that,

|    Relids  ec_relids;   /* all relids appearing in ec_members */    
...
|    Relids  em_relids;   /* all relids appearing in em_expr */   

But add_eq_member already did this, so the point was whether it's
ok to omit intermediate relations...

|   else if (!is_child)  /* child members don't add to ec_relids */
|   {
|       ec->ec_relids = bms_add_members(ec->ec_relids, relids);
|   }
|   ec->ec_members = lappend(ec->ec_members, em);

I understood that it is ok because an inheritance tree must have
one parent and it can be the representative for whole its
descendents so it's no problem. Thank you, I'll go on.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: UNION ALL on partitioned tables won't use indices.

From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
>> ec_relids has never included child relids.

> relation.h says that,

> |    Relids  ec_relids;   /* all relids appearing in ec_members */    
> ...
> |    Relids  em_relids;   /* all relids appearing in em_expr */   

Ah.  Those comments could use improvement, I guess.
        regards, tom lane



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Hello. Attached is the 2nd version of 'pushdown in UNION ALL on
partitioned tables' patch type 1 - fix in equiv-member version.

As far as I can see, I have found no problem on the original
Tom's patch. I have no annoyance of modifying inh flag and so
with this.  

At Tue, 04 Mar 2014 18:57:56 +0900, Kyotaro HORIGUCHI wrote
me> Mmm. That's motifying but you seems to be right :) Equipping this
me> with some regression tests become my work from now.

And I took an advantage of using Noah's regression test after
some modifications. After all, this patch consists of work of you
all. Thanks for all you did to me.

I simplified the query for regression tests so as to clarify the
objective and getting rid of confisions of readers. Using only
the first column seems to be enough to also make sure of pushing
down and ordering.

Any comments?


At Wed, 05 Mar 2014 13:59:45 -0500, Tom Lane wrote
tgl> >> ec_relids has never included child relids.
> > |    Relids  ec_relids;   /* all relids appearing in ec_members */    
> > ...
> > |    Relids  em_relids;   /* all relids appearing in em_expr */   
> 
> Ah.  Those comments could use improvement, I guess.

The revised comment makes it clear. Thank you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 03be7b1..5777cb2 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1021,10 +1021,15 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel, *
accumulate_append_subpath*        Add a subpath to the list being built for an Append or MergeAppend *
 
- * It's possible that the child is itself an Append path, in which case
- * we can "cut out the middleman" and just add its child paths to our
- * own list.  (We don't try to do this earlier because we need to
- * apply both levels of transformation to the quals.)
+ * It's possible that the child is itself an Append or MergeAppend path, in
+ * which case we can "cut out the middleman" and just add its child paths to
+ * our own list.  (We don't try to do this earlier because we need to apply
+ * both levels of transformation to the quals.)
+ *
+ * Note that if we omit a child MergeAppend in this way, we are effectively
+ * omitting a sort step, which seems fine: if the parent is to be an Append,
+ * its result would be unsorted anyway, while if the parent is to be a
+ * MergeAppend, there's no point in a separate sort on a child. */static List *accumulate_append_subpath(List
*subpaths,Path *path)
 
@@ -1036,6 +1041,13 @@ accumulate_append_subpath(List *subpaths, Path *path)        /* list_copy is important here to
avoidsharing list substructure */        return list_concat(subpaths, list_copy(apath->subpaths));    }
 
+    else if (IsA(path, MergeAppendPath))
+    {
+        MergeAppendPath *mpath = (MergeAppendPath *) path;
+
+        /* list_copy is important here to avoid sharing list substructure */
+        return list_concat(subpaths, list_copy(mpath->subpaths));
+    }    else        return lappend(subpaths, path);}
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 35d2a83..ac12f84 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1937,16 +1937,20 @@ add_child_rel_equivalences(PlannerInfo *root,        if (cur_ec->ec_has_volatile)
continue;
-        /* No point in searching if parent rel not mentioned in eclass */
-        if (!bms_is_subset(parent_rel->relids, cur_ec->ec_relids))
+        /*
+         * No point in searching if parent rel not mentioned in eclass; but
+         * we can't tell that for sure if parent rel is itself a child.
+         */
+        if (parent_rel->reloptkind == RELOPT_BASEREL &&
+            !bms_is_subset(parent_rel->relids, cur_ec->ec_relids))            continue;        foreach(lc2,
cur_ec->ec_members)       {            EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
 
-            if (cur_em->em_is_const || cur_em->em_is_child)
-                continue;        /* ignore consts and children here */
+            if (cur_em->em_is_const)
+                continue;        /* ignore consts here */            /* Does it reference parent_rel? */            if
(bms_overlap(cur_em->em_relids,parent_rel->relids))
 
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 184d37a..784805f 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -751,7 +751,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)    /* Compute sort column
info,and adjust MergeAppend's tlist as needed */    (void) prepare_sort_from_pathkeys(root, plan, pathkeys,
 
-                                      NULL,
+                                      best_path->path.parent->relids,                                      NULL,
                              true,                                      &node->numCols,
 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6f9ee5e..68643d8 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -502,9 +502,56 @@ explain (costs off)               Index Cond: (ab = 'ab'::text)(7 rows)
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+CREATE TEMP TABLE t1c (b text, a text);
+ALTER TABLE t1c INHERIT t1;
+CREATE TEMP TABLE t2c (primary key (ab)) INHERITS (t2);
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd'), ('m', 'n'), ('e', 'f');
+INSERT INTO t2c VALUES ('vw'), ('cd'), ('mn'), ('ef');
+CREATE INDEX t1c_ab_idx on t1c ((a || b));
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+explain (costs off)
+  SELECT * FROM
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT ab FROM t2) t
+  ORDER BY 1 LIMIT 8;
+                   QUERY PLAN                   
+------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: ((t1.a || t1.b))
+         ->  Index Scan using t1_ab_idx on t1
+         ->  Index Scan using t1c_ab_idx on t1c
+         ->  Index Scan using t2_pkey on t2
+         ->  Index Scan using t2c_pkey on t2c
+(7 rows)
+
+  SELECT * FROM
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT ab FROM t2) t
+  ORDER BY 1  LIMIT 8;
+ ab 
+----
+ ab
+ ab
+ cd
+ dc
+ ef
+ fe
+ mn
+ nm
+(8 rows)
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off) SELECT * FROM
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d567cf1..aa3ef7e 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -198,9 +198,37 @@ explain (costs off)  SELECT * FROM t2) t WHERE ab = 'ab';
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+
+CREATE TEMP TABLE t1c (b text, a text);
+ALTER TABLE t1c INHERIT t1;
+CREATE TEMP TABLE t2c (primary key (ab)) INHERITS (t2);
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd'), ('m', 'n'), ('e', 'f');
+INSERT INTO t2c VALUES ('vw'), ('cd'), ('mn'), ('ef');
+CREATE INDEX t1c_ab_idx on t1c ((a || b));
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+
+explain (costs off)
+  SELECT * FROM
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT ab FROM t2) t
+  ORDER BY 1 LIMIT 8;
+
+  SELECT * FROM
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT ab FROM t2) t
+  ORDER BY 1  LIMIT 8;
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off)

Re: UNION ALL on partitioned tables won't use indices.

From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> Hello. Attached is the 2nd version of 'pushdown in UNION ALL on
> partitioned tables' patch type 1 - fix in equiv-member version.

Committed, thanks.
        regards, tom lane



Re: UNION ALL on partitioned tables won't use indices.

From
Kyotaro HORIGUCHI
Date:
Thank you for committing.

At Fri, 28 Mar 2014 11:50:56 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <21426.1396021856@sss.pgh.pa.us>
tgl> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
tgl> > Hello. Attached is the 2nd version of 'pushdown in UNION ALL on
tgl> > partitioned tables' patch type 1 - fix in equiv-member version.
tgl> 
tgl> Committed, thanks.
tgl> 
tgl>             regards, tom lane

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center