Re: Using indices for UNION. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Using indices for UNION.
Date
Msg-id 20131122.165927.27412386.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Using indices for UNION.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Hello, As I was reconsidering after your advise, I came up with
an idea of hacking on query tree tranaformation phase in
subquery_planner, not on that of plan generation phase as
before. (And the older patch is totally scrapped:-)


Currently flatten_simple_union_all() flattens 'simple' UNION
'ALL' at the top level of 'current' query. I noticed that a
little modification there also could flatten simple UNION (w/o
ALL), and it has finally almost same effect regarding more usage
of indexes for UNION. Additionally, applying still more the
'UNION ALL and inheritance table' patch, even coveres UNION's on
inheritance tables.

This patch is far small and neat (though I say so myself :-) than
the previous ones. The following plans we got from unpatched
PostgreSQL.

original=# explain (analyze on, costs off)
|     select a, b from cu11 union select a, b from cu12 order by a, b limit 10;
|                               QUERY PLAN
| -----------------------------------------------------------------------------
|  Limit (actual time=272.252..272.259 rows=10 loops=1)
|  ->  Unique (actual time=272.251..272.258 rows=10 loops=1)
|   ->  Sort (actual time=272.249..272.252 rows=10 loops=1)
|         Sort Key: cu11.a, cu11.b
|         Sort Method: external sort  Disk: 3520kB
|         ->  Append (actual time=0.020..70.935 rows=200000 loops=1)
|          ->  Seq Scan on cu11 (actual time=0.019..26.741 rows=100000 loops=1)
|          ->  Seq Scan on cu12 (actual time=0.006..25.183 rows=100000 loops=1)
|  Total runtime: 273.162 ms

And of course for * (= a, b, c, d) this,

original=# explain (analyze on, costs off)
|            select * from cu11 union select * from cu12 order by a, b limit 10;
|                                QUERY PLAN
| ----------------------------------------------------------------------------
|  Limit (actual time=234.880..234.891 rows=10 loops=1)
|  ->  Unique (actual time=234.879..234.889 rows=10 loops=1)
|   ->  Sort (actual time=234.878..234.881 rows=10 loops=1)
|        Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
|        Sort Method: external sort  Disk: 5280kB
|        ->  Append (actual time=0.017..49.502 rows=200000 loops=1)
|           ->  Seq Scan on cu11 (actual time=0.017..15.649 rows=100000 loops=1)
|           ->  Seq Scan on cu12 (actual time=0.007..14.939 rows=100000 loops=1)
|  Total runtime: 236.029 ms

We'll get the following plan changed with this patch plus the
latest "pathkey_and_uniqueindx_v4_20131122.patch" patcch. (It got
a small fix on pathkeys fixation for index paths)

https://commitfest.postgresql.org/action/patch_view?id=1280

patched=# explain (analyze on, costs off)
|           select * from cu11 union select * from cu12 order by a, b limit 10;
|                                  QUERY PLAN
| ------------------------------------------------------------------------------
|  Limit (actual time=0.059..0.081 rows=10 loops=1)
|  ->  Unique (actual time=0.058..0.079 rows=10 loops=1)
|   ->  Merge Append (actual time=0.056..0.065 rows=10 loops=1)
|         Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
|         ->  Index Scan ... on cu11 (actual time=0.032..0.037 rows=10 loops=1)
|         ->  Index Scan ... on cu12 (actual time=0.021..0.021 rows=1 loops=1)
|  Total runtime: 0.162 ms

Moreover, with the 'UNION ALL' patches , say,
union_inh_idx_typ1_v1_20131024.patch and
union_inh_idx_typ1_add_v1_20131024.patch, we'll get the following
plan for UNION on inhertance tables,

https://commitfest.postgresql.org/action/patch_view?id=1270


patched2=# explain (analyze on, costs off)
|          select * from pu1 union select * from pu2 order by a, b limit 10;
|                             QUERY PLAN
| ---------------------------------------------------------------------------
|  Limit (actual time=0.209..0.234 rows=10 loops=1)
|  ->  Unique (actual time=0.206..0.230 rows=10 loops=1)
|   ->  Merge Append (actual time=0.204..0.216 rows=10 loops=1)
|        Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
|        ->  Index Scan..on pu1 (actual time=0.004..0.004 rows=0 loops=1)
|        ->  Index Scan..on cu11 (actual time=0.050..0.058 rows=10 loops=1)
|        ->  Index Scan..on cu12 (actual time=0.052..0.052 rows=1 loops=1)
|        ->  Index Scan..on pu2 (actual time=0.002..0.002 rows=0 loops=1)
|        ->  Index Scan..on cu21 (actual time=0.046..0.046 rows=1 loops=1)
|        ->  Index Scan..on cu22 (actual time=0.046..0.046 rows=1 loops=1)
|  Total runtime: 0.627 ms

The attached union_uses_idx_v4_20131122.patch is changed as
follows.
- Rebased to current master.
- Scrapped the whold old stuff.
- flatten_simple_union_all() is renamed to  flatten_simple_union() and modified to do so.  UNION is  flattend only if
sortClauseexists and distinctClause is NIL.
 

======
Test tables can be created following the command below,

| create table pu1 (a int not null, b int not null, c int, d text);
| create unique index i_pu1_ab on pu1 (a, b);
| create table cu11 (like pu1 including all) inherits (pu1);
| create table cu12 (like pu1 including all) inherits (pu1);
| create table pu2 (like pu1 including all);
| create table cu21 (like pu2 including all) inherits (pu2);
| create table cu22 (like pu2 including all) inherits (pu2);
| insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(000000, 099999) a);
| insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(100000, 199999) a);
| insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(200000, 299999) a);
| insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(300000, 399999) a);
=====

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6670794..7262c1b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -353,13 +353,14 @@ subquery_planner(PlannerGlobal *glob, Query *parse,        pull_up_subqueries(root, (Node *)
parse->jointree);   /*
 
-     * If this is a simple UNION ALL query, flatten it into an appendrel. We
-     * do this now because it requires applying pull_up_subqueries to the leaf
-     * queries of the UNION ALL, which weren't touched above because they
-     * weren't referenced by the jointree (they will be after we do this).
+     * If this is a simple UNION/UNION ALL query, flatten it into an
+     * appendrel. We do this now because it requires applying
+     * pull_up_subqueries to the leaf queries of the UNION/UNION ALL, which
+     * weren't touched above because they weren't referenced by the jointree
+     * (they will be after we do this).     */    if (parse->setOperations)
-        flatten_simple_union_all(root);
+        flatten_simple_union(root);    /*     * Detect whether any rangetable entries are RTE_JOIN kind; if not, we
can
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 485ac31..adf341b 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -32,6 +32,7 @@#include "optimizer/subselect.h"#include "optimizer/tlist.h"#include "parser/parse_relation.h"
+#include "parser/parse_clause.h"#include "parser/parsetree.h"#include "rewrite/rewriteManip.h"
@@ -81,8 +82,8 @@ static void make_setop_translation_list(Query *query, Index newvarno,static bool
is_simple_subquery(Query*subquery, RangeTblEntry *rte,                   JoinExpr *lowest_outer_join);static bool
is_simple_union_all(Query*subquery);
 
-static bool is_simple_union_all_recurse(Node *setOp, Query *setOpQuery,
-                            List *colTypes);
+static bool is_simple_union_recurse(SetOperationStmt *topop, Node *setOp,
+                                    Query *setOpQuery, List *colTypes);static bool is_safe_append_member(Query
*subquery);staticbool jointree_contains_lateral_outer_refs(Node *jtnode, bool restricted,
     Relids safe_upper_varnos);
 
@@ -1428,12 +1429,14 @@ is_simple_union_all(Query *subquery)        return false;    /* Recursively check the tree of
setoperations */
 
-    return is_simple_union_all_recurse((Node *) topop, subquery,
-                                       topop->colTypes);
+    if (topop->op != SETOP_UNION || !topop->all) return false;
+    return is_simple_union_recurse(topop, (Node *) topop, subquery,
+                                   topop->colTypes);}static bool
-is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes)
+is_simple_union_recurse(SetOperationStmt *topop, Node *setOp,
+                        Query *setOpQuery, List *colTypes){    if (IsA(setOp, RangeTblRef))    {
@@ -1451,13 +1454,16 @@ is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes)    {
SetOperationStmt*op = (SetOperationStmt *) setOp;
 
-        /* Must be UNION ALL */
-        if (op->op != SETOP_UNION || !op->all)
+        /* All SetOps under topop are UNION and ->all is same to topop */
+        if (op->op != SETOP_UNION || op->all != topop->all)            return false;        /* Recurse to check inputs
*/
-        return is_simple_union_all_recurse(op->larg, setOpQuery, colTypes) &&
-            is_simple_union_all_recurse(op->rarg, setOpQuery, colTypes);
+        return
+            is_simple_union_recurse(topop, op->larg,
+                                    setOpQuery, colTypes) &&
+            is_simple_union_recurse(topop, op->rarg,
+                                    setOpQuery, colTypes);    }    else    {
@@ -1896,23 +1902,22 @@ pullup_replace_vars_subquery(Query *query,                                           NULL);}
-/* * flatten_simple_union_all
- *        Try to optimize top-level UNION ALL structure into an appendrel
+ *        Try to optimize top-level UNION/UNION ALL structure into an appendrel *
- * If a query's setOperations tree consists entirely of simple UNION ALL
- * operations, flatten it into an append relation, which we can process more
- * intelligently than the general setops case.    Otherwise, do nothing.
+ * If a query's setOperations tree consists entirely of simple UNION and UNION
+ * ALL operations, flatten it into an append relation, which we can process
+ * more intelligently than the general setops case. Otherwise, do nothing. * * In most cases, this can succeed only
fora top-level query, because for a * subquery in FROM, the parent query's invocation of pull_up_subqueries would
 
- * already have flattened the UNION via pull_up_simple_union_all.  But there
+ * already have flattened the UNION ALL via pull_up_simple_union_all.  But there * are a few cases we can support here
butnot in that code path, for example * when the subquery also contains ORDER BY. */void
 
-flatten_simple_union_all(PlannerInfo *root)
+flatten_simple_union(PlannerInfo *root){    Query       *parse = root->parse;    SetOperationStmt *topop;
@@ -1932,10 +1937,18 @@ flatten_simple_union_all(PlannerInfo *root)        return;    /*
-     * Recursively check the tree of set operations.  If not all UNION ALL
+     * If topop is not UNION (not likey), punt. Also for UNION(not ALL)
+     * without sortClause or already having distinctClause.
+     */
+    if (topop->op != SETOP_UNION ||
+        (!topop->all && (!parse->sortClause || parse->distinctClause )))
+        return;
+
+    /*
+     * Recursively check the tree of set operations.  If not all UNION(ALL)     * with identical column types, punt.
 */
 
-    if (!is_simple_union_all_recurse((Node *) topop, parse, topop->colTypes))
+    if (!is_simple_union_recurse(topop, (Node *) topop, parse, topop->colTypes))        return;    /*
@@ -1983,6 +1996,16 @@ flatten_simple_union_all(PlannerInfo *root)    parse->setOperations = NULL;    /*
+     * We can create distinctClause using transformDistinctClause() with
+     * pstate == NULL.
+     */
+    if (!topop->all)
+        parse->distinctClause =
+            transformDistinctClause(NULL,
+                                    &parse->targetList, parse->sortClause,
+                                    false);
+
+    /*     * Build AppendRelInfo information, and apply pull_up_subqueries to the     * leaf queries of the UNION ALL.
(We must do that now because they     * weren't previously referenced by the jointree, and so were missed by
 
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 0934e63..32b3bf4 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -24,7 +24,7 @@extern void pull_up_sublinks(PlannerInfo *root);extern void inline_set_returning_functions(PlannerInfo
*root);externNode *pull_up_subqueries(PlannerInfo *root, Node *jtnode);
 
-extern void flatten_simple_union_all(PlannerInfo *root);
+extern void flatten_simple_union(PlannerInfo *root);extern void reduce_outer_joins(PlannerInfo *root);extern Relids
get_relids_in_jointree(Node*jtnode, bool include_joins);extern Relids get_relids_for_join(PlannerInfo *root, int
joinrelid);

pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Get more from indices.
Next
From: Pavel Stehule
Date:
Subject: Re: new unicode table border styles for psql