Re: Using indices for UNION. - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Using indices for UNION. |
Date | |
Msg-id | 20131122.165927.27412386.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Using indices for UNION. (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
List | pgsql-hackers |
Hello, As I was reconsidering after your advise, I came up with an idea of hacking on query tree tranaformation phase in subquery_planner, not on that of plan generation phase as before. (And the older patch is totally scrapped:-) Currently flatten_simple_union_all() flattens 'simple' UNION 'ALL' at the top level of 'current' query. I noticed that a little modification there also could flatten simple UNION (w/o ALL), and it has finally almost same effect regarding more usage of indexes for UNION. Additionally, applying still more the 'UNION ALL and inheritance table' patch, even coveres UNION's on inheritance tables. This patch is far small and neat (though I say so myself :-) than the previous ones. The following plans we got from unpatched PostgreSQL. original=# explain (analyze on, costs off) | select a, b from cu11 union select a, b from cu12 order by a, b limit 10; | QUERY PLAN | ----------------------------------------------------------------------------- | Limit (actual time=272.252..272.259 rows=10 loops=1) | -> Unique (actual time=272.251..272.258 rows=10 loops=1) | -> Sort (actual time=272.249..272.252 rows=10 loops=1) | Sort Key: cu11.a, cu11.b | Sort Method: external sort Disk: 3520kB | -> Append (actual time=0.020..70.935 rows=200000 loops=1) | -> Seq Scan on cu11 (actual time=0.019..26.741 rows=100000 loops=1) | -> Seq Scan on cu12 (actual time=0.006..25.183 rows=100000 loops=1) | Total runtime: 273.162 ms And of course for * (= a, b, c, d) this, original=# explain (analyze on, costs off) | select * from cu11 union select * from cu12 order by a, b limit 10; | QUERY PLAN | ---------------------------------------------------------------------------- | Limit (actual time=234.880..234.891 rows=10 loops=1) | -> Unique (actual time=234.879..234.889 rows=10 loops=1) | -> Sort (actual time=234.878..234.881 rows=10 loops=1) | Sort Key: cu11.a, cu11.b, cu11.c, cu11.d | Sort Method: external sort Disk: 5280kB | -> Append (actual time=0.017..49.502 rows=200000 loops=1) | -> Seq Scan on cu11 (actual time=0.017..15.649 rows=100000 loops=1) | -> Seq Scan on cu12 (actual time=0.007..14.939 rows=100000 loops=1) | Total runtime: 236.029 ms We'll get the following plan changed with this patch plus the latest "pathkey_and_uniqueindx_v4_20131122.patch" patcch. (It got a small fix on pathkeys fixation for index paths) https://commitfest.postgresql.org/action/patch_view?id=1280 patched=# explain (analyze on, costs off) | select * from cu11 union select * from cu12 order by a, b limit 10; | QUERY PLAN | ------------------------------------------------------------------------------ | Limit (actual time=0.059..0.081 rows=10 loops=1) | -> Unique (actual time=0.058..0.079 rows=10 loops=1) | -> Merge Append (actual time=0.056..0.065 rows=10 loops=1) | Sort Key: cu11.a, cu11.b, cu11.c, cu11.d | -> Index Scan ... on cu11 (actual time=0.032..0.037 rows=10 loops=1) | -> Index Scan ... on cu12 (actual time=0.021..0.021 rows=1 loops=1) | Total runtime: 0.162 ms Moreover, with the 'UNION ALL' patches , say, union_inh_idx_typ1_v1_20131024.patch and union_inh_idx_typ1_add_v1_20131024.patch, we'll get the following plan for UNION on inhertance tables, https://commitfest.postgresql.org/action/patch_view?id=1270 patched2=# explain (analyze on, costs off) | select * from pu1 union select * from pu2 order by a, b limit 10; | QUERY PLAN | --------------------------------------------------------------------------- | Limit (actual time=0.209..0.234 rows=10 loops=1) | -> Unique (actual time=0.206..0.230 rows=10 loops=1) | -> Merge Append (actual time=0.204..0.216 rows=10 loops=1) | Sort Key: pu1.a, pu1.b, pu1.c, pu1.d | -> Index Scan..on pu1 (actual time=0.004..0.004 rows=0 loops=1) | -> Index Scan..on cu11 (actual time=0.050..0.058 rows=10 loops=1) | -> Index Scan..on cu12 (actual time=0.052..0.052 rows=1 loops=1) | -> Index Scan..on pu2 (actual time=0.002..0.002 rows=0 loops=1) | -> Index Scan..on cu21 (actual time=0.046..0.046 rows=1 loops=1) | -> Index Scan..on cu22 (actual time=0.046..0.046 rows=1 loops=1) | Total runtime: 0.627 ms The attached union_uses_idx_v4_20131122.patch is changed as follows. - Rebased to current master. - Scrapped the whold old stuff. - flatten_simple_union_all() is renamed to flatten_simple_union() and modified to do so. UNION is flattend only if sortClauseexists and distinctClause is NIL. ====== Test tables can be created following the command below, | create table pu1 (a int not null, b int not null, c int, d text); | create unique index i_pu1_ab on pu1 (a, b); | create table cu11 (like pu1 including all) inherits (pu1); | create table cu12 (like pu1 including all) inherits (pu1); | create table pu2 (like pu1 including all); | create table cu21 (like pu2 including all) inherits (pu2); | create table cu22 (like pu2 including all) inherits (pu2); | insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(000000, 099999) a); | insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(100000, 199999) a); | insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(200000, 299999) a); | insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(300000, 399999) a); ===== regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 6670794..7262c1b 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -353,13 +353,14 @@ subquery_planner(PlannerGlobal *glob, Query *parse, pull_up_subqueries(root, (Node *) parse->jointree); /* - * If this is a simple UNION ALL query, flatten it into an appendrel. We - * do this now because it requires applying pull_up_subqueries to the leaf - * queries of the UNION ALL, which weren't touched above because they - * weren't referenced by the jointree (they will be after we do this). + * If this is a simple UNION/UNION ALL query, flatten it into an + * appendrel. We do this now because it requires applying + * pull_up_subqueries to the leaf queries of the UNION/UNION ALL, which + * weren't touched above because they weren't referenced by the jointree + * (they will be after we do this). */ if (parse->setOperations) - flatten_simple_union_all(root); + flatten_simple_union(root); /* * Detect whether any rangetable entries are RTE_JOIN kind; if not, we can diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 485ac31..adf341b 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -32,6 +32,7 @@#include "optimizer/subselect.h"#include "optimizer/tlist.h"#include "parser/parse_relation.h" +#include "parser/parse_clause.h"#include "parser/parsetree.h"#include "rewrite/rewriteManip.h" @@ -81,8 +82,8 @@ static void make_setop_translation_list(Query *query, Index newvarno,static bool is_simple_subquery(Query*subquery, RangeTblEntry *rte, JoinExpr *lowest_outer_join);static bool is_simple_union_all(Query*subquery); -static bool is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, - List *colTypes); +static bool is_simple_union_recurse(SetOperationStmt *topop, Node *setOp, + Query *setOpQuery, List *colTypes);static bool is_safe_append_member(Query *subquery);staticbool jointree_contains_lateral_outer_refs(Node *jtnode, bool restricted, Relids safe_upper_varnos); @@ -1428,12 +1429,14 @@ is_simple_union_all(Query *subquery) return false; /* Recursively check the tree of setoperations */ - return is_simple_union_all_recurse((Node *) topop, subquery, - topop->colTypes); + if (topop->op != SETOP_UNION || !topop->all) return false; + return is_simple_union_recurse(topop, (Node *) topop, subquery, + topop->colTypes);}static bool -is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes) +is_simple_union_recurse(SetOperationStmt *topop, Node *setOp, + Query *setOpQuery, List *colTypes){ if (IsA(setOp, RangeTblRef)) { @@ -1451,13 +1454,16 @@ is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes) { SetOperationStmt*op = (SetOperationStmt *) setOp; - /* Must be UNION ALL */ - if (op->op != SETOP_UNION || !op->all) + /* All SetOps under topop are UNION and ->all is same to topop */ + if (op->op != SETOP_UNION || op->all != topop->all) return false; /* Recurse to check inputs */ - return is_simple_union_all_recurse(op->larg, setOpQuery, colTypes) && - is_simple_union_all_recurse(op->rarg, setOpQuery, colTypes); + return + is_simple_union_recurse(topop, op->larg, + setOpQuery, colTypes) && + is_simple_union_recurse(topop, op->rarg, + setOpQuery, colTypes); } else { @@ -1896,23 +1902,22 @@ pullup_replace_vars_subquery(Query *query, NULL);} -/* * flatten_simple_union_all - * Try to optimize top-level UNION ALL structure into an appendrel + * Try to optimize top-level UNION/UNION ALL structure into an appendrel * - * If a query's setOperations tree consists entirely of simple UNION ALL - * operations, flatten it into an append relation, which we can process more - * intelligently than the general setops case. Otherwise, do nothing. + * If a query's setOperations tree consists entirely of simple UNION and UNION + * ALL operations, flatten it into an append relation, which we can process + * more intelligently than the general setops case. Otherwise, do nothing. * * In most cases, this can succeed only fora top-level query, because for a * subquery in FROM, the parent query's invocation of pull_up_subqueries would - * already have flattened the UNION via pull_up_simple_union_all. But there + * already have flattened the UNION ALL via pull_up_simple_union_all. But there * are a few cases we can support here butnot in that code path, for example * when the subquery also contains ORDER BY. */void -flatten_simple_union_all(PlannerInfo *root) +flatten_simple_union(PlannerInfo *root){ Query *parse = root->parse; SetOperationStmt *topop; @@ -1932,10 +1937,18 @@ flatten_simple_union_all(PlannerInfo *root) return; /* - * Recursively check the tree of set operations. If not all UNION ALL + * If topop is not UNION (not likey), punt. Also for UNION(not ALL) + * without sortClause or already having distinctClause. + */ + if (topop->op != SETOP_UNION || + (!topop->all && (!parse->sortClause || parse->distinctClause ))) + return; + + /* + * Recursively check the tree of set operations. If not all UNION(ALL) * with identical column types, punt. */ - if (!is_simple_union_all_recurse((Node *) topop, parse, topop->colTypes)) + if (!is_simple_union_recurse(topop, (Node *) topop, parse, topop->colTypes)) return; /* @@ -1983,6 +1996,16 @@ flatten_simple_union_all(PlannerInfo *root) parse->setOperations = NULL; /* + * We can create distinctClause using transformDistinctClause() with + * pstate == NULL. + */ + if (!topop->all) + parse->distinctClause = + transformDistinctClause(NULL, + &parse->targetList, parse->sortClause, + false); + + /* * Build AppendRelInfo information, and apply pull_up_subqueries to the * leaf queries of the UNION ALL. (We must do that now because they * weren't previously referenced by the jointree, and so were missed by diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h index 0934e63..32b3bf4 100644 --- a/src/include/optimizer/prep.h +++ b/src/include/optimizer/prep.h @@ -24,7 +24,7 @@extern void pull_up_sublinks(PlannerInfo *root);extern void inline_set_returning_functions(PlannerInfo *root);externNode *pull_up_subqueries(PlannerInfo *root, Node *jtnode); -extern void flatten_simple_union_all(PlannerInfo *root); +extern void flatten_simple_union(PlannerInfo *root);extern void reduce_outer_joins(PlannerInfo *root);extern Relids get_relids_in_jointree(Node*jtnode, bool include_joins);extern Relids get_relids_for_join(PlannerInfo *root, int joinrelid);
pgsql-hackers by date: