Use unique index for longer pathkeys. - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Use unique index for longer pathkeys. |
Date | |
Msg-id | 20140613.164133.160845727.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
Responses |
Re: Use unique index for longer pathkeys.
Re: Use unique index for longer pathkeys. |
List | pgsql-hackers |
Hello, This is the continuation from the last CF. This patch intends to make PG to use index for longer pathkeys than index columns when, - The index is a unique index.- All index columns are NOT NULL.- The index column list is a subset of query_pathkeys. The use cases for this patch are, - (Useless?) DISTINCT on the table with PK constraints. This alone is useless but this situation could take place when processing UNION (without ALL), which silently adds DISTINCT* to element queries. Especially effective with ORDER BY and LIMIT clauses. But this needs another patch forUNION optimization. This is the main motive for this patch. - Unfortunately, other use cases are somewhat boresome, mainly effective for essentially unnecessary ORDER BY or DISTINCT.To streatch a point, naively or mechanically composed queires could be saved by this patch.. ==== Duing last CF, I proposed to match long pathkeys against index columns during creating index paths. This worked fine but also it is difficult to make sure that all side-effects are eliminated. Finally Tom Lane suggested to truncate pathkeys while generation of the pathkeys itself. So this patch comes. This patch consists of mainly three parts. 1. Mark index as 'Uniquely orders the table' if the conditions are satisfied. This is the same from the previous patch. 2. Collect the "common primary pathkeys", which is pathkeys that can perfectly sort all involved RTE's. (collect_common_primary_pathkeys()) 3. Trim the query pathkeys using the common primary pathkeys. (trim_query_pathkeys()) These steps take place between set_base_rel_sizes() and set_base_rel_pathlists() in make_one_rel(). The reason for the position is that the step 2 above needs all inheritence tables to be extended in PlannerInfo and set_base_rel_sizes (currently) does that. Then the trimmed pathkeys are used in set_base_rel_pathlists so trim_query_pathkeys should be before it. (This needs to be written as a comment?) Finally, the new patch has grown up to twice in size.. Although it seems a bit large, I think that side-effects are clearly avoided. This message is attatched by two patches. 1. pathkey_and_uniqueindx_typ2_v1.patch : This new patch. 2. pathkey_and_uniqueindx_v10_20130411.patch : The last version of the previous approach. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index fdaa964..47b8056 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -50,6 +50,7 @@ int geqo_threshold;join_search_hook_type join_search_hook = NULL; +static List *collect_common_primary_pathkeys(PlannerInfo *root);static void set_base_rel_sizes(PlannerInfo *root);staticvoid set_base_rel_pathlists(PlannerInfo *root);static void set_rel_size(PlannerInfo *root, RelOptInfo *rel, @@ -58,6 +59,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry*rte);static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); +static void trim_query_pathkeys(PlannerInfo * root);static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte);static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel, @@ -102,6 +104,214 @@ static void recurse_push_qual(Node *setOp, Query *topquery, RangeTblEntry *rte, Indexrti, Node *qual);static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel); +/* + * collect_common_primary_pathkeys: Collects common unique and non-null index + * pathkeys from all base relations in current root. + */ + +static List * +collect_common_primary_pathkeys(PlannerInfo *root) +{ + List *result = NIL; + Index rti; + int nindex = -1; + List **pathkeys_array0 = NULL; + List **pathkeys_array = NULL; + List **pathkeys_array2 = NULL; + bool multipass = (root->simple_rel_array_size > 2); + + for (rti = 1 ; rti < root->simple_rel_array_size && nindex ; rti++) + { + RelOptInfo *rel = root->simple_rel_array[rti]; + List **pktmp; + + if (rel == NULL) + continue; + + Assert (rel->relid == rti); + + /* + * Scan all base relations except parent relations. + */ + if (root->simple_rte_array[rti]->inh || + (rel->reloptkind != RELOPT_BASEREL && + rel->reloptkind != RELOPT_OTHER_MEMBER_REL)) + continue; + + /* pathkeys_array is NULL for the first iteration */ + if (pathkeys_array == NULL) + { + ListCell *lc; + int i = 0; + + /* Skip rigging for "AND" operation if not needed. */ + if (multipass) + { + nindex = 0; + + foreach (lc, rel->indexlist) + { + IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + + if (index->full_ordered && index->indpred == NIL) + nindex++; + } + + if (nindex == 0) + return NULL; + + pathkeys_array0 = palloc0(sizeof(List*) * nindex * 2); + pathkeys_array = pathkeys_array0; + pathkeys_array2 = pathkeys_array0 + nindex; + } + + foreach (lc, rel->indexlist) + { + IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + + if (index->full_ordered && index->indpred == NIL) + { + IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + List *idx_pathkeys; + + idx_pathkeys = build_index_pathkeys(root, index, + ForwardScanDirection); + if (idx_pathkeys) + { + if (multipass) + pathkeys_array[i++] = idx_pathkeys; + else + result = lappend(result, idx_pathkeys); + } + } + } + nindex = i; + } + else + { + int i; + int nmatch = 0; + ListCell *lc; + + Assert(multipass); + + for (i = 0 ; i < nindex ; i++) pathkeys_array2[i] = NULL; + + foreach (lc, rel->indexlist) + { + IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + + if (index->full_ordered && index->indpred == NIL) + { + List *idx_pathkeys; + + idx_pathkeys = build_index_pathkeys(root, index, + ForwardScanDirection); + if (!idx_pathkeys) + continue; + + /* + * AND operation: drop pathkeys which don't see identical + * one in this relation. + */ + for (i = 0 ; i < nindex ; i++) + { + if (!pathkeys_array2[i] && + compare_pathkeys(pathkeys_array[i], idx_pathkeys) + == PATHKEYS_EQUAL) + { + pathkeys_array2[i] = pathkeys_array[i]; + nmatch++; + + /* Also drop possible duplicate pathkeys. */ + break; + } + } + } + } + if (nmatch == 0) + return NULL; + + /* swap two pathkeys arrays */ + pktmp = pathkeys_array2; + pathkeys_array2 = pathkeys_array; + pathkeys_array = pktmp; + } + } + + if (multipass) + { + int i; + + for (i = 0 ; i < nindex ; i++) + { + if (pathkeys_array[i]) + result = lappend(result, pathkeys_array[i]); + } + pfree(pathkeys_array0); + } + + return result; +} + +/* + * trim_query_pathkeys: trim pathkeys following common primary pathkeys. + * + * Primary pathkeys are pathkeys which perfectly sorts tuples in its owner + * table so the query pathkeys can be replaced with the primary pathkeys when + * the latter prefixes the former. Common primary pathkeys are the primary + * pathkeys shared among all base relations accessed in current root. In + * general cases, all pathkeys other than sort_pathkeys are allowed to flip + * ordering direction in part of its members, but grouping (orderless) + * pathkeys might have the same ordering directions with sort_pathkeys in most + * cases where sort_pathkeys exists. So just disable partial flipping for all + * pathkeys if sort_pathkeys exists. + */ +static void +trim_query_pathkeys(PlannerInfo * root) +{ + ListCell *lc; + + foreach(lc, collect_common_primary_pathkeys(root)) + { + List *pk_guide = (List*) lfirst(lc); + + if (root->sort_pathkeys) + { + root->sort_pathkeys = + shorten_pathkeys_following(root, root->sort_pathkeys, pk_guide, + false); + + /* + * Replace guide pathkeys with new sort_pathkeys if they differs + * from each other only by sorting directions. + */ + if (compare_pathkeys(root->sort_pathkeys, + pk_guide) != PATHKEYS_EQUAL && + compare_pathkeys_ignoring_strategy(root->sort_pathkeys, + pk_guide) == PATHKEYS_EQUAL) + pk_guide = root->sort_pathkeys; + } + + root->group_pathkeys = + shorten_pathkeys_following(root, root->group_pathkeys, pk_guide, + true); + + /* + * Currently window_pathkeys for multiple partitioning columns is + * composed of a pathkey of RowExpr, which cannot be compared (easily) + * with other pathkeys. Skip it. + */ + + root->distinct_pathkeys = + shorten_pathkeys_following(root, root->distinct_pathkeys,pk_guide, + true); + + root->query_pathkeys = + shorten_pathkeys_following(root,root->query_pathkeys,pk_guide, + true); + } +}/* * make_one_rel @@ -139,6 +349,7 @@ make_one_rel(PlannerInfo *root, List *joinlist) * Generate access paths for the base rels. */ set_base_rel_sizes(root); + trim_query_pathkeys(root); set_base_rel_pathlists(root); /* diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index 5d953df..0fdb15a 100644 --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -311,6 +311,53 @@ compare_pathkeys(List *keys1, List *keys2) return PATHKEYS_EQUAL;} +PathKeysComparison +compare_pathkeys_ignoring_strategy(List *keys1, List *keys2) +{ + ListCell *key1, + *key2; + + /* + * Fall out quickly if we are passed two identical lists. This mostly + * catches the case where both are NIL, but that's common enough to + * warrant the test. + */ + if (keys1 == keys2) + return PATHKEYS_EQUAL; + + forboth(key1, keys1, key2, keys2) + { + PathKey *pathkey1 = (PathKey *) lfirst(key1); + PathKey *pathkey2 = (PathKey *) lfirst(key2); + + if (pathkey1 != pathkey2) + { + Assert(pathkey1->pk_strategy == BTGreaterStrategyNumber || + pathkey1->pk_strategy == BTLessStrategyNumber); + Assert(pathkey2->pk_strategy == BTGreaterStrategyNumber || + pathkey2->pk_strategy == BTLessStrategyNumber); + + if (pathkey1->pk_eclass != pathkey2->pk_eclass || + pathkey1->pk_opfamily != pathkey2->pk_opfamily || + !((pathkey1->pk_strategy == pathkey2->pk_strategy && + pathkey1->pk_nulls_first == pathkey2->pk_nulls_first) || + (pathkey1->pk_strategy != pathkey2->pk_strategy && + pathkey1->pk_nulls_first != pathkey2->pk_nulls_first))) + return PATHKEYS_DIFFERENT; /* no need to keep looking */ + } + } + + /* + * If we reached the end of only one list, the other is longer and + * therefore not a subset. + */ + if (key1 != NULL) + return PATHKEYS_BETTER1; /* key1 is longer */ + if (key2 != NULL) + return PATHKEYS_BETTER2; /* key2 is longer */ + return PATHKEYS_EQUAL; +} +/* * pathkeys_contained_in * Common special case of compare_pathkeys: we just want to know @@ -1525,3 +1572,87 @@ has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel) return true; /* might beable to use them for ordering */ return false; /* definitely useless */} + +/* + * shorten_pathkeys_following: returns pk_guide if it prefixes pk_target + * regardless of the ordering directions of the pathkeys. + * allow_partial_flip allows ordering directions to be partially different + * between pk_target and pk_guide. + */ +List * +shorten_pathkeys_following(PlannerInfo *root, List *pk_target, List *pk_guide, + bool allow_partial_flip) +{ + ListCell *tlc, *glc; + int same = 0, reverse = 0; + List *result; + + if (pk_target == NIL) + return NIL; + + if (list_length(pk_target) < list_length(pk_guide)) + return pk_target; + + forboth(tlc, pk_target, glc, pk_guide) + { + PathKey *tpk = (PathKey *) lfirst(tlc); + PathKey *gpk = (PathKey *) lfirst(glc); + + if (tpk == gpk) + { + same++; + continue; + } + + Assert(tpk->pk_strategy == BTGreaterStrategyNumber || + tpk->pk_strategy == BTLessStrategyNumber); + Assert(gpk->pk_strategy == BTGreaterStrategyNumber || + gpk->pk_strategy == BTLessStrategyNumber); + + if (tpk->pk_eclass == gpk->pk_eclass && + tpk->pk_opfamily == gpk->pk_opfamily && + tpk->pk_strategy != gpk->pk_strategy && + tpk->pk_nulls_first != gpk->pk_nulls_first) + { + reverse++; + continue; + } + + return pk_target; + } + + /* trailing members in pk_target don't matter */ + + /* reverse == 0 means that pk_guide exactly prefixes pk_target */ + if (allow_partial_flip || reverse == 0) + return pk_guide; + + /* + * Don't replace pathkeys if partial flip is not allowed but partially + * flipped. + */ + Assert(reverse > 0); + if (same > 0) + return pk_target; + + /* make then reutrn the reverse pathkeys of pk_guide */ + result = NIL; + foreach (glc, pk_guide) + { + PathKey *gpk = (PathKey *) lfirst(glc); + PathKey *pk; + EquivalenceClass *ec = gpk->pk_eclass; + int revstr; + + if (gpk->pk_strategy == BTGreaterStrategyNumber) + revstr = BTLessStrategyNumber; + else + revstr = BTGreaterStrategyNumber; + + pk = make_canonical_pathkey(root, ec, gpk->pk_opfamily, revstr, + !gpk->pk_nulls_first); + + result = lappend(result, pk); + } + return result; +} diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index b2becfa..5520e76 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -231,6 +231,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, */ if (info->relam == BTREE_AM_OID) { + bool has_nullable_keycol = false; + /* * If it's a btree index, we can use its opfamily OIDs * directly as thesort ordering opfamily OIDs. @@ -244,10 +246,26 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, for (i =0; i < ncolumns; i++) { int16 opt = indexRelation->rd_indoption[i]; + int16 keyattno = index->indkey.values[i]; info->reverse_sort[i] = (opt & INDOPTION_DESC)!= 0; info->nulls_first[i] = (opt & INDOPTION_NULLS_FIRST) != 0; + + if (keyattno > 0) + { + has_nullable_keycol |= + !relation->rd_att->attrs[keyattno - 1]->attnotnull; + } + /* OID should not be NULL if exists */ + else if (keyattno != ObjectIdAttributeNumber) + has_nullable_keycol = true; } + + /* Check to see if it is a full-ordered index */ + if (IndexIsValid(index) && + index->indisunique && index->indimmediate && + !has_nullable_keycol) + info->full_ordered = true; } else if (indexRelation->rd_am->amcanorder) { diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 300136e..a6b371e 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -525,6 +525,7 @@ typedef struct IndexOptInfo bool predOK; /* true if predicate matches query */ bool unique; /* true if a unique index */ bool immediate; /* is uniqueness enforcedimmediately? */ + bool full_ordered; /* don't key columns duplicate? */ bool hypothetical; /* true if index doesn'treally exist */ bool canreturn; /* can index return IndexTuples? */ bool amcanorderbyop;/* does AM support order by operator result? */ diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 9b22fda..506673f 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -147,6 +147,8 @@ typedef enum} PathKeysComparison;extern PathKeysComparison compare_pathkeys(List *keys1, List *keys2); +extern PathKeysComparison compare_pathkeys_ignoring_strategy(List *keys1, + List *keys2);extern bool pathkeys_contained_in(List *keys1,List *keys2);extern Path *get_cheapest_path_for_pathkeys(List *paths, List *pathkeys, Relids required_outer, @@ -187,5 +189,8 @@ extern List *truncate_useless_pathkeys(PlannerInfo *root, RelOptInfo *rel, List *pathkeys);extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel); +extern List *shorten_pathkeys_following(PlannerInfo *root, + List *pk_target, List *pk_guide, + bool allow_partial_flip);#endif /* PATHS_H */ diff --git a/src/test/isolation/expected/drop-index-concurrently-1.out b/src/test/isolation/expected/drop-index-concurrently-1.out index 75dff56..ab96fa0 100644 --- a/src/test/isolation/expected/drop-index-concurrently-1.out +++ b/src/test/isolation/expected/drop-index-concurrently-1.out @@ -19,10 +19,8 @@ Sortstep explains: EXPLAIN (COSTS OFF) EXECUTE getrow_seq;QUERY PLAN -Sort - Sort Key: id, data - -> Seq Scan on test_dc - Filter: ((data)::text = '34'::text) +Index Scan using test_dc_pkey on test_dc + Filter: ((data)::text = '34'::text)step select2: SELECT * FROM test_dc WHERE data=34 ORDER BY id,data;id data diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index bfb4b9f..ff5c88c 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1790,6 +1790,7 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node) WRITE_BOOL_FIELD(predOK); WRITE_BOOL_FIELD(unique); WRITE_BOOL_FIELD(immediate); + WRITE_BOOL_FIELD(allnotnull); WRITE_BOOL_FIELD(hypothetical); /* we don't bother with fields copied from the pg_amentry */} diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index a912174..4376e95 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -951,8 +951,11 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, { index_pathkeys = build_index_pathkeys(root,index, ForwardScanDirection); - useful_pathkeys = truncate_useless_pathkeys(root, rel, - index_pathkeys); + if (index_pathkeys_are_extensible(root, index, index_pathkeys)) + useful_pathkeys = root->query_pathkeys; + else + useful_pathkeys = truncate_useless_pathkeys(root, rel, + index_pathkeys); orderbyclauses = NIL; orderbyclausecols= NIL; } diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index 9179c61..5edca4f 100644 --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -502,6 +502,76 @@ build_index_pathkeys(PlannerInfo *root,}/* + * index_pathkeys_are_extensible + * Check whether the pathkeys are extensible to query_pathkeys. + */ +bool +index_pathkeys_are_extensible(PlannerInfo *root, + IndexOptInfo *index, + List *pathkeys) +{ + bool result; + ListCell *lc1, *remain; + + if (root->query_pathkeys == NIL || pathkeys == NIL) + return false; + + /* This index is not suitable for pathkey extension */ + if (!index->unique || !index->immediate || !index->allnotnull) + return false; + + /* pathkeys is a prefixing proper subset of index tlist */ + if (list_length(pathkeys) < list_length(index->indextlist)) + return false; + + if (!pathkeys_contained_in(pathkeys, root->query_pathkeys)) + return false; + + if (list_length(pathkeys) == list_length(root->query_pathkeys)) + return true; + + Assert(list_length(root->query_pathkeys) > list_length(pathkeys)); + + /* + * Check if all unchecked elements of query_patheys are simple Vars for + * the same relation. + */ + + /* list_nth_cell is not exposed publicly.. */ + if (list_length(pathkeys) == list_length(root->query_pathkeys) - 1) + remain = list_tail(root->query_pathkeys); + else + { + int n = list_length(pathkeys); + + remain = root->query_pathkeys->head; + while(n-- > 0) remain = remain->next; + } + + result = true; + for_each_cell(lc1, remain) + { + PathKey *pathkey = (PathKey *) lfirst(lc1); + ListCell *lc2; + + foreach(lc2, pathkey->pk_eclass->ec_members) + { + EquivalenceMember *member = (EquivalenceMember *) lfirst(lc2); + + if (!bms_equal(member->em_relids, index->rel->relids) || + !IsA(member->em_expr, Var)) + { + result = false; + break; + } + } + + if (!result) break; + } + return result; +} + +/* * build_expression_pathkey * Build a pathkeys list that describes an ordering by a single expression * usingthe given sort operator. diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 73ba2f6..c61cddb 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -333,6 +333,26 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, info->immediate= index->indimmediate; info->hypothetical = false; + info->allnotnull = true; + for (i = 0; i < ncolumns; i++) + { + int attrno = info->indexkeys[i]; + + if (attrno == 0) + { + info->allnotnull = false; + break; + } + else if (attrno > 0) + { + if (!relation->rd_att->attrs[attrno - 1]->attnotnull) + { + info->allnotnull = false; + break; + } + } + } + /* * Estimate the index size. If it's not a partial index, we lock * the number-of-tuplesestimate to equal the parent table; if it diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index c607b36..119bb31 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -525,6 +525,7 @@ typedef struct IndexOptInfo bool predOK; /* true if predicate matches query */ bool unique; /* true if a unique index */ bool immediate; /* is uniqueness enforcedimmediately? */ + bool allnotnull; /* true if index's keys are all not null */ bool hypothetical; /* true ifindex doesn't really exist */ bool canreturn; /* can index return IndexTuples? */ bool amcanorderbyop;/* does AM support order by operator result? */ diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 9b22fda..8abdb99 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -187,5 +187,8 @@ extern List *truncate_useless_pathkeys(PlannerInfo *root, RelOptInfo *rel, List *pathkeys);extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel); +extern bool index_pathkeys_are_extensible(PlannerInfo *root, + IndexOptInfo *index, + List *pathkeys);#endif /* PATHS_H */ diff --git a/src/test/isolation/expected/drop-index-concurrently-1.out b/src/test/isolation/expected/drop-index-concurrently-1.out index 75dff56..ab96fa0 100644 --- a/src/test/isolation/expected/drop-index-concurrently-1.out +++ b/src/test/isolation/expected/drop-index-concurrently-1.out @@ -19,10 +19,8 @@ Sortstep explains: EXPLAIN (COSTS OFF) EXECUTE getrow_seq;QUERY PLAN -Sort - Sort Key: id, data - -> Seq Scan on test_dc - Filter: ((data)::text = '34'::text) +Index Scan using test_dc_pkey on test_dc + Filter: ((data)::text = '34'::text)step select2: SELECT * FROM test_dc WHERE data=34 ORDER BY id,data;id data
pgsql-hackers by date: