Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: [HACKERS] path toward faster partition pruning |
Date | |
Msg-id | CAKJS1f-4a7+BMGrmv+jzNVFOZ2HhOJmSNx-jOeKX-_bK82hi9g@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] path toward faster partition pruning (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] path toward faster partition pruning
|
List | pgsql-hackers |
On 19 February 2018 at 22:19, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Attached updated patches. Thanks again! Thanks for making those changes. I've made another pass of v28 and have a few more comments. The patch is starting to look good, but there are some new changes in recent versions which still don't look quite right. 1. This does not fully make sense: /* * Remove the indexes of partitions excluded due to each of * those partitions' *all* of allowed datums appearing in * keys->ne_datums, that is compared to the partition key * using <> operator. */ "each of those partitions' *all* of allowed" is not correct. Maybe better to write: /* * Remove the indexes of any partitions which cannot possibly * contain rows matching the clauses due to key->ne_datums containing * all datum values which are allowed in the given partition. This * is only possible to do in LIST partitioning as it's the only * partitioning strategy which allows the specification of exact values. */ 2. Mine does not, but some compilers may complain about get_partitions_for_keys() result variable being uninitalised in get_partitions_for_keys. Probably the easiest fix would be to just assign to NULL in the default case. 3. Did you mean to put this Assert() inside the loop? memset(keyisnull, false, sizeof(keyisnull)); i = -1; while ((i = bms_next_member(keys->keyisnull, i)) >= 0) { keys->n_eqkeys++; keyisnull[i] = true; } Assert(i < partnatts); i will always be -2 at the end of the loop. Seems like a useless Assert in its current location. 4. Can you add a comment here to say: "Note: LIST partitioning only supports a single partition key, therefore this function requires no looping over the partition keys" /* * get_partitions_for_keys_list * Return partitions of a list partitioned table for requested keys * * This interprets the keys and looks up partitions in the partition bound * descriptor using the list partitioning semantics. */ 5. The following comment contains a bit of duplication to the comment which comes after it. Maybe the following: /* * If the query is looking for null keys, there can only be one such * partition. Return the same if one exists. */ can be changed to: /* Handle clauses requesting a NULL valued partition key */ 6. The following comment does not quite make sense: /* Exactly matching datum exists. */ Probably better to write: /* An exact matching datum exists. */ 7. "If it's not equal (<)" I think you mean (>), not (<), in: * The bound at minoff is <= minkeys, given the way * partition_list_bsearch() works. If it's not equal (<), then * increment minoff to make it point to the datum on the right * that necessarily satisfies minkeys. Also do the same if it is * equal but minkeys is exclusive. However, the comment is a bit clumsy. Maybe the following is better? /* * partition_list_bsearch returning a positive number means that * minkeys[0] must be greater than or equal to the smallest datum. * If we didn't find an exact matching datum (!is_equal) or if the * operator used was non-inclusive (>), then in both of these * cases we're not interested in the datum pointed to by minoff, * but we may start getting matches in the partition which the * next datum belongs to, so point to that one instead. (This may * be beyond the last datum in the array, but we'll detect that * later.) */ 8. The following comment could be improved: * minkeys is greater than the datums of all non-default partitions, * meaning there isn't one to return. Return the default partition if * one exists. how about: * The value of minkeys[0] is greater than all of the datums we have * partitions for. The only possible partition that could contain a * match is the default partition. Return that, if it exists. 9. The following could also be improved: * The bound at maxoff is <= maxkeys, given the way * partition_list_bsearch works. If the bound at maxoff exactly * matches maxkey (is_equal), but the maxkey is exclusive, then * decrement maxoff to point to the bound on the left. how about: * partition_list_bsearch returning a positive number means that * maxkeys[0] must be greater than or equal to the smallest datum. * If the match found is an equal match, but the operator used is * non-inclusive of that value (<), then the partition belonging * to maxoff cannot match, so we'll decrement maxoff to point to * the partition belonging to the previous datum. We might end up * decrementing maxoff down to -1, but we'll handle that later. 10. Can you append " This may not technically be true for some data types (e.g. integer types), however, we currently lack any sort of infrastructure to provide us with proofs that would allow us to do anything smarter here." to: * For range queries, always include the default list partition, * because list partitions divide the key space in a discontinuous * manner, not all values in the given range will have a partition * assigned. 11. get_partitions_for_keys_range seems to prefer to do "minoff -= 1", but get_partitions_for_keys_list likes to "minoff--", can this be made the same? Personally, I like -- over -= 1 as it's shorter. Although I do remember having an argument with my university professor about this. He claimed -= 1 was clearer... I'm still unsure what he found so confusing about -- ... 12. The following code could be optimised a little for the case when there's no default: /* * There may exist a range of values unassigned to any non-default * partition between the datums at minoff and maxoff. */ for (i = minoff; i <= maxoff; i++) { if (boundinfo->indexes[i] < 0) { include_def = true; break; } } /* * Since partition keys with nulls are mapped to the default range * partition, we must include the default partition if some keys * *could* be null. */ if (bms_num_members(keys->keyisnotnull) < partnatts) include_def = true; if (include_def && partition_bound_has_default(boundinfo)) result = bms_add_member(result, boundinfo->default_index); return result; Maybe something more like: if (!partition_bound_has_default(boundinfo)) return result; /* * There may exist a range of values unassigned to any non-default * partition between the datums at minoff and maxoff. */ for (i = minoff; i <= maxoff; i++) { if (boundinfo->indexes[i] < 0) return bms_add_member(result, boundinfo->default_index); } /* * Since partition keys with nulls are mapped to the default range * partition, we must include the default partition if some keys * *could* be null. */ if (bms_num_members(keys->keyisnotnull) < partnatts) return bms_add_member(result, boundinfo->default_index); Which is saves a bit of needless work when there's no default to add, and also saves a few lines, including the line where you declare the include_def variable. 13. Variable name: Bitmapset *partitioned_rels_bms = NULL; This should likely be called partitioned_relids, and be of type Relids instead of Bitmapset. 14. This line removal seems surplus. It should probably be fixed independently of this patch. parent_roots[appinfo->child_relid] = subroot; - continue; 15. I'm unsure how safe the following code is: while ((parent_rti = bms_first_member(partitioned_rels_bms)) >= 0) partitioned_rels = lappend_int(partitioned_rels, parent_rti); You're now putting this list in ascending order of relid, but some code in set_plan_refs assumes the root partition is the first element: root->glob->rootResultRelations = lappend_int(root->glob->rootResultRelations, linitial_int(splan->partitioned_rels)); By luck, the first element might today be the root due to the way we expand the inheritance hierarchy, but I think this code is wrong to rely on that. I'm not really a fan of having the root partition be the first element in the List. I would much rather see a Relids type and a special Index field for the root, but that might be more changes that you'd like to make here. I just don't think what you have now is correct. 16. This should probably return Relids rather than Bitmapset *. Bitmapset * prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel) Please update the mention of Bitmapset in the comment at the top of the function too. 17. hmm, my patch did palloc(), not palloc0(). My original patch was broken and missed this, but v2 got it. context->clauseinfo = partclauseinfo = palloc0(sizeof(PartitionClauseInfo)); There's no need to palloc0() here. You're setting all the fields to zero just below. As far as I understand it, it's only Node types that we have to go through the rigmarole of doing both. 18. I tentatively agree with you having changed the continue to break in the following: /* We can't use any volatile value to prune partitions. */ if (contain_volatile_functions((Node *) valueexpr)) break; I believe it's not wrong to break here, but keep in mind you're testing valueexpr rather than something with the OpExpr itself. The reason I'm not saying this is wrong is that if the valueexpr is volatile then it cannot possibly match another partition key anyway, so there's likely no point in continuing to look for another match... You should likely write a comment to explain this a bit. I think all of the other places you've changed to break look fine. The ScalarArrayOpExpr volatile function test is fine to break from since the operands cannot be reversed in that case, so rightop certainly can't match a partition key. 19. I might have caused this, but there's no such variable as 'cur' /* cur is more restrictive, so replace the existing. */ 20. Is there a difference between partition_bound_has_default(context->boundinfo) and context->has_default_part? Any reason for both? Your code uses both. I don't yet understand why has_default_part was added. That's all I have for now. It's getting close. Good work! I keep having to turn up my strictness level each time I review. I'd like to be able to turn it up all the way earlier, but I fear I may still be reviewing v1 if I'd done that :-) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: