Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS]path toward faster partition pruning - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS]path toward faster partition pruning |
Date | |
Msg-id | 294e8400-6c40-87d5-078f-8713142032a0@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS]path toward faster partition pruning (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS]path toward faster partition pruning
Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS]path toward faster partition pruning |
List | pgsql-hackers |
Hi David. Thanks for the review. On 2018/01/12 12:30, David Rowley wrote: > I've got a few more things for you. I'm only partway through another > pass, but it makes sense to post what I have now if you're working on > a new version. > > 1. partitioing -> partitioning > > * Strategy of a partition clause operator per the partitioing operator class Fixed. > 2. get_partitions_from_clauses() modifies partclauses without > mentioning it in the header. I think you need to either: > > a) warn about this in the header comment; or > b) do a list_copy() before list_concat() > c) do list_truncate back to the original length after you're done with the list. Went with (b). > 3. get_partitions_from_clauses_recurse(), with: > > result = bms_add_range(result, 0, partdesc->nparts - 1); > > You could change that to bms_add_range(NULL, ...) and ditch the > assignment of result to NULL at the start of the function. Done. > 4. classify_partition_bounding_keys() now returns bool, but the return > statement is still: > > return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness; > > my compiler didn't warn about that, but I'd imagine some might. Oops, my bad. > > Instead, can you make it: > > if (keys->n_eqkeys > 0 || keys->n_minkeys > 0 || > keys->n_maxkeys > 0 || n_keynullness > 0) > return true; > > return false; > > probably equal keys are the most likely case, so it'll be good to > short circuit instead of performing addition on a bunch of stuff we > don't care about anymore. Changed it to what Robert suggested downthread. > 5. In classify_partition_bounding_keys, why do we "continue" here? > > clause = rinfo->clause; > if (rinfo->pseudoconstant && > !DatumGetBool(((Const *) clause)->constvalue)) > { > *constfalse = true; > continue; > } > > Is there any point in searching further? > > Also, if you were consistent with the return value for > classify_partition_bounding_keys when you've set *constfalse = true; > you wouldn't need to handle the case twice like you are in > get_partitions_from_clauses_recurse(). OK, I made classify_partition_bounding_keys() return true whenever set *constfalse to true. > 6. I think it would be nicer if get_partitions_from_ne_clauses returns > a set of partitions that could be excluded. > > So instead of: > > * get_partitions_from_ne_clauses > * > * Return partitions of relation that satisfy all <> operator clauses in > * ne_clauses. Only ever called if relation is a list partitioned table. > > Have: > > * get_partitions_from_ne_clauses > * > * Returns a Bitmapset of partitions that can be safely excluded due to > * not-equal clauses existing for all possible partition values. It is only > * valid to call this for LIST partitioned tables. > > and instead of: > > result = bms_add_range(NULL, 0, partdesc->nparts - 1); > result = bms_del_members(result, excluded_parts); > bms_free(excluded_parts); > > return result; > > Just do: > > return excluded_parts; > > and in get_partitions_from_clauses_recurse(), do bms_del_members > instead of bms_int_members. > > there's less bit shuffling and it seems cleaner. Perhaps the function > name would need to be changed if we're inverting the meaning too. > > (I've attached a patch which makes this change along with an idea in #8 below) Thanks for the suggestions... (comment continues below) > 7. The following comment claims the function sets *datum, but there's > no param by that name: > > /* > * partkey_datum_from_expr > * Extract constant value from expr and set *datum to that value > */ > static bool > partkey_datum_from_expr(PartitionKey key, int partkeyidx, > Expr *expr, Datum *value) Fixed. > 8. The code in get_partitions_from_ne_clauses() does perform quite a > few nested loops. I think a more simple way to would be to track the > offsets you've seen in a Bitmapset. This would save you having to > check for duplicates, as an offset can only contain a single datum. > You'd just need to build a couple of arrays after that, one to sum up > the offsets found per partition, and one for the total datums allowed > in the partition. If the numbers match then you can remove the > partition. > > I've written this and attached it to this email. It saves about 50 > lines of code and should perform much better for complex cases, for > example, a large NOT IN list. This also implements #6. I liked your patch, so incorporated it, except, I feel slightly uncomfortable about the new name that you chose for the function because it sounds a bit generic. I mean the function solves a very specific problem and have very strict requirements for calling it. It's not like we could pass it just any partitioned relation and/or just any set of clauses. It has to be a list-partitioned table and the list of clauses must contain only the clauses containing compatible <> operators. Checks for those requirements are carried out in yet another place, that is, classify_partition_bounding_keys(). Perhaps we can live with that though, because it's not a publicly available function, but someone might get confused in the future. > 9. "the same" -> "it" > > /* > * In case of NOT IN (..), we get a '<>', which while not > * listed as part of any operator family, we are able to > * handle the same if its negator is indeed a part of the > * partitioning operator family. > */ Done. > 10. in classify_partition_bounding_keys: "0" -> "false" > > /* Return if no work to do below. */ > if (!will_compute_keys || *constfalse) > return 0; > > Likewise for: > > if (*constfalse) > return 0; Have fixed these per an earlier comment in this email. > 11. I don't see partition_bound_bsearch used anywhere below the > following comment: > > * Generate bounding tuple(s). > * > * We look up partitions in the partition bound descriptor using, say, > * partition_bound_bsearch(), which expects a Datum (or Datums if multi- > * column key). So, extract the same out of the constant argument of > * each clause. > > I also don't know what the comment is trying to say. The comment is no longer very intelligible to me too. I just wanted to say here that, *elsewhere*, we will use a function like partition_bound_bsearch() to look up partitions from the clauses we matched against individual partition key columns. That function expects the lookup key to be in a Datum array form, not a list-of-clauses form. So, we must construct the lookup key(s) by extracting constant values out the clauses. I tried to rewrite it that way. Hope that's a bit clearer. > 12. > > * operator and sets *incl if equality is implied > > should be: > > * operator and set *incl to true if the operator's strategy is inclusive. Done that way. > 13. What does "the same" mean in: > > * and add this one directly to the result. Caller would > * arbitrarily choose one of the many and perform > * partition-pruning with the same. It's possible that mutual It means "the one that caller would arbitrarily choose of the many that this function will return to it". Anyway, I changed "the same" to "it". > I think you quite often use "the same" to mean "it". Can you change that? I guess that's just one of my many odd habits when writing English, all of which I'm trying to get rid of, but apparently with limited success. Must try harder. :) > 14. Not sure what parameter you're talking about here. > > * Evaluate 'leftarg op rightarg' and set *result to its value. > * > * leftarg and rightarg referred to above actually refer to the constant > * operand (Datum) of the clause contained in the parameters leftarg and > * rightarg below, respectively. And op refers to the operator of the > * clause contained in the parameter op below. I rewrote the above comment block as: * Try to compare the constant arguments of 'leftarg' and 'rightarg', in that * order, using the operator of 'op' and set *result to the result of this * comparison. Is that any better? > 15. "the latter" is normally used when you're referring to the last > thing in a list which was just mentioned. In this case, leftarg_const > and rightarg_const is the list, so "the latter" should mean > rightarg_const, but I think you mean to compare them using the > operator. > > * If the leftarg_const and rightarg_const are both of the type expected > * by op's operator, then compare them using the latter. Rewrote it as: * We can compare leftarg_const and rightarg_const using op's operator * only if both are of the type expected by it. > 16. There are a few things to improve with the following comment: > > /* > * Hash partitioning stores partition keys containing nulls in regular > * partitions. That is, the code that determines the hash partition for > * a given row admits nulls in the partition key when computing the key's > * hash. So, here we treat any IS NULL clauses on partition key columns as > * equality keys, along with any other non-null values coming from equality > * operator clauses. > */ > > "admits" is not the correct word here, and "hash" should be "correct", > but there are more mistakes, so might be easier just to rewrite to: > > /* > * Since tuples with NULL values in the partition key columns are > stored in regular partitions, > * we'll treat any IS NULL clauses here as regular equality clauses. > /* Agreed that your version is better, so went with it. > 17. The following example will cause get_partitions_for_keys_hash to misbehave: > > create table hashp (a int, b int) partition by hash (a, b); > create table hashp1 partition of hashp for values with (modulus 4, remainder 0); > create table hashp2 partition of hashp for values with (modulus 4, remainder 1); > create table hashp3 partition of hashp for values with (modulus 4, remainder 3); > create table hashp4 partition of hashp for values with (modulus 4, remainder 2); > explain select * from hashp where a = 1 and a is null; > > The following code assumes that you'll never get a NULL test for a key > that has an equality test, and ends up trying to prune partitions > thinking we got compatible clauses for both partition keys. Yeah, I think this example helps spot a problem. I thought we'd never get to get_partitions_for_keys_hash() for the above query, because we should've been able to prove much earlier that the particular clause combination should be always false (a cannot be both non-null 1 and null). Now, because the planner itself doesn't substitute a constant-false for that, I taught classify_partition_bounding_keys() to do so. It would now return constfalse=true if it turns out that clauses in the input list lead to contradictory nullness condition for a given partition column. > memset(keyisnull, false, sizeof(keyisnull)); > for (i = 0; i < partkey->partnatts; i++) > { > if (bms_is_member(i, keys->keyisnull)) > { > keys->n_eqkeys++; > keyisnull[i] = true; > } > } > > /* > * Can only do pruning if we know all the keys and they're all equality > * keys including the nulls that we just counted above. > */ > if (keys->n_eqkeys == partkey->partnatts) > > The above code will need to be made smarter. It'll likely crash if you > change "b" to a pass-by-ref type. Hmm, not sure why. It seems to work: create table hp (a int, b text) partition by hash (a, b); create table hp1 partition of hp for values with (modulus 4, remainder 0); create table hp2 partition of hp for values with (modulus 4, remainder 1); create table hp3 partition of hp for values with (modulus 4, remainder 3); create table hp4 partition of hp for values with (modulus 4, remainder 2); insert into hp values (1, 'xxx'); INSERT 0 1 select tableoid::regclass, * from hp; tableoid | a | b ----------+---+----- hp1 | 1 | xxx (1 row) insert into hp (a) values (1); INSERT 0 1 insert into hp (b) values ('xxx'); INSERT 0 1 select tableoid::regclass, * from hp where a is null; tableoid | a | b ----------+---+----- hp2 | | xxx (1 row) select tableoid::regclass, * from hp where b is null; tableoid | a | b ----------+---+--- hp1 | 1 | (1 row) select tableoid::regclass, * from hp where a = 1 and b is null; tableoid | a | b ----------+---+--- hp1 | 1 | (1 row) select tableoid::regclass, * from hp where a is null and b = 'xxx'; tableoid | a | b ----------+---+----- hp2 | | xxx (1 row) > 18. The following code: > > int other_idx = -1; > > /* > * Only a designated partition accepts nulls, which if there > * exists one, return the same. > */ > if (partition_bound_accepts_nulls(boundinfo) || > partition_bound_has_default(boundinfo)) > other_idx = partition_bound_accepts_nulls(boundinfo) > ? boundinfo->null_index > : boundinfo->default_index; > if (other_idx >= 0) > return bms_make_singleton(other_idx); > else > return NULL; > > should be simplified to: > > /* > * NULLs may only exist in the NULL partition, or in the > * default, if there's no NULL partition. > */ > if (partition_bound_accepts_nulls(boundinfo)) > return bms_make_singleton(boundinfo->null_index); > else if (partition_bound_has_default(boundinfo)) > return bms_make_singleton(boundinfo->default_index); > return NULL; Agreed, done that way. > 19. "exists" -> "are" > > * If there are no datums to compare keys with, but there exist > * partitions, it must be the default partition. > > also, instead of writing "it must be the default partition." it should > be better to say "just return the default partition." OK, done. > 20. I don't think the return NULL should ever hit, is it worth putting > a comment to say /* shouldn't happen */ > > if (boundinfo->ndatums == 0) > { > if (partition_bound_has_default(boundinfo)) > return bms_make_singleton(boundinfo->default_index); > else > return NULL; > } I added a /* shouldn't happen */ comment next to return NULL. > 21. Can the following comment does not explain the situation well: > > /* > * boundinfo->ndatums - 1 is the last valid list partition datums > * index. > */ > > There's really no possible non-default partition for this case, so > perhaps we should just return the default, if one exists. We do go on > to check the n_maxkeys needlessly for this case. At the very least the > comment should be changed to: > > /* > * minkeys values are greater than any non-default partition. > * We'll check that for case below. > */ > > but I think it's worth just doing the default partition check there > and returning it, or NULL. It should help reduce confusion. Yep, done. Attached v20. Thanks again. Regards, Amit
Attachment
- v20-0001-Some-interface-changes-for-partition_bound_-cmp-.patch
- v20-0002-Introduce-a-get_partitions_from_clauses.patch
- v20-0003-Move-some-code-of-set_append_rel_size-to-separat.patch
- v20-0004-More-refactoring-around-partitioned-table-Append.patch
- v20-0005-Teach-planner-to-use-get_partitions_from_clauses.patch
pgsql-hackers by date: