Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS]path toward faster partition pruning - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS]path toward faster partition pruning
Date
Msg-id 294e8400-6c40-87d5-078f-8713142032a0@lab.ntt.co.jp
Whole thread Raw
In response to Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS]path toward faster partition pruning  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS]path toward faster partition pruning
Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS]path toward faster partition pruning
List pgsql-hackers
Hi David.

Thanks for the review.

On 2018/01/12 12:30, David Rowley wrote:
> I've got a few more things for you. I'm only partway through another
> pass, but it makes sense to post what I have now if you're working on
> a new version.
> 
> 1. partitioing -> partitioning
> 
>  * Strategy of a partition clause operator per the partitioing operator class

Fixed.

> 2. get_partitions_from_clauses() modifies partclauses without
> mentioning it in the header. I think you need to either:
> 
> a) warn about this in the header comment; or
> b) do a list_copy() before list_concat()
> c) do list_truncate back to the original length after you're done with the list.

Went with (b).

> 3. get_partitions_from_clauses_recurse(), with:
> 
>   result = bms_add_range(result, 0, partdesc->nparts - 1);
> 
> You could change that to bms_add_range(NULL, ...) and ditch the
> assignment of result to NULL at the start of the function.

Done.

> 4. classify_partition_bounding_keys() now returns bool, but the return
> statement is still:
> 
> return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
> 
> my compiler didn't warn about that, but I'd imagine some might.

Oops, my bad.

> 
> Instead, can you make it:
> 
> if (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
>     keys->n_maxkeys > 0 || n_keynullness > 0)
>     return true;
> 
> return false;
>
> probably equal keys are the most likely case, so it'll be good to
> short circuit instead of performing addition on a bunch of stuff we
> don't care about anymore.

Changed it to what Robert suggested downthread.

> 5. In classify_partition_bounding_keys, why do we "continue" here?
> 
> clause = rinfo->clause;
> if (rinfo->pseudoconstant &&
> !DatumGetBool(((Const *) clause)->constvalue))
> {
> *constfalse = true;
> continue;
> }
> 
> Is there any point in searching further?
> 
> Also, if you were consistent with the return value for
> classify_partition_bounding_keys when you've set *constfalse = true;
> you wouldn't need to handle the case twice like you are in
> get_partitions_from_clauses_recurse().

OK, I made classify_partition_bounding_keys() return true whenever set
*constfalse to true.

> 6. I think it would be nicer if get_partitions_from_ne_clauses returns
> a set of partitions that could be excluded.
> 
> So instead of:
> 
>  * get_partitions_from_ne_clauses
>  *
>  * Return partitions of relation that satisfy all <> operator clauses in
>  * ne_clauses.  Only ever called if relation is a list partitioned table.
> 
> Have:
> 
>  * get_partitions_from_ne_clauses
>  *
>  * Returns a Bitmapset of partitions that can be safely excluded due to
>  * not-equal clauses existing for all possible partition values. It is only
>  * valid to call this for LIST partitioned tables.
> 
> and instead of:
> 
> result = bms_add_range(NULL, 0, partdesc->nparts - 1);
> result = bms_del_members(result, excluded_parts);
> bms_free(excluded_parts);
> 
> return result;
> 
> Just do:
> 
> return excluded_parts;
> 
> and in get_partitions_from_clauses_recurse(), do bms_del_members
> instead of bms_int_members.
> 
> there's less bit shuffling and it seems cleaner. Perhaps the function
> name would need to be changed if we're inverting the meaning too.
> 
> (I've attached a patch which makes this change along with an idea in #8 below)

Thanks for the suggestions... (comment continues below)

> 7. The following comment claims the function sets *datum, but there's
> no param by that name:
> 
> /*
>  * partkey_datum_from_expr
>  * Extract constant value from expr and set *datum to that value
>  */
> static bool
> partkey_datum_from_expr(PartitionKey key, int partkeyidx,
> Expr *expr, Datum *value)

Fixed.

> 8. The code in get_partitions_from_ne_clauses() does perform quite a
> few nested loops. I think a more simple way to would be to track the
> offsets you've seen in a Bitmapset. This would save you having to
> check for duplicates, as an offset can only contain a single datum.
> You'd just need to build a couple of arrays after that, one to sum up
> the offsets found per partition, and one for the total datums allowed
> in the partition. If the numbers match then you can remove the
> partition.
> 
> I've written this and attached it to this email. It saves about 50
> lines of code and should perform much better for complex cases, for
> example, a large NOT IN list. This also implements #6.

I liked your patch, so incorporated it, except, I feel slightly
uncomfortable about the new name that you chose for the function because
it sounds a bit generic.  I mean the function solves a very specific
problem and have very strict requirements for calling it.  It's not like
we could pass it just any partitioned relation and/or just any set of
clauses.  It has to be a list-partitioned table and the list of clauses
must contain only the clauses containing compatible <> operators.  Checks
for those requirements are carried out in yet another place, that is,
classify_partition_bounding_keys().

Perhaps we can live with that though, because it's not a publicly
available function, but someone might get confused in the future.

> 9. "the same" -> "it"
> 
> /*
> * In case of NOT IN (..), we get a '<>', which while not
> * listed as part of any operator family, we are able to
> * handle the same if its negator is indeed a part of the
> * partitioning operator family.
> */

Done.

> 10. in classify_partition_bounding_keys: "0" -> "false"
> 
> /* Return if no work to do below. */
> if (!will_compute_keys || *constfalse)
> return 0;
> 
> Likewise for:
> 
> if (*constfalse)
> return 0;

Have fixed these per an earlier comment in this email.

> 11. I don't see partition_bound_bsearch used anywhere below the
> following comment:
> 
> * Generate bounding tuple(s).
> *
> * We look up partitions in the partition bound descriptor using, say,
> * partition_bound_bsearch(), which expects a Datum (or Datums if multi-
> * column key).  So, extract the same out of the constant argument of
> * each clause.
> 
> I also don't know what the comment is trying to say.

The comment is no longer very intelligible to me too.  I just wanted to
say here that, *elsewhere*, we will use a function like
partition_bound_bsearch() to look up partitions from the clauses we
matched against individual partition key columns.  That function expects
the lookup key to be in a Datum array form, not a list-of-clauses form.
So, we must construct the lookup key(s) by extracting constant values out
the clauses.

I tried to rewrite it that way.  Hope that's a bit clearer.

> 12.
> 
>  * operator and sets *incl if equality is implied
> 
> should be:
> 
>  * operator and set *incl to true if the operator's strategy is inclusive.

Done that way.

> 13. What does "the same" mean in:
> 
> * and add this one directly to the result.  Caller would
> * arbitrarily choose one of the many and perform
> * partition-pruning with the same.  It's possible that mutual

It means "the one that caller would arbitrarily choose of the many that
this function will return to it".  Anyway, I changed "the same" to "it".

> I think you quite often use "the same" to mean "it". Can you change that?

I guess that's just one of my many odd habits when writing English, all of
which I'm trying to get rid of, but apparently with limited success.  Must
try harder. :)

> 14. Not sure what parameter you're talking about here.
> 
>  * Evaluate 'leftarg op rightarg' and set *result to its value.
>  *
>  * leftarg and rightarg referred to above actually refer to the constant
>  * operand (Datum) of the clause contained in the parameters leftarg and
>  * rightarg below, respectively.  And op refers to the operator of the
>  * clause contained in the parameter op below.

I rewrote the above comment block as:

 * Try to compare the constant arguments of 'leftarg' and 'rightarg', in that
 * order, using the operator of 'op' and set *result to the result of this
 * comparison.

Is that any better?

> 15. "the latter" is normally used when you're referring to the last
> thing in a list which was just mentioned. In this case, leftarg_const
> and rightarg_const is the list, so "the latter" should mean
> rightarg_const, but I think you mean to compare them using the
> operator.
> 
> * If the leftarg_const and rightarg_const are both of the type expected
> * by op's operator, then compare them using the latter.

Rewrote it as:

     * We can compare leftarg_const and rightarg_const using op's operator
     * only if both are of the type expected by it.

> 16. There are a few things to improve with the following comment:
> 
> /*
> * Hash partitioning stores partition keys containing nulls in regular
> * partitions.  That is, the code that determines the hash partition for
> * a given row admits nulls in the partition key when computing the key's
> * hash.  So, here we treat any IS NULL clauses on partition key columns as
> * equality keys, along with any other non-null values coming from equality
> * operator clauses.
> */
> 
> "admits" is not the correct word here, and "hash" should be "correct",
> but there are more mistakes, so might be easier just to rewrite to:
> 
> /*
> * Since tuples with NULL values in the partition key columns are
> stored in regular partitions,
> * we'll treat any IS NULL clauses here as regular equality clauses.
> /*

Agreed that your version is better, so went with it.

> 17. The following example will cause get_partitions_for_keys_hash to misbehave:
> 
> create table hashp (a int, b int) partition by hash (a, b);
> create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
> create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
> create table hashp3 partition of hashp for values with (modulus 4, remainder 3);
> create table hashp4 partition of hashp for values with (modulus 4, remainder 2);
> explain select * from hashp where a = 1 and a is null;
> 
> The following code assumes that you'll never get a NULL test for a key
> that has an equality test, and ends up trying to prune partitions
> thinking we got compatible clauses for both partition keys.

Yeah, I think this example helps spot a problem.  I thought we'd never get
to get_partitions_for_keys_hash() for the above query, because we
should've been able to prove much earlier that the particular clause
combination should be always false (a cannot be both non-null 1 and null).
 Now, because the planner itself doesn't substitute a constant-false for
that, I taught classify_partition_bounding_keys() to do so.  It would now
return constfalse=true if it turns out that clauses in the input list lead
to contradictory nullness condition for a given partition column.

>   memset(keyisnull, false, sizeof(keyisnull));
>   for (i = 0; i < partkey->partnatts; i++)
>   {
>     if (bms_is_member(i, keys->keyisnull))
>     {
>       keys->n_eqkeys++;
>       keyisnull[i] = true;
>     }
>   }
> 
>   /*
>    * Can only do pruning if we know all the keys and they're all equality
>    * keys including the nulls that we just counted above.
>    */
>   if (keys->n_eqkeys == partkey->partnatts)
> 
> The above code will need to be made smarter. It'll likely crash if you
> change "b" to a pass-by-ref type.

Hmm, not sure why.  It seems to work:

create table hp (a int, b text) partition by hash (a, b);
create table hp1 partition of hp for values with (modulus 4, remainder 0);
create table hp2 partition of hp for values with (modulus 4, remainder 1);
create table hp3 partition of hp for values with (modulus 4, remainder 3);
create table hp4 partition of hp for values with (modulus 4, remainder 2);

insert into hp values (1, 'xxx');
INSERT 0 1

select tableoid::regclass, * from hp;
 tableoid | a |  b
----------+---+-----
 hp1   | 1 | xxx
(1 row)

insert into hp (a) values (1);
INSERT 0 1

insert into hp (b) values ('xxx');
INSERT 0 1

select tableoid::regclass, * from hp where a is null;
 tableoid | a |  b
----------+---+-----
 hp2   |   | xxx
(1 row)

select tableoid::regclass, * from hp where b is null;
 tableoid | a | b
----------+---+---
 hp1   | 1 |
(1 row)

select tableoid::regclass, * from hp where a = 1 and b is null;
 tableoid | a | b
----------+---+---
 hp1   | 1 |
(1 row)

select tableoid::regclass, * from hp where a is null and b = 'xxx';
 tableoid | a |  b
----------+---+-----
 hp2   |   | xxx
(1 row)

> 18. The following code:
> 
> int other_idx = -1;
> 
> /*
> * Only a designated partition accepts nulls, which if there
> * exists one, return the same.
> */
> if (partition_bound_accepts_nulls(boundinfo) ||
> partition_bound_has_default(boundinfo))
> other_idx = partition_bound_accepts_nulls(boundinfo)
> ? boundinfo->null_index
> : boundinfo->default_index;
> if (other_idx >= 0)
> return bms_make_singleton(other_idx);
> else
> return NULL;
> 
>  should be simplified to:
> 
>  /*
>   * NULLs may only exist in the NULL partition, or in the
>   * default, if there's no NULL partition.
>   */
>  if (partition_bound_accepts_nulls(boundinfo))
> return bms_make_singleton(boundinfo->null_index);
>  else if (partition_bound_has_default(boundinfo))
> return bms_make_singleton(boundinfo->default_index);
>  return NULL;

Agreed, done that way.

> 19. "exists" -> "are"
> 
> * If there are no datums to compare keys with, but there exist
> * partitions, it must be the default partition.
> 
> also, instead of writing "it must be the default partition." it should
> be better to say "just return the default partition."

OK, done.

> 20. I don't think the return NULL should ever hit, is it worth putting
> a comment to say /* shouldn't happen */
> 
> if (boundinfo->ndatums == 0)
> {
> if (partition_bound_has_default(boundinfo))
> return bms_make_singleton(boundinfo->default_index);
> else
> return NULL;
> }

I added a /* shouldn't happen */ comment next to return NULL.

> 21. Can the following comment does not explain the situation well:
> 
> /*
> * boundinfo->ndatums - 1 is the last valid list partition datums
> * index.
> */
> 
> There's really no possible non-default partition for this case, so
> perhaps we should just return the default, if one exists. We do go on
> to check the n_maxkeys needlessly for this case. At the very least the
> comment should be changed to:
> 
>  /*
>   * minkeys values are greater than any non-default partition.
>   * We'll check that for case below.
>   */
> 
> but I think it's worth just doing the default partition check there
> and returning it, or NULL. It should help reduce confusion.

Yep, done.


Attached v20.  Thanks again.

Regards,
Amit

Attachment

pgsql-hackers by date:

Previous
From: Aleksandr Parfenov
Date:
Subject: Re: PostgreSQL crashes with SIGSEGV
Next
From: "REIX, Tony"
Date:
Subject: RE:[HACKERS] Deadlock in XLogInsert at AIX