Re: Multi-Column List Partitioning - Mailing list pgsql-hackers
From | Nitin Jadhav |
---|---|
Subject | Re: Multi-Column List Partitioning |
Date | |
Msg-id | CAMm1aWb3_4ZH7cqCp5LGRas3p18qg+U1Jd6V_=KiEAYp57aO5A@mail.gmail.com Whole thread Raw |
In response to | Re: Multi-Column List Partitioning (Zhihong Yu <zyu@yugabyte.com>) |
Responses |
Re: Multi-Column List Partitioning
|
List | pgsql-hackers |
> + * isnulls is an array of boolean-tuples with key->partnatts booleans values > + * each. Currently only used for list partitioning, it stores whether a > > I think 'booleans' should be 'boolean'. > The trailing word 'each' is unnecessary. > bq. Supported new syantx to allow mentioning multiple key information. > > syantx -> syntax > + isDuplicate = checkForDuplicates(result, values); > + if (isDuplicate) > + continue; > > It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates(). I agree that isDuplicate is not required. Thanks for sharing the comments. I will take care of these comments in the next patch. > + //TODO: Handle for multi-column cases > + for (j = 0; j < 1; j++) > > Is this part going to be updated in the next patch? Yes. The code changes related to partition-wise join are in progress. I will handle these in the next patch. Thanks & Regards, Nitin Jadhav On Thu, Aug 26, 2021 at 2:40 AM Zhihong Yu <zyu@yugabyte.com> wrote: > > > > On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: >> >> > The new list bound binary search and related comparison support >> > function look a bit too verbose to me. I was expecting >> > partition_list_bsearch() to look very much like >> > partition_range_datum_bsearch(), but that is not the case. The >> > special case code that you wrote in partition_list_bsearch() seems >> > unnecessary, at least in that function. I'm talking about the code >> > fragment starting with this comment: >> > >> > I will look at other parts of the patch next week hopefully. For >> > now, attached is a delta patch that applies on top of your v1, which >> > does: >> > >> > * Simplify partition_list_bsearch() and partition_lbound_datum_cmp() >> > * Make qsort_partition_list_value_cmp simply call >> > partition_lbound_datum_cmp() instead of having its own logic to >> > compare input bounds >> > * Move partition_lbound_datum_cmp() into partbounds.c as a static >> > function (export seems unnecessary) >> > * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index >> >> Yes. You are right. The extra code added in partition_list_bsearch() >> is not required and thanks for sharing the delta patch. It looks good >> to me and I have incorporated the changes in the attached patch. >> >> > I guess you're perhaps trying to address the case where the caller >> > does not specify the values for all of the partition key columns, >> > which can happen when the partition pruning code needs to handle a set >> > of clauses matching only some of the partition key columns. But >> > that's a concern of the partition pruning code and so the special case >> > should be handled there (if at all), not in the binary search function >> > that is shared with other callers. Regarding that, I'm wondering if >> > we should require clauses matching all of the partition key columns to >> > be found for the pruning code to call the binary search, so do >> > something like get_matching_hash_bounds() does: >> > >> > Do you think that trying to match list partitions even with fewer keys >> > is worth the complexity of the implementation? That is, is the use >> > case to search for only a subset of partition key columns common >> > enough with list partitioning? >> > >> > If we do decide to implement the special case, remember that to do >> > that efficiently, we'd need to require that the subset of matched key >> > columns constitutes a prefix, because of the way the datums are >> > sorted. That is, match all partitions when the query only contains a >> > clause for b when the partition key is (a, b, c), but engage the >> > special case of pruning if the query contains clauses for a, or for a >> > and b. >> >> Thanks for the suggestion. Below is the implementation details for the >> partition pruning for multi column list partitioning. >> >> In the existing code (For single column list partitioning) >> 1. In gen_partprune_steps_internal(), we try to match the where >> clauses provided by the user with the partition key data using >> match_clause_to_partition_key(). Based on the match, this function can >> return many values like PARTCLAUSE_MATCH_CLAUSE, >> PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc. >> 2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using >> gen_prune_steps_from_opexps() (strategy-2) which generate and return a >> list of PartitionPruneStepOp that are based on OpExpr and BooleanTest >> clauses that have been matched to the partition key and it also takes >> care handling prefix of the partition keys. >> 3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using >> gen_prune_step_op() (strategy-1) which generates single >> PartitionPruneStepOp since the earlier list partitioning supports >> single column and there can be only one NULL value. In >> get_matching_list_bounds(), if the nullkeys is not empty, we fetch the >> partition index which accepts null and we used to return from here. >> >> In case of multi column list partitioning, we have columns more than >> one and hence there is a possibility of more than one NULL values in >> the where clauses. The above mentioned steps are modified like below. >> >> 1. Modified the match_clause_to_partition_key() to generate an object >> of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in >> case of clauses related to NULL. The information required to generate >> PartClauseInfo is populated here like the constant expression >> consisting of (Datum) 0, op_strategy, op_is_ne, etc. >> 2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2 >> (gen_prune_steps_from_opexps) to generate partition pruning steps. >> This function takes care of generating a list of pruning steps if >> there are multiple clauses and also takes care of handling prefixes. >> 3. Modified perform_pruning_base_step() to generate the datum values >> and isnulls data of the where clauses. In case if any of the key >> contains NULL value then the corresponding datum value is 0. >> 4. Modified get_matching_list_bounds() to generate the minimum offset >> and/or maximum offset of the matched values based on the difference >> operation strategies. Now since the NULL containing bound values are >> part of 'boundinfo', changed the code accordingly to include the NULL >> containing partitions or not in different scenarios like >> InvalidStrategy, etc. >> >> I have done some cosmetic changes to >> v1_multi_column_list_partitioning.patch. So all the above code changes >> related to partition pruning are merged with the previous patch and >> also included the delta patch shared by you. Hence sharing a single >> patch. >> >> Kindly have a look and share your thoughts. >> >> > Hi, > > bq. Supported new syantx to allow mentioning multiple key information. > > syantx -> syntax > > + isDuplicate = checkForDuplicates(result, values); > + if (isDuplicate) > + continue; > > It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates(). > > + //TODO: Handle for multi-column cases > + for (j = 0; j < 1; j++) > > Is this part going to be updated in the next patch? > > Cheers
pgsql-hackers by date: