Re: Multi-Column List Partitioning - Mailing list pgsql-hackers

From Zhihong Yu
Subject Re: Multi-Column List Partitioning
Date
Msg-id CALNJ-vRP6n8tB5a=cxZ_oa7VrywgPcSeM-_zQct7LG+4nNUVAw@mail.gmail.com
Whole thread Raw
In response to Re: Multi-Column List Partitioning  (Nitin Jadhav <nitinjadhavpostgres@gmail.com>)
Responses Re: Multi-Column List Partitioning  (Nitin Jadhav <nitinjadhavpostgres@gmail.com>)
List pgsql-hackers


On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
> The new list bound binary search and related comparison support
> function look a bit too verbose to me.  I was expecting
> partition_list_bsearch() to look very much like
> partition_range_datum_bsearch(), but that is not the case.  The
> special case code that you wrote in partition_list_bsearch() seems
> unnecessary, at least in that function.  I'm talking about the code
> fragment starting with this comment:
>
> I will look at other parts of the patch next week hopefully.   For
> now, attached is a delta patch that applies on top of your v1, which
> does:
>
> * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
> * Make qsort_partition_list_value_cmp simply call
> partition_lbound_datum_cmp() instead of having its own logic to
> compare input bounds
> * Move partition_lbound_datum_cmp() into partbounds.c as a static
> function (export seems unnecessary)
> * Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

Yes. You are right. The extra code added in partition_list_bsearch()
is not required and thanks for sharing the delta patch. It looks good
to me and I have incorporated the changes in the attached patch.

> I guess you're perhaps trying to address the case where the caller
> does not specify the values for all of the partition key columns,
> which can happen when the partition pruning code needs to handle a set
> of clauses matching only some of the partition key columns.  But
> that's a concern of the partition pruning code and so the special case
> should be handled there (if at all), not in the binary search function
> that is shared with other callers.  Regarding that, I'm wondering if
> we should require clauses matching all of the partition key columns to
> be found for the pruning code to call the binary search, so do
> something like get_matching_hash_bounds() does:
>
> Do you think that trying to match list partitions even with fewer keys
> is worth the complexity of the implementation?  That is, is the use
> case to search for only a subset of partition key columns common
> enough with list partitioning?
>
> If we do decide to implement the special case, remember that to do
> that efficiently, we'd need to require that the subset of matched key
> columns constitutes a prefix, because of the way the datums are
> sorted.  That is, match all partitions when the query only contains a
> clause for b when the partition key is (a, b, c), but engage the
> special case of pruning if the query contains clauses for a, or for a
> and b.

Thanks for the suggestion. Below is the implementation details for the
partition pruning for multi column list partitioning.

In the existing code (For single column list partitioning)
1. In gen_partprune_steps_internal(), we try to match the where
clauses provided by the user with the partition key data using
match_clause_to_partition_key(). Based on the match, this function can
return many values like PARTCLAUSE_MATCH_CLAUSE,
PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
gen_prune_steps_from_opexps() (strategy-2) which generate and return a
list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
clauses that have been matched to the partition key and it also takes
care handling prefix of the partition keys.
3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
gen_prune_step_op() (strategy-1) which generates single
PartitionPruneStepOp since the earlier list partitioning supports
single column and there can be only one NULL value. In
get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
partition index which accepts null and we used to return from here.

In case of multi column list partitioning, we have columns more than
one and hence there is a possibility of more than one NULL values in
the where clauses. The above mentioned steps are modified like below.

1.  Modified the match_clause_to_partition_key() to generate an object
of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
case of clauses related to NULL. The information required to generate
PartClauseInfo is populated here like the constant expression
consisting of (Datum) 0, op_strategy, op_is_ne, etc.
2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
(gen_prune_steps_from_opexps) to generate partition pruning steps.
This function takes care of generating a list of pruning steps if
there are multiple clauses and also takes care of handling prefixes.
3. Modified perform_pruning_base_step() to generate the datum values
and isnulls data of the where clauses. In case if any of the key
contains NULL value then the corresponding datum value is 0.
4. Modified get_matching_list_bounds() to generate the minimum offset
and/or maximum offset of the matched values based on the difference
operation strategies. Now since the NULL containing bound values are
part of 'boundinfo', changed the code accordingly to include the NULL
containing partitions or not in different scenarios like
InvalidStrategy, etc.

I have done some cosmetic changes to
v1_multi_column_list_partitioning.patch. So all the above code changes
related to partition pruning are merged with the previous patch and
also included the delta patch shared by you. Hence sharing a single
patch.

Kindly have a look and share your thoughts.


Hi,

bq. Supported new syantx to allow mentioning multiple key information. 

syantx -> syntax

+       isDuplicate = checkForDuplicates(result, values);
+       if (isDuplicate)
+           continue;

It seems the variable isDuplicate is not needed. The if statement can directly check the return value from checkForDuplicates().

+       //TODO: Handle for multi-column cases
+       for (j = 0; j < 1; j++)

Is this part going to be updated in the next patch?

Cheers

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: log_autovacuum in Postgres 14 -- ordering issue
Next
From: Zhihong Yu
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER