Re: Multi-Column List Partitioning - Mailing list pgsql-hackers
From | Nitin Jadhav |
---|---|
Subject | Re: Multi-Column List Partitioning |
Date | |
Msg-id | CAMm1aWYryFE2WTN3xT_SMuGkJjU6HdnHSMM5YgrxV8U365+4_w@mail.gmail.com Whole thread Raw |
In response to | Re: Multi-Column List Partitioning (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: Multi-Column List Partitioning
|
List | pgsql-hackers |
> Approach 1 sounds better. It sounds like approach 1 might help us > implement support for allowing NULLs in range partition bounds in the > future, if at all. For now, it might be better to not allocate the > isnull array except for list partitioning. Thanks for confirming. > I'll wait for you to post a new patch addressing at least the comments > in my earlier email. Also, please make sure to run `make check` > successfully before posting the patch. :) I have fixed all of the review comments given by you and Jeevan in the attached patch and also the attached patch contains more changes compared to the previous patch. Following are the implementation details. 1. Regarding syntax, the existing syntax will work fine for the single-column list partitioning. However I have used the new syntax for the multi-column list partitioning as we discussed earlier. I have used a combination of 'AND' and 'OR' logic for the partition constraints as given in the below example. postgres@17503=#create table t(a int, b text) partition by list(a,b); CREATE TABLE postgres@17503=#create table t1 partition of t for values in ((1,'a'), (NULL,'b')); CREATE TABLE postgres@17503=#\d+ t Partitioned table "public.t" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+----------+-------------+--------------+------------- a | integer | | | | plain | | | b | text | | | | extended | | | Partition key: LIST (a, b) Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b')) postgres@17503=#\d+ t1 Table "public.t1" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+----------+-------------+--------------+------------- a | integer | | | | plain | | | b | text | | | | extended | | | Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b')) Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL) AND (b = 'b'::text))) Access method: heap 2. In the existing code, NULL values were handled differently. It was not added to the 'datums' variable, rather used to store the partition index directly in the 'null_index' variable. Now there is a possibility of multiple NULL values, hence introducing a new member 'isnulls' in the 'PartitionBoundInfoData' struct which indicates whether the corresponding element in the 'datums' is NULL. Now 'null_index' cannot be used directly to store the partition index, so removed it and made the necessary changes in multiple places. 3. I have added test cases for 'create table' and 'insert' statements related to multi-column list partitioning and these are working fine with 'make check'. 4. Handled the partition pruning code to accommodate these changes for single-column list partitioning. However it is pending for multi-column list partitioning. 5. I have done necessary changes in partition wise join related code to accommodate for single-column list partitioning. However it is pending for multi-column list partitioning. Kindly review the patch and let me know if any changes are required. Pending items: 1. Support of partition pruning for multi-column list partitioning. 2. Support of partition wise join for multi-column list partitioning. I will continue to work on the above 2 items. Kindly let me know if I am missing something. Thanks & Regards, Nitin Jadhav On Wed, May 26, 2021 at 10:27 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Sun, May 23, 2021 at 6:49 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > IMO, it is not such a bad syntax from a user's PoV. It's not hard to > > > understand from this syntax that the partition constraint is something > > > like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs > > > row-wise comparison. > > > > Thanks for suggesting to use row-wise comparison. > > Actually, I was just describing how the *users* may want to visualize > the partition constraint... > > > I have few queries > > with respect to handling of NULL values. > > > > 1. What should be the partition constraint for the above case. AFAIK, > > row-wise comparison wont work with NULL values as shown in [1]. I mean > > two rows are considered equal if all their corresponding members are > > non-null and equal. The rows are unequal if any corresponding members > > are non-null and unequal. Otherwise the result of the row comparison > > is unknown (null). So we should generate different types of > > constraints for NULL values. > > > > Ex: > > CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b); > > CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL), > > (NULL, 1), (NULL, NULL); > > > > As per my knowledge, we should consider creating partition constraints > > for the above example as given below. > > > > (a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b = > > 1)) OR ((a is NULL) AND (b is NULL)). > > Yeah, something like that should do the trick. > > Again, I was not actually suggesting that you write code to implement > the constraint using something like RowCompareExpr, only that the > users might want to view the constraint as doing row-wise comparison > of the partitioning columns and the specified value lists. > > > 2. In the current code we don't put the NULL value in the 'datums' > > field of 'PartitionBoundInfoData' structure [2]. Since there can be > > only one NULL value, we directly store the corresponding index value > > in the 'null_index' field. Now we have to handle multiple NULL values > > in case of Multi-Column List Partitioning. So the question is how to > > handle this scenario. Following are the 2 approaches to handle this. > > > > Approach-1: > > Add another field 'bool **isnull' in [2] and mark the corresponding > > element to TRUE if it has NULL value and the corresponding location in > > 'datums' contains empty/No value. For example, If a partition bound is > > (1, NULL), then > > > > datums[0][0] = 1 > > datums[0][1] = Not assigned any value > > isnull[0][0] = FALSE > > is null[0][1] = TRUE > > > > So now we have an entry in the 'datums' field for a bound containing > > NULL value, so we should handle this in all the scenarios where we are > > manipulating 'datums' in order to support NULL values and avoid crash. > > > > Approach-2: > > Don't add the bound information to 'datums' field of [2] if any of the > > value is NULL. Store this information separately in the structures > > mentioned in [3] and process accordingly. > > > > I feel approach-1 is the better solution as this requires less code > > changes and easy to implement than approach-2. Kindly share your > > thoughts about the approaches and please share if you have any better > > solution than the above 2. > > Approach 1 sounds better. It sounds like approach 1 might help us > implement support for allowing NULLs in range partition bounds in the > future, if at all. For now, it might be better to not allocate the > isnull array except for list partitioning. > > I'll wait for you to post a new patch addressing at least the comments > in my earlier email. Also, please make sure to run `make check` > successfully before posting the patch. :) > > Thanks. > > -- > Amit Langote > EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: