Re: Multi-Column List Partitioning - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: Multi-Column List Partitioning |
Date | |
Msg-id | CA+HiwqHKBFpCH+j0LET9mZjxtntZ87rrzw8KZDGw8ZHU=Rb8pg@mail.gmail.com Whole thread Raw |
In response to | Re: Multi-Column List Partitioning (Nitin Jadhav <nitinjadhavpostgres@gmail.com>) |
Responses |
Re: Multi-Column List Partitioning
|
List | pgsql-hackers |
On Sun, May 23, 2021 at 6:49 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > IMO, it is not such a bad syntax from a user's PoV. It's not hard to > > understand from this syntax that the partition constraint is something > > like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs > > row-wise comparison. > > Thanks for suggesting to use row-wise comparison. Actually, I was just describing how the *users* may want to visualize the partition constraint... > I have few queries > with respect to handling of NULL values. > > 1. What should be the partition constraint for the above case. AFAIK, > row-wise comparison wont work with NULL values as shown in [1]. I mean > two rows are considered equal if all their corresponding members are > non-null and equal. The rows are unequal if any corresponding members > are non-null and unequal. Otherwise the result of the row comparison > is unknown (null). So we should generate different types of > constraints for NULL values. > > Ex: > CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b); > CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL), > (NULL, 1), (NULL, NULL); > > As per my knowledge, we should consider creating partition constraints > for the above example as given below. > > (a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b = > 1)) OR ((a is NULL) AND (b is NULL)). Yeah, something like that should do the trick. Again, I was not actually suggesting that you write code to implement the constraint using something like RowCompareExpr, only that the users might want to view the constraint as doing row-wise comparison of the partitioning columns and the specified value lists. > 2. In the current code we don't put the NULL value in the 'datums' > field of 'PartitionBoundInfoData' structure [2]. Since there can be > only one NULL value, we directly store the corresponding index value > in the 'null_index' field. Now we have to handle multiple NULL values > in case of Multi-Column List Partitioning. So the question is how to > handle this scenario. Following are the 2 approaches to handle this. > > Approach-1: > Add another field 'bool **isnull' in [2] and mark the corresponding > element to TRUE if it has NULL value and the corresponding location in > 'datums' contains empty/No value. For example, If a partition bound is > (1, NULL), then > > datums[0][0] = 1 > datums[0][1] = Not assigned any value > isnull[0][0] = FALSE > is null[0][1] = TRUE > > So now we have an entry in the 'datums' field for a bound containing > NULL value, so we should handle this in all the scenarios where we are > manipulating 'datums' in order to support NULL values and avoid crash. > > Approach-2: > Don't add the bound information to 'datums' field of [2] if any of the > value is NULL. Store this information separately in the structures > mentioned in [3] and process accordingly. > > I feel approach-1 is the better solution as this requires less code > changes and easy to implement than approach-2. Kindly share your > thoughts about the approaches and please share if you have any better > solution than the above 2. Approach 1 sounds better. It sounds like approach 1 might help us implement support for allowing NULLs in range partition bounds in the future, if at all. For now, it might be better to not allocate the isnull array except for list partitioning. I'll wait for you to post a new patch addressing at least the comments in my earlier email. Also, please make sure to run `make check` successfully before posting the patch. :) Thanks. -- Amit Langote EDB: http://www.enterprisedb.com
pgsql-hackers by date: