Re: Multi-Column List Partitioning - Mailing list pgsql-hackers
From | Nitin Jadhav |
---|---|
Subject | Re: Multi-Column List Partitioning |
Date | |
Msg-id | CAMm1aWakf4DnFN=e8zRScKgpLJQ3T1T6AN1WyqqtC4ZnMb0CUQ@mail.gmail.com Whole thread Raw |
In response to | Re: Multi-Column List Partitioning (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: Multi-Column List Partitioning
|
List | pgsql-hackers |
> Yes, it would be nice to have this. Thanks for picking this up. Thanks for confirming. > Some quick observations: Thanks for providing the comments. I will handle these cases. > Hmm, why not have parentheses around these lists, that is: ( > (list_of_values) [, ...] ) > > So your example would look like this: > > CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN ((1, 2), (1, 5), (2, > 2), (2, 10)); I am ok with this syntax. This would be more appropriate. > IMO, it is not such a bad syntax from a user's PoV. It's not hard to > understand from this syntax that the partition constraint is something > like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs > row-wise comparison. Thanks for suggesting to use row-wise comparison. I have few queries with respect to handling of NULL values. 1. What should be the partition constraint for the above case. AFAIK, row-wise comparison wont work with NULL values as shown in [1]. I mean two rows are considered equal if all their corresponding members are non-null and equal. The rows are unequal if any corresponding members are non-null and unequal. Otherwise the result of the row comparison is unknown (null). So we should generate different types of constraints for NULL values. Ex: CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b); CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL), (NULL, 1), (NULL, NULL); As per my knowledge, we should consider creating partition constraints for the above example as given below. (a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b = 1)) OR ((a is NULL) AND (b is NULL)). Kindly correct me if I am wrong. 2. In the current code we don't put the NULL value in the 'datums' field of 'PartitionBoundInfoData' structure [2]. Since there can be only one NULL value, we directly store the corresponding index value in the 'null_index' field. Now we have to handle multiple NULL values in case of Multi-Column List Partitioning. So the question is how to handle this scenario. Following are the 2 approaches to handle this. Approach-1: Add another field 'bool **isnull' in [2] and mark the corresponding element to TRUE if it has NULL value and the corresponding location in 'datums' contains empty/No value. For example, If a partition bound is (1, NULL), then datums[0][0] = 1 datums[0][1] = Not assigned any value isnull[0][0] = FALSE is null[0][1] = TRUE So now we have an entry in the 'datums' field for a bound containing NULL value, so we should handle this in all the scenarios where we are manipulating 'datums' in order to support NULL values and avoid crash. Approach-2: Don't add the bound information to 'datums' field of [2] if any of the value is NULL. Store this information separately in the structures mentioned in [3] and process accordingly. I feel approach-1 is the better solution as this requires less code changes and easy to implement than approach-2. Kindly share your thoughts about the approaches and please share if you have any better solution than the above 2. [1]: postgres@15890=#SELECT ROW(1, 2) = ROW(1, 2); ?column? ---------- t (1 row) postgres@15890=#SELECT ROW(1, 2) = ROW(1, 1); ?column? ---------- f (1 row) postgres@15890=#SELECT ROW(1, NULL) = ROW(1, NULL); ?column? ---------- (1 row) postgres@15890=#SELECT ROW(1, 2) = ROW(1, NULL); ?column? ---------- (1 row) [2] : typedef struct PartitionBoundInfoData { char strategy; /* hash, list or range? */ int ndatums; /* Length of the datums[] array */ Datum **datums; PartitionRangeDatumKind **kind; /* The kind of each range bound datum; * NULL for hash and list partitioned * tables */ int nindexes; /* Length of the indexes[] array */ int *indexes; /* Partition indexes */ int null_index; /* Index of the null-accepting partition; -1 * if there isn't one */ int default_index; /* Index of the default partition; -1 if there * isn't one */ } PartitionBoundInfoData; [3]: typedef struct NullBoundDatumInfo { Datum *datum; int col_index; int. bound_index; } NullBoundDatumInfo; typedef struct NullBoundIsNullInfo { int col_index; int. bound_index; } NullBoundIsNullInfo; Add 2 fields of type 'NullBoundDatumInfo' and 'NullBoundIsNullInfo' to the structure [2]. -- Thanks & Regards, Nitin Jadhav On Fri, May 21, 2021 at 5:47 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Fri, May 21, 2021 at 1:02 PM Amit Langote <amitlangote09@gmail.com> wrote: > > I will now take a look at the patch itself. > > Some quick observations: > > * I get a lot of instances of the following 2 warnings when compiling > the patched code: > > Warning #1: > > partprune.c: In function ‘get_matching_list_bounds’: > partprune.c:2731:11: warning: passing argument 5 of > ‘partition_list_bsearch’ makes pointer from integer without a cast > [enabled by default] > nvalues, value, &is_equal); > ^ > In file included from partprune.c:53:0: > ../../../src/include/partitioning/partbounds.h:117:12: note: expected > ‘Datum *’ but argument is of type ‘Datum’ > extern int partition_list_bsearch(FmgrInfo *partsupfunc, > > Warning #2: > > partprune.c:2781:12: warning: incompatible integer to pointer > conversion passing 'Datum' > (aka 'unsigned long') to parameter of type 'Datum *' (aka > 'unsigned long *'); take the > address with & [-Wint-conversion] > > value, &is_equal); > > ^~~~~ > > & > ../../../src/include/partitioning/partbounds.h:120:32: note: passing > argument to parameter 'value' > here > ...int nvalues, Datum *value, bool *is_equal); > > * I think this code: > > === > /* Get the only column's name in case we need to output an error */ > if (key->partattrs[0] != 0) > colname = get_attname(RelationGetRelid(parent), > key->partattrs[0], false); > else > colname = deparse_expression((Node *) linitial(partexprs), > > deparse_context_for(RelationGetRelationName(parent), > > RelationGetRelid(parent)), > false, false); > /* Need its type data too */ > coltype = get_partition_col_typid(key, 0); > coltypmod = get_partition_col_typmod(key, 0); > partcollation = get_partition_col_collation(key, 0); > === > > belongs in the new function transformPartitionListBounds that you > added, because without doing so, any errors having to do with > partitioning columns other than the first one will report the first > column's name in the error message: > > postgres=# create table foo (a bool, b bool) partition by list (a, b); > CREATE TABLE > > -- this is fine! > postgres=# create table foo_true_true partition of foo for values in (1, true); > ERROR: specified value cannot be cast to type boolean for column "a" > LINE 1: ...able foo_true_true partition of foo for values in (1, true); > > -- not this! > postgres=# create table foo_true_true partition of foo for values in (true, 1); > ERROR: specified value cannot be cast to type boolean for column "a" > LINE 1: ...able foo_true_true partition of foo for values in (true, 1); > > * The following prototype of transformPartitionListBounds() means that > all values in a given bound list are analyzed with the first > partitioning column's colname, type, typmod, etc., which is wrong: > > +static List * > +transformPartitionListBounds(ParseState *pstate, PartitionBoundSpec *spec, > + char *colname, Oid coltype, int32 coltypmod, > + Oid partcollation, int partnatts) > +{ > > An example of wrong behavior because of that: > > postgres=# create table foo (a bool, b text) partition by list (a, b); > CREATE TABLE > Time: 3.967 ms > postgres=# create table foo_true_true partition of foo for values in > (true, 'whatever'); > ERROR: invalid input syntax for type boolean: "whatever" > LINE 1: ...o_true_true partition of foo for values in (true, 'whatever'... > > "whatever" should've been accepted but because it's checked with a's > type, it is wrongly flagged. > > Please take a look at how transformPartitionRangeBound() handles this, > especially how it uses the correct partitioning column's info to > analyze the corresponding bound value expression. > > I will continue looking next week. > > -- > Amit Langote > EDB: http://www.enterprisedb.com
pgsql-hackers by date: