Thread: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()
In ATExecAttachPartition() there's following code 13715 partnatts = get_partition_natts(key); 13716 for (i = 0; i < partnatts; i++) 13717 { 13718 AttrNumber partattno; 13719 13720 partattno = get_partition_col_attnum(key, i); 13721 13722 /* If partition key is an expression, must not skip validation */ 13723 if (!partition_accepts_null && 13724 (partattno == 0 || 13725 !bms_is_member(partattno, not_null_attrs))) 13726 skip_validate = false; 13727 } partattno obtained from the partition key is the attribute number of the partitioned table but not_null_attrs contains the attribute numbers of attributes of the table being attached which have NOT NULL constraint on them. But the attribute numbers of partitioned table and the table being attached may not agree i.e. partition key attribute in partitioned table may have a different position in the table being attached. So, this code looks buggy. Probably we don't have a test which tests this code with different attribute order between partitioned table and the table being attached. I didn't get time to actually construct a testcase and test it. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Wed, Jun 7, 2017 at 7:47 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > In ATExecAttachPartition() there's following code > > 13715 partnatts = get_partition_natts(key); > 13716 for (i = 0; i < partnatts; i++) > 13717 { > 13718 AttrNumber partattno; > 13719 > 13720 partattno = get_partition_col_attnum(key, i); > 13721 > 13722 /* If partition key is an expression, must not skip > validation */ > 13723 if (!partition_accepts_null && > 13724 (partattno == 0 || > 13725 !bms_is_member(partattno, not_null_attrs))) > 13726 skip_validate = false; > 13727 } > > partattno obtained from the partition key is the attribute number of > the partitioned table but not_null_attrs contains the attribute > numbers of attributes of the table being attached which have NOT NULL > constraint on them. But the attribute numbers of partitioned table and > the table being attached may not agree i.e. partition key attribute in > partitioned table may have a different position in the table being > attached. So, this code looks buggy. Probably we don't have a test > which tests this code with different attribute order between > partitioned table and the table being attached. I didn't get time to > actually construct a testcase and test it. I think this code could be removed entirely in light of commit 3ec76ff1f2cf52e9b900349957b42d28128b7bc7. Amit? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017/06/08 1:44, Robert Haas wrote: > On Wed, Jun 7, 2017 at 7:47 AM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> In ATExecAttachPartition() there's following code >> >> 13715 partnatts = get_partition_natts(key); >> 13716 for (i = 0; i < partnatts; i++) >> 13717 { >> 13718 AttrNumber partattno; >> 13719 >> 13720 partattno = get_partition_col_attnum(key, i); >> 13721 >> 13722 /* If partition key is an expression, must not skip >> validation */ >> 13723 if (!partition_accepts_null && >> 13724 (partattno == 0 || >> 13725 !bms_is_member(partattno, not_null_attrs))) >> 13726 skip_validate = false; >> 13727 } >> >> partattno obtained from the partition key is the attribute number of >> the partitioned table but not_null_attrs contains the attribute >> numbers of attributes of the table being attached which have NOT NULL >> constraint on them. But the attribute numbers of partitioned table and >> the table being attached may not agree i.e. partition key attribute in >> partitioned table may have a different position in the table being >> attached. So, this code looks buggy. Probably we don't have a test >> which tests this code with different attribute order between >> partitioned table and the table being attached. I didn't get time to >> actually construct a testcase and test it. There seem to be couple of bugs here: 1. When partition's key attributes differ in ordering from the parent, predicate_implied_by() will give up due to structuralinequality of Vars in the expressions. By fixing this, we can get it to return 'true' when it's really so. 2. As you said, we store partition's attribute numbers in the not_null_attrs bitmap, but then check partattno (which isthe parent's attribute number which might differ) against the bitmap, which seems like it might produce incorrect result. If, for example, predicate_implied_by() set skip_validate to true, and the above code failed to set skip_validateto false where it should have, then we would wrongly end up skipping the scan. That is, rows in the partition will contain null values whereas the partition constraint does not allow it. It's hard to reproduce this currently,because with different ordering of attributes, predicate_refute_by() never returns true (as mentioned in 1 above),so skip_validate does not need to be set to false again. Consider this example: create table p (a int, b char) partition by list (a); create table p1 (b char not null, a int check (a in (1))); insert into p1 values ('b', null); Note that not_null_attrs for p1 will contain 1 corresponding to column b, which matches key attribute of the parent, that is 1, corresponding to column a. Hence we end up wrongly concluding that p1's partition key column does not allow nulls. > I think this code could be removed entirely in light of commit > 3ec76ff1f2cf52e9b900349957b42d28128b7bc7. I am assuming you think that because now we emit IS NOT NULL constraint internally for any partition keys that do not allow null values (including all the keys of range partitions as of commit 3ec76ff1f2cf52e9b900349957b42d28128b7bc7). But those IS NOT NULL constraint expressions are inconclusive as far as the application of predicate_implied_by() to determine if we can skip the scan is concerned. So even if predicate_implied_by() returned 'true', we cannot conclude, just based on that result, that there are not any null values in the partition keys. The code in question is there to check if there are explicit NOT NULL constraints on the partition keys. It cannot be true for expression keys, so we give up on skip_validate in that case anyway. But if 1) there are no expression keys, 2) all the partition key columns of the table being attached have NOT NULL constraint, and 3) predicate_implied_by() returned 'true', we can skip the scan. Thoughts? I am working on a patch to fix the above mentioned issues and will post the same no later than Friday. Thanks, Amit
On Thu, Jun 8, 2017 at 3:13 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/08 1:44, Robert Haas wrote: >> On Wed, Jun 7, 2017 at 7:47 AM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> In ATExecAttachPartition() there's following code >>> >>> 13715 partnatts = get_partition_natts(key); >>> 13716 for (i = 0; i < partnatts; i++) >>> 13717 { >>> 13718 AttrNumber partattno; >>> 13719 >>> 13720 partattno = get_partition_col_attnum(key, i); >>> 13721 >>> 13722 /* If partition key is an expression, must not skip >>> validation */ >>> 13723 if (!partition_accepts_null && >>> 13724 (partattno == 0 || >>> 13725 !bms_is_member(partattno, not_null_attrs))) >>> 13726 skip_validate = false; >>> 13727 } >>> >>> partattno obtained from the partition key is the attribute number of >>> the partitioned table but not_null_attrs contains the attribute >>> numbers of attributes of the table being attached which have NOT NULL >>> constraint on them. But the attribute numbers of partitioned table and >>> the table being attached may not agree i.e. partition key attribute in >>> partitioned table may have a different position in the table being >>> attached. So, this code looks buggy. Probably we don't have a test >>> which tests this code with different attribute order between >>> partitioned table and the table being attached. I didn't get time to >>> actually construct a testcase and test it. > > There seem to be couple of bugs here: > > 1. When partition's key attributes differ in ordering from the parent, > predicate_implied_by() will give up due to structural inequality of > Vars in the expressions. By fixing this, we can get it to return > 'true' when it's really so. > > 2. As you said, we store partition's attribute numbers in the > not_null_attrs bitmap, but then check partattno (which is the parent's > attribute number which might differ) against the bitmap, which seems > like it might produce incorrect result. If, for example, > predicate_implied_by() set skip_validate to true, and the above code > failed to set skip_validate to false where it should have, then we > would wrongly end up skipping the scan. That is, rows in the partition > will contain null values whereas the partition constraint does not > allow it. It's hard to reproduce this currently, because with > different ordering of attributes, predicate_refute_by() never returns > true (as mentioned in 1 above), so skip_validate does not need to be > set to false again. > > Consider this example: > > create table p (a int, b char) partition by list (a); > > create table p1 (b char not null, a int check (a in (1))); > insert into p1 values ('b', null); > > Note that not_null_attrs for p1 will contain 1 corresponding to column b, > which matches key attribute of the parent, that is 1, corresponding to > column a. Hence we end up wrongly concluding that p1's partition key > column does not allow nulls. > >> I think this code could be removed entirely in light of commit >> 3ec76ff1f2cf52e9b900349957b42d28128b7bc7. > > I am assuming you think that because now we emit IS NOT NULL constraint > internally for any partition keys that do not allow null values (including > all the keys of range partitions as of commit > 3ec76ff1f2cf52e9b900349957b42d28128b7bc7). But those IS NOT NULL > constraint expressions are inconclusive as far as the application of > predicate_implied_by() to determine if we can skip the scan is concerned. > So even if predicate_implied_by() returned 'true', we cannot conclude, > just based on that result, that there are not any null values in the > partition keys. I am not able to understand this. Are you saying that predicate_implied_by() returns true even when it's not implied when NOT NULL constraints are involved? That sounds like a bug in predicate_implied_by(), which should be fixed instead of adding code to pepper over it? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 2017/06/08 19:25, Ashutosh Bapat wrote: > On Thu, Jun 8, 2017 at 3:13 PM, Amit Langote >>> I think this code could be removed entirely in light of commit >>> 3ec76ff1f2cf52e9b900349957b42d28128b7bc7. >> >> I am assuming you think that because now we emit IS NOT NULL constraint >> internally for any partition keys that do not allow null values (including >> all the keys of range partitions as of commit >> 3ec76ff1f2cf52e9b900349957b42d28128b7bc7). But those IS NOT NULL >> constraint expressions are inconclusive as far as the application of >> predicate_implied_by() to determine if we can skip the scan is concerned. >> So even if predicate_implied_by() returned 'true', we cannot conclude, >> just based on that result, that there are not any null values in the >> partition keys. > > I am not able to understand this. Are you saying that > predicate_implied_by() returns true even when it's not implied when > NOT NULL constraints are involved? That sounds like a bug in > predicate_implied_by(), which should be fixed instead of adding code > to pepper over it? No, it's not a bug of predicate_implied_by(). I meant to say predicate_implied_by() isn't exactly designed for ATExecAttachPartition's purpose, especially its treatment of IS NOT NULL constraints is not suitable for this application. To prove that the table cannot contain NULLs when it shouldn't because of the partition constraint, we must look for explicit NOT NULL constraints on the partition key columns, instead of relying on the predicate_implied_by()'s proof. See the following explanation for why that is so (or at least I think is so): There is this text in the header comment of predicate_implied_by_simple_clause(), which is where the individual pairs of expressions are compared and/or passed to operator_predicate_proof(), which mentions that the treatment of IS NOT NULL predicate is based on the assumption that 'restrictions' that are passed to predicate_implied_by() are a query's WHERE clause restrictions, *not* CHECK constraints that are checked when inserting data into a table. * When the predicate is of the form "foo IS NOT NULL", we can conclude that* the predicate is implied if the clause is astrict operator or function* that has "foo" as an input. In this case the clause must yield NULL when* "foo" is NULL, whichwe can take as equivalent to FALSE because we know* we are within an AND/OR subtree of a WHERE clause. (Again, "foo"is* already known immutable, so the clause will certainly always fail.)* Also, if the clause is just "foo" (meaningit's a boolean variable),* the predicate is implied since the clause can't be true if "foo" is NULL. As mentioned above, note the following part: which we can take as equivalent to FALSE because we know we are within an AND/OR subtree of a WHERE clause. Which is not what we are passing to predicate_implied_by() in ATExecAttachPartition(). We are passing it the table's CHECK constraint clauses which behave differently for the NULL result on NULL input - they *allow* the row to be inserted. Which means that there will be rows with NULLs in the partition key, even if predicate_refuted_by() said that there cannot be. We will end up *wrongly* skipping the validation scan if we relied on just the predicate_refuted_by()'s result. That's why there is code to check for explicit NOT NULL constraints on the partition key columns. If there are, it's OK then to assume that all the partition constraints are satisfied by existing constraints. One more thing: if any partition key happens to be an expression, which there cannot be NOT NULL constraints for, we just give up on skipping the scan, because we don't have any declared knowledge about whether those keys are also non-null, which we want for partitions that do not accept null values. Does that make sense? Thanks, Amit PS: Also interesting to note is the difference in behavior between ExecQual() and ExecCheck() on NULL result.
On 2017/06/08 18:43, Amit Langote wrote: > On 2017/06/08 1:44, Robert Haas wrote: >> On Wed, Jun 7, 2017 at 7:47 AM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> In ATExecAttachPartition() there's following code >>> >>> 13715 partnatts = get_partition_natts(key); >>> 13716 for (i = 0; i < partnatts; i++) >>> 13717 { >>> 13718 AttrNumber partattno; >>> 13719 >>> 13720 partattno = get_partition_col_attnum(key, i); >>> 13721 >>> 13722 /* If partition key is an expression, must not skip >>> validation */ >>> 13723 if (!partition_accepts_null && >>> 13724 (partattno == 0 || >>> 13725 !bms_is_member(partattno, not_null_attrs))) >>> 13726 skip_validate = false; >>> 13727 } >>> >>> partattno obtained from the partition key is the attribute number of >>> the partitioned table but not_null_attrs contains the attribute >>> numbers of attributes of the table being attached which have NOT NULL >>> constraint on them. But the attribute numbers of partitioned table and >>> the table being attached may not agree i.e. partition key attribute in >>> partitioned table may have a different position in the table being >>> attached. So, this code looks buggy. Probably we don't have a test >>> which tests this code with different attribute order between >>> partitioned table and the table being attached. I didn't get time to >>> actually construct a testcase and test it. > > There seem to be couple of bugs here: > > 1. When partition's key attributes differ in ordering from the parent, > predicate_implied_by() will give up due to structural inequality of > Vars in the expressions. By fixing this, we can get it to return > 'true' when it's really so. > > 2. As you said, we store partition's attribute numbers in the > not_null_attrs bitmap, but then check partattno (which is the parent's > attribute number which might differ) against the bitmap, which seems > like it might produce incorrect result. If, for example, > predicate_implied_by() set skip_validate to true, and the above code > failed to set skip_validate to false where it should have, then we > would wrongly end up skipping the scan. That is, rows in the partition > will contain null values whereas the partition constraint does not > allow it. It's hard to reproduce this currently, because with > different ordering of attributes, predicate_refute_by() never returns > true (as mentioned in 1 above), so skip_validate does not need to be > set to false again. > > Consider this example: > > create table p (a int, b char) partition by list (a); > > create table p1 (b char not null, a int check (a in (1))); > insert into p1 values ('b', null); > > Note that not_null_attrs for p1 will contain 1 corresponding to column b, > which matches key attribute of the parent, that is 1, corresponding to > column a. Hence we end up wrongly concluding that p1's partition key > column does not allow nulls. [ ... ] > I am working on a patch to fix the above mentioned issues and will post > the same no later than Friday. And here is the patch. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Jun 9, 2017 at 10:31 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/08 19:25, Ashutosh Bapat wrote: >> On Thu, Jun 8, 2017 at 3:13 PM, Amit Langote >>>> I think this code could be removed entirely in light of commit >>>> 3ec76ff1f2cf52e9b900349957b42d28128b7bc7. >>> >>> I am assuming you think that because now we emit IS NOT NULL constraint >>> internally for any partition keys that do not allow null values (including >>> all the keys of range partitions as of commit >>> 3ec76ff1f2cf52e9b900349957b42d28128b7bc7). But those IS NOT NULL >>> constraint expressions are inconclusive as far as the application of >>> predicate_implied_by() to determine if we can skip the scan is concerned. >>> So even if predicate_implied_by() returned 'true', we cannot conclude, >>> just based on that result, that there are not any null values in the >>> partition keys. >> >> I am not able to understand this. Are you saying that >> predicate_implied_by() returns true even when it's not implied when >> NOT NULL constraints are involved? That sounds like a bug in >> predicate_implied_by(), which should be fixed instead of adding code >> to pepper over it? > > No, it's not a bug of predicate_implied_by(). I meant to say > predicate_implied_by() isn't exactly designed for ATExecAttachPartition's > purpose, especially its treatment of IS NOT NULL constraints is not > suitable for this application. To prove that the table cannot contain > NULLs when it shouldn't because of the partition constraint, we must look > for explicit NOT NULL constraints on the partition key columns, instead of > relying on the predicate_implied_by()'s proof. See the following > explanation for why that is so (or at least I think is so): > > There is this text in the header comment of > predicate_implied_by_simple_clause(), which is where the individual pairs > of expressions are compared and/or passed to operator_predicate_proof(), > which mentions that the treatment of IS NOT NULL predicate is based on the > assumption that 'restrictions' that are passed to predicate_implied_by() > are a query's WHERE clause restrictions, *not* CHECK constraints that are > checked when inserting data into a table. > > * When the predicate is of the form "foo IS NOT NULL", we can conclude that > * the predicate is implied if the clause is a strict operator or function > * that has "foo" as an input. In this case the clause must yield NULL when > * "foo" is NULL, which we can take as equivalent to FALSE because we know > * we are within an AND/OR subtree of a WHERE clause. (Again, "foo" is > * already known immutable, so the clause will certainly always fail.) > * Also, if the clause is just "foo" (meaning it's a boolean variable), > * the predicate is implied since the clause can't be true if "foo" is NULL. > > As mentioned above, note the following part: which we can take as > equivalent to FALSE because we know we are within an AND/OR subtree of a > WHERE clause. > > Which is not what we are passing to predicate_implied_by() in > ATExecAttachPartition(). We are passing it the table's CHECK constraint > clauses which behave differently for the NULL result on NULL input - they > *allow* the row to be inserted. Which means that there will be rows with > NULLs in the partition key, even if predicate_refuted_by() said that there > cannot be. We will end up *wrongly* skipping the validation scan if we > relied on just the predicate_refuted_by()'s result. That's why there is > code to check for explicit NOT NULL constraints on the partition key > columns. If there are, it's OK then to assume that all the partition > constraints are satisfied by existing constraints. One more thing: if any > partition key happens to be an expression, which there cannot be NOT NULL > constraints for, we just give up on skipping the scan, because we don't > have any declared knowledge about whether those keys are also non-null, > which we want for partitions that do not accept null values. > > Does that make sense? Thanks for the long explanation. I guess, this should be written in comments somewhere in the code there. I see following comment in ATExecAttachPartition() -- * * There is a case in which we cannot rely on just the result of the * proof. */ -- I guess, this comment should be expanded to explain what you wrote above. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
May be we should pass a flag to predicate_implied_by() to handle NULL behaviour for CHECK constraints. Partitioning has shown that it needs to use predicate_implied_by() for comparing constraints and there may be other cases that can come up in future. Instead of handling it outside predicate_implied_by() we may want to change it under a flag. On Fri, Jun 9, 2017 at 11:43 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/08 18:43, Amit Langote wrote: >> On 2017/06/08 1:44, Robert Haas wrote: >>> On Wed, Jun 7, 2017 at 7:47 AM, Ashutosh Bapat >>> <ashutosh.bapat@enterprisedb.com> wrote: >>>> In ATExecAttachPartition() there's following code >>>> >>>> 13715 partnatts = get_partition_natts(key); >>>> 13716 for (i = 0; i < partnatts; i++) >>>> 13717 { >>>> 13718 AttrNumber partattno; >>>> 13719 >>>> 13720 partattno = get_partition_col_attnum(key, i); >>>> 13721 >>>> 13722 /* If partition key is an expression, must not skip >>>> validation */ >>>> 13723 if (!partition_accepts_null && >>>> 13724 (partattno == 0 || >>>> 13725 !bms_is_member(partattno, not_null_attrs))) >>>> 13726 skip_validate = false; >>>> 13727 } >>>> >>>> partattno obtained from the partition key is the attribute number of >>>> the partitioned table but not_null_attrs contains the attribute >>>> numbers of attributes of the table being attached which have NOT NULL >>>> constraint on them. But the attribute numbers of partitioned table and >>>> the table being attached may not agree i.e. partition key attribute in >>>> partitioned table may have a different position in the table being >>>> attached. So, this code looks buggy. Probably we don't have a test >>>> which tests this code with different attribute order between >>>> partitioned table and the table being attached. I didn't get time to >>>> actually construct a testcase and test it. >> >> There seem to be couple of bugs here: >> >> 1. When partition's key attributes differ in ordering from the parent, >> predicate_implied_by() will give up due to structural inequality of >> Vars in the expressions. By fixing this, we can get it to return >> 'true' when it's really so. >> >> 2. As you said, we store partition's attribute numbers in the >> not_null_attrs bitmap, but then check partattno (which is the parent's >> attribute number which might differ) against the bitmap, which seems >> like it might produce incorrect result. If, for example, >> predicate_implied_by() set skip_validate to true, and the above code >> failed to set skip_validate to false where it should have, then we >> would wrongly end up skipping the scan. That is, rows in the partition >> will contain null values whereas the partition constraint does not >> allow it. It's hard to reproduce this currently, because with >> different ordering of attributes, predicate_refute_by() never returns >> true (as mentioned in 1 above), so skip_validate does not need to be >> set to false again. >> >> Consider this example: >> >> create table p (a int, b char) partition by list (a); >> >> create table p1 (b char not null, a int check (a in (1))); >> insert into p1 values ('b', null); >> >> Note that not_null_attrs for p1 will contain 1 corresponding to column b, >> which matches key attribute of the parent, that is 1, corresponding to >> column a. Hence we end up wrongly concluding that p1's partition key >> column does not allow nulls. > > [ ... ] > >> I am working on a patch to fix the above mentioned issues and will post >> the same no later than Friday. > > And here is the patch. > > Thanks, > Amit -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 2017/06/09 20:47, Ashutosh Bapat wrote: > Thanks for the long explanation. I guess, this should be written in > comments somewhere in the code there. I see following comment in > ATExecAttachPartition() > -- > * > * There is a case in which we cannot rely on just the result of the > * proof. > */ > > -- > > I guess, this comment should be expanded to explain what you wrote above. Tried in the attached updated patch. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2017/06/09 20:49, Ashutosh Bapat wrote: > May be we should pass a flag to predicate_implied_by() to handle NULL > behaviour for CHECK constraints. Partitioning has shown that it needs > to use predicate_implied_by() for comparing constraints and there may > be other cases that can come up in future. Instead of handling it > outside predicate_implied_by() we may want to change it under a flag. IMHO, it may not be a good idea to modify predtest.c to suit the partitioning code's needs. The workaround of checking that NOT NULL constraints on partitioning columns exist seems to me to be simpler than hacking predtest.c to teach it about the new behavior. Thanks, Amit
On Mon, Jun 12, 2017 at 4:09 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/09 20:49, Ashutosh Bapat wrote: >> May be we should pass a flag to predicate_implied_by() to handle NULL >> behaviour for CHECK constraints. Partitioning has shown that it needs >> to use predicate_implied_by() for comparing constraints and there may >> be other cases that can come up in future. Instead of handling it >> outside predicate_implied_by() we may want to change it under a flag. > > IMHO, it may not be a good idea to modify predtest.c to suit the > partitioning code's needs. The workaround of checking that NOT NULL > constraints on partitioning columns exist seems to me to be simpler than > hacking predtest.c to teach it about the new behavior. On the plus side, it might also work correctly. I mean, the problem with what you've done here is that (a) you're completely giving up on expressions as partition keys and (b) even if no expressions are used for partitioning, you're still giving up unless there are NOT NULL constraints on the partitions. Now, maybe that doesn't sound so bad, but what it means is that if you copy-and-paste the partition constraint into a CHECK constraint on a new table, you can't skip the validation scan when attaching it: rhaas=# create table foo (a int, b text) partition by range (a); CREATE TABLE rhaas=# create table foo1 partition of foo for values from (0) to (10); CREATE TABLE rhaas=# \d+ foo1 Table "public.foo1"Column | Type | Collation | Nullable | Default |Storage | Stats target | Description --------+---------+-----------+----------+---------+----------+--------------+-------------a | integer | | | | plain | |b | text | | | | extended | | Partition of: foo FOR VALUES FROM (0) TO (10) Partition constraint: ((a IS NOT NULL) AND (a >= 0) AND (a < 10)) rhaas=# drop table foo1; DROP TABLE rhaas=# create table foo1 (like foo, check ((a IS NOT NULL) AND (a >= 0) AND (a < 10))); CREATE TABLE rhaas=# alter table foo attach partition foo1 for values from (0) to (10); ALTER TABLE I think that's going to come as an unpleasant surprise to more than one user. I'm not sure exactly how we need to restructure things here so that this works properly, and maybe modifying predicate_implied_by() isn't the right thing at all; for instance, there's also predicate_refuted_by(), which maybe could be used in some way (inject NOT?). But I don't much like the idea that you copy and paste the partitioning constraint into a CHECK constraint and that doesn't work. That's not cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jun 13, 2017 at 10:24 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I think that's going to come as an unpleasant surprise to more than > one user. I'm not sure exactly how we need to restructure things here > so that this works properly, and maybe modifying > predicate_implied_by() isn't the right thing at all; for instance, > there's also predicate_refuted_by(), which maybe could be used in some > way (inject NOT?). But I don't much like the idea that you copy and > paste the partitioning constraint into a CHECK constraint and that > doesn't work. That's not cool. OK, I think I see the problem here. predicate_implied_by() and predicate_refuted_by() differ in what they assume about the predicate evaluating to NULL, but both of them assume that if the clause evaluates to NULL, that's equivalent to false. So there's actually no option to get the behavior we want here, which is to treat both operands using CHECK-semantics (null is true) rather than WHERE-semantics (null is false). Given that, Ashutosh's proposal of passing an additional flag to predicate_implied_by() seems like the best option. Here's a patch implementing that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > OK, I think I see the problem here. predicate_implied_by() and > predicate_refuted_by() differ in what they assume about the predicate > evaluating to NULL, but both of them assume that if the clause > evaluates to NULL, that's equivalent to false. So there's actually no > option to get the behavior we want here, which is to treat both > operands using CHECK-semantics (null is true) rather than > WHERE-semantics (null is false). > Given that, Ashutosh's proposal of passing an additional flag to > predicate_implied_by() seems like the best option. Here's a patch > implementing that. I've not reviewed the logic changes in predtest.c in detail, but I think this is a reasonable direction to go in. Two suggestions: 1. predicate_refuted_by() should grow the extra argument at the same time. There's no good reason to be asymmetric. 2. It might be clearer, and would certainly be shorter, to call the extra argument something like "null_is_true". regards, tom lane
On Tue, Jun 13, 2017 at 5:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> OK, I think I see the problem here. predicate_implied_by() and >> predicate_refuted_by() differ in what they assume about the predicate >> evaluating to NULL, but both of them assume that if the clause >> evaluates to NULL, that's equivalent to false. So there's actually no >> option to get the behavior we want here, which is to treat both >> operands using CHECK-semantics (null is true) rather than >> WHERE-semantics (null is false). > >> Given that, Ashutosh's proposal of passing an additional flag to >> predicate_implied_by() seems like the best option. Here's a patch >> implementing that. > > I've not reviewed the logic changes in predtest.c in detail, but > I think this is a reasonable direction to go in. Two suggestions: > > 1. predicate_refuted_by() should grow the extra argument at the > same time. There's no good reason to be asymmetric. OK. > 2. It might be clearer, and would certainly be shorter, to call the > extra argument something like "null_is_true". I think it's pretty darn important to make it clear that the argument only applies to the clauses supplied as axioms, and not to the predicate to be proven; if you want to control how the *predicate* is handled with respect to nulls, change your selection as among predicate_implied_by() and predicate_refuted_by(). For that reason, I disesteem null_is_true, but I'm open to other suggestions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017/06/13 23:24, Robert Haas wrote: > On Mon, Jun 12, 2017 at 4:09 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> On 2017/06/09 20:49, Ashutosh Bapat wrote: >>> May be we should pass a flag to predicate_implied_by() to handle NULL >>> behaviour for CHECK constraints. Partitioning has shown that it needs >>> to use predicate_implied_by() for comparing constraints and there may >>> be other cases that can come up in future. Instead of handling it >>> outside predicate_implied_by() we may want to change it under a flag. >> >> IMHO, it may not be a good idea to modify predtest.c to suit the >> partitioning code's needs. The workaround of checking that NOT NULL >> constraints on partitioning columns exist seems to me to be simpler than >> hacking predtest.c to teach it about the new behavior. > > On the plus side, it might also work correctly. I mean, the problem > with what you've done here is that (a) you're completely giving up on > expressions as partition keys and (b) even if no expressions are used > for partitioning, you're still giving up unless there are NOT NULL > constraints on the partitions. Now, maybe that doesn't sound so bad, > but what it means is that if you copy-and-paste the partition > constraint into a CHECK constraint on a new table, you can't skip the > validation scan when attaching it: > > rhaas=# create table foo (a int, b text) partition by range (a); > CREATE TABLE > rhaas=# create table foo1 partition of foo for values from (0) to (10); > CREATE TABLE > rhaas=# \d+ foo1 > Table "public.foo1" > Column | Type | Collation | Nullable | Default | Storage | Stats > target | Description > --------+---------+-----------+----------+---------+----------+--------------+------------- > a | integer | | | | plain | | > b | text | | | | extended | | > Partition of: foo FOR VALUES FROM (0) TO (10) > Partition constraint: ((a IS NOT NULL) AND (a >= 0) AND (a < 10)) > > rhaas=# drop table foo1; > DROP TABLE > rhaas=# create table foo1 (like foo, check ((a IS NOT NULL) AND (a >= > 0) AND (a < 10))); > CREATE TABLE > rhaas=# alter table foo attach partition foo1 for values from (0) to (10); > ALTER TABLE > > I think that's going to come as an unpleasant surprise to more than > one user. I'm not sure exactly how we need to restructure things here > so that this works properly, and maybe modifying > predicate_implied_by() isn't the right thing at all; for instance, > there's also predicate_refuted_by(), which maybe could be used in some > way (inject NOT?). But I don't much like the idea that you copy and > paste the partitioning constraint into a CHECK constraint and that > doesn't work. That's not cool. I agree with this argument. I just tried the patch you posted in the other email and I like how easy it makes the life for users in that they can just look at the partition constraint of an existing partition (thanks to 1848b73d45!) and frame the check constraint of the new partition to attach accordingly. IOW, +1 from me to the Ashutosh's idea. Thanks, Amit
On 2017/06/14 5:36, Robert Haas wrote: > On Tue, Jun 13, 2017 at 10:24 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I think that's going to come as an unpleasant surprise to more than >> one user. I'm not sure exactly how we need to restructure things here >> so that this works properly, and maybe modifying >> predicate_implied_by() isn't the right thing at all; for instance, >> there's also predicate_refuted_by(), which maybe could be used in some >> way (inject NOT?). But I don't much like the idea that you copy and >> paste the partitioning constraint into a CHECK constraint and that >> doesn't work. That's not cool. > > OK, I think I see the problem here. predicate_implied_by() and > predicate_refuted_by() differ in what they assume about the predicate > evaluating to NULL, but both of them assume that if the clause > evaluates to NULL, that's equivalent to false. So there's actually no > option to get the behavior we want here, which is to treat both > operands using CHECK-semantics (null is true) rather than > WHERE-semantics (null is false). > > Given that, Ashutosh's proposal of passing an additional flag to > predicate_implied_by() seems like the best option. Here's a patch > implementing that. I tried this patch and it seems to work correctly. Some comments on the patch itself: The following was perhaps included for debugging? +#include "nodes/print.h" I think the following sentence in a comment near the affected code in ATExecAttachPartition() needs to be removed. * * There is a case in which we cannot rely on just the result of the * proof. We no longer need the Bitmapset not_null_attrs. So, the line declaring it and the following line can be removed: not_null_attrs = bms_add_member(not_null_attrs, i); I thought I would make these changes myself and send the v2, but realized that you might be updating it yourself based on Tom's comments, so didn't. By the way, I mentioned an existing problem in one of the earlier emails on this thread about differing attribute numbers in the table being attached causing predicate_implied_by() to give up due to structural inequality of Vars. To clarify: table's check constraints will bear the table's attribute numbers whereas the partition constraint generated using get_qual_for_partbound() (the predicate) bears the parent's attribute numbers. That results in Var arguments of the expressions passed to predicate_implied_by() not matching and causing the latter to return failure prematurely. Attached find a patch to fix that that applies on top of your patch (added a test too). Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
PFA patch set addressing comments by Tom and Amit. 0001 is same as Robert's patch. On Wed, Jun 14, 2017 at 7:20 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 13, 2017 at 5:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> OK, I think I see the problem here. predicate_implied_by() and >>> predicate_refuted_by() differ in what they assume about the predicate >>> evaluating to NULL, but both of them assume that if the clause >>> evaluates to NULL, that's equivalent to false. So there's actually no >>> option to get the behavior we want here, which is to treat both >>> operands using CHECK-semantics (null is true) rather than >>> WHERE-semantics (null is false). >> >>> Given that, Ashutosh's proposal of passing an additional flag to >>> predicate_implied_by() seems like the best option. Here's a patch >>> implementing that. >> >> I've not reviewed the logic changes in predtest.c in detail, but >> I think this is a reasonable direction to go in. Two suggestions: >> >> 1. predicate_refuted_by() should grow the extra argument at the >> same time. There's no good reason to be asymmetric. > > OK. 0002 has these changes. > >> 2. It might be clearer, and would certainly be shorter, to call the >> extra argument something like "null_is_true". > > I think it's pretty darn important to make it clear that the argument > only applies to the clauses supplied as axioms, and not to the > predicate to be proven; if you want to control how the *predicate* is > handled with respect to nulls, change your selection as among > predicate_implied_by() and predicate_refuted_by(). For that reason, I > disesteem null_is_true, but I'm open to other suggestions. The extern functions viz. predicate_refuted_by() and predicate_implied_by() both accept restrictinfo_list and so the new argument gets name restrictinfo_is_check, which is fine. But the static minions have the corresponding argument named clause but the new argument isn't named clause_is_check. I think it would be better to be consistent everywhere and use either clause or restrictinfo. 0004 patch does that, it renames restrictinfo_list as clause_list and the boolean argument as clause_is_check. 0003 addresses comments by Amit Langote. In your original patch, if restrictinfo_is_check is true, we will call operator_predicate_proof(), which does not handle anything other than an operator expression. So, it's going to return false from that function. Looking at the way argisrow is handled in that function, it looks like we don't want to pass NullTest expression to operator_predicate_proof(). 0005 handles the boolean flag in the same way as argisrow is handled. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Jun 14, 2017 at 9:20 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > > By the way, I mentioned an existing problem in one of the earlier emails > on this thread about differing attribute numbers in the table being > attached causing predicate_implied_by() to give up due to structural > inequality of Vars. To clarify: table's check constraints will bear the > table's attribute numbers whereas the partition constraint generated using > get_qual_for_partbound() (the predicate) bears the parent's attribute > numbers. That results in Var arguments of the expressions passed to > predicate_implied_by() not matching and causing the latter to return > failure prematurely. Attached find a patch to fix that that applies on > top of your patch (added a test too). + * Adjust the generated constraint to match this partition's attribute + * numbers. Save the original to be used later if we decide to proceed + * with the validation scan after all. + */ + partConstraintOrig = copyObject(partConstraint); + partConstraint = map_partition_varattnos(partConstraint, 1, attachRel, + rel); + If the partition has different column order than the parent, its heap will also have different column order. I am not able to understand the purpose of using original constraints for validation using scan. Shouldn't we just use the mapped constraint expressions? BTW I liked the idea; this way we can keep part_6 in sync with list_parted2 even when the later changes and still manage to have different order of attributes. Although the CHECK still assumes that there is a column "a" but that's fine I guess. +CREATE TABLE part_6 ( + c int, + LIKE list_parted2, + CONSTRAINT check_a CHECK (a IS NOT NULL AND a = 6) +); +ALTER TABLE part_6 DROP c; -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Wed, Jun 14, 2017 at 6:15 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > PFA patch set addressing comments by Tom and Amit. LGTM. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Thanks for taking a look. On 2017/06/14 20:06, Ashutosh Bapat wrote: > On Wed, Jun 14, 2017 at 9:20 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> >> By the way, I mentioned an existing problem in one of the earlier emails >> on this thread about differing attribute numbers in the table being >> attached causing predicate_implied_by() to give up due to structural >> inequality of Vars. To clarify: table's check constraints will bear the >> table's attribute numbers whereas the partition constraint generated using >> get_qual_for_partbound() (the predicate) bears the parent's attribute >> numbers. That results in Var arguments of the expressions passed to >> predicate_implied_by() not matching and causing the latter to return >> failure prematurely. Attached find a patch to fix that that applies on >> top of your patch (added a test too). > > + * Adjust the generated constraint to match this partition's attribute > + * numbers. Save the original to be used later if we decide to proceed > + * with the validation scan after all. > + */ > + partConstraintOrig = copyObject(partConstraint); > + partConstraint = map_partition_varattnos(partConstraint, 1, attachRel, > + rel); > + > If the partition has different column order than the parent, its heap will also > have different column order. I am not able to understand the purpose of using > original constraints for validation using scan. Shouldn't we just use the > mapped constraint expressions? Actually, I dropped the approach of using partConstraintOrig altogether from the latest updated patch. I will explain the problem I was trying to solve with that approach, which is now replaced in the new patch by, I think, a more correct solution. If we end up having to perform the validation scan and the table being attached is a partitioned table, we will scan its leaf partitions. Each of those leaf partitions may have different attribute numbers for the partitioning columns, so we will need to do the mapping, which actually we do even today. With this patch however, we apply mapping to the generated partition constraint so that it no longer bears the original parent's attribute numbers but those of the table being attached. Down below where we map to the leaf partition's attribute numbers, we still pass the root partitioned table as the parent. But it may so happen that the attnos appearing in the Vars can no longer be matched with any of the root table's attribute numbers, resulting in the following code in map_variable_attnos_mutator() to trigger an error: if (attno > context->map_length || context->attno_map[attno - 1] == 0) elog(ERROR, "unexpected varattno %d in expression to be mapped", attno); Consider this example: root: (a, b, c) partition by list (a) intermediate: (b, c, ..dropped.., a) partition by list (b) leaf: (b, c, a) partition of intermediate When attaching intermediate to root, we will generate the partition constraint and after mapping, its Vars will have attno = 4. When trying to map the same for leaf, we currently do map_partition_varattnos(expr, 1, leaf, root). So, the innards of map_variable_attnos will try to look for an attribute with attno = 4 in root which there isn't, so the above error will occur. We should really be passing intermediate as parent to the mapping routine. With the previous patch's approach, we would pass root as the parent along with partConstraintOrig which would bear the root parent's attnos. Please find attached the updated patch. In addition to the already described fixes, the patch also rearranges code so that a redundant AT work queue entry is avoided. (Currently, we end up creating one for attachRel even if it's partitioned, although it's harmless because ATRewriteTables() knows to skip partitioned tables.) Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Jun 15, 2017 at 10:46 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Thanks for taking a look. > > On 2017/06/14 20:06, Ashutosh Bapat wrote: >> On Wed, Jun 14, 2017 at 9:20 AM, Amit Langote >> <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> >>> By the way, I mentioned an existing problem in one of the earlier emails >>> on this thread about differing attribute numbers in the table being >>> attached causing predicate_implied_by() to give up due to structural >>> inequality of Vars. To clarify: table's check constraints will bear the >>> table's attribute numbers whereas the partition constraint generated using >>> get_qual_for_partbound() (the predicate) bears the parent's attribute >>> numbers. That results in Var arguments of the expressions passed to >>> predicate_implied_by() not matching and causing the latter to return >>> failure prematurely. Attached find a patch to fix that that applies on >>> top of your patch (added a test too). >> >> + * Adjust the generated constraint to match this partition's attribute >> + * numbers. Save the original to be used later if we decide to proceed >> + * with the validation scan after all. >> + */ >> + partConstraintOrig = copyObject(partConstraint); >> + partConstraint = map_partition_varattnos(partConstraint, 1, attachRel, >> + rel); >> + >> If the partition has different column order than the parent, its heap will also >> have different column order. I am not able to understand the purpose of using >> original constraints for validation using scan. Shouldn't we just use the >> mapped constraint expressions? > > Actually, I dropped the approach of using partConstraintOrig altogether > from the latest updated patch. I will explain the problem I was trying to > solve with that approach, which is now replaced in the new patch by, I > think, a more correct solution. > I agree. > If we end up having to perform the validation scan and the table being > attached is a partitioned table, we will scan its leaf partitions. Each > of those leaf partitions may have different attribute numbers for the > partitioning columns, so we will need to do the mapping, which actually we > do even today. With this patch however, we apply mapping to the generated > partition constraint so that it no longer bears the original parent's > attribute numbers but those of the table being attached. Down below where > we map to the leaf partition's attribute numbers, we still pass the root > partitioned table as the parent. But it may so happen that the attnos > appearing in the Vars can no longer be matched with any of the root > table's attribute numbers, resulting in the following code in > map_variable_attnos_mutator() to trigger an error: > > if (attno > context->map_length || context->attno_map[attno - 1] == 0) > elog(ERROR, "unexpected varattno %d in expression to be mapped", > attno); > > Consider this example: > > root: (a, b, c) partition by list (a) > intermediate: (b, c, ..dropped.., a) partition by list (b) > leaf: (b, c, a) partition of intermediate > > When attaching intermediate to root, we will generate the partition > constraint and after mapping, its Vars will have attno = 4. When trying > to map the same for leaf, we currently do map_partition_varattnos(expr, 1, > leaf, root). So, the innards of map_variable_attnos will try to look for > an attribute with attno = 4 in root which there isn't, so the above error > will occur. We should really be passing intermediate as parent to the > mapping routine. With the previous patch's approach, we would pass root > as the parent along with partConstraintOrig which would bear the root > parent's attnos. Thanks for the explanation. So, your earlier patch did map Vars correctly for the leaf partitions of the table being attached. > > Please find attached the updated patch. In addition to the already > described fixes, the patch also rearranges code so that a redundant AT > work queue entry is avoided. (Currently, we end up creating one for > attachRel even if it's partitioned, although it's harmless because > ATRewriteTables() knows to skip partitioned tables.) We are calling find_all_inheritors() on attachRel twice in this function, once to avoid circularity and second time for scheduling a scan. Why can't call it once and reuse the result? On the default partitioning thread [1] Robert commented that we should try to avoid queueing the subpartitions which have constraints that imply the new partitioning constraint. I think that comment applies to this code, which the refactoring patch has moved into a function. If you do this, instead of duplicating the code to gather existing constraints, please create a function for gathering constraints of a given relation and use it for the table being attached as well as its partitions. Also, we should avoid matching constraints for the table being attached twice, when it's not partitioned. Both of the above comments are not related to the bug that is being fixed, but they apply to the same code where the bug exists. So instead of fixing it twice, may be we should expand the scope of this work to cover other refactoring needed in this area. That might save us some rebasing and commits. + /* + * Adjust the constraint to match this partition. + * + * Since partConstraint contains attachRel's attnos due to the + * mapping we did just before attempting the proof above, we pass + * attachRel as the parent to map_partition_varattnos, not 'rel' + * which is the root parent. + */ May be reworded as "Adjust the partition constraints constructed for the table being attached for the leaf partition being validated." +CREATE TABLE part_7 ( + a int, + b char, + CONSTRAINT check_a CHECK (a IS NOT NULL AND a = 7) +) PARTITION BY LIST (b); +CREATE TABLE part_7_a_null ( + c int, + d int, + b char, + a int, -- 'a' will have attnum = 4 + CONSTRAINT check_b CHECK (b IS NULL OR b = 'a'), + CONSTRAINT check_a CHECK (a IS NOT NULL AND a = 7) +); Why not to use LIKE list_parted as you have done for part_6? [1] https://postgr.es/m/CA+TgmoZ+54-z+VJrxeuMmSV8aDWmuEQcsE9iFfhh=ZybomcZnw@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Thanks for the review. On 2017/06/15 16:08, Ashutosh Bapat wrote: > On Thu, Jun 15, 2017 at 10:46 AM, Amit Langote wrote: >> If we end up having to perform the validation scan and the table being >> attached is a partitioned table, we will scan its leaf partitions. Each >> of those leaf partitions may have different attribute numbers for the >> partitioning columns, so we will need to do the mapping, which actually we >> do even today. With this patch however, we apply mapping to the generated >> partition constraint so that it no longer bears the original parent's >> attribute numbers but those of the table being attached. Down below where >> we map to the leaf partition's attribute numbers, we still pass the root >> partitioned table as the parent. But it may so happen that the attnos >> appearing in the Vars can no longer be matched with any of the root >> table's attribute numbers, resulting in the following code in >> map_variable_attnos_mutator() to trigger an error: >> >> if (attno > context->map_length || context->attno_map[attno - 1] == 0) >> elog(ERROR, "unexpected varattno %d in expression to be mapped", >> attno); >> >> Consider this example: >> >> root: (a, b, c) partition by list (a) >> intermediate: (b, c, ..dropped.., a) partition by list (b) >> leaf: (b, c, a) partition of intermediate >> >> When attaching intermediate to root, we will generate the partition >> constraint and after mapping, its Vars will have attno = 4. When trying >> to map the same for leaf, we currently do map_partition_varattnos(expr, 1, >> leaf, root). So, the innards of map_variable_attnos will try to look for >> an attribute with attno = 4 in root which there isn't, so the above error >> will occur. We should really be passing intermediate as parent to the >> mapping routine. With the previous patch's approach, we would pass root >> as the parent along with partConstraintOrig which would bear the root >> parent's attnos. > > Thanks for the explanation. So, your earlier patch did map Vars > correctly for the leaf partitions of the table being attached. Yes I think. >> Please find attached the updated patch. In addition to the already >> described fixes, the patch also rearranges code so that a redundant AT >> work queue entry is avoided. (Currently, we end up creating one for >> attachRel even if it's partitioned, although it's harmless because >> ATRewriteTables() knows to skip partitioned tables.) > > We are calling find_all_inheritors() on attachRel twice in this function, once > to avoid circularity and second time for scheduling a scan. Why can't call it > once and reuse the result? Hmm, avoiding calling it twice would be a good idea. Also, I noticed that there might be a deadlock hazard here due to the lock-strength-upgrade thing happening here across the two calls. In the first call to find_all_inheritors(), we request AccessShareLock, whereas in the second, an AccessExclusiveLock. That ought to be fixed anyway I'd think. So, the first (and the only after this change) call will request an AccessExclusiveLock, even though we may not scan the leaf partitions if a suitable constraint exists on the table being attached. Note that previously, we would not have exclusive-locked the leaf partitions in such a case, although it was deadlock-prone/buggy anyway. The updated patch includes this fix. > On the default partitioning thread [1] Robert commented that we should try to > avoid queueing the subpartitions which have constraints that imply the new > partitioning constraint. I think that comment applies to this code, which the > refactoring patch has moved into a function. If you do this, instead of > duplicating the code to gather existing constraints, please create a function > for gathering constraints of a given relation and use it for the table being > attached as well as its partitions. Also, we should avoid matching > constraints for the table being attached twice, when it's not > partitioned. I guess you are talking about the case where the table being attached itself does not have a check constraint that would help avoid the scan, but its individual leaf partitions (if any) may. > Both of the above comments are not related to the bug that is being fixed, but > they apply to the same code where the bug exists. So instead of fixing it > twice, may be we should expand the scope of this work to cover other > refactoring needed in this area. That might save us some rebasing and commits. Are you saying that the patch posted on that thread should be brought over and discussed here? I tend to agree if that helps avoid muddying the default partition discussion with this refactoring work. > > + /* > + * Adjust the constraint to match this partition. > + * > + * Since partConstraint contains attachRel's attnos due to the > + * mapping we did just before attempting the proof above, we pass > + * attachRel as the parent to map_partition_varattnos, not 'rel' > + * which is the root parent. > + */ > May be reworded as "Adjust the partition constraints constructed for the table > being attached for the leaf partition being validated." Done, although worded slightly differently: Adjust the constraint that we constructed above for the table being attached so that it matches this partition's attribute numbers. > +CREATE TABLE part_7 ( > + a int, > + b char, > + CONSTRAINT check_a CHECK (a IS NOT NULL AND a = 7) > +) PARTITION BY LIST (b); > +CREATE TABLE part_7_a_null ( > + c int, > + d int, > + b char, > + a int, -- 'a' will have attnum = 4 > + CONSTRAINT check_b CHECK (b IS NULL OR b = 'a'), > + CONSTRAINT check_a CHECK (a IS NOT NULL AND a = 7) > +); > Why not to use LIKE list_parted as you have done for part_6? Sure, done. Please find the updated patch. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Jun 15, 2017 at 2:12 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Thanks for the review. > > On 2017/06/15 16:08, Ashutosh Bapat wrote: >> On Thu, Jun 15, 2017 at 10:46 AM, Amit Langote wrote: >>> If we end up having to perform the validation scan and the table being >>> attached is a partitioned table, we will scan its leaf partitions. Each >>> of those leaf partitions may have different attribute numbers for the >>> partitioning columns, so we will need to do the mapping, which actually we >>> do even today. With this patch however, we apply mapping to the generated >>> partition constraint so that it no longer bears the original parent's >>> attribute numbers but those of the table being attached. Down below where >>> we map to the leaf partition's attribute numbers, we still pass the root >>> partitioned table as the parent. But it may so happen that the attnos >>> appearing in the Vars can no longer be matched with any of the root >>> table's attribute numbers, resulting in the following code in >>> map_variable_attnos_mutator() to trigger an error: >>> >>> if (attno > context->map_length || context->attno_map[attno - 1] == 0) >>> elog(ERROR, "unexpected varattno %d in expression to be mapped", >>> attno); >>> >>> Consider this example: >>> >>> root: (a, b, c) partition by list (a) >>> intermediate: (b, c, ..dropped.., a) partition by list (b) >>> leaf: (b, c, a) partition of intermediate >>> >>> When attaching intermediate to root, we will generate the partition >>> constraint and after mapping, its Vars will have attno = 4. When trying >>> to map the same for leaf, we currently do map_partition_varattnos(expr, 1, >>> leaf, root). So, the innards of map_variable_attnos will try to look for >>> an attribute with attno = 4 in root which there isn't, so the above error >>> will occur. We should really be passing intermediate as parent to the >>> mapping routine. With the previous patch's approach, we would pass root >>> as the parent along with partConstraintOrig which would bear the root >>> parent's attnos. >> >> Thanks for the explanation. So, your earlier patch did map Vars >> correctly for the leaf partitions of the table being attached. > > Yes I think. > >>> Please find attached the updated patch. In addition to the already >>> described fixes, the patch also rearranges code so that a redundant AT >>> work queue entry is avoided. (Currently, we end up creating one for >>> attachRel even if it's partitioned, although it's harmless because >>> ATRewriteTables() knows to skip partitioned tables.) >> >> We are calling find_all_inheritors() on attachRel twice in this function, once >> to avoid circularity and second time for scheduling a scan. Why can't call it >> once and reuse the result? > > Hmm, avoiding calling it twice would be a good idea. > > Also, I noticed that there might be a deadlock hazard here due to the > lock-strength-upgrade thing happening here across the two calls. In the > first call to find_all_inheritors(), we request AccessShareLock, whereas > in the second, an AccessExclusiveLock. That ought to be fixed anyway I'd > think. > > So, the first (and the only after this change) call will request an > AccessExclusiveLock, even though we may not scan the leaf partitions if a > suitable constraint exists on the table being attached. Note that > previously, we would not have exclusive-locked the leaf partitions in such > a case, although it was deadlock-prone/buggy anyway. > > The updated patch includes this fix. > >> On the default partitioning thread [1] Robert commented that we should try to >> avoid queueing the subpartitions which have constraints that imply the new >> partitioning constraint. I think that comment applies to this code, which the >> refactoring patch has moved into a function. If you do this, instead of >> duplicating the code to gather existing constraints, please create a function >> for gathering constraints of a given relation and use it for the table being >> attached as well as its partitions. Also, we should avoid matching >> constraints for the table being attached twice, when it's not >> partitioned. > > I guess you are talking about the case where the table being attached > itself does not have a check constraint that would help avoid the scan, > but its individual leaf partitions (if any) may. Right. > >> Both of the above comments are not related to the bug that is being fixed, but >> they apply to the same code where the bug exists. So instead of fixing it >> twice, may be we should expand the scope of this work to cover other >> refactoring needed in this area. That might save us some rebasing and commits. > > Are you saying that the patch posted on that thread should be brought over > and discussed here? Not the whole patch, but that one particular comment, which applies to the existing code in ATExecAttachPartition(). If we fix the existing code in ATExecAttachPartition(), the refactoring patch there will inherit it when rebased. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 2017/06/15 17:53, Ashutosh Bapat wrote: > On Thu, Jun 15, 2017 at 2:12 PM, Amit Langote wrote: >>> Both of the above comments are not related to the bug that is being fixed, but >>> they apply to the same code where the bug exists. So instead of fixing it >>> twice, may be we should expand the scope of this work to cover other >>> refactoring needed in this area. That might save us some rebasing and commits. >> >> Are you saying that the patch posted on that thread should be brought over >> and discussed here? > > Not the whole patch, but that one particular comment, which applies to > the existing code in ATExecAttachPartition(). If we fix the existing > code in ATExecAttachPartition(), the refactoring patch there will > inherit it when rebased. Yes, I too meant only the refactoring patch, which I see as patch 0001 in the series of patches that Jeevan posted with the following message: https://www.postgresql.org/message-id/CAOgcT0NeR%3D%2BTMRTw6oq_5WrJF%2B_xG91k_nGUub29Lnv5-qmQHw%40mail.gmail.com Thanks, Amit
On Thu, Jun 15, 2017 at 2:30 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/06/15 17:53, Ashutosh Bapat wrote: >> On Thu, Jun 15, 2017 at 2:12 PM, Amit Langote wrote: >>>> Both of the above comments are not related to the bug that is being fixed, but >>>> they apply to the same code where the bug exists. So instead of fixing it >>>> twice, may be we should expand the scope of this work to cover other >>>> refactoring needed in this area. That might save us some rebasing and commits. >>> >>> Are you saying that the patch posted on that thread should be brought over >>> and discussed here? >> >> Not the whole patch, but that one particular comment, which applies to >> the existing code in ATExecAttachPartition(). If we fix the existing >> code in ATExecAttachPartition(), the refactoring patch there will >> inherit it when rebased. > > Yes, I too meant only the refactoring patch, which I see as patch 0001 in > the series of patches that Jeevan posted with the following message: > > https://www.postgresql.org/message-id/CAOgcT0NeR%3D%2BTMRTw6oq_5WrJF%2B_xG91k_nGUub29Lnv5-qmQHw%40mail.gmail.com I think we don't need to move that patch over to here, unless you see that some of that refactoring is useful here. I think, we should continue this thread and patch independent of what happens there. If and when this patch gets committed, that patch will need to be refactored. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 2017/06/15 18:05, Ashutosh Bapat wrote: > On Thu, Jun 15, 2017 at 2:30 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> On 2017/06/15 17:53, Ashutosh Bapat wrote: >>> On Thu, Jun 15, 2017 at 2:12 PM, Amit Langote wrote: >>>>> Both of the above comments are not related to the bug that is being fixed, but >>>>> they apply to the same code where the bug exists. So instead of fixing it >>>>> twice, may be we should expand the scope of this work to cover other >>>>> refactoring needed in this area. That might save us some rebasing and commits. >>>> >>>> Are you saying that the patch posted on that thread should be brought over >>>> and discussed here? >>> >>> Not the whole patch, but that one particular comment, which applies to >>> the existing code in ATExecAttachPartition(). If we fix the existing >>> code in ATExecAttachPartition(), the refactoring patch there will >>> inherit it when rebased. >> >> Yes, I too meant only the refactoring patch, which I see as patch 0001 in >> the series of patches that Jeevan posted with the following message: >> >> https://www.postgresql.org/message-id/CAOgcT0NeR%3D%2BTMRTw6oq_5WrJF%2B_xG91k_nGUub29Lnv5-qmQHw%40mail.gmail.com > > I think we don't need to move that patch over to here, unless you see > that some of that refactoring is useful here. I think, we should > continue this thread and patch independent of what happens there. If > and when this patch gets committed, that patch will need to be > refactored. I do see it as useful refactoring and a way to implement a feature (which is perhaps something worth including into v10?) It's just that the patch I have posted here fixes bugs, which it would be nice to get committed first. Anyway, I tried to implement the refactoring in patch 0002, which is not all of the patch 0001 that Jeevan posted. Please take a look. I wondered if we should emit a NOTICE when an individual leaf partition validation can be skipped? No point in adding a new test if the answer to that is no, I'd think. Attaching here 0001 which fixes the bug (unchanged from the previous version) and 0002 which implements the refactoring (and the feature to look at the individual leaf partitions' constraints to see if validation can be skipped.) Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Jun 15, 2017 at 4:06 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > > Anyway, I tried to implement the refactoring in patch 0002, which is not > all of the patch 0001 that Jeevan posted. Please take a look. I wondered > if we should emit a NOTICE when an individual leaf partition validation > can be skipped? Yes. We emit an INFO for the table being attached. We want to do the same for the child. Or NOTICE In both the places. > No point in adding a new test if the answer to that is > no, I'd think. > > Attaching here 0001 which fixes the bug (unchanged from the previous > version) and 0002 which implements the refactoring (and the feature to > look at the individual leaf partitions' constraints to see if validation > can be skipped.) Comments on 0001 patch. + * + * We request an exclusive lock on all the partitions, because we may + * decide later in this function to scan them to validate the new + * partition constraint. Does that mean that we may not scan the partitions later, in which the stronger lock we took was not needed. Is that right? Don't we need an exclusive lock to make sure that the constraints are not changed while we are validating those? if (!skip_validate) May be we should turn this into "else" condition of the "if" just above. + /* + * We already collected the list of partitions, including the table + * named in the command itself, which should appear at the head of the + * list. + */ + Assert(list_length(attachRel_children) >= 1 && + linitial_oid(attachRel_children) == RelationGetRelid(attachRel)); Probably the Assert should be moved next to find_all_inheritors(). But I don't see much value in this comment and the Assert at this place. + /* Skip the original table if it's partitioned. */ + if (part_relid == RelationGetRelid(attachRel) && + attachRel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + continue; + There isn't much point in checking this for every child in the loop. Instead, we should remove attachRel from the attachRel_children if there are more than one elements in the list (which means the attachRel is partitioned, may be verify that by Asserting). Comments on 0002 patch. Thanks for the refactoring. The code looks really good now. The name skipPartConstraintValidation() looks very specific to the case at hand. The function is really checking whether existing constraints on the table can imply the given constraints (which happen to be partition constraints). How about PartConstraintImpliedByRelConstraint()? The idea is to pick a general name so that the function can be used for purposes other than skipping validation scan in future. * This basically returns if the partrel's existing constraints, which returns "true". Add "otherwise returns false". if (constr != NULL) { ... } return false; May be you should return false when constr == NULL (I prefer !constr, but others may have different preferences.). That should save us one level of indentation. At the end just return whatever predicate_implied_by() returns. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Thanks for the review again. On 2017/06/22 19:55, Ashutosh Bapat wrote: > On Thu, Jun 15, 2017 at 4:06 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> >> Anyway, I tried to implement the refactoring in patch 0002, which is not >> all of the patch 0001 that Jeevan posted. Please take a look. I wondered >> if we should emit a NOTICE when an individual leaf partition validation >> can be skipped? > > Yes. We emit an INFO for the table being attached. We want to do the > same for the child. Or NOTICE In both the places. Actually, I meant INFO. >> No point in adding a new test if the answer to that is >> no, I'd think. Updated the patch 0002 so that an INFO message is printed for each leaf partition and a test for the same. >> Attaching here 0001 which fixes the bug (unchanged from the previous >> version) and 0002 which implements the refactoring (and the feature to >> look at the individual leaf partitions' constraints to see if validation >> can be skipped.) > > Comments on 0001 patch. > + * > + * We request an exclusive lock on all the partitions, because we may > + * decide later in this function to scan them to validate the new > + * partition constraint. > Does that mean that we may not scan the partitions later, in which the stronger > lock we took was not needed. Is that right? Yes. I wrote that comment thinking only about the deadlock hazard which had then occurred to me, so the text somehow ended up being reflective of that thinking. Please see the new comment, which hopefully is more informative. > Don't we need an exclusive lock to > make sure that the constraints are not changed while we are validating those? If I understand your question correctly, you meant to ask if we don't need the strongest lock on individual partitions while looking at their constraints to prove that we don't need to scan them. We do and we do take the strongest lock on individual partitions even today in the second call to find_all_inheritors(). We're trying to eliminate the second call here. With the current code, we take AccessShareLock in the first call when checking the circularity of inheritance. Then if attachRel doesn't have the constraint to avoid the scan, we decide to look at individual partitions (their rows, not constraints, as of now) when we take AccessExclusiveLock. That might cause a deadlock (was able to reproduce one using the debugger). > if (!skip_validate) > May be we should turn this into "else" condition of the "if" just above. Yes, done. > + /* > + * We already collected the list of partitions, including the table > + * named in the command itself, which should appear at the head of the > + * list. > + */ > + Assert(list_length(attachRel_children) >= 1 && > + linitial_oid(attachRel_children) == RelationGetRelid(attachRel)); > Probably the Assert should be moved next to find_all_inheritors(). But I don't > see much value in this comment and the Assert at this place. I agree, removed both. > > + /* Skip the original table if it's partitioned. */ > + if (part_relid == RelationGetRelid(attachRel) && > + attachRel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > + continue; > + > > There isn't much point in checking this for every child in the loop. Instead, > we should remove attachRel from the attachRel_children if there are more than > one elements in the list (which means the attachRel is partitioned, may be > verify that by Asserting). Rearranged code considering these comments. > Comments on 0002 patch. > Thanks for the refactoring. The code looks really good now. Thanks. > The name skipPartConstraintValidation() looks very specific to the case at > hand. The function is really checking whether existing constraints on the table > can imply the given constraints (which happen to be partition constraints). How > about PartConstraintImpliedByRelConstraint()? The idea is to pick a general > name so that the function can be used for purposes other than skipping > validation scan in future. I liked this idea, so done. > * This basically returns if the partrel's existing constraints, which > returns "true". Add "otherwise returns false". > > if (constr != NULL) > { > ... > } > return false; > May be you should return false when constr == NULL (I prefer !constr, but > others may have different preferences.). That should save us one level of > indentation. At the end just return whatever predicate_implied_by() returns. Good suggestion, done. Attached updated patches. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Thanks for working on the previous comments. The code really looks good now. On Fri, Jun 23, 2017 at 2:29 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > >> Don't we need an exclusive lock to >> make sure that the constraints are not changed while we are validating those? > > If I understand your question correctly, you meant to ask if we don't need > the strongest lock on individual partitions while looking at their > constraints to prove that we don't need to scan them. We do and we do > take the strongest lock on individual partitions even today in the second > call to find_all_inheritors(). We're trying to eliminate the second call > here. The comment seems to imply that we need strongest lock only when we "scan" the table/s. Some more comments on 0001 - * Prevent circularity by seeing if rel is a partition of attachRel. (In + * Prevent circularity by seeing if rel is a partition of attachRel, (In * particular, this disallows making a rela partition of itself.) The sentence outside () doesn't have a full-stop. I think the original construct was better. + * We want to avoid having to construct this list again, so we request the "this list" is confusing here since the earlier sentence doesn't mention any list at all. Instead we may reword it as "We will need the list of children later to check whether any of those have a row which would not fit the partition constraints. So, take the strongest lock ..." * XXX - Do we need to lock the partitions here if we already have the * strongest lock on attachRel? The informationwe need here to check * for circularity cannot change without taking a lock on attachRel. I wondered about this. Do we really need an exclusive lock to check whether partition constraint is valid? May be we can compare this condition with ALTER TABLE ... ADD CONSTRAINT since the children will all get a new constraint effectively. So, exclusive lock it is. Assert(linitial_oid(attachRel_children) == RelationGetRelid(attachRel)); if (attachRel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) attachRel_children= list_delete_first(attachRel_children); Is it necessary for this code to have OID of the relation being attached as the first one? You could simply call list_delete_oid() instead of list_delete_first(). If for any reason find_all_inheritors() changes the output order, this assertion and code would need a change.\ > >> Comments on 0002 patch. >> Thanks for the refactoring. The code looks really good now. > > Thanks. > >> The name skipPartConstraintValidation() looks very specific to the case at >> hand. The function is really checking whether existing constraints on the table >> can imply the given constraints (which happen to be partition constraints). How >> about PartConstraintImpliedByRelConstraint()? The idea is to pick a general >> name so that the function can be used for purposes other than skipping >> validation scan in future. > > I liked this idea, so done. + * skipPartConstraintValidation +PartConstraintImpliedByRelConstraint(Relation partrel, List *partConstraint) Different function names in prologue and the definition. > >> * This basically returns if the partrel's existing constraints, which >> returns "true". Add "otherwise returns false". >> >> if (constr != NULL) >> { >> ... >> } >> return false; >> May be you should return false when constr == NULL (I prefer !constr, but >> others may have different preferences.). That should save us one level of >> indentation. At the end just return whatever predicate_implied_by() returns. > > Good suggestion, done. + if (predicate_implied_by(partConstraint, existConstraint, true)) + return true; + + /* Tough luck. */ + return false; why not to just return the return value of predicate_implied_by() as is? With this we can actually handle constr == NULL a bit differently. + if (constr == NULL) + return false; To be on safer side, return false when partConstraint is not NULL. If both the relation constraint and partConstraint are both NULL, the first does imply the other. Or may be leave that decision to predicate_implied_by(), which takes care of it right at the beginning of the function. + * For each leaf partition, check if it we can skip the validation An extra "it". + * Note that attachRel's OID is in this list. Since we already + * determined above that its validation scan cannot be skipped, we + * need not check that again in the loop below. If it's partitioned, I don't see code to skip checking whether scan can be skipped for relation being attached. The loop below this comments executes for every unpartitioned table in the list of OIDs returned. Thus for an unpartitioned relation being attached, it will try to compare the constraints again. Am I correct? + * comparing it to similarly-processed qual clauses, and may fail There are no "qual clauses" here only constraints :). The testcase looks good to me. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Thanks for the review. On 2017/07/03 20:13, Ashutosh Bapat wrote: > Thanks for working on the previous comments. The code really looks good now. > On Fri, Jun 23, 2017 at 2:29 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> >>> Don't we need an exclusive lock to >>> make sure that the constraints are not changed while we are validating those? >> >> If I understand your question correctly, you meant to ask if we don't need >> the strongest lock on individual partitions while looking at their >> constraints to prove that we don't need to scan them. We do and we do >> take the strongest lock on individual partitions even today in the second >> call to find_all_inheritors(). We're trying to eliminate the second call >> here. > > The comment seems to imply that we need strongest lock only when we > "scan" the table/s. > > Some more comments on 0001 > - * Prevent circularity by seeing if rel is a partition of attachRel. (In > + * Prevent circularity by seeing if rel is a partition of attachRel, (In > * particular, this disallows making a rel a partition of itself.) > The sentence outside () doesn't have a full-stop. I think the original > construct was better. Yep, fixed. > + * We want to avoid having to construct this list again, so we request the > > "this list" is confusing here since the earlier sentence doesn't mention any > list at all. Instead we may reword it as "We will need the list of children > later to check whether any of those have a row which would not fit the > partition constraints. So, take the strongest lock ..." It was confusing for sure, so rewrote. > * XXX - Do we need to lock the partitions here if we already have the > * strongest lock on attachRel? The information we need here to check > * for circularity cannot change without taking a lock on attachRel. > > I wondered about this. Do we really need an exclusive lock to check whether > partition constraint is valid? May be we can compare this condition with ALTER > TABLE ... ADD CONSTRAINT since the children will all get a new constraint > effectively. So, exclusive lock it is. Actually, the XXX comment is about whether we need to lock the children at all when checking the circularity of inheritance, that is, not about whether we need lock to check the partition constraint is valid. > Assert(linitial_oid(attachRel_children) == > RelationGetRelid(attachRel)); > if (attachRel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > attachRel_children = list_delete_first(attachRel_children); > > Is it necessary for this code to have OID of the relation being attached as the > first one? You could simply call list_delete_oid() instead of > list_delete_first(). If for any reason find_all_inheritors() changes the output > order, this assertion and code would need a change.\ I concluded that removing attachRel's OID from attachRel_children is pointless, considering that we have to check for attachRel in the loop below anyway. The step of removing the OID wasn't helping much. >>> The name skipPartConstraintValidation() looks very specific to the case at >>> hand. The function is really checking whether existing constraints on the table >>> can imply the given constraints (which happen to be partition constraints). How >>> about PartConstraintImpliedByRelConstraint()? The idea is to pick a general >>> name so that the function can be used for purposes other than skipping >>> validation scan in future. >> >> I liked this idea, so done. > > + * skipPartConstraintValidation > +PartConstraintImpliedByRelConstraint(Relation partrel, List *partConstraint) > Different function names in prologue and the definition. Oops, fixed. > + if (predicate_implied_by(partConstraint, existConstraint, true)) > + return true; > + > + /* Tough luck. */ > + return false; > > why not to just return the return value of predicate_implied_by() as is? Sigh. Done. :) > With this we can actually handle constr == NULL a bit differently. > + if (constr == NULL) > + return false; > > To be on safer side, return false when partConstraint is not NULL. If both the > relation constraint and partConstraint are both NULL, the first does imply the > other. Or may be leave that decision to predicate_implied_by(), which takes > care of it right at the beginning of the function. Rearranged code considering this comment. Let's let predicate_implied_by() make ultimate decisions about logical implication. :) > > + * For each leaf partition, check if it we can skip the validation > > An extra "it". Fixed. > > + * Note that attachRel's OID is in this list. Since we already > + * determined above that its validation scan cannot be skipped, we > + * need not check that again in the loop below. If it's partitioned, > > I don't see code to skip checking whether scan can be skipped for relation > being attached. The loop below this comments executes for every unpartitioned > table in the list of OIDs returned. Thus for an unpartitioned relation being > attached, it will try to compare the constraints again. Am I correct? Good catch, fixed. > + * comparing it to similarly-processed qual clauses, and may fail > There are no "qual clauses" here only constraints :). Oh, yes. Text fixed. > The testcase looks good to me. Attached updated patches. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Jul 4, 2017 at 9:51 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > > Attached updated patches. There's an extra "we" in + * Note that attachRel's OID is in this list. If it's partitioned, we + * we don't need to schedule it to be scanned (would be a noop anyway And some portions of the comment before find_all_inheritors() in ATExecAttachPartition() look duplicated in portions of the code that check constraints on the table being attached and each of its leaf partition. Other than that the patches look good to me. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 2017/07/11 19:49, Ashutosh Bapat wrote: > On Tue, Jul 4, 2017 at 9:51 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: > >> >> Attached updated patches. > > There's an extra "we" in > + * Note that attachRel's OID is in this list. If it's partitioned, we > + * we don't need to schedule it to be scanned (would be a noop anyway > > And some portions of the comment before find_all_inheritors() in > ATExecAttachPartition() look duplicated in portions of the code that > check constraints on the table being attached and each of its leaf > partition. > > Other than that the patches look good to me. Thanks for the review. Patch updated taking care of the comments. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Jul 12, 2017 at 7:17 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > > Thanks for the review. Patch updated taking care of the comments. The patches still apply and compile. make check runs well. I do not have any further review comments. Given that they address a bug, should we consider those for v10? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Thanks for looking at this again. On 2017/07/26 23:31, Ashutosh Bapat wrote: > On Wed, Jul 12, 2017 at 7:17 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> >> Thanks for the review. Patch updated taking care of the comments. > > The patches still apply and compile. make check runs well. I do not > have any further review comments. Given that they address a bug, > should we consider those for v10? At least patch 0001 does address a bug. Not sure if we can say that 0002 addresses a bug; it implements a feature that might be a nice-to-have-in-PG-10. Attaching rebased patches. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Jul 26, 2017 at 9:50 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > At least patch 0001 does address a bug. Not sure if we can say that 0002 > addresses a bug; it implements a feature that might be a > nice-to-have-in-PG-10. I think 0001 is actually several fixes that should be separated: - Cosmetic fixes, like renaming childrels to attachRel_children, adding a period after "Grab a work queue entry", and replacing the if (skip_validate) / if (!skip_validate) blocks with if (skip_validate) { ... } else { ... }. - Taking AccessExclusiveLock initially to avoid a lock upgrade hazard. - Rejiggering things so that we apply map_partition_varattnos() to the partConstraint in all relevant places. Regarding the XXX, we currently require AccessExclusiveLock for the addition of a CHECK constraint, so I think it's best to just do the same thing here. We can optimize later, but it's probably not easy to come up with something that is safe and correct in all cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Thanks for taking a look at this. On 2017/08/01 6:26, Robert Haas wrote: > On Wed, Jul 26, 2017 at 9:50 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> At least patch 0001 does address a bug. Not sure if we can say that 0002 >> addresses a bug; it implements a feature that might be a >> nice-to-have-in-PG-10. > > I think 0001 is actually several fixes that should be separated: Agreed. > > - Cosmetic fixes, like renaming childrels to attachRel_children, > adding a period after "Grab a work queue entry", and replacing the if > (skip_validate) / if (!skip_validate) blocks with if (skip_validate) { > ... } else { ... }. OK, these cosmetic changes are now in attached patch 0001. > > - Taking AccessExclusiveLock initially to avoid a lock upgrade hazard. This in 0002. > > - Rejiggering things so that we apply map_partition_varattnos() to the > partConstraint in all relevant places. And this in 0003. > Regarding the XXX, we currently require AccessExclusiveLock for the > addition of a CHECK constraint, so I think it's best to just do the > same thing here. We can optimize later, but it's probably not easy to > come up with something that is safe and correct in all cases. Agreed. Dropped the XXX part in the comment. 0004 is what used to be 0002 before (checking partition constraints of individual leaf partitions to skip their scans). Attached here just in case. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Jul 31, 2017 at 11:10 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > OK, these cosmetic changes are now in attached patch 0001. Regarding 0001: - List *childrels; + List *attachRel_children; I sorta don't see why this is necessary, or better. /* It's safe to skip the validation scan after all */ if (skip_validate) + { + /* No need to scan the table after all. */ The existing comment should be removed along with adding the new one, I think. - if (part_rel != attachRel && - part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { - heap_close(part_rel, NoLock); + if (part_rel != attachRel) + heap_close(part_rel, NoLock); This works out to a cosmetic change, I guess, but it makes it worse... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Thanks for reviewing. On 2017/08/02 2:54, Robert Haas wrote: > On Mon, Jul 31, 2017 at 11:10 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> OK, these cosmetic changes are now in attached patch 0001. > > Regarding 0001: > > - List *childrels; > + List *attachRel_children; > > I sorta don't see why this is necessary, or better. Since ATExecAttachPartition() deals with the possibility that the table being attached itself might be partitioned, someone reading the code might find it helpful to get some clue about whose partitions/children a particular piece of code is dealing with - AT's target table's (rel's) or those of the table being attached (attachRel's)? IMHO, attachRel_children makes it abundantly clear that it is in fact the partitions of the table being attached that are being manipulated. > /* It's safe to skip the validation scan after all */ > if (skip_validate) > + { > + /* No need to scan the table after all. */ > > The existing comment should be removed along with adding the new one, I think. Oops, fixed. > - if (part_rel != attachRel && > - part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > + if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > { > - heap_close(part_rel, NoLock); > + if (part_rel != attachRel) > + heap_close(part_rel, NoLock); > > This works out to a cosmetic change, I guess, but it makes it worse... Not sure what you mean by "makes it worse". The comment above says that we should skip partitioned tables from being scheduled for heap scan. The new code still does that. We should close part_rel before continuing to consider the next partition, but mustn't do that if part_rel is really attachRel. The new code does that too. Stylistically worse? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Aug 1, 2017 at 9:23 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Since ATExecAttachPartition() deals with the possibility that the table > being attached itself might be partitioned, someone reading the code might > find it helpful to get some clue about whose partitions/children a > particular piece of code is dealing with - AT's target table's (rel's) or > those of the table being attached (attachRel's)? IMHO, attachRel_children > makes it abundantly clear that it is in fact the partitions of the table > being attached that are being manipulated. True, but it's also long and oddly capitalized and punctuated. Seems like a judgement call which way is better, but I'm allergic to fooBar_baz style names. >> - if (part_rel != attachRel && >> - part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) >> + if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) >> { >> - heap_close(part_rel, NoLock); >> + if (part_rel != attachRel) >> + heap_close(part_rel, NoLock); >> >> This works out to a cosmetic change, I guess, but it makes it worse... > > Not sure what you mean by "makes it worse". The comment above says that > we should skip partitioned tables from being scheduled for heap scan. The > new code still does that. We should close part_rel before continuing to > consider the next partition, but mustn't do that if part_rel is really > attachRel. The new code does that too. Stylistically worse? Yeah. I mean, do you write: if (a) if (b) c(); rather than if (a && b) c(); ? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017/08/02 10:27, Robert Haas wrote: > On Tue, Aug 1, 2017 at 9:23 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> Since ATExecAttachPartition() deals with the possibility that the table >> being attached itself might be partitioned, someone reading the code might >> find it helpful to get some clue about whose partitions/children a >> particular piece of code is dealing with - AT's target table's (rel's) or >> those of the table being attached (attachRel's)? IMHO, attachRel_children >> makes it abundantly clear that it is in fact the partitions of the table >> being attached that are being manipulated. > > True, but it's also long and oddly capitalized and punctuated. Seems > like a judgement call which way is better, but I'm allergic to > fooBar_baz style names. I too dislike the shape of attachRel. How about we rename attachRel to attachrel? So, attachrel_children, attachrel_constr, etc. It's still long though... :) >>> - if (part_rel != attachRel && >>> - part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) >>> + if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) >>> { >>> - heap_close(part_rel, NoLock); >>> + if (part_rel != attachRel) >>> + heap_close(part_rel, NoLock); >>> >>> This works out to a cosmetic change, I guess, but it makes it worse... >> >> Not sure what you mean by "makes it worse". The comment above says that >> we should skip partitioned tables from being scheduled for heap scan. The >> new code still does that. We should close part_rel before continuing to >> consider the next partition, but mustn't do that if part_rel is really >> attachRel. The new code does that too. Stylistically worse? > > Yeah. I mean, do you write: > > if (a) > if (b) > c(); > > rather than > > if (a && b) > c(); > > ? Hmm, The latter is better. I guess I just get confused with long &&, ||, () chains. If you're fine with the s/attachRel/attachrel/g suggestion above, I will update the patch along with reverting to if (a && b) c(). Thanks, Amit
On Tue, Aug 1, 2017 at 9:44 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > I too dislike the shape of attachRel. How about we rename attachRel to > attachrel? So, attachrel_children, attachrel_constr, etc. It's still > long though... :) OK, I can live with that, I guess. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017/08/02 20:35, Robert Haas wrote: > On Tue, Aug 1, 2017 at 9:44 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> I too dislike the shape of attachRel. How about we rename attachRel to >> attachrel? So, attachrel_children, attachrel_constr, etc. It's still >> long though... :) > > OK, I can live with that, I guess. Alright, attached updated 0001 does that. About the other hunk, it seems we will have to go with the following structure after all; if (a) if (b) c(); d(); Note that we were missing that there is a d(), which executes if a is true. c() executes *only* if b is true. So I left that hunk unchanged, viz. the following: /* - * Skip if it's a partitioned table. Only RELKIND_RELATION - * relations (ie, leaf partitions) need to be scanned. + * Skip if the partition is itself a partitioned table. We can + * only ever scan RELKIND_RELATION relations. */ - if (part_rel != attachRel && - part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { - heap_close(part_rel, NoLock); + if (part_rel != attachrel) + heap_close(part_rel, NoLock); continue; } You might ask why the earlier code worked if there was this kind of logical bug - accident; even if we failed skipping attachRel, the AT rewrite phase which is in charge of actually scanning the table knows to skip the partitioned tables, so no harm would be done. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Aug 3, 2017 at 1:04 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Alright, attached updated 0001 does that. Committed 0001 and 0002. 0003 needs a rebase. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017/08/04 3:29, Robert Haas wrote: > On Thu, Aug 3, 2017 at 1:04 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> Alright, attached updated 0001 does that. > > Committed 0001 and 0002. Thanks. > 0003 needs a rebase. Rebased patch attached. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Aug 3, 2017 at 8:45 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> 0003 needs a rebase. > > Rebased patch attached. Committed. I think 0004 is a new feature, so I'm leaving that for v11. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017/08/05 11:05, Robert Haas wrote: > On Thu, Aug 3, 2017 at 8:45 PM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> 0003 needs a rebase. >> >> Rebased patch attached. > > Committed. Thank you. > I think 0004 is a new feature, so I'm leaving that for v11. Sure. By the way, bulk of 0004 is refactoring which it seems is what Jeevan's default partition patch set also includes as one of the patches [1]. It got a decent amount review from Ashutosh. I broke it down into a separate patch, so that the patch to add the new feature is its own tiny patch. I also spotted a couple of comments referring to attachRel that we just recently renamed. So, attached are: 0001: s/attachRel/attachrel/g 0002: Refactoring to introduce a PartConstraintImpliedByRelConstraint 0003: Add the feature to skip the scan of individual leaf partitions Totally fine if you postpone 0002 and 0003 to when the tree opens up for PG 11. Thanks, Amit [1] https://www.postgresql.org/message-id/CAOgcT0OARciE2X%2BU0rjSKp9VuC279dYcCGkc3nCWKhHQ1_m2rw%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2017/08/07 11:05, Amit Langote wrote: > By the way, bulk of 0004 is refactoring which it seems is what Jeevan's > default partition patch set also includes as one of the patches [1]. It > got a decent amount review from Ashutosh. I broke it down into a separate > patch, so that the patch to add the new feature is its own tiny patch. > > I also spotted a couple of comments referring to attachRel that we just > recently renamed. > > So, attached are: > > 0001: s/attachRel/attachrel/g > 0002: Refactoring to introduce a PartConstraintImpliedByRelConstraint > 0003: Add the feature to skip the scan of individual leaf partitions Since Jeevan Ladhe mentioned this patch [1] earlier this week, sending the rebased patches here for consideration. Actually there are only 2 patches now, because 0002 above is rendered unnecessary by ecfe59e50fb [2]. Thanks, Amit [1] https://www.postgresql.org/message-id/CAOgcT0MWwG8WBw8frFMtRYHAgDD=tpt6U7WcsO_L2k0KYpm4Jg@mail.gmail.com [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ecfe59e50fb -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Sep 14, 2017 at 12:59 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Since Jeevan Ladhe mentioned this patch [1] earlier this week, sending the > rebased patches here for consideration. Actually there are only 2 patches > now, because 0002 above is rendered unnecessary by ecfe59e50fb [2]. Committed 0001 and back-patched to v10. Your 0002 and the patch from Jeevan Ladhe to which you refer seem to be covering closely related subjects. When I apply either patch by itself, the regression tests pass; when I apply both together, they fail. Could you and Jeevan sort that out? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers