Thread: Exclusion constraints on partitioned tables
Hello Hackers, I'm trying to get things going again on my temporal tables work, and here is a small patch to move that forward. It lets you create exclusion constraints on partitioned tables, similar to today's rules for b-tree primary keys & unique constraints: just as we permit a PK on a partitioned table when the PK's columns are a superset of the partition keys, so we could also allow an exclusion constraint when its columns are a superset of the partition keys. This patch also requires the matching constraint columns to use equality comparisons (`(foo WITH =)`), so it is really equivalent to the existing b-tree rule. Perhaps that is more conservative than necessary, but we can't permit an arbitrary operator, since some might require testing rows that fall into other partitions. For example `(foo WITH <>)` would obviously cause problems. The exclusion constraint may still include other columns beyond the partition keys, and those may use equality operators or something else. This patch is required to support temporal partitioned tables, because temporal tables use exclusion constraints as their primary key. Essentially they are `(id WITH =, valid_at with &&)`. Since the primary key is not a b-tree, partitioning them would be forbidden prior to this patch. But now you could partition that table on `id`, and we could still correctly validate the temporal PK without requiring rows from other partitions. This patch may be helpful beyond just temporal tables (or for DIY temporal tables), so it seems worth submitting it separately. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
Paul Jungwirth <pj@illuminatedcomputing.com> writes: > It lets you create exclusion constraints on partitioned tables, similar > to today's rules for b-tree primary keys & unique constraints: > just as we permit a PK on a partitioned table when the PK's columns are > a superset of the partition keys, so we could also allow an exclusion > constraint when its columns are a superset of the partition keys. OK. AFAICS that works in principle. > This patch also requires the matching constraint columns to use equality > comparisons (`(foo WITH =)`), so it is really equivalent to the existing > b-tree rule. That's not quite good enough: you'd better enforce that it's the same equality operator (and same collation, if relevant) as is being used in the partition key. Remember that we don't have a requirement that a datatype have only one equality operator; and these days I think collation can affect equality, too. Another problem is that while we can safely assume that we know what BTEqualStrategyNumber means in btree, we can NOT assume that we know what gist opclass strategy numbers mean: each opclass is free to define those as it sees fit. The part of your patch that is looking at RTEqualStrategyNumber seems dangerously broken to me. It might work better to consider the operator itself and ask if it's equality in the same btree opfamily that's used by the partition key. (Hm, do we use btree opfamilies for all types of partitioning?) Anyway, I think something can be made of this, but you need to be less fuzzy about matching the equality semantics of the partition key. regards, tom lane
On 12/15/22 16:12, Tom Lane wrote: >> This patch also requires the matching constraint columns to use equality >> comparisons (`(foo WITH =)`), so it is really equivalent to the existing >> b-tree rule. > > That's not quite good enough: you'd better enforce that it's the same > equality operator (and same collation, if relevant) as is being used > in the partition key. > [snip] > It might work better to consider the operator itself and ask if > it's equality in the same btree opfamily that's used by the > partition key. Thank you for taking a look! Here is a comparison on just the operator itself. I included a collation check too, but I'm not sure it's necessary. Exclusion constraints don't have a collation per se; it comes from the index, and we choose it just a little above in this function. (I'm not even sure how to elicit that new error message in a test case.) I'm not sure what to do about matching the opfamily. In practice an exclusion constraint will typically use gist, but the partition key will always use btree/hash. You're saying that the equals operator can be inconsistent between those access methods? That is surprising to me, but I admit op classes/families are still sinking in. (Even prior to this patch, isn't the code for hash-based partitions looking up ptkey_eqop via the hash opfamily, and then comparing it to idx_eqop looked up via the btree opfamily?) If partitions can only support btree-based exclusion constraints, you still wouldn't be able to partition a temporal table, because those constraints would always be gist. So I guess what I really want is to support gist index constraints on partitioned tables. Regards, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
Le vendredi 16 décembre 2022, 06:11:49 CET Paul Jungwirth a écrit : > On 12/15/22 16:12, Tom Lane wrote: > >> This patch also requires the matching constraint columns to use equality > >> comparisons (`(foo WITH =)`), so it is really equivalent to the existing > >> b-tree rule. > > > > That's not quite good enough: you'd better enforce that it's the same > > equality operator (and same collation, if relevant) as is being used > > in the partition key. > > [snip] > > It might work better to consider the operator itself and ask if > > it's equality in the same btree opfamily that's used by the > > partition key. > > Thank you for taking a look! Here is a comparison on just the operator > itself. > I've taken a look at the patch, and I'm not sure why you keep the restriction on the Gist operator being of the RTEqualStrategyNumber strategy. I don't think we have any other place where we expect those strategy numbers to match. For hash it's different, as the hash-equality is the only operator strategy and as such there is no other way to look at it. Can't we just enforce partition_operator == exclusion_operator without adding the RTEqualStrategyNumber for the opfamily into the mix ?
On 1/24/23 06:38, Ronan Dunklau wrote: > I've taken a look at the patch, and I'm not sure why you keep the restriction > on the Gist operator being of the RTEqualStrategyNumber strategy. I don't > think we have any other place where we expect those strategy numbers to > match. For hash it's different, as the hash-equality is the only operator > strategy and as such there is no other way to look at it. Can't we just > enforce partition_operator == exclusion_operator without adding the > RTEqualStrategyNumber for the opfamily into the mix ? Thank you for taking a look! I did some research on the history of the code here, and I think I understand Tom's concern about making sure the index uses the same equality operator as the partition. I was confused about his remarks about the opfamily, but I agree with you that if the operator is the same, we should be okay. I added the code about RTEqualStrategyNumber because that's what we need to find an equals operator when the index is GiST (except if it's using an opclass from btree_gist; then it needs to be BTEqual again). But then I realized that for exclusion constraints we have already figured out the operator (in RelationGetExclusionInfo) and put it in indexInfo->ii_ExclusionOps. So we can just compare against that. This works whether your index uses btree_gist or not. Here is an updated patch with that change (also rebased). I also included a more specific error message. If we find a matching column in the index but with the wrong operator, we should say so, and not say there is no matching column. Thanks, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
Le vendredi 17 mars 2023, 17:03:09 CET Paul Jungwirth a écrit : > I added the code about RTEqualStrategyNumber because that's what we need > to find an equals operator when the index is GiST (except if it's using > an opclass from btree_gist; then it needs to be BTEqual again). But then > I realized that for exclusion constraints we have already figured out > the operator (in RelationGetExclusionInfo) and put it in > indexInfo->ii_ExclusionOps. So we can just compare against that. This > works whether your index uses btree_gist or not. > > Here is an updated patch with that change (also rebased). Thanks ! This looks fine to me like this. > > I also included a more specific error message. If we find a matching > column in the index but with the wrong operator, we should say so, and > not say there is no matching column. > I agree that's a nicer improvement. Regards, -- Ronan Dunklau
On 17.03.23 17:03, Paul Jungwirth wrote: > Thank you for taking a look! I did some research on the history of the > code here, and I think I understand Tom's concern about making sure the > index uses the same equality operator as the partition. I was confused > about his remarks about the opfamily, but I agree with you that if the > operator is the same, we should be okay. > > I added the code about RTEqualStrategyNumber because that's what we need > to find an equals operator when the index is GiST (except if it's using > an opclass from btree_gist; then it needs to be BTEqual again). But then > I realized that for exclusion constraints we have already figured out > the operator (in RelationGetExclusionInfo) and put it in > indexInfo->ii_ExclusionOps. So we can just compare against that. This > works whether your index uses btree_gist or not. > > Here is an updated patch with that change (also rebased). > > I also included a more specific error message. If we find a matching > column in the index but with the wrong operator, we should say so, and > not say there is no matching column. This looks all pretty good to me. A few more comments: It seems to me that many of the test cases added in indexing.sql are redundant with create_table.sql/alter_table.sql (or vice versa). Is there a reason for this? This is not really a problem in your patch, but I think in - if (partitioned && (stmt->unique || stmt->primary)) + if (partitioned && (stmt->unique || stmt->primary || stmt->excludeOpNames != NIL)) the stmt->primary is redundant and should be removed. Right now "primary" is always a subset of "unique", but presumably a future patch of yours wants to change that. Furthermore, I think it would be more elegant in your patch if you wrote stmt->excludeOpNames without the "== NIL" or "!= NIL", so that it becomes a peer of stmt->unique. (I understand some people don't like that style. But it is already used in that file.) I would consider rearranging some of the conditionals more as a selection of cases, like "is it a unique constraint?", "else, is it an exclusion constraint?" -- rather than the current "is it an exclusion constraint?, "else, various old code". For example, instead of if (stmt->excludeOpNames != NIL) idx_eqop = indexInfo->ii_ExclusionOps[j]; else idx_eqop = get_opfamily_member(..., eq_strategy); consider if (stmt->unique) idx_eqop = get_opfamily_member(..., eq_strategy); else if (stmt->excludeOpNames) idx_eqop = indexInfo->ii_ExclusionOps[j]; Assert(idx_eqop); Also, I would push the code if (accessMethodId == BTREE_AM_OID) eq_strategy = BTEqualStrategyNumber; further down into the loop, so that you don't have to remember in which cases eq_strategy is assigned or not. (It's also confusing that the eq_strategy variable is used for two different things in this function, and that would clean that up.) Finally, this code + att = TupleDescAttr(RelationGetDescr(rel), + key->partattrs[i] - 1); + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot match partition key to index on column \"%s\" using non-equal operator \"%s\".", + NameStr(att->attname), get_opname(indexInfo->ii_ExclusionOps[j])))); could be simplified by using get_attname(). This is all just a bit of polishing. I think it would be good to go after that.
On Thu, Jul 6, 2023 at 1:03 AM Peter Eisentraut <peter@eisentraut.org> wrote: > This looks all pretty good to me. A few more comments: Thanks for the feedback! New patch attached here. Responses below: > It seems to me that many of the test cases added in indexing.sql are > redundant with create_table.sql/alter_table.sql (or vice versa). Is > there a reason for this? Yes, there is some overlap. I think that's just because there was overlap before, and I didn't want to delete the old tests completely. But since indexing.sql has a fuller list of tests and is a superset of the others, this new patch removes the redundant tests from {create,alter}_table.sql. Btw speaking of tests, I want to make sure this new feature will still work when you're using btree_gist and and `EXCLUDE WITH (myint =, mytsrange &&)` (and not just `(myint4range =, mytsrange &&)`). Some of my early attempts writing this patch worked w/o btree_gist but not w/ (or vice versa). But as far as I know there's no way to test that in regress. I wound up writing a private shell script that just does this: ``` -------- -- test against btree_gist since we can't do that in the postgres regress test suite: CREATE EXTENSION btree_gist; create table partitioned (id int, valid_at tsrange, exclude using gist (id with =, valid_at with &&)) partition by range (id); -- should fail with a good error message: create table partitioned2 (id int, valid_at tsrange, exclude using gist (id with <>, valid_at with &&)) partition by range (id); ``` Is there some place in the repo to include a test like that? It seems a little funny to put it in the btree_gist suite, but maybe that's the right answer. > This is not really a problem in your patch, but I think in > > - if (partitioned && (stmt->unique || stmt->primary)) > + if (partitioned && (stmt->unique || stmt->primary || > stmt->excludeOpNames != NIL)) > > the stmt->primary is redundant and should be removed. Right now > "primary" is always a subset of "unique", but presumably a future patch > of yours wants to change that. Done! I don't think my temporal work changes that primary ⊆ unique. It does change that some primary/unique constraints will have non-null excludeOpNames, which will require small changes here eventually. But that should be part of the temporal patches, not this one. > Furthermore, I think it would be more elegant in your patch if you wrote > stmt->excludeOpNames without the "== NIL" or "!= NIL", so that it > becomes a peer of stmt->unique. (I understand some people don't like > that style. But it is already used in that file.) Done. > I would consider rearranging some of the conditionals more as a > selection of cases, like "is it a unique constraint?", "else, is it an > exclusion constraint?" -- rather than the current "is it an exclusion > constraint?, "else, various old code". Done. > Also, I would push the code > > if (accessMethodId == BTREE_AM_OID) > eq_strategy = BTEqualStrategyNumber; > > further down into the loop, so that you don't have to remember in which > cases eq_strategy is assigned or not. > > (It's also confusing that the eq_strategy variable is used for two > different things in this function, and that would clean that up.) Agreed that it's confusing. Done. > Finally, this code > > + att = TupleDescAttr(RelationGetDescr(rel), > + key->partattrs[i] - 1); > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("cannot match partition key > to index on column \"%s\" using non-equal operator \"%s\".", > + NameStr(att->attname), > get_opname(indexInfo->ii_ExclusionOps[j])))); > > could be simplified by using get_attname(). Okay, done. I changed the similar error message just below too. > This is all just a bit of polishing. I think it would be good to go > after that. Thanks! -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On 09.07.23 03:21, Paul A Jungwirth wrote: >> It seems to me that many of the test cases added in indexing.sql are >> redundant with create_table.sql/alter_table.sql (or vice versa). Is >> there a reason for this? > > Yes, there is some overlap. I think that's just because there was > overlap before, and I didn't want to delete the old tests completely. > But since indexing.sql has a fuller list of tests and is a superset of > the others, this new patch removes the redundant tests from > {create,alter}_table.sql. This looks better. > Btw speaking of tests, I want to make sure this new feature will still > work when you're using btree_gist and and `EXCLUDE WITH (myint =, > mytsrange &&)` (and not just `(myint4range =, mytsrange &&)`). Some of > my early attempts writing this patch worked w/o btree_gist but not w/ > (or vice versa). But as far as I know there's no way to test that in > regress. I wound up writing a private shell script that just does > this: > Is there some place in the repo to include a test like that? It seems > a little funny to put it in the btree_gist suite, but maybe that's the > right answer. I'm not sure what value we would get from testing this with btree_gist, but if we wanted to do that, then adding a new test file to the btree_gist sql/ directory would seem reasonable to me. (I would make the test a little bit bigger than you had shown, like insert a few values.) If you want to do that, please send another patch. Otherwise, I'm ok to commit this one.
On Mon, Jul 10, 2023 at 7:05 AM Peter Eisentraut <peter@eisentraut.org> wrote: > I'm not sure what value we would get from testing this with btree_gist, > but if we wanted to do that, then adding a new test file to the > btree_gist sql/ directory would seem reasonable to me. > > (I would make the test a little bit bigger than you had shown, like > insert a few values.) > > If you want to do that, please send another patch. Otherwise, I'm ok to > commit this one. I can get you a patch tonight or tomorrow. I think it's worth it since btree_gist uses different strategy numbers than ordinary gist. Thanks! Paul
On Mon, Jul 10, 2023 at 8:06 AM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote: > > On Mon, Jul 10, 2023 at 7:05 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > I'm not sure what value we would get from testing this with btree_gist, > > but if we wanted to do that, then adding a new test file to the > > btree_gist sql/ directory would seem reasonable to me. > > > > (I would make the test a little bit bigger than you had shown, like > > insert a few values.) > > > > If you want to do that, please send another patch. Otherwise, I'm ok to > > commit this one. > > I can get you a patch tonight or tomorrow. I think it's worth it since > btree_gist uses different strategy numbers than ordinary gist. Patch attached. Regards, Paul
Attachment
On 11.07.23 07:52, Paul A Jungwirth wrote: > On Mon, Jul 10, 2023 at 8:06 AM Paul A Jungwirth > <pj@illuminatedcomputing.com> wrote: >> >> On Mon, Jul 10, 2023 at 7:05 AM Peter Eisentraut <peter@eisentraut.org> wrote: >>> I'm not sure what value we would get from testing this with btree_gist, >>> but if we wanted to do that, then adding a new test file to the >>> btree_gist sql/ directory would seem reasonable to me. >>> >>> (I would make the test a little bit bigger than you had shown, like >>> insert a few values.) >>> >>> If you want to do that, please send another patch. Otherwise, I'm ok to >>> commit this one. >> >> I can get you a patch tonight or tomorrow. I think it's worth it since >> btree_gist uses different strategy numbers than ordinary gist. > > Patch attached. Looks good, committed. I had some second thoughts about the use of get_attname(). It seems the previous code used the dominant style of extracting the attribute name from the open relation handle, so I kept it that way. That's also more efficient than going via the syscache.