Re: SQL:2011 application time - Mailing list pgsql-hackers
From | Paul A Jungwirth |
---|---|
Subject | Re: SQL:2011 application time |
Date | |
Msg-id | CA+renyXHFFyq1n2+4cwcFoLXqzUhb6u6WCL6wjk8ZjP9WxTExw@mail.gmail.com Whole thread Raw |
In response to | Re: SQL:2011 application time (jian he <jian.universality@gmail.com>) |
Responses |
Re: SQL:2011 application time
|
List | pgsql-hackers |
Hello, Here is a new patch series addressing the last few feedback emails from Peter & Jian He. It mostly focuses on the FKs patch, trying to get it really ready to commit, but it also finishes restoring all the functionality to the PERIODs patch (that I removed temporarily when we changed PERIODs to GENERATED columns). I still want to restore a few more tests there, but all the functionality is back (e.g. PERIODs with foreign keys and FOR PORTION OF), so it proves the GENERATED idea works in principle. Specific feedback below: On Mon, Mar 11, 2024 at 12:46 AM Peter Eisentraut <peter@eisentraut.org> wrote: > I had committed v27-0001-Rename-conwithoutoverlaps-to-conperiod.patch a > little while ago. Thanks! It looks like you also fixed the pg_catalog docs which I missed. > Attached is a small patch that changes the PERIOD keyword to unreserved > for this patch. You had said earlier that this didn't work for you. > The attached patch works for me when applied on top of 0003. Applied and included here. > You wrote: > > > I'm not sure how else to do it. The issue is that `range_agg` returns > > a multirange, so the result > > type doesn't match the inputs. But other types will likely have the > > same problem: to combine boxes > > you may need a multibox. The combine mdranges you may need a > > multimdrange. > > Can we just hardcode the use of range_agg for this release? Might be > easier. I don't see all this generality being useful in the near future. Okay, I've hard-coded range_agg in the main patch and separated the support for multirange/etc in the next two patches. But there isn't much code there (mostly tests and docs). Since we can't hard-code the *operators*, most of the infrastructure is already there not to hard-code the aggregate function. Supporting multiranges is already a nice improvement. E.g. it should cut down on disk usage when a record gets updated frequently. Supporting arbitrary types also seems very powerful, and we already do that for PKs. > > Btw that part changed a bit since v24 because as jian he pointed out, > > our type system doesn't > > support anyrange inputs and an anyrange[] output. So I changed the > > support funcs to use SETOF. > > I didn't see any SETOF stuff in the patch, or I didn't know where to look. > > I'm not sure I follow all the details here. So more explanations of any > kind could be helpful. This is talking about the FOR PORTION OF patch, not the FKs patch. It is the function that gives the "leftovers" after a temporal UPDATE/DELETE. There is explanation in the preliminary patch (adding the support function) and the actual FOR PORTION OF patch, but if you think they need more let me know. But I'd love to talk more about this here: The reason for using a SETOF function is because you can't return an anyarray from a function that takes anyrange or anymultirange. Or rather if you do, the array elements match the rangetype's bounds' type, not the rangetype itself: `T[] f(rangetype<T>)`, not `rangetype<T>[] f(rangetype<T>)`, and we need the latter. So to get a list of rangetype objects we do a SETOF function that is `anyrange f(anyrange)`. Personally I think an improvement would be to add a broken-out patch to add pseudotypes called anyrangearray and anymultirangearray, but using SETOF works now, and I don't know if anyone is interested in such a patch. But it's not the first time I've hit this shortcoming in the pg type system, so I think it's worthwhile. And since FOR PORTION OF isn't getting into v17, there is time to do it. What do you think? If it's an acceptable idea I will get started. It should be a separate commitfest entry I think. > * v27-0003-Refactor-FK-operator-lookup.patch > > I suggest to skip this refactoring patch. I don't think the way this is > sliced up is all that great, and it doesn't actually help with the > subsequent patches. Okay. > - src/backend/catalog/pg_constraint.c > > FindFKPeriodOpersAndProcs() could use a bit more top-level > documentation. Where does the input opclass come from? What are the > three output values? What is the business with "symmetric types"? Added and tried to clarify about the types. > - src/backend/commands/indexcmds.c > > GetOperatorFromWellKnownStrategy() is apparently changed to accept > InvalidOid for rhstype, but the meaning of this is not explained in > the function header. It's also not clear to me why an existing caller > is changed. This should be explained more thoroughly. It's not so much changing a param as removing one and adding another. The old param was unneeded because it's just the opclass's opcintype, and we're already passing the opclass. Then the new param lets you optionally ask for an operator that is not `opcintype op opcintype` but `opcintype op rhstype`. We need this because FKs compare fkattr <@ range_agg(pkattr)`, and range_agg returns a multirange, not a range. Even if we hard-code range_agg, the easiest way to get the operator is to use this function, passing ANYMULTIRANGEOID (but better is to pass whatever the referencedagg support func returns, as the now-separate multirange/custom type patch does). > - src/backend/commands/tablecmds.c > > is_temporal and similar should be renamed to with_period or similar > throughout this patch. Done. > In transformFkeyGetPrimaryKey(): > > * Now build the list of PK attributes from the indkey definition (we > - * assume a primary key cannot have expressional elements) > + * assume a primary key cannot have expressional elements, unless it > + * has a PERIOD) > > I think the original statement is still true even with PERIOD. The > expressional elements refer to expression indexes. I don't think we can > have a PERIOD marker on an expression? You're right: I wrote this back before PERIODs became GENERATED columns. Updated now. > - src/backend/utils/adt/ri_triggers.c > > Please remove the separate trigger functions for the period case. They > are the same as the non-period ones, so we don't need separate ones. > The difference is handled lower in the call stack, which I think is a > good setup. Removing the separate functions also removes a lot of extra > code in other parts of the patch. Done. The later patch for FKs with CASCADE/SET NULL/SET DEFAULT still has separate functions (since they call actually-different implementations), but I will see if I can unify things a bit more there. > - src/include/catalog/pg_constraint.h > > Should also update catalogs.sgml accordingly. Looks like you did this already in 030e10ff1a. > - src/test/regress/expected/without_overlaps.out > - src/test/regress/sql/without_overlaps.sql > > A few general comments on the tests: > > - In the INSERT commands, specify the column names explicitly. This > makes the tests easier to read (especially since the column order > between the PK and the FK table is sometimes different). Okay. > - Let's try to make it so that the inserted literals match the values > shown in the various error messages, so it's easier to match them up. > So, change the int4range literals to half-open notation. And also maybe > change the date output format to ISO. Done. Also changed the tsrange cols to daterange and made them YYYY-MM-DD. This is much easier to read IMO. Note there were already a few tsrange columns in the PK tests, so I changed those separately in the very first patch here. > - In various comments, instead of test FK "child", maybe use > "referencing table"? Instead of "parent", use "referenced table" (or > primary key table). When I read child and parent I was looking for > inheritance. Done. > - Consider truncating the test tables before each major block of tests > and refilling them with fresh data. So it's easier to eyeball the > tests. Otherwise, there is too much dependency on what earlier tests > left behind. Done. This will also let me reuse ids in the FOR PORTION OF partitioned table tests, but that's not done yet. > A specific question: > > In this test, a PERIOD marker on the referenced site is automatically > inferred from the primary key: > > +-- with inferred PK on the referenced table: > +CREATE TABLE temporal_fk_rng2rng ( > + id int4range, > + valid_at tsrange, > + parent_id int4range, > + CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT > OVERLAPS), > + CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD > valid_at) > + REFERENCES temporal_rng > +); > > In your patch, this succeeds. According to the SQL standard, it should > not. In subclause 11.8, syntax rule 4b: > > """ > Otherwise, the table descriptor of the referenced table shall include a > unique constraint UC that specifies PRIMARY KEY. The table constraint > descriptor of UC shall not include an application time period name. > """ > > So this case is apparently explicitly ruled out. > > (It might be ok to make an extension here, but then we should be > explicit about it.) Okay, I agree it doesn't match the standard. IMO our behavior is better, but the patches here should let you go either way. The main FK patch keeps the old behavior, but there is a follow-up patch doing what the standard says. There are some interesting implications, which you can see by looking at the test changes in that patch. Basically you can never give an inferred REFERENCES against a temporal table. Either your FK has a PERIOD element, and it fails because we exclude the PK's WITHOUT OVERLAPS in the inferred attributes, or your FK does not have a PERIOD element, and it fails because you want a PK side that is genuinely unique, but the PK index has a temporal definition of "unique" (and is not B-tree but GiST). I don't see any drawbacks from supporting inferred REFERENCES with temporal tables, so my vote is to break from the standard here, and *not* apply that follow-up patch. Should I add some docs about that? Also skipping the patch will cause some annoying merge conflicts, so let me know if that's what you choose and I'll handle them right away. Btw I tried checking what other vendors do here, but no one supports temporal FKs yet! MS SQL Server doesn't support application time at all. Oracle and MariaDB don't support temporal PKs or FKs. And IBM DB2 only supports temporal PKs. Actually DB2's docs in 2019 were *claiming* they supported temporal FKs, but it didn't work for me or at least one other person posting in their forums. And the latest docs no longer mention it.[1] I wrote about trying to make it work in my survey of other vendors.[2] The old docs are now a 404,[3] as is the forums post.[4] My DB2 test code is below in case anyone else wants to try.[5] So there is no precedent here for us to follow. Incidentally, here are two non-standard things I would like to add "some day": 1. FKs from non-temporal tables to temporal tables. Right now temporal tables are "contagious", which can be annoying. Maybe a non-temporal record is valid as long as a referenced temporal row exists at *any time*. You can't do that today. You can't even add an additional UNIQUE constraint, because there are surely duplicates that invalidate it. This kind of FK would be satisfied if *at least one* reference exists. 2. FKs from a single-timestamp table to a temporal table. Maybe the referring table is an "event" with no duration, but it is valid as long as the referenced table contains it. A workaround is to have a range that is `[t,t]`, but that's annoying. Anyway that's not important for these patches. As far as I can tell, whatever we choose re inferred PERIOD in REFERENCES keeps our options open for those ideas. One more thought: if we wanted to be cheekily compatible with the standard, we could infer *range types* that are WITHOUT OVERLAPs but not true PERIOD objects. "The table constraint descriptor of UC shall not include an application time period name." If it's a rangetype column, then it doesn't include a period name. :-P. So then we would skip the follow-up patch here but I could work it into the final patch for PERIOD support. This is probably not the wisest choice, although I guess it does let us defer deciding what to do. On Mon, Mar 11, 2024 at 7:45 PM jian he <jian.universality@gmail.com> wrote: > typo "referenced_agg", in the gist-extensibility.html page is "referencedagg" > <literal>WITHOUT PORTION</literal> should be <literal>WITHOUT OVERLAPS</literal> Good catch! Fixed. > + While the non-<literal>PERIOD</literal> columns are treated normally > + (and there must be at least one of them), > + the <literal>PERIOD</literal> column is not compared for equality. > the above sentence didn't say what is "normally"? > maybe we can do the following: > + While the non-<literal>PERIOD</literal> columns are treated > + normally for equality > + (and there must be at least one of them), > + the <literal>PERIOD</literal> column is not compared for equality. Reworked the language here. > my_range_agg_transfn error message is inconsistent? > `elog(ERROR, "range_agg_transfn called in non-aggregate context");` > `elog(ERROR, "range_agg must be called with a range");` > maybe just `my_range_agg_transfn`, instead of mention > {range_agg_transfn|range_agg} > similarly my_range_agg_finalfn error is also inconsistent. This matches what other aggs do (e.g. array_agg, json_agg, etc.) as well as the actual core range_agg code. And I think it is an appropriate difference. You only hit the first error if you are invoking the transfn directly, so that's what we should say. OTOH you hit the second error by calling the aggregate function, but with the wrong type. So the error message should mention the aggregate function. > my_range_agg_finalfn need `type_is_multirange(mltrngtypoid)`? This isn't part of the core range_agg_finalfn, so I'd rather not include it here. And I don't think it is needed. You would only get a non-multirange if the transfn does something wrong, and even if it does, the error will be caught and reported in multirange_get_typcache. On Mon, Mar 11, 2024 at 7:47 PM jian he <jian.universality@gmail.com> wrote: > maybe just change the tsrange type to daterange, then the dot out file > will be far less verbose. Agreed, done. > minor issues while reviewing v27, 0001 to 0004. > transformFkeyGetPrimaryKey comments need to update, > since bool pk_period also returned. pk_period is no longer returned in this latest patch. > +/* > + * FindFKComparisonOperators - > + * > + * Gets the operators for pfeqopOut, ppeqopOut, and ffeqopOut. > + * Sets old_check_ok if we can avoid re-validating the constraint. > + * Sets old_pfeqop_item to the old pfeqop values. > + */ > +static void > +FindFKComparisonOperators(Constraint *fkconstraint, > + AlteredTableInfo *tab, > + int i, > + int16 *fkattnum, > + bool *old_check_ok, > + ListCell **old_pfeqop_item, > + Oid pktype, Oid fktype, Oid opclass, > + Oid *pfeqopOut, Oid *ppeqopOut, Oid *ffeqopOut) > > I think the above comments is > `Sets the operators for pfeqopOut, ppeqopOut, and ffeqopOut.`. This whole function is removed. > + if (is_temporal) > + { > + if (!fkconstraint->fk_with_period) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_FOREIGN_KEY), > + errmsg("foreign key uses PERIOD on the referenced table but not the > referencing table"))); > + } > can be > if (is_temporal && !fkconstraint->fk_with_period) > ereport(ERROR, > (errcode(ERRCODE_INVALID_FOREIGN_KEY), > errmsg("foreign key uses PERIOD on the referenced table but not the > referencing table"))); The patch about inferred REFERENCES moves things around a bit, so this no longer applies. > + if (is_temporal) > + { > + if (!fkconstraint->pk_with_period) > + /* Since we got pk_attrs, one should be a period. */ > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_FOREIGN_KEY), > + errmsg("foreign key uses PERIOD on the referencing table but not the > referenced table"))); > + } > can be > if (is_temporal && !fkconstraint->pk_with_period) > /* Since we got pk_attrs, one should be a period. */ > ereport(ERROR, > (errcode(ERRCODE_INVALID_FOREIGN_KEY), > errmsg("foreign key uses PERIOD on the referencing table but not the > referenced table"))); Likewise. > refactor decompile_column_index_array seems unnecessary. > Peter already mentioned it at [1], I have tried to fix it at [2]. No, that conversation is about handling WITHOUT OVERLAPS, not PERIOD. Because the syntax is `valid_at WITHOUT OVERLAPS` but `PERIOD valid_at` (post vs pre), we must handle PERIOD inside the function. > I don't understand the "expression elements" in the comments, most of > the tests case is like Covered above in Peter's feedback. > ` > PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) > ` > + *pk_period = (indexStruct->indisexclusion); > can be > `+ *pk_period = indexStruct->indisexclusion;` No longer included here. On Wed, Mar 13, 2024 at 5:00 PM jian he <jian.universality@gmail.com> wrote: > @@ -118,12 +120,17 @@ typedef struct RI_ConstraintInfo > int16 confdelsetcols[RI_MAX_NUMKEYS]; /* attnums of cols to set on > * delete */ > char confmatchtype; /* foreign key's match type */ > + bool temporal; /* if the foreign key is temporal */ > int nkeys; /* number of key columns */ > int16 pk_attnums[RI_MAX_NUMKEYS]; /* attnums of referenced cols */ > int16 fk_attnums[RI_MAX_NUMKEYS]; /* attnums of referencing cols */ > Oid pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */ > Oid pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */ > Oid ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */ > + Oid period_contained_by_oper; /* operator for PERIOD SQL */ > + Oid agged_period_contained_by_oper; /* operator for PERIOD SQL */ > + Oid period_referenced_agg_proc; /* proc for PERIOD SQL */ > + Oid period_referenced_agg_rettype; /* rettype for previous */ > > the comment seems not clear to me. Here is my understanding about it: > period_contained_by_oper is the operator where a single period/range > contained by a single period/range. > agged_period_contained_by_oper is the operator oid where a period > contained by a bound of periods > period_referenced_agg_proc is the oprcode of the agged_period_contained_by_oper. > period_referenced_agg_rettype is the function > period_referenced_agg_proc returning data type. Expanded these comments a bit. Thanks to you both for such detailed, careful feedback! Rebased to 605062227f. If anything else comes up re FKs I'll tackle that first, but otherwise I think I will work on some of the outstanding issues in the FOR PORTION OF patch (e.g. trigger transition table names). I may experiment with handling the leftover inserts as a separate executor node. If anyone has advice there I'm happy to hear it! Yours, Paul [1] https://www.ibm.com/docs/en/db2/11.5?topic=statements-alter-table [2] https://illuminatedcomputing.com/posts/2019/08/sql2011-survey/ [3] https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/intro/src/tpc/db2z_integrity.html [4] https://www.ibm.com/developerworks/community/forums/html/topic?id=440e07ad-23ee-4b0a-ae23-8c747abca819 [5] Here is DB2 test code showing temporal FKs don't work. (Note they disobey the standard re declaring `PERIOD p (s, e)` not `PERIOD FOR p (s, e)`, and it must be named `business_time`.) ``` create table t (id integer not null, ds date not null, de date not null, name varchar(4000), period business_time (ds, de)); alter table t add constraint tpk primary key (id, business_time without overlaps) insert into t values (1, '2000-01-01', '2001-01-01', 'foo'); create table fk (id integer, ds date not null, de date not null, period business_time (ds, de)); -- all this fails: alter table fk add constraint fkfk foreign key (id, period business_time) references t (id, period business_time); alter table fk add constraint fkfk foreign key (id, business_time) references t (id, business_time); alter table fk add constraint fkfk foreign key (id, period business_time) references t; alter table fk add constraint fkfk foreign key (id, business_time) references t; alter table fk add constraint fkfk foreign key (id, period for business_time) references t; alter table fk add constraint fkfk foreign key (id, period for business_time) references t (id, period for business_time); alter table fk add constraint fkfk foreign key (id, business_time without overlaps) references t; alter table fk add constraint fkfk foreign key (id, business_time without overlaps) references t (id, business_time without overlaps); alter table fk add constraint fkfk foreign key (id) references t; alter table fk add constraint fkfk foreign key (id) references t (id); ```
Attachment
- v28-0004-Add-GiST-referencedagg-support-func.patch
- v28-0005-Support-multiranges-or-any-type-in-temporal-FKs.patch
- v28-0002-Add-temporal-FOREIGN-KEYs.patch
- v28-0001-Use-daterange-and-YMD-in-tests-instead-of-tsrang.patch
- v28-0003-Don-t-infer-PERIOD-on-PK-side-of-temporal-FK.patch
- v28-0006-Add-support-funcs-for-FOR-PORTION-OF.patch
- v28-0007-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v28-0008-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v28-0009-Add-PERIODs.patch
pgsql-hackers by date: