Thread: SQL:2011 application time
Hello, Here is a set of patches to add SQL:2011 application-time support (aka valid-time). Previous discussion was on https://www.postgresql.org/message-id/20200930073908.GQ1996@paquier.xyz but I thought I should update the email subject. There are four patches here: - Add PERIODs. - Add temporal PRIMARY KEY and UNIQUE constraints. - Add UPDATE/DELETE FOR PORTION OF. - Add temporal FOREIGN KEYs. The PERIOD patch is mostly Vik Fearing's work (submitted here a few years ago), so he should get credit for that! All patches have tests & documentation. I do have a few more tests I plan to write, and there are some questions for reviewers embedded in patches (mostly about when to lock and/or copy data structures). I've tried to format these as C++ comments to indicate they should be removed before committing. Throughout I've made sure that wherever SQL:2011 accepts a PERIOD, we also accept a range column. So in all these examples valid_at could be either one: PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) FOREIGN KEY (id, PERIOD valid_at) REFERENCES too (id, PERIOD valid_at) FOR PORTION OF valid_at FROM t1 TO t2 Range types are superior to PERIODs in many ways, so I think we should support both. For example you can SELECT them, WHERE them, GROUP BY them, pass them to functions, return them from functions, do arithmetic on them, index them, etc. In fact whether you use a PERIOD or a range, the implementation uses ranges a lot, since they are such a good fit. A temporal PK is really an exclusion constraint, etc. When you define a PERIOD, we find a matching range type and store its oid on the period record. If there are more than one range type we raise an error, but you can give a rangetype option to remove the ambiguity. This means we support PERIODs of any type (basically), not just dates & timestamps. According to SQL:2011 we should automatically set any columns used by a PERIOD to NOT NULL. I've ignored that requirement, since permitting nullable columns is strictly greater functionality: you can always make the columns NOT NULL if you like. Interpreting NULLs as unbounded fits better with our range types, and it means you don't have to use sentinels. (Timestamp has +-Infinity, but many types don't.) Oracle also accepts null PERIOD columns and treats them the same way. I don't think it would break anything though to force PERIOD columns to NOT NULL. If you hate sentinels you can just use range columns. But still I see no reason to force this on our users. In the FOR PORTION OF bounds I accept MINVALUE and MAXVALUE as special tokens. I chose the names to be consistent with partition syntax. This isn't part of the standard but seems nice. Here are a few other things to discuss: - My patch only adds application time. There is a separate patch to add system time: https://commitfest.postgresql.org/33/2316/ I don't foresee any serious conflicts between our work, and in general I think each patch implements its functionality at an appropriate (but different) level of abstraction. But I haven't looked at that patch recently. I'll try to give some comments during this commitfest. The one place they probably overlap is with defining PERIODs. Since system-time periods *must* be named SYSTEM_TIME, even that overlap should be slight, but it still might be worth accepting the PERIOD patch here before adopting either. Even SYSTEM_TIME ought to be recorded in information_schema.periods IIRC. - The biggest thing remaining to do is to add support for partitioned tables. I would love some help with that if anyone is interested. - Since temporal PKs are implemented with exclusion constraints they use GiST indexes, so you can't really use them without the btree_gist extension (unless *all* your key parts are ranges---which is how we test exclusion constraints). Personally I'm okay with this, since even exclusion constraints are pretty useless without that extension. But it seems like something to talk about. - At PgCon 2020 Vik suggested a different way of querying for FK checks, which he used in his own temporal tables extension. It is more complicated but he thinks it may be faster. I plan to try both and run some benchmarks. I'm not sure whether his approach will work with CASCADE/SET NULL/SET DEFAULT---but I haven't looked at it in a while. - It is hard to avoid a shift/reduce conflict in FOR PORTION OF <period_or_range> FROM <expr> TO <expr> because expressions may contain date INTERVALs that also may contain TO. So this is an error: FOR PORTION OF valid_at FROM '2018-03-01' AT TIME ZONE INTERVAL '2' HOUR TO '2019-01-01' but this works: FOR PORTION OF valid_at FROM ('2018-03-01' AT TIME ZONE INTERVAL '2' HOUR) TO '2019-01-01' I'm personally satisfied with that, but if anyone thinks it can be improved please let me know. It would be nice if the parser were smart enough to see that without a second TO, it must belong to FOR PORTION OF, not the interval. But *I'm* not smart enough to teach it that. :-) If only it could have a greater lookahead. . . . - Normally we return the number of rows affected by an UPDATE/DELETE. What do you think we should do when a FOR PORTION OF causes extra rows to be inserted? I'm not doing anything special here today. After all foreign keys don't do anything extra when they CASCADE/SET (to my knowledge). Also I think adding info about the inserted rows might be annoying, since I'd have to communicate it from within the trigger function. I'm really hoping no one asks for this. - Since PERIODs are a weird neither-fish-nor-foul thing (parsed a lot like a column, but also behaving like a constraint), they add a lot of tedious if-statements when they are used by an index or constraint. In many places I've used a zero attnum to signal that a component is really a PERIOD. (Range columns are easy since they really are a column.) I feel this approach is pretty ugly, so I will probably experiment a bit with a different way. If anyone else wants to take this on though, I'm grateful for the help. - It would be really cool if ON CONFLICT DO UPDATE had a temporal variant so it would INSERT the missing durations and UPDATE the existing ones. That's what Tom Johnston said the standard should have required in *Bitemporal Data*, and it does make things a lot easier on the client side. But that is something to do in a later patch. . . . Yours, Paul
Attachment
On Wed, Jun 30, 2021 at 10:39 AM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote: > Here is a set of patches to add SQL:2011 application-time support (aka > valid-time). Here is a small fix to prevent `FOR PORTION OF valid_at FROM MAXVALUE TO foo` or `FROM foo TO MINVALUE`. I rebased on latest master too. Yours, Paul
Attachment
On Fri, Jul 2, 2021 at 2:39 PM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote: > > On Wed, Jun 30, 2021 at 10:39 AM Paul A Jungwirth > <pj@illuminatedcomputing.com> wrote: > > Here is a set of patches to add SQL:2011 application-time support (aka > > valid-time). > > Here is a small fix to prevent `FOR PORTION OF valid_at FROM MAXVALUE > TO foo` or `FROM foo TO MINVALUE`. I rebased on latest master too. Here is a patch set that cleans up the catalog docs for pg_period. The columns have changed since that was written, and also we use a different sgml structure on those pages now. Note pg_period still contains a couple essentially-unused columns, perislocal and perinhcount. Those are intended for supporting table inheritance, so I've left them in. Paul
Attachment
On Sat, Jul 03, 2021 at 10:46:55AM -0700, Paul A Jungwirth wrote: > On Fri, Jul 2, 2021 at 2:39 PM Paul A Jungwirth > <pj@illuminatedcomputing.com> wrote: > > > > On Wed, Jun 30, 2021 at 10:39 AM Paul A Jungwirth > > <pj@illuminatedcomputing.com> wrote: > > > Here is a set of patches to add SQL:2011 application-time support (aka > > > valid-time). > > > > Here is a small fix to prevent `FOR PORTION OF valid_at FROM MAXVALUE > > TO foo` or `FROM foo TO MINVALUE`. I rebased on latest master too. > > Here is a patch set that cleans up the catalog docs for pg_period. The > columns have changed since that was written, and also we use a > different sgml structure on those pages now. Note pg_period still > contains a couple essentially-unused columns, perislocal and > perinhcount. Those are intended for supporting table inheritance, so > I've left them in. > Hi Paul, Thanks for working on this. It would be a great improvement. I wanted to test the patches but: patch 01: does apply but doesn't compile, attached the compile errors. patch 04: does not apply clean. Please fix and resend. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Attachment
On Sat, Sep 4, 2021 at 12:56 PM Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > > patch 01: does apply but doesn't compile, attached the compile errors. > patch 04: does not apply clean. Thanks for taking a look! I've rebased & made it compile again. v7 attached. Yours, Paul
Attachment
On Mon, Sep 6, 2021 at 12:53 PM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote:
On Sat, Sep 4, 2021 at 12:56 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> patch 01: does apply but doesn't compile, attached the compile errors.
> patch 04: does not apply clean.
Thanks for taking a look! I've rebased & made it compile again. v7 attached.
Yours,
Paul
Hi,
For v7-0001-Add-PERIODs.patch :
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
It seems the year (2018) should be updated to 2021.
For RemovePeriodById(), it seems table_open() can be called after SearchSysCache1(). This way, if HeapTupleIsValid(tup) is true, table_open() can be skipped.
For tablecmds.c, AT_PASS_ADD_PERIOD is defined as 5 with AT_PASS_ADD_CONSTR etc moved upward. Do we need to consider compatibility ?
There are a few TODO's such as:
+ * TODO: What about periods?
Are they going to be addressed in the next round of patches ?
There seems to be some overlap between ATExecAddPeriod() and AddRelationNewPeriod().
Is it possible to reduce code duplication ?
Cheers
On Mon, Sep 06, 2021 at 12:52:37PM -0700, Paul A Jungwirth wrote: > On Sat, Sep 4, 2021 at 12:56 PM Jaime Casanova > <jcasanov@systemguards.com.ec> wrote: > > > > patch 01: does apply but doesn't compile, attached the compile errors. > > patch 04: does not apply clean. > > Thanks for taking a look! I've rebased & made it compile again. v7 attached. > patch 01: does apply but gives a compile warning (which is fixed by patch 02) """ parse_utilcmd.c: In function ‘generateClonedIndexStmt’: parse_utilcmd.c:1730:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] Period *p = makeNode(Period); ^~~~~~ """ patch 03: produces these compile errors. analyze.c: In function ‘transformForPortionOfBound’: analyze.c:1171:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] A_Const *n2 = makeNode(A_Const); ^~~~~~~ analyze.c:1172:10: error: ‘union ValUnion’ has no member named ‘type’ n2->val.type = T_Null; ^ analyze.c:1172:18: error: ‘T_Null’ undeclared (first use in this function) n2->val.type = T_Null; ^~~~~~ analyze.c:1172:18: note: each undeclared identifier is reported only once for each function it appears in -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Fri, Sep 10, 2021 at 6:50 PM Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > > patch 01: does apply but gives a compile warning (which is fixed by patch > 02) > [snip] > patch 03: produces these compile errors. I did a rebase and fixed this new error, as well as the warnings. On Mon, Sep 6, 2021 at 1:40 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > + * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group > > It seems the year (2018) should be updated to 2021. Done. > For RemovePeriodById(), it seems table_open() can be called after SearchSysCache1(). This way, if HeapTupleIsValid(tup)is true, table_open() can be skipped. This seems like it permits a race condition when two connections both try to drop the period, right? > For tablecmds.c, AT_PASS_ADD_PERIOD is defined as 5 with AT_PASS_ADD_CONSTR etc moved upward. Do we need to consider compatibility? I don't think there is a compatibility problem---can you explain? These symbols aren't used outside tablecmds.c and the values aren't saved anywhere AFAIK. > There are a few TODO's such as: > Are they going to be addressed in the next round of patches ? These are mostly questions I'm hoping a reviewer can help me answer, but I'll take a pass through them and see which I can remove myself. Several are for adding support for partitioned tables, where I would definitely appreciate help. > There seems to be some overlap between ATExecAddPeriod() and AddRelationNewPeriod(). > Is it possible to reduce code duplication ? I've refactored those functions to remove some duplication, but I think I prefer the old version---let me know if you have suggestions to avoid the duplication in a nicer way. Oh also I realized fp_triggers.c wasn't included in the last few patch files---I'm sorry about that! Latest files attached. Thanks for the reviews! Paul
Attachment
So I've been eagerly watching this thread and hoping to have time to devote to it. I've also been looking at the thread at https://www.postgresql.org/message-id/CALAY4q8Pp699qv-pJZc4toS-e2NzRJKrvaX-xqG1aqj2Q+Ww-w@mail.gmail.com that covers system versioning, and per our conversation far too long ago (again, my bad) it's obvious that the two efforts shouldn't do anything that would be in conflict with one another, as we eventually have to support bitemporal [1] tables: tables that have both system versioning and an application period.
Below is a list of observations and questions about this proposed patch of itself in isolation, but mostly about how it relates to the work being done for system versioning.
Below is a list of observations and questions about this proposed patch of itself in isolation, but mostly about how it relates to the work being done for system versioning.
1. This patch creates a pg_period catalog table, whereas the system versioning relies on additions to pg_attribute to identify the start/end columns. Initially I thought this was because it was somehow possible to have multiple application periods defined on a table, but in reading [1] I see that there are some design suppositions that would make a second application period impossible[2]. I can also see where having this table would facilitate the easy creation of INFORMATION_SCHEMA.PERIODS. I was previously unaware that this info schema table was a thing, but I have found references to it, though I'm unclear as to whether it's supposed to have information about system versioned tables in it as well.
Q 1.1. Would a bitemporal table have two entries in that view?
Q 1.2. Could you see being able to implement this without pg_period, using only additions to pg_attribute (start/end for system temporal, start/end for application, plus an addition for period name)?
Q 1.3. Can you see a way to represent the system versioning in pg_period such that bitemporal tables were possible?
Q 1.1. Would a bitemporal table have two entries in that view?
Q 1.2. Could you see being able to implement this without pg_period, using only additions to pg_attribute (start/end for system temporal, start/end for application, plus an addition for period name)?
Q 1.3. Can you see a way to represent the system versioning in pg_period such that bitemporal tables were possible?
2. The system versioning effort has chosen 'infinity' as their end-time value, whereas you have chosen NULL as that makes sense for an unbounded range. Other databases seem to leverage '9999-12-31 23:59:59' (SQLServer, IIRC) whereas some others seem to used '2999-12-31 23:59:59' but those might have been home-rolled temporal implementations. To further add to the confusion, the syntax seems to specify the keyword of MAXVALUE, which further muddies things. The system versioning people went with 'infinity' seemingly because it prescribe and end to the world like SQLServer did, but also because it allowed for a primary key based on (id, endtime) and that's just not possible with NULL endtime values.
Q 2.1. Do you have any thoughts about how to resolve this notational logjam?
Q 2.1. Do you have any thoughts about how to resolve this notational logjam?
3. I noticed some inconsistency in the results from various "SELECT * FROM portion_of_test" examples. In some, the "valid_at" range is shown but not columns that make it up, and in some others, the "valid_from" and "valid_to" columns are shown, with no mention of the period. From what I've seen, the period column should be invisible unless invoked, like ctid or xmin.
4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE simply confounded me. I googled around for it, but could find no matches for postgres exception in mailing list discussions circa 2003. I tried it out myself and, lo and behold
# SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE;timezone---------------------2018-03-04 05:02:00(1 row)
I really didn't expect that to work, or even "work". I can see that it added 2 minutes to UTC's perspective on my local concept of midnight, but I don't understand what it's supposed to mean.
Q 4.1. What does it mean?
5. I haven't seen any actual syntax conflicts between this patch and the system versioning patch. Both teams added basically the same keywords, though I haven't dove more deeply into any bison incompatibilities. Still, it's a great start.
6. Overall, I'm really excited about what this will mean for data governance in postgres.
[2] In the bitemporal table example in [1] - the application period get the defined primary key, and the system_time period would be merely unique
On Mon, Sep 13, 2021 at 12:12 AM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote:
On Fri, Sep 10, 2021 at 6:50 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> patch 01: does apply but gives a compile warning (which is fixed by patch
> 02)
> [snip]
> patch 03: produces these compile errors.
I did a rebase and fixed this new error, as well as the warnings.
On Mon, Sep 6, 2021 at 1:40 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> + * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
>
> It seems the year (2018) should be updated to 2021.
Done.
> For RemovePeriodById(), it seems table_open() can be called after SearchSysCache1(). This way, if HeapTupleIsValid(tup) is true, table_open() can be skipped.
This seems like it permits a race condition when two connections both
try to drop the period, right?
> For tablecmds.c, AT_PASS_ADD_PERIOD is defined as 5 with AT_PASS_ADD_CONSTR etc moved upward. Do we need to consider compatibility ?
I don't think there is a compatibility problem---can you explain?
These symbols aren't used outside tablecmds.c and the values aren't
saved anywhere AFAIK.
> There are a few TODO's such as:
> Are they going to be addressed in the next round of patches ?
These are mostly questions I'm hoping a reviewer can help me answer,
but I'll take a pass through them and see which I can remove myself.
Several are for adding support for partitioned tables, where I would
definitely appreciate help.
> There seems to be some overlap between ATExecAddPeriod() and AddRelationNewPeriod().
> Is it possible to reduce code duplication ?
I've refactored those functions to remove some duplication, but I
think I prefer the old version---let me know if you have suggestions
to avoid the duplication in a nicer way.
Oh also I realized fp_triggers.c wasn't included in the last few patch
files---I'm sorry about that!
Latest files attached. Thanks for the reviews!
Paul
Hi Corey, Thanks for all the good questions! > 1. This patch creates a pg_period catalog table, whereas the system versioning relies on additions to pg_attribute to identifythe start/end columns. Initially I thought this was because it was somehow possible to have multiple applicationperiods defined on a table, but in reading [1] I see that there are some design suppositions that would make asecond application period impossible[2]. I can also see where having this table would facilitate the easy creation of INFORMATION_SCHEMA.PERIODS.I was previously unaware that this info schema table was a thing, but I have found referencesto it, though I'm unclear as to whether it's supposed to have information about system versioned tables in it aswell. Yes, information_schema.periods is given by the standard. Having pg_period seems like a natural place to store periods, since they are separate entities. I think that is a better design than just storing them as extra fields in pg_attribute. It follows normal normalization rules. The standard forbids multiple application-time periods per table. From SQL:2011 in the SQL/Foundation section (7IWD2-02-Foundation-2011-12.pdf available from http://www.wiscorp.com/sql20nn.zip) under 11.27 <add table period definition>: > 5) If <table period definition> contains <application time period specification> ATPS, then: > b) The table descriptor of T shall not include a period descriptor other than a system-time period descriptor. In other words you can add both a SYSTEM TIME period and one other application-time period (whose name is your choice), but if you already have an application-time period, you can't add another one. I also checked other RDBMSes and none of them allow it either: In Mariadb 10.6.4 (the latest) I get "ERROR 4154 (HY000); Cannot specify more than one application-time period". Oracle disallows it with a vague error: SQL> create table t2 (id int, valid_from date, valid_til date, period for valid_at (valid_from, valid_til), period for valid_at2 valid_from, valid_til)); create table t2 (id int, valid_from date, valid_til date, period for valid_at (valid_from, valid_til), period for valid_at2 (valid_from, valid_til)) * ERROR at line 1: ORA-55603: invalid flashback archive or valid time period command (Using different start/end columns for each period doesn't change the result.) In IBM DB2 you can only have one because application-time periods must be named "business_time" (not joking). Mssql (2019) doesn't support application periods. Personally I feel like it's a weird limitation and I wouldn't mind supporting more, but my current implementation only allows for one, and I'd have to rethink some things to do it differently. Also: I think information_schema.periods *should* include SYSTEM_TIME periods. The spec says (in SQL/Schemata, file 7IWD2-11-Schemata-2011-12.pdf at the link above), "The PERIODS base table has one row for each period defined for a table. It effectively contains a representation of the period descriptors." It doesn't say anything about excluding system-time periods. I checked mariadb, mssql, oracle, and db2, and I could only find this table in db2, as syscat.periods. It includes both application-time and system-time periods. The spec calls for the columns table_catalog, table_schema, table_name, period_name, start_column_name, and end_column_name. There isn't a column to distinguish the period type, but since a period is a system-time period iff its name is "SYSTEM_TIME", technically such a column isn't needed. The db2 columns are periodname, tabschema, tabname, begincolname, endcolname, periodtype, historytabschema, and historytabname. The periodtype column is either A or S (for application-time or system-time). > Q 1.1. Would a bitemporal table have two entries in that view? Yes. > Q 1.2. Could you see being able to implement this without pg_period, using only additions to pg_attribute (start/end forsystem temporal, start/end for application, plus an addition for period name)? Not just period name, but also the range type associated with the period (which should be determined at period creation, so that you can pass an option to disambiguate if there are two ranges defined for the same base type), the constraint oid (to prevent end <= start), and some more data for inherited tables (not really used yet). It seems ugly to hang all these extra values on a pg_attribute record. > Q 1.3. Can you see a way to represent the system versioning in pg_period such that bitemporal tables were possible? Yes. Even though the name "SYSTEM_TIME" is technically enough, I'd still include a pertype column to make distinguishing system vs application periods easier and more obvious. > 2. The system versioning effort has chosen 'infinity' as their end-time value, whereas you have chosen NULL as that makessense for an unbounded range. Other databases seem to leverage '9999-12-31 23:59:59' (SQLServer, IIRC) whereas someothers seem to used '2999-12-31 23:59:59' but those might have been home-rolled temporal implementations. To furtheradd to the confusion, the syntax seems to specify the keyword of MAXVALUE, which further muddies things. The systemversioning people went with 'infinity' seemingly because it prescribe and end to the world like SQLServer did, butalso because it allowed for a primary key based on (id, endtime) and that's just not possible with NULL endtime values. I think it's a little weird that our system-time patch mutates your primary key. None of the other RDMBSes do that. I don't think it's incompatible (as long as the system time patch knows how to preserve the extra period/range data in an application-time temporal key), but it feels messy to me. I would prefer if system-time and application-time used the same value to mean "unbounded". Using null means we can support any type (not just types with +-Infinity). And it pairs nicely with range types. If the only reason for system-time to use Infinity is the primary key, I think it would be better not to mutate the primary key (and store the historical records in a separate table as other RDMSes do). Btw Oracle also uses NULL to mean "unbounded". We presently forbid PKs from including expressions, but my patch lifts that exception so it can index a rangetype expression built from the period start & end columns. So even if we must include the system-time end column in a PK, perhaps it can use a COALESCE expression to store Infinity even while using NULL to signify "currently true" from a user perspective. > 3. I noticed some inconsistency in the results from various "SELECT * FROM portion_of_test" examples. In some, the "valid_at"range is shown but not columns that make it up, and in some others, the "valid_from" and "valid_to" columns areshown, with no mention of the period. From what I've seen, the period column should be invisible unless invoked, likectid or xmin. In most cases the tests test the same functionality with both PERIODs and rangetype columns. For FKs they test all four combinations of PERIOD-referencing-PERIOD, PERIOD-referencing-range, range-referencing-PERIOD, and range-referencing-range. If valid_at is a genuine column, it is included in SELECT *, but not if it is a PERIOD. > 4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE simply confounded me. Me too! I have no idea what that is supposed to mean. But that behavior predates my patch. I only had to deal with it because it creates a shift-reduce conflict with `FOR PORTION OF valid_at FROM x TO y`, where x & y are expressions. I asked about this syntax at my PgCon 2020 talk, but I haven't ever received an answer. Perhaps someone else knows what this kind of INTERVAL means (as a modifier of a time value). > 5. I haven't seen any actual syntax conflicts between this patch and the system versioning patch. Both teams added basicallythe same keywords, though I haven't dove more deeply into any bison incompatibilities. Still, it's a great start. I think that's right. Early on the other patch used `FOR PERIOD SYSTEM TIME (x, y)` instead of the standard `FOR PERIOD SYSTEM_TIME (x, y)` but I believe that was fixed, so that the period name is an identifier and not two keywords. > 6. Overall, I'm really excited about what this will mean for data governance in postgres. Me too, and thank you for the detailed review! Yours, Paul
In IBM DB2 you can only have one because application-time periods must
be named "business_time" (not joking).
I saw that as well, and it made me think that someone at IBM is a fan of Flight Of The Conchords.
Personally I feel like it's a weird limitation and I wouldn't mind
supporting more, but my current implementation only allows for one,
and I'd have to rethink some things to do it differently.
I'm satisfied that it's not something we need to do in the first MVP.
Yes. Even though the name "SYSTEM_TIME" is technically enough, I'd
still include a pertype column to make distinguishing system vs
application periods easier and more obvious.
SYSTEM_TIME seems to allow for DATE values in the start_time and end_time fields, though I cannot imagine how that would ever be practical, unless it were somehow desirable to reject subsequent updates within a 24 hour timeframe. I have seen instances where home-rolled application periods used date values, which had similar problems where certain intermediate updates would simply have to be discarded in favor of the one that was still standing at midnight.
> 2. The system versioning effort has chosen 'infinity' as their end-time value, whereas you have chosen NULL as that makes sense for an unbounded range. Other databases seem to leverage '9999-12-31 23:59:59' (SQLServer, IIRC) whereas some others seem to used '2999-12-31 23:59:59' but those might have been home-rolled temporal implementations. To further add to the confusion, the syntax seems to specify the keyword of MAXVALUE, which further muddies things. The system versioning people went with 'infinity' seemingly because it prescribe and end to the world like SQLServer did, but also because it allowed for a primary key based on (id, endtime) and that's just not possible with NULL endtime values.
I think it's a little weird that our system-time patch mutates your
primary key. None of the other RDMBSes do that. I don't think it's
incompatible (as long as the system time patch knows how to preserve
the extra period/range data in an application-time temporal key), but
it feels messy to me.
Per outline below, I'm proposing an alternate SYSTEM_TIME implementation that would leave the PK as-is.
I would prefer if system-time and application-time used the same value
to mean "unbounded". Using null means we can support any type (not
just types with +-Infinity). And it pairs nicely with range types. If
the only reason for system-time to use Infinity is the primary key, I
think it would be better not to mutate the primary key (and store the
historical records in a separate table as other RDMSes do).
The two "big wins" of infinity seemed (to me) to be:
1. the ability to add "AND end_time = 'infinity'" as a cheap way to get current rows
2. clauses like "WHERE CURRENT_DATE - 3 BETWEEN start_time AND end_time" would work. Granted, there's very specific new syntax to do that properly, but you know somebody's gonna see the columns and try to do it that way.
1. the ability to add "AND end_time = 'infinity'" as a cheap way to get current rows
2. clauses like "WHERE CURRENT_DATE - 3 BETWEEN start_time AND end_time" would work. Granted, there's very specific new syntax to do that properly, but you know somebody's gonna see the columns and try to do it that way.
Btw Oracle also uses NULL to mean "unbounded".
Huh, I missed that one. That is good in that it gives some precedence to how you've approached it.
We presently forbid PKs from including expressions, but my patch lifts
that exception so it can index a rangetype expression built from the
period start & end columns. So even if we must include the system-time
end column in a PK, perhaps it can use a COALESCE expression to store
Infinity even while using NULL to signify "currently true" from a user
perspective.
Either way seems viable, but I understand why you want to leverage ranges in this way.
> 3. I noticed some inconsistency in the results from various "SELECT * FROM portion_of_test" examples. In some, the "valid_at" range is shown but not columns that make it up, and in some others, the "valid_from" and "valid_to" columns are shown, with no mention of the period. From what I've seen, the period column should be invisible unless invoked, like ctid or xmin.
In most cases the tests test the same functionality with both PERIODs
and rangetype columns. For FKs they test all four combinations of
PERIOD-referencing-PERIOD, PERIOD-referencing-range,
range-referencing-PERIOD, and range-referencing-range. If valid_at is
a genuine column, it is included in SELECT *, but not if it is a
PERIOD.
Ok, I'll have to look back over the test coverage to make sure that I understand the behavior now.
> 4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE simply confounded me.
Me too! I have no idea what that is supposed to mean. But that
behavior predates my patch. I only had to deal with it because it
creates a shift-reduce conflict with `FOR PORTION OF valid_at FROM x
TO y`, where x & y are expressions. I asked about this syntax at my
PgCon 2020 talk, but I haven't ever received an answer. Perhaps
someone else knows what this kind of INTERVAL means (as a modifier of
a time value).
I think I'll open this as a separate thread, because it would simplify matters if we can reject this nonsense syntax.
This was the alternative method of system versioning I proposed recently in the system versioning thread
1. The regular table remains unchanged, but a pg_class attribute named "relissystemversioned" would be set to true
2. I'm unsure if the standard allows dropping a column from a table while it is system versioned, and the purpose behind system versioning makes me believe the answer is a strong "no" and requiring DROP COLUMN to fail on relissystemversioned = 't' seems pretty straightforward.3. The history table would be given a default name of $FOO_history (space permitting), but could be overridden with the history_table option.4. The history table would have relkind = 'h'5. The history table will only have rows that are not current, so it is created empty.6. As such, the table is effectively append-only, in a way that vacuum can actually leverage, and likewise the fill factor of such a table should never be less than 100.7. The history table could only be updated only via system defined triggers (insert,update,delete, alter to add columns), or row migration similar to that found in partitioning. It seems like this would work as the two tables working as partitions of the same table, but presently we can't have multi-parent partitions.8. The history table would be indexed the same as the base table, except that all unique indexes would be made non-unique, and an index of pk + start_time + end_time would be added9. The primary key of the base table would remain the existing pk vals, and would basically function normally, with triggers to carry forth changes to the history table. The net effect of this is that the end_time value of all rows in the main table would always be the chosen "current" value (infinity, null, 9999-12-31, etc) and as such might not actually _need_ to be stored.10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table directly with no quals to add.11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, then the query would do a union of the base table and the history table with quals applied to both.12. It's a fair question whether the history table would be something that could be queried directly. I'm inclined to say no, because that allows for things like SELECT FOR UPDATE, which of course we'd have to reject.13. If a history table is directly referenceable, then SELECT permission can be granted or revoked as normal, but all insert/update/delete/truncate options would raise an error.14. DROP SYSTEM VERSIONING from a table would be quite straightforward - the history table would be dropped along with the triggers that reference it, setting relissystemversioned = 'f' on the base table.
The benefits to your effort here would be:
1. No change to the primary key except for the ones dictated by application period
2. The INFORMATION_SCHEMA view need merely take into account The pg_class.relkind = 'h' entries
3. system versioning is no longer mutating (trigger on X updates X), which eliminates the possibility that application period triggers get into a loop
4. DROP SYSTEM VERSIONING would be entirely transparent to application versioning.
Thoughts?
On Sat, Sep 18, 2021 at 5:46 PM Corey Huinker <corey.huinker@gmail.com> wrote:
I agree that for SYSTEM_TIME it doesn't make much sense to use anything but the smallest time granularity.
Yes. OTOH there is equivalent syntax for ranges, e.g. `valid_at @> now()`. But if you had a real PERIOD then that wouldn't be available, since you can't use a PERIOD as an expression. Personally I think that's a shame, and I wonder if PERIODs should be another kind of expression (much like a column value) that evaluates to an equivalent range. Then you'd get all kinds of operators & functions that work with them, you could `SELECT` them, `GROUP BY` them, pass them to functions, etc.
The spec doesn't say anything about using PERIODs in those places, but it *does* have a section on period *predicates*, which seem to be allowed anywhere you can put an expression. The spec's discussion of this is in 4.14.2 ("Operations involving periods") and 8.20 ("<period predicate>"), and says there should be predicates for overlap, equals, contains, precedes, succeeds, immediately precedes, and immediately succeeds. So in the spec, the smallest possible "element" is not a bare PERIOD, but rather these predicates. My patch doesn't include these (it's a lot of new syntax), and no other RDBMS seems to have implemented them. I'm inclined to just treat PERIODs like ranges, or at least maybe let you cast from one to another. (Casting is weird though since if a bare PERIOD isn't a valid expression, what are you casting from/to?)
I should add that using +-Infinity for application-time bounds is completely acceptable under my patch; you just have the option to use NULL instead. So your examples of filtering above are fine. There aren't any operations where we have to set a bounded rangepart to unbounded, so we never pass a NULL; only the user would do that. We do bless NULLs by translating MINVALUE/MAXVALUE to NULL, but that is necessary to support arbitrary types. Even that could be refined so that we use +-Infinity when available but NULL elsewhere. Or we could just drop MINVALUE/MAXVALUE entirely. It's my own addition to make sentinels less arbitrary; it's not in the standard.
One of my design goals was to let people favor ranges over PERIODs if they like. Forcing people to use +-Infinity doesn't completely eliminate that goal, but it does mean your ranges are different than you're used to seeing (`[2020-01-01, Infinity)' vs [2020-01-01,)`. More importantly you can only use {date,ts,tstz}range for application-time periods, not other rangetypes. So I'd prefer to keep NULL bounds *possible*, even if MINVALUE/MAXVALUE aren't giving it a sanction.
+1 so far. Behavior of DDL in temporal tables is almost untouched even in the academic literature I've read. (My bibliography mentions a few places that at least mention that it's a hard problem.) Forbidding to drop a column seems pretty harsh---but on the other hand that's just the tip of the iceberg, so failing is probably the practical choice. For example what happens to old rows if you add a NOT NULL constraint? For application-time we can make the user responsible for figuring out the most sensible thing, but for SYSTEM_TIME we have to figure that out ourselves. But what about column type changes, or domains? What about removing an enum option? Or adding a CHECK constraint? With SYSTEM_TIME the user is supposed to be unable to change the history data, so they can't accommodate it to future requirements.
I don't think they should be sibling partitions, but I do think it would be cool if you could ask for the history table to be partitioned. Mariadb offers a way to do this (see my blog post comparing SQL:2011 implementations). It doesn't have to be in the first patch though, and it's not part of the standard.
Is there any value to indexing both start_time and end_time? Just one already takes you to a single row.
The system-time code would need to know how to handle application-time PKs since they are a little different, but that's not hard. And it still is just adding a column (or two if you think they should both be there).
The history table also should not have any FKs, and no FKs should reference it.
Interesting thought that we wouldn't really even need to store the end time. I don't have an opinion about whether the optimization is worth the complexity, but yeah it seems possible.
I like this, but it means people can't filter directly on the columns themselves as you suggest above. Can we detect when they're doing that? Keep in mind it might be happening inside a user-defined function, etc. So perhaps it is safer to always use the UNION.
It seems to break the abstraction to let people query the history table directly. OTOH sometimes it's helpful to see behind the curtain. I could go either way here, but I slightly favor letting people do it.
SYSTEM_TIME seems to allow for DATE values in the start_time and end_time fields,though I cannot imagine how that would ever be practical, unless it were somehowdesirable to reject subsequent updates within a 24 hour timeframe.
I agree that for SYSTEM_TIME it doesn't make much sense to use anything but the smallest time granularity.
The two "big wins" of infinity seemed (to me) to be:
1. the ability to add "AND end_time = 'infinity'" as a cheap way to get current rows
2. clauses like "WHERE CURRENT_DATE - 3 BETWEEN start_time AND end_time" would work.
Yes. OTOH there is equivalent syntax for ranges, e.g. `valid_at @> now()`. But if you had a real PERIOD then that wouldn't be available, since you can't use a PERIOD as an expression. Personally I think that's a shame, and I wonder if PERIODs should be another kind of expression (much like a column value) that evaluates to an equivalent range. Then you'd get all kinds of operators & functions that work with them, you could `SELECT` them, `GROUP BY` them, pass them to functions, etc.
The spec doesn't say anything about using PERIODs in those places, but it *does* have a section on period *predicates*, which seem to be allowed anywhere you can put an expression. The spec's discussion of this is in 4.14.2 ("Operations involving periods") and 8.20 ("<period predicate>"), and says there should be predicates for overlap, equals, contains, precedes, succeeds, immediately precedes, and immediately succeeds. So in the spec, the smallest possible "element" is not a bare PERIOD, but rather these predicates. My patch doesn't include these (it's a lot of new syntax), and no other RDBMS seems to have implemented them. I'm inclined to just treat PERIODs like ranges, or at least maybe let you cast from one to another. (Casting is weird though since if a bare PERIOD isn't a valid expression, what are you casting from/to?)
I should add that using +-Infinity for application-time bounds is completely acceptable under my patch; you just have the option to use NULL instead. So your examples of filtering above are fine. There aren't any operations where we have to set a bounded rangepart to unbounded, so we never pass a NULL; only the user would do that. We do bless NULLs by translating MINVALUE/MAXVALUE to NULL, but that is necessary to support arbitrary types. Even that could be refined so that we use +-Infinity when available but NULL elsewhere. Or we could just drop MINVALUE/MAXVALUE entirely. It's my own addition to make sentinels less arbitrary; it's not in the standard.
One of my design goals was to let people favor ranges over PERIODs if they like. Forcing people to use +-Infinity doesn't completely eliminate that goal, but it does mean your ranges are different than you're used to seeing (`[2020-01-01, Infinity)' vs [2020-01-01,)`. More importantly you can only use {date,ts,tstz}range for application-time periods, not other rangetypes. So I'd prefer to keep NULL bounds *possible*, even if MINVALUE/MAXVALUE aren't giving it a sanction.
This was the alternative method of system versioning I proposed recently in the system versioning thread
1. The regular table remains unchanged, but a pg_class attribute named "relissystemversioned" would be set to true
2. I'm unsure if the standard allows dropping a column from a table while it is system versioned, and the purpose behind system versioning makes me believe the answer is a strong "no" and requiring DROP COLUMN to fail on relissystemversioned = 't' seems pretty straightforward.
3. The history table would be given a default name of $FOO_history (space permitting), but could be overridden with the history_table option.
4. The history table would have relkind = 'h'
+1 so far. Behavior of DDL in temporal tables is almost untouched even in the academic literature I've read. (My bibliography mentions a few places that at least mention that it's a hard problem.) Forbidding to drop a column seems pretty harsh---but on the other hand that's just the tip of the iceberg, so failing is probably the practical choice. For example what happens to old rows if you add a NOT NULL constraint? For application-time we can make the user responsible for figuring out the most sensible thing, but for SYSTEM_TIME we have to figure that out ourselves. But what about column type changes, or domains? What about removing an enum option? Or adding a CHECK constraint? With SYSTEM_TIME the user is supposed to be unable to change the history data, so they can't accommodate it to future requirements.
5. The history table will only have rows that are not current, so it is created empty.
6. As such, the table is effectively append-only, in a way that vacuum can actually leverage, and likewise the fill factor of such a table should never be less than 100.
7. The history table could only be updated only via system defined triggers (insert,update,delete, alter to add columns), or row migration similar to that found in partitioning. It seems like this would work as the two tables working as partitions of the same table, but presently we can't have multi-parent partitions.
I don't think they should be sibling partitions, but I do think it would be cool if you could ask for the history table to be partitioned. Mariadb offers a way to do this (see my blog post comparing SQL:2011 implementations). It doesn't have to be in the first patch though, and it's not part of the standard.
8. The history table would be indexed the same as the base table, except that all unique indexes would be made non-unique, and an index of pk + start_time + end_time would be added
Is there any value to indexing both start_time and end_time? Just one already takes you to a single row.
The system-time code would need to know how to handle application-time PKs since they are a little different, but that's not hard. And it still is just adding a column (or two if you think they should both be there).
The history table also should not have any FKs, and no FKs should reference it.
9. The primary key of the base table would remain the existing pk vals, and would basically function normally, with triggers to carry forth changes to the history table. The net effect of this is that the end_time value of all rows in the main table would always be the chosen "current" value (infinity, null, 9999-12-31, etc) and as such might not actually _need_ to be stored.
Interesting thought that we wouldn't really even need to store the end time. I don't have an opinion about whether the optimization is worth the complexity, but yeah it seems possible.
10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table directly with no quals to add.
11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, then the query would do a union of the base table and the history table with quals applied to both.
I like this, but it means people can't filter directly on the columns themselves as you suggest above. Can we detect when they're doing that? Keep in mind it might be happening inside a user-defined function, etc. So perhaps it is safer to always use the UNION.
12. It's a fair question whether the history table would be something that could be queried directly. I'm inclined to say no, because that allows for things like SELECT FOR UPDATE, which of course we'd have to reject.
13. If a history table is directly referenceable, then SELECT permission can be granted or revoked as normal, but all insert/update/delete/truncate options would raise an error.
It seems to break the abstraction to let people query the history table directly. OTOH sometimes it's helpful to see behind the curtain. I could go either way here, but I slightly favor letting people do it.
14. DROP SYSTEM VERSIONING from a table would be quite straightforward - the history table would be dropped along with the triggers that reference it, setting relissystemversioned = 'f' on the base table.
I like this approach a lot, and I think it's a better design than carrying all the history inside the main table. I also like how bitemporal will Just Work^TM. One is in user-space and the other is controlled by Postgres---which fits the intention.
Yours,
Paul
Here are some new patches rebased on the latest master. I haven't made any substantive changes, but I should have time soon to take a stab at supporting partitioned tables and removing some of my own TODOs (things like making sure I'm locking things correctly). I don't think there is any outstanding feedback other than that. But in the meantime here are some up-to-date patches. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On Tue, Nov 16, 2021 at 3:55 PM Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
I haven't made any substantive changes, but I should have time soon to
take a stab at supporting partitioned tables and removing some of my own
TODOs (things like making sure I'm locking things correctly).
Hello,
Here are updated patches. They are rebased and clean up some of my TODOs. Here is what remains:
- Various TODOs asking for advice about concurrency things: where to lock, when to copy structs, etc. I'd appreciate some review on these from someone more experienced than me.
- Supporting FOR PORTION OF against updateable views. I'll keep working on this, but I thought there was enough progress to pass along new patches in the meantime.
- Support partitioned tables. I think this is a medium-size effort, and I'm not sure whether it's really needed for pg 15 or something we can add later. I'm going to do my best to get it done though. (I should have more time for this project now: having a sixth baby recently made side projects challenging for a while, but lately things have been getting easier.) Partitioning could use some design discussion though, both for application time alone and for bitemporal tables (so overlapping with the system time work). Here are some thoughts so far:
- Creating a PERIOD on a partitioned table should automatically create the PERIOD (and associated constraints) on the child tables. This one seems easy and I'll try to get it done soon.
- Sort of related, but not strictly partitioning: CREATE TABLE LIKE should have a new INCLUDING PERIODS option. (I'm tempted to include this under INCLUDING CONSTRAINTS, but I think a separate option is nicer since it gives more control.)
- If you partition by something in the scalar part of the temporal PK, that's easy. I don't think we have to do anything special there. I'd like to add some tests about it though.
- We should allow temporal primary keys on the top-level partitioned table, even though they are essentially exclusion constraints. Whereas in the general case an exclusion constraint cannot prove its validity across all the tables, a temporal PK *can* prove its validity so long the partition key includes at least one scalar part of the temporal PK (so that all records for one "entity" get routed to the same table).
- If you partition by the temporal part of the temporal PK, things are harder. I'm inclined to forbid this, at least for v15. Suppose you partition by the start time. Then you wind up with the same entity spread across several tables, so you can't validate the overall exclusion constraint anymore.
- OTOH you *could* partition by application-time itself (not start time alone nor end time alone) where each partition has application-time ranges/periods that are trimmed to fit within that partition's limits. Then since each partition is responsible for a non-overlapping time period, you could validate the overall exclusion constraint. You'd just have to add some logic to tuple re-routing that could transform single records into multiple records. For example if each partition holds a different year and you INSERT a record that is valid for a decade, you'd have to insert one row into ten partitions, and change the application-time range/period of each row appropriately. This is a special kind of range partitioning. I don't have any ideas how to make hash or list partitioning work on the temporal part of the PK. I don't think we should allow it.
- Partitioning by application time requires no special syntax. Partitioning by system time (if that's desired) would probably require extra (non-standard) syntax. Mariadb has this: https://mariadb.com/kb/en/system-versioned-tables/#storing-the-history-separately Perhaps that is orthogonal to application-time partitioning though. It sounds like people think we should store non-current system time in a separate table (I agree), and in that case I think a bitemporal table that is partitioned by scalar keys or application-time would just have a separate system-time history table for each partition, and that would Just Work. And if we *do* want to partition by system time too, then it would be transparent to the application-time logic.
- Since system time doesn't add anything to your PK (or at least it shouldn't), there is no extra complexity around dealing with exclusion constraints. We should just guarantee that all *current* rows land in the same partition, because for a bitemporal table that's the only one that needs a temporal PK. I guess that means you could partition by end system-time but not start system-time. This would be an exception to the rule that a PK must include the partition keys. Instead we'd say that all current (i.e. non-historical) records stay together (at the system-time level of partitioning).
- I don't think system-time partitioning needs to be in v15. It seems more complicated than ordinary partitioning.
Yours,
Here are updated patches. They are rebased and clean up some of my TODOs. Here is what remains:
- Various TODOs asking for advice about concurrency things: where to lock, when to copy structs, etc. I'd appreciate some review on these from someone more experienced than me.
- Supporting FOR PORTION OF against updateable views. I'll keep working on this, but I thought there was enough progress to pass along new patches in the meantime.
- Support partitioned tables. I think this is a medium-size effort, and I'm not sure whether it's really needed for pg 15 or something we can add later. I'm going to do my best to get it done though. (I should have more time for this project now: having a sixth baby recently made side projects challenging for a while, but lately things have been getting easier.) Partitioning could use some design discussion though, both for application time alone and for bitemporal tables (so overlapping with the system time work). Here are some thoughts so far:
- Creating a PERIOD on a partitioned table should automatically create the PERIOD (and associated constraints) on the child tables. This one seems easy and I'll try to get it done soon.
- Sort of related, but not strictly partitioning: CREATE TABLE LIKE should have a new INCLUDING PERIODS option. (I'm tempted to include this under INCLUDING CONSTRAINTS, but I think a separate option is nicer since it gives more control.)
- If you partition by something in the scalar part of the temporal PK, that's easy. I don't think we have to do anything special there. I'd like to add some tests about it though.
- We should allow temporal primary keys on the top-level partitioned table, even though they are essentially exclusion constraints. Whereas in the general case an exclusion constraint cannot prove its validity across all the tables, a temporal PK *can* prove its validity so long the partition key includes at least one scalar part of the temporal PK (so that all records for one "entity" get routed to the same table).
- If you partition by the temporal part of the temporal PK, things are harder. I'm inclined to forbid this, at least for v15. Suppose you partition by the start time. Then you wind up with the same entity spread across several tables, so you can't validate the overall exclusion constraint anymore.
- OTOH you *could* partition by application-time itself (not start time alone nor end time alone) where each partition has application-time ranges/periods that are trimmed to fit within that partition's limits. Then since each partition is responsible for a non-overlapping time period, you could validate the overall exclusion constraint. You'd just have to add some logic to tuple re-routing that could transform single records into multiple records. For example if each partition holds a different year and you INSERT a record that is valid for a decade, you'd have to insert one row into ten partitions, and change the application-time range/period of each row appropriately. This is a special kind of range partitioning. I don't have any ideas how to make hash or list partitioning work on the temporal part of the PK. I don't think we should allow it.
- Partitioning by application time requires no special syntax. Partitioning by system time (if that's desired) would probably require extra (non-standard) syntax. Mariadb has this: https://mariadb.com/kb/en/system-versioned-tables/#storing-the-history-separately Perhaps that is orthogonal to application-time partitioning though. It sounds like people think we should store non-current system time in a separate table (I agree), and in that case I think a bitemporal table that is partitioned by scalar keys or application-time would just have a separate system-time history table for each partition, and that would Just Work. And if we *do* want to partition by system time too, then it would be transparent to the application-time logic.
- Since system time doesn't add anything to your PK (or at least it shouldn't), there is no extra complexity around dealing with exclusion constraints. We should just guarantee that all *current* rows land in the same partition, because for a bitemporal table that's the only one that needs a temporal PK. I guess that means you could partition by end system-time but not start system-time. This would be an exception to the rule that a PK must include the partition keys. Instead we'd say that all current (i.e. non-historical) records stay together (at the system-time level of partitioning).
- I don't think system-time partitioning needs to be in v15. It seems more complicated than ordinary partitioning.
Yours,
Paul
Attachment
On 21.11.21 02:51, Paul A Jungwirth wrote: > Here are updated patches. They are rebased and clean up some of my > TODOs. This patch set looks very interesting. It's also very big, so it's difficult to see how to get a handle on it. I did a pass through it to see if there were any obvious architectural or coding style problems. I also looked at some of your TODO comments to see if I had something to contribute there. I'm confused about how to query tables based on application time periods. Online, I see examples using AS OF, but in the SQL standard I only see this used for system time, which we are not doing here. What is your understanding of that? v10-0001-Add-PERIODs.patch src/backend/commands/tablecmds.c Might be worth explaining somewhere why AT_PASS_ADD_PERIOD needs to be its own pass. -- Ah, this is explained in ATPrepCmd(). Maybe that is okay, but I would tend to prefer a comprehensive explanation here rather than sprinkled around. make_period_not_backward(): Hardcoding the name of the operator as "<" is not good. You should perhaps lookup the less-than operator in the type cache. Look around for TYPECACHE_LT_OPR for how this is usually done. validate_period(): Could use an explanatory comment. There are a bunch of output arguments, and it's not clear what all of this is supposed to do, and what "validating" is in this context. MergeAttributes(): I would perhaps initially just prohibit inheritance situations that involve periods on either side. (It should work for partitioning, IMO, but that should be easier to arrange.) AlterTableGetLockLevel(): The choice of AccessExclusiveLock looks correct. I think the whole thing can also be grouped with some of the other "affects concurrent SELECTs" cases? Maybe the node type Period could have a slightly more specific name, perhaps PeriodDef, analogous to ColumnDef? I didn't follow why indexes would have periods, for example, the new period field in IndexStmt. Is that explained anywhere? While reading this patch I kept wondering whether it would be possible to fold periods into pg_attribute, perhaps with negative attribute numbers. Have you looked into something like that? No doubt it's also complicated, but it might simplify some things, like the name conflict checking. v10-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch src/backend/catalog/Catalog.pm: I see you use this change in the subsequent patches, but I would recommend skipping all this. The comments added are kind of redundant with the descr fields anyway. transformIndexConstraint(): As above, we can't look up the && operator by name. In this case, I suppose we should look it up through the index AM support operators. Further, the additions to this function are very complicated and not fully explained. I'm suspicious about things like findNewOrOldColumn() -- generally we should look up columns by number not name. Perhaps you can add a header comment or split out the code further into smaller functions. pg_dump.c getIndexes() has been refactored since to make version-specific additions easier. But your patch is now failing to apply because of this. Of course, the main problem in this patch is that for most uses it requires btree_gist. I think we should consider moving that into core, or at least the support for types that are most relevant to this functionality, specifically the date/time types. Aside from user convenience, this would also allow writing more realistic test cases. v10-0003-Add-UPDATE-DELETE-FOR-PORTION-OF.patch Use of MINVALUE and MAXVALUE for unbounded seems problematic to me. (If it is some value, it is not really larger than any value.) We have the keyword UNBOUNDED, which seems better suited. src/backend/access/brin/brin_minmax_multi.c These renaming changes seem unrelated (but still seem like a good idea). Should they be progressed separately? Again, some hardcoded operator name lookup in this patch. I don't understand why a temporal primary key is required for doing UPDATE FOR PORTION OF. I don't see this in the standard. v10-0004-Add-temporal-FOREIGN-KEYs.patch Do we really need different trigger names depending on whether the foreign key is temporal? range_as_string() doesn't appear to be used anywhere. I ran out of steam on this patch, it's very big. But it seems sound in general. How to proceed. I suppose we could focus on committing 0001 and 0002 first. That would be a sensible feature set even if the remaining patches did not make a release. I do feel we need to get btree_gist into core. That might be a big job by itself. I'm also bemused why btree_gist is so bloated compared to btree_gin. btree_gin uses macros to eliminate duplicate code where btree_gist is full of copy-and-paste. So there are some opportunities there to make things more compact. Is there anything else you think we can do as preparatory work to make the main patches more manageable?
On Wed, Jan 5, 2022 at 11:07 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
On 21.11.21 02:51, Paul A Jungwirth wrote:
> Here are updated patches. They are rebased and clean up some of my
> TODOs.
This patch set looks very interesting. It's also very big, so it's
difficult to see how to get a handle on it. I did a pass through it
to see if there were any obvious architectural or coding style
problems. I also looked at some of your TODO comments to see if I had
something to contribute there.
I'm confused about how to query tables based on application time
periods. Online, I see examples using AS OF, but in the SQL standard
I only see this used for system time, which we are not doing here.
What is your understanding of that?
Paul has previously supplied me with this document https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf and that formed the basis of a lot of my questions a few months earlier.
There was similar work being done for system periods, which are a bit simpler but require a side (history) table to be created. I was picking people's brains about some aspects of system versioning to see if I could help bringing that into this already very large patchset, but haven't yet felt like I had done enough research to post it.
It is my hope that we can at least get the syntax for both application and system versioning committed, even if it's just stubbed in with not-yet-supported errors.
On Wed, Jan 5, 2022 at 8:07 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > This patch set looks very interesting. Thank you for the review! I'll work on your feedback but in the meantime here are replies to your questions: > I'm confused about how to query tables based on application time > periods. Online, I see examples using AS OF, but in the SQL standard > I only see this used for system time, which we are not doing here. Correct, the standard only gives it for system time. I think application time is intended to be more "in user space" so it's fine to use regular operators in your WHERE condition against the time columns, whereas system time is more of a managed thing---automatic, read-only, possibly stored in a separate table. Having a special syntax cue lets the RDBMS know it needs to involve the historical records. > validate_period(): Could use an explanatory comment. There are a > bunch of output arguments, and it's not clear what all of this is > supposed to do, and what "validating" is in this context. I'm not too happy with that function, but a previous reviewer asked me to factor out what was shared between the CREATE TABLE and ALTER TABLE cases. It does some sanity checks on the columns you've chosen, and along the way it collects info about those columns that we'll need later. But yeah all those out parameters are pretty ugly. I'll see if I can come up with a stronger abstraction for it, and at the very least I'll add some comments. > MergeAttributes(): I would perhaps initially just prohibit inheritance > situations that involve periods on either side. (It should work for > partitioning, IMO, but that should be easier to arrange.) Okay. I'm glad to hear you think partitioning won't be too hard. It is one of the last things, but to me it's a bit intimidating. > I didn't follow why indexes would have periods, for example, the new > period field in IndexStmt. Is that explained anywhere? When you create a primary key or a unique constraint (which are backed by a unique index), you can give a period name to make it a temporal constraint. We create the index first and then create the constraint as a side-effect of that (e.g. index_create calls index_constraint_create). The analysis phase generates an IndexStmt. So I think this was mostly a way to pass the period info down to the constraint. It probably doesn't actually need to be stored on pg_index though. Maybe it does for index_concurrently_create_copy. I'll add some comments, but if you think it's the wrong approach let me know. > While reading this patch I kept wondering whether it would be possible > to fold periods into pg_attribute, perhaps with negative attribute > numbers. Have you looked into something like that? No doubt it's > also complicated, but it might simplify some things, like the name > conflict checking. Hmm, I thought that sort of thing would be frowned upon. :-) But also it seems like periods really do have a bunch of details they need beyond what other attributes have (e.g. the two source attributes, the matching range type, the period type (application-vs-system), maybe some extra things for table inheritance. Also are you sure we aren't already using negative attnums somewhere already? I thought I saw something like that. > Of course, the main problem in this patch is that for most uses it > requires btree_gist. I think we should consider moving that into > core, or at least the support for types that are most relevant to this > functionality, specifically the date/time types. Aside from user > convenience, this would also allow writing more realistic test cases. I think this would be great too. How realistic do you think it is? I figured since exclusion constraints are also pretty useless without btree_gist, it wasn't asking too much to have people install the extension, but still it'd be better if it were all built in. > src/backend/access/brin/brin_minmax_multi.c > > These renaming changes seem unrelated (but still seem like a good > idea). Should they be progressed separately? I can pull this out into a separate patch. I needed to do it because when I added an `#include <rangetypes.h>` somewhere, these conflicted with the range_{de,}serialize functions declared there. > I don't understand why a temporal primary key is required for doing > UPDATE FOR PORTION OF. I don't see this in the standard. You're right, it's not in the standard. I'm doing that because creating the PK is when we add the triggers to implement UPDATE FOR PORTION OF. I thought it was acceptable since we also require a PK/unique constraint as the referent of a foreign key. But we could avoid it if I went back to the executor-based FOR PORTION OF implementation, since that doesn't depend on triggers. What do you think? Also: I noticed recently that you can't use FOR PORTION OF against an updatable view. I'm working on a new patch set to fix that. But the main reason is this PK check. So that's maybe another reason to go back to the executor implementation. > How to proceed. I suppose we could focus on committing 0001 and 0002 > first. That would be great! I don't think either is likely to conflict with future system-time work. > Is there anything else you think we can do as > preparatory work to make the main patches more manageable? I think it would be smart to have a rough plan for how this work will be compatible with system-time support. Corey & I have talked about that a lot, and In general they are orthogonal, but it would be nice to have details written down somewhere. Yours, Paul
On 1/5/22 11:03 PM, Corey Huinker wrote: > > There was similar work being done for system periods, which are a bit > simpler but require a side (history) table to be created. This is false. SYSTEM_TIME periods do not need any kind of history. This was one of the problems I had with Surafel's attempt because it was confusing the period with SYSTEM VERSIONING. Versioning needs the period but the inverse is not true. -- Vik Fearing
On Thu, Jan 6, 2022 at 6:45 AM Vik Fearing <vik@postgresfriends.org> wrote: > > On 1/5/22 11:03 PM, Corey Huinker wrote: > > > > There was similar work being done for system periods, which are a bit > > simpler but require a side (history) table to be created. > > This is false. SYSTEM_TIME periods do not need any kind of history. > This was one of the problems I had with Surafel's attempt because it was > confusing the period with SYSTEM VERSIONING. Versioning needs the > period but the inverse is not true. This is an interesting point. Syntactically, there are three different things: the generated started/end columns, the period declaration, and the WITH SYSTEM VERSIONING modifier to the table. You could declare a system period without making the table versioned. Practically speaking I don't know why you'd ever create a system period without a versioned table (do you know of any uses Vik?), but perhaps we can exploit the separation to add system periods in the same patch that adds application periods. The first two bits of syntax *are* tied together: you need columns with GENERATED ALWAYS AS ROW START/END to declare the system period, and less intuitively the standard says you can't use AS ROW START/END unless those columns appear in a system period (2.e.v.2 under Part 2: Foundation, 11.3 <table definition>). Personally I'd be willing to ignore that latter requirement. For one thing, what does Postgres do with the columns if you drop the period? Dropping the columns altogether seems very harsh, so I guess you'd just remove the GENERATED clause. Another weird thing is that you don't (can't) say STORED for those columns. But they are certainly stored somewhere. I would store the values just like any other column (even if non-current rows get moved to a separate table). Also then you don't have to do anything extra when the GENERATED clause is dropped. If we wanted to support system-time periods without building all of system versioning, what would that look like? At first I thought it would be a trivial addition to part-1 of the patch here, but the more I think about it the more it seems to deserve its own patch. One rule I think we should follow is that using a non-system-versioned table (with a system period) should get you to the same place as using a system-versioned table and then removing the system versioning. But the standard says that dropping system versioning should automatically drop all historical records (2 under Part 2: Foundation, 11.30 <drop system versioning clause>). That actually makes sense though: when you do DML we automatically update the start/end columns, but we don't save copies of the previous data (and incidentally the end column will always be the max value.) So there is a use case, albeit a thin one: you get a Rails-like updated_at column that is maintained automatically by your RDBMS. That is pretty easy, but I think I'd still break it out into a separate patch. I'm happy to work on that as something that builds on top of my part-1 patch here. Yours, Paul
But
the standard says that dropping system versioning should automatically
drop all historical records (2 under Part 2: Foundation, 11.30 <drop
system versioning clause>). That actually makes sense though: when you
do DML we automatically update the start/end columns, but we don't
save copies of the previous data (and incidentally the end column will
always be the max value.)
This is what I was referring to when I mentioned a side-table. deleting history would be an O(1) operation. Any other misunderstandings are all mine.
On 06.01.22 06:44, Paul A Jungwirth wrote: >> I didn't follow why indexes would have periods, for example, the new >> period field in IndexStmt. Is that explained anywhere? > > When you create a primary key or a unique constraint (which are backed > by a unique index), you can give a period name to make it a temporal > constraint. We create the index first and then create the constraint > as a side-effect of that (e.g. index_create calls > index_constraint_create). The analysis phase generates an IndexStmt. > So I think this was mostly a way to pass the period info down to the > constraint. It probably doesn't actually need to be stored on pg_index > though. Maybe it does for index_concurrently_create_copy. I'll add > some comments, but if you think it's the wrong approach let me know. This seems backwards. Currently, when you create a constraint, the index is created as a side effect and is owned, so to speak, by the constraint. What you are describing here sounds like the index owns the constraint. This needs to be reconsidered, I think. >> Of course, the main problem in this patch is that for most uses it >> requires btree_gist. I think we should consider moving that into >> core, or at least the support for types that are most relevant to this >> functionality, specifically the date/time types. Aside from user >> convenience, this would also allow writing more realistic test cases. > > I think this would be great too. How realistic do you think it is? I > figured since exclusion constraints are also pretty useless without > btree_gist, it wasn't asking too much to have people install the > extension, but still it'd be better if it were all built in. IMO, if this temporal feature is to happen, btree_gist needs to be moved into core first. Having to install an extension in order to use an in-core feature like this isn't going to be an acceptable experience. >> src/backend/access/brin/brin_minmax_multi.c >> >> These renaming changes seem unrelated (but still seem like a good >> idea). Should they be progressed separately? > > I can pull this out into a separate patch. I needed to do it because > when I added an `#include <rangetypes.h>` somewhere, these conflicted > with the range_{de,}serialize functions declared there. OK, I have committed this separately. >> I don't understand why a temporal primary key is required for doing >> UPDATE FOR PORTION OF. I don't see this in the standard. > > You're right, it's not in the standard. I'm doing that because > creating the PK is when we add the triggers to implement UPDATE FOR > PORTION OF. I thought it was acceptable since we also require a > PK/unique constraint as the referent of a foreign key. That part *is* in the standard. > But we could > avoid it if I went back to the executor-based FOR PORTION OF > implementation, since that doesn't depend on triggers. What do you > think? I think it's worth trying to do this without triggers. But if you are just looking for a way to create the triggers, why are they not just created when the table is created? > I think it would be smart to have a rough plan for how this work will > be compatible with system-time support. Corey & I have talked about > that a lot, and In general they are orthogonal, but it would be nice > to have details written down somewhere. I personally don't see why we need to worry about system time now. System time seems quite a complicated feature, since you have to figure out a system to store and clean the old data, whereas this application time feature is ultimately mostly syntax sugar around ranges and exclusion constraints. As long as we keep the standard syntax for system time available for future use (which is what your patch does), I don't see a need to go deeper right now.
Hi, On Sat, Nov 20, 2021 at 05:51:16PM -0800, Paul A Jungwirth wrote: > > Here are updated patches. They are rebased and clean up some of my TODOs. The cfbot reports that the patchset doesn't apply anymore: http://cfbot.cputube.org/patch_36_2048.log === Applying patches on top of PostgreSQL commit ID 5513dc6a304d8bda114004a3b906cc6fde5d6274 === === applying patch ./v10-0001-Add-PERIODs.patch patching file src/backend/commands/tablecmds.c Hunk #1 FAILED at 40. [...] 1 out of 21 hunks FAILED -- saving rejects to file src/backend/commands/tablecmds.c.rej patching file src/bin/pg_dump/pg_dump.c Hunk #1 succeeded at 5906 with fuzz 2 (offset -454 lines). Hunk #2 FAILED at 6425. Hunk #3 succeeded at 6121 with fuzz 2 (offset -566 lines). Hunk #4 succeeded at 6203 (offset -561 lines). Hunk #5 succeeded at 8015 with fuzz 2 (offset -539 lines). Hunk #6 FAILED at 8862. Hunk #7 FAILED at 8875. Hunk #8 FAILED at 8917. [...] 4 out of 15 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej patching file src/bin/pg_dump/pg_dump.h Hunk #2 FAILED at 284. Hunk #3 FAILED at 329. Hunk #4 succeeded at 484 (offset 15 lines). 2 out of 4 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.h.rej I also see that there were multiple reviews with unanswered comments, so I will switch the cf entry to Waiting on Author.
On 10.01.22 09:53, Peter Eisentraut wrote: >>> Of course, the main problem in this patch is that for most uses it >>> requires btree_gist. I think we should consider moving that into >>> core, or at least the support for types that are most relevant to this >>> functionality, specifically the date/time types. Aside from user >>> convenience, this would also allow writing more realistic test cases. >> >> I think this would be great too. How realistic do you think it is? I >> figured since exclusion constraints are also pretty useless without >> btree_gist, it wasn't asking too much to have people install the >> extension, but still it'd be better if it were all built in. > > IMO, if this temporal feature is to happen, btree_gist needs to be moved > into core first. Having to install an extension in order to use an > in-core feature like this isn't going to be an acceptable experience. I have started a separate thread about this question.
Hello, Thank you again for the review. Here is a patch with most of your feedback addressed. Sorry it has taken so long! These patches are rebased up to 1ab763fc22adc88e5d779817e7b42b25a9dd7c9e (May 3). The big change is switching from implementing FOR PORTION OF with triggers back to an executor node implementation. I think this is a lot simpler and means we don't have to be so "premeditated" (for example you just need a PERIOD/range, not a temporal PK). I've also made some progress on partitioning temporal tables. It still needs some work though, and also it depends on my separate commitfest entry (https://commitfest.postgresql.org/43/4065/). So I've left it out of the patches attached here. A few more details below: Back in January 2022, Peter Eisentraut wrote: > make_period_not_backward(): Hardcoding the name of the operator as "<" > is not good. You should perhaps lookup the less-than operator in the > type cache. Look around for TYPECACHE_LT_OPR for how this is usually done. > ... > transformIndexConstraint(): As above, we can't look up the && operator > by name. In this case, I suppose we should look it up through the > index AM support operators. I've changed most locations to look up the operators we need using strategy number. But in some places I need the range intersects operator (`*`) and we don't have a strategy number for that. I don't really understand the purpose of not hardcoding operator names here. Can you give me the reasons for that? Do you have any suggestions what I can do to use `*`? Also, when I'm doing these operator lookups, do I need permission checks similar to what I see in ComputeIndexAttrs? > Further, the additions to this function are very complicated and not > fully explained. I'm suspicious about things like > findNewOrOldColumn() -- generally we should look up columns by number > not name. Perhaps you can add a header comment or split out the code > further into smaller functions. I still have some work to do on this. I agree it's very complicated, so I'm going to see what kind of refactoring I can do. >>> I didn't follow why indexes would have periods, for example, the new >>> period field in IndexStmt. Is that explained anywhere? >> >> When you create a primary key or a unique constraint (which are backed >> by a unique index), you can give a period name to make it a temporal >> constraint. We create the index first and then create the constraint >> as a side-effect of that (e.g. index_create calls >> index_constraint_create). The analysis phase generates an IndexStmt. >> So I think this was mostly a way to pass the period info down to the >> constraint. It probably doesn't actually need to be stored on pg_index >> though. Maybe it does for index_concurrently_create_copy. I'll add >> some comments, but if you think it's the wrong approach let me know. > > This seems backwards. Currently, when you create a constraint, the index is created as a side effect and is owned, soto speak, by the constraint. What you are describing here sounds like the index owns the constraint. This needs to bereconsidered, I think. After looking at this again I do think to reference the period from the index, not vice versa. The period is basically one of the index elements (e.g. `PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)`). You can define a `PERIOD` without an index, but you can't define a WITHOUT OVERLAPS index without a period. In addition you could have multiple indexes using the same period (though this is probably unusual and technically disallowed by the standard, although in principal you could do it), but not multiple periods within the same index. I understand what you're saying about how constraints cause indexes as a by-product, but here the constraint isn't the PERIOD; it's the PRIMARY KEY or UNIQUE constraint. The PERIOD is just something the constraint & index refer to (like an expression indexElem). The dependency direction also suggests the period should be referenced by the index: you can drop the index without dropping the period, but dropping the period would cascade to dropping the index (or fail). I hope that makes sense. But let me know if you still disagree. > Do we really need different trigger names depending on whether the > foreign key is temporal? They don't have to be different. I used separate C functions because I didn't want standard FKs to be slowed/complicated by the temporal ones, and also I wanted to avoid merge conflicts with the work on avoiding SPI in RI checks. But you're just asking about the trigger names, right? I haven't changed those yet but it shouldn't take long. > IMO, if this temporal feature is to happen, btree_gist needs to be moved > into core first. Having to install an extension in order to use an > in-core feature like this isn't going to be an acceptable experience. As far as I can tell the conversation about moving this into core hasn't gone anywhere. Do you still think this is a prerequisite to this patch? Is there anything I can do to help move `btree_gist` forward? It seems like a large backwards compatibility challenge. I imagine that getting agreement on how to approach it is actually more work than doing the development. I'd be very happy for any suggestions here! Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On 03.05.23 23:02, Paul Jungwirth wrote: > Thank you again for the review. Here is a patch with most of your > feedback addressed. Sorry it has taken so long! These patches are > rebased up to 1ab763fc22adc88e5d779817e7b42b25a9dd7c9e > (May 3). Here are a few small fixup patches to get your patch set compiling cleanly. Also, it looks like the patches 0002, 0003, and 0004 are not split up correctly. 0002 contains tests using the FOR PORTION OF syntax introduced in 0003, and 0003 uses the function build_period_range() from 0004.
Attachment
> On 8 May 2023, at 09:10, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 03.05.23 23:02, Paul Jungwirth wrote: >> Thank you again for the review. Here is a patch with most of your feedback addressed. Sorry it has taken so long! Thesepatches are rebased up to 1ab763fc22adc88e5d779817e7b42b25a9dd7c9e >> (May 3). > > Here are a few small fixup patches to get your patch set compiling cleanly. > > Also, it looks like the patches 0002, 0003, and 0004 are not split up correctly. 0002 contains tests using the FOR PORTIONOF syntax introduced in 0003, and 0003 uses the function build_period_range() from 0004. These patches no longer apply without a new rebase. Should this patch be closed in while waiting for the prequisite of adding btree_gist to core mentioned upthread? I see no patch registered in the CF for this unless I'm missing sometihng. -- Daniel Gustafsson
On 04.07.23 14:48, Daniel Gustafsson wrote: >> On 8 May 2023, at 09:10, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: >> >> On 03.05.23 23:02, Paul Jungwirth wrote: >>> Thank you again for the review. Here is a patch with most of your feedback addressed. Sorry it has taken so long! Thesepatches are rebased up to 1ab763fc22adc88e5d779817e7b42b25a9dd7c9e >>> (May 3). >> >> Here are a few small fixup patches to get your patch set compiling cleanly. >> >> Also, it looks like the patches 0002, 0003, and 0004 are not split up correctly. 0002 contains tests using the FOR PORTIONOF syntax introduced in 0003, and 0003 uses the function build_period_range() from 0004. > > These patches no longer apply without a new rebase. Should this patch be > closed in while waiting for the prequisite of adding btree_gist to core > mentioned upthread? I see no patch registered in the CF for this unless I'm > missing sometihng. I had talked to Paul about this offline a while ago. btree_gist to core is no longer considered a prerequisite. But Paul was planning to produce a new patch set that is arranged and sequenced a bit differently. Apparently, that new version is not done yet, so it would make sense to either close this entry as returned with feedback, or move it to the next commit fest as waiting on author.
> On 6 Jul 2023, at 10:12, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > it would make sense to either close this entry as returned with feedback, or move it to the next commit fest as waitingon author. Fair enough, done. -- Daniel Gustafsson
On Thu, Jul 6, 2023 at 1:13 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > I had talked to Paul about this offline a while ago. btree_gist to core > is no longer considered a prerequisite. But Paul was planning to > produce a new patch set that is arranged and sequenced a bit > differently. Apparently, that new version is not done yet, so it would > make sense to either close this entry as returned with feedback, or move > it to the next commit fest as waiting on author. Here are some new patch files based on discussions from PGCon. The patches are reorganized a bit to hopefully make them easier to review: Initially I implement all functionality on just range columns, without supporting PERIODs yet. There are patches for temporal PRIMARY KEY/UNIQUE constraints, for simple foreign keys (without CASCADE/SET NULL/SET DEFAULT), for UPDATE/DELETE FOR PORTION OF, and then for the rest of the FK support (which depends on FOR PORTION OF). If you compare these patches to the v11 ones, you'll see that a ton of clutter disappears by not supporting PERIODs as a separate "thing". Finally there is a patch adding PERIOD syntax, but with a new implementation where a PERIOD causes us to just define a GENERATED range column. That means we can support all the same things as before but without adding the clutter. This patch isn't quite working yet (especially ALTER TABLE), but I thought I'd send where I'm at so far, since it sounds like folks are interested in doing a review. Also it was a little tricky dealing with the dependency between the PERIOD and the GENERATED column. (See the comments in the patch.) If anyone has a suggestion there I'd be happy to hear it. My goal is to include another patch soon to support hidden columns, so that the period's GENERATED column can be hidden. I read the conversation about a recent patch attempt for something similar, and I think I can use most of that (but cut some of the things the community was worried about). All these patches need some polishing, but I think there is enough new here for them to be worth reading for anyone interested in temporal progress. I'll set this commitfest entry back to Needs Review. Thanks for taking a look! Paul
Attachment
On 07.07.23 03:03, Paul A Jungwirth wrote: > Here are some new patch files based on discussions from PGCon. Here are a few fixup patches to get things building without warnings and errors. The last patch (your 0005) fails the regression test for me and it didn't appear to be a trivial problem, so please take another look at that sometime. (Since it's the last patch, it's obviously lower priority.)
Attachment
On Fri, Jul 7, 2023 at 9:04 AM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote: > > On Thu, Jul 6, 2023 at 1:13 AM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: > > > > I had talked to Paul about this offline a while ago. btree_gist to core > > is no longer considered a prerequisite. But Paul was planning to > > produce a new patch set that is arranged and sequenced a bit > > differently. Apparently, that new version is not done yet, so it would > > make sense to either close this entry as returned with feedback, or move > > it to the next commit fest as waiting on author. > > Here are some new patch files based on discussions from PGCon. The > patches are reorganized a bit to hopefully make them easier to review: > > Initially I implement all functionality on just range columns, without > supporting PERIODs yet. There are patches for temporal PRIMARY > KEY/UNIQUE constraints, for simple foreign keys (without CASCADE/SET > NULL/SET DEFAULT), for UPDATE/DELETE FOR PORTION OF, and then for the > rest of the FK support (which depends on FOR PORTION OF). If you > compare these patches to the v11 ones, you'll see that a ton of > clutter disappears by not supporting PERIODs as a separate "thing". > > Finally there is a patch adding PERIOD syntax, but with a new > implementation where a PERIOD causes us to just define a GENERATED > range column. That means we can support all the same things as before > but without adding the clutter. This patch isn't quite working yet > (especially ALTER TABLE), but I thought I'd send where I'm at so far, > since it sounds like folks are interested in doing a review. Also it > was a little tricky dealing with the dependency between the PERIOD and > the GENERATED column. (See the comments in the patch.) If anyone has a > suggestion there I'd be happy to hear it. > > My goal is to include another patch soon to support hidden columns, so > that the period's GENERATED column can be hidden. I read the > conversation about a recent patch attempt for something similar, and I > think I can use most of that (but cut some of the things the community > was worried about). > > All these patches need some polishing, but I think there is enough new > here for them to be worth reading for anyone interested in temporal > progress. > > I'll set this commitfest entry back to Needs Review. Thanks for taking a look! > > Paul due to change in: https://www.postgresql.org/message-id/flat/ec8b1d9b-502e-d1f8-e909-1bf9dffe6fa5@illuminatedcomputing.com git apply $DOWNLOADS/patches/v12-0001-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch error: patch failed: src/backend/commands/indexcmds.c:940 error: src/backend/commands/indexcmds.c: patch does not apply probably need some adjustment.
On 7/12/23 01:24, Peter Eisentraut wrote: > On 07.07.23 03:03, Paul A Jungwirth wrote: >> Here are some new patch files based on discussions from PGCon. > > Here are a few fixup patches to get things building without warnings and > errors. > > The last patch (your 0005) fails the regression test for me and it > didn't appear to be a trivial problem, so please take another look at > that sometime. (Since it's the last patch, it's obviously lower priority.) Hello, Here are the latest patches for my temporal tables work. They are rebased on e8d74ad6 from Aug 31. I incorporated Peter's edits mentioned above and have made various other improvements. The most significant change is support for partitioned tables. Those now work with temporal PRIMARY KEY and UNIQUE constraints, FOR PORTION OF commands, and FOREIGN KEYs. I've tried to clean up the first four patches to get them ready for committing, since they could get committed before the PERIOD patch. I think there is a little more cleanup needed but they should be ready for a review. The PERIOD patch is not finished and includes some deliberately-failing tests. I did make some progress here finishing ALTER TABLE ADD PERIOD. I could use help handling the INTERNAL depenency from the PERIOD to its (hidden) GENERATED column. The problem is in findDependentObjects: if you drop the PERIOD, then Postgres automatically tries to drop the column (correctly), but then it seems to think it needs to drop the whole table. I think this is because a column's object address is the table's object address plus a subaddress equaling the attno, and findDependentObjects thinks it should drop the whole thing. I'm sure I can sort this out, but if anyone has a suggestion it might save me some time. Thanks for taking a look! Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
The PERIOD patch is not finished and includes some deliberately-failing
tests. I did make some progress here finishing ALTER TABLE ADD PERIOD.
If it's ok with you, I need PERIODs for System Versioning, and planned on developing a highly similar version, albeit closer to the standard. It shouldn't interfere with your work as you're heavily leveraging range types.
On 31.08.23 23:26, Paul Jungwirth wrote: > I've tried to clean up the first four patches to get them ready for > committing, since they could get committed before the PERIOD patch. I > think there is a little more cleanup needed but they should be ready for > a review. Looking at the patch 0001 "Add temporal PRIMARY KEY and UNIQUE constraints": Generally, this looks like a good direction. The patch looks comprehensive, with documentation and tests, and appears to cover all the required pieces (client programs, ruleutils, etc.). I have two conceptual questions that should be clarified before we go much further: 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT OVERLAPS clause attach to the last column, or to the whole column list? In the SQL standard, you can only have one period and it has to be listed last, so this question does not arise. But here we are building a more general facility to then build the SQL facility on top of. So I think it doesn't make sense that the range column must be last or that there can only be one. Also, your implementation requires at least one non-overlaps column, which also seems like a confusing restriction. I think the WITHOUT OVERLAPS clause should be per-column, so that something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) would be possible. Then the WITHOUT OVERLAPS clause would directly correspond to the choice between equality or overlaps operator per column. An alternative interpretation would be that WITHOUT OVERLAPS applies to the whole column list, and we would take it to mean, for any range column, use the overlaps operator, for any non-range column, use the equals operator. But I think this would be confusing and would prevent the case of using the equality operator for some ranges and the overlaps operator for some other ranges in the same key. 2) The logic hinges on get_index_attr_temporal_operator(), to pick the equality and overlaps operator for each column. For btree indexes, the strategy numbers are fixed, so this is straightforward. But for gist indexes, the strategy numbers are more like recommendations. Are we comfortable with how this works? I mean, we could say, if you want to be able to take advantage of the WITHOUT OVERLAPS syntax, you have to use these numbers, otherwise you're on your own. It looks like the gist strategy numbers are already hardcoded in a number of places, so maybe that's all okay, but I feel we should be more explicit about this somewhere, maybe in the documentation, or at least in code comments. Besides that, some stylistic comments: * There is a lot of talk about "temporal" in this patch, but this functionality is more general than temporal. I would prefer to change this to more neutral terms like "overlaps". * The field ii_Temporal in IndexInfo doesn't seem necessary and could be handled via local variables. See [0] for a similar discussion: [0]: https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org * In gram.y, change withoutOverlapsClause -> without_overlaps_clause for consistency with the surrounding code. * No-op assignments like n->without_overlaps = NULL; can be omitted. (Or you should put them everywhere. But only in some places seems inconsistent and confusing.)
On 9/1/23 11:30, Peter Eisentraut wrote: > 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT > OVERLAPS clause attach to the last column, or to the whole column list? > In the SQL standard, you can only have one period and it has to be > listed last, so this question does not arise. But here we are building > a more general facility to then build the SQL facility on top of. So I > think it doesn't make sense that the range column must be last or that > there can only be one. Also, your implementation requires at least one > non-overlaps column, which also seems like a confusing restriction. > > I think the WITHOUT OVERLAPS clause should be per-column, so that > something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) would > be possible. Then the WITHOUT OVERLAPS clause would directly correspond > to the choice between equality or overlaps operator per column. > > An alternative interpretation would be that WITHOUT OVERLAPS applies to > the whole column list, and we would take it to mean, for any range > column, use the overlaps operator, for any non-range column, use the > equals operator. But I think this would be confusing and would prevent > the case of using the equality operator for some ranges and the overlaps > operator for some other ranges in the same key. I prefer the first option. That is: WITHOUT OVERLAPS applies only to the column or expression it is attached to, and need not be last in line. -- Vik Fearing
On 9/1/23 03:50, Vik Fearing wrote: > On 9/1/23 11:30, Peter Eisentraut wrote: >> 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT >> OVERLAPS clause attach to the last column, or to the whole column >> list? In the SQL standard, you can only have one period and it has to >> be listed last, so this question does not arise. But here we are >> building a more general facility to then build the SQL facility on top >> of. So I think it doesn't make sense that the range column must be >> last or that there can only be one. Also, your implementation >> requires at least one non-overlaps column, which also seems like a >> confusing restriction. >> >> I think the WITHOUT OVERLAPS clause should be per-column, so that >> something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) >> would be possible. Then the WITHOUT OVERLAPS clause would directly >> correspond to the choice between equality or overlaps operator per >> column. >> >> An alternative interpretation would be that WITHOUT OVERLAPS applies >> to the whole column list, and we would take it to mean, for any range >> column, use the overlaps operator, for any non-range column, use the >> equals operator. But I think this would be confusing and would >> prevent the case of using the equality operator for some ranges and >> the overlaps operator for some other ranges in the same key. > > I prefer the first option. That is: WITHOUT OVERLAPS applies only to > the column or expression it is attached to, and need not be last in line. I agree. The second option seems confusing and is more restrictive. I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any position) is a great recommendation that enables a lot of new functionality. Several books[1,2] about temporal databases describe a multi-dimensional temporal space (even beyond application time vs. system time), and the standard is pretty disappointing here. It's not a weird idea. But I just want to be explicit that this isn't something the standard describes. (I think everyone in the conversation so far understands that.) So far I've tried to be pretty scrupulous about following SQL:2011, although personally I'd rather see Postgres support this functionality. And it's not like it goes *against* what the standard says. But if there are any objections, I'd love to hear them before putting in the work. :-) If we allow multiple+anywhere WITHOUT OVERLAPS in PRIMARY KEY & UNIQUE constraints, then surely we also allow multiple+anywhere PERIOD in FOREIGN KEY constraints too. (I guess the standard switched keywords because a FK is more like "MUST OVERLAPS". :-) Also if you have multiple application-time dimensions we probably need to allow multiple FOR PORTION OF clauses. I think the syntax would be: UPDATE t FOR PORTION OF valid_at FROM ... TO ... FOR PORTION OF asserted_at FROM ... TO ... [...] SET foo = bar Does that sound okay? I don't quite understand this part: >> Also, your implementation >> requires at least one non-overlaps column, which also seems like a >> confusing restriction. That's just a regular non-temporal constraint. Right? If I'm missing something let me know. [1] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL. 2nd edition, 2014. [2] Tom Johnston. Bitemporal Data: Theory and Practice. 2014. -- Paul ~{:-) pj@illuminatedcomputing.com
On 9/1/23 21:56, Paul Jungwirth wrote: > On 9/1/23 03:50, Vik Fearing wrote: >> On 9/1/23 11:30, Peter Eisentraut wrote: >>> 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT >>> OVERLAPS clause attach to the last column, or to the whole column >>> list? In the SQL standard, you can only have one period and it has to >>> be listed last, so this question does not arise. But here we are >>> building a more general facility to then build the SQL facility on >>> top of. So I think it doesn't make sense that the range column must >>> be last or that there can only be one. Also, your implementation >>> requires at least one non-overlaps column, which also seems like a >>> confusing restriction. >>> >>> I think the WITHOUT OVERLAPS clause should be per-column, so that >>> something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) >>> would be possible. Then the WITHOUT OVERLAPS clause would directly >>> correspond to the choice between equality or overlaps operator per >>> column. >>> >>> An alternative interpretation would be that WITHOUT OVERLAPS applies >>> to the whole column list, and we would take it to mean, for any range >>> column, use the overlaps operator, for any non-range column, use the >>> equals operator. But I think this would be confusing and would >>> prevent the case of using the equality operator for some ranges and >>> the overlaps operator for some other ranges in the same key. >> >> I prefer the first option. That is: WITHOUT OVERLAPS applies only to >> the column or expression it is attached to, and need not be last in line. > > I agree. The second option seems confusing and is more restrictive. > > I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any > position) is a great recommendation that enables a lot of new > functionality. Several books[1,2] about temporal databases describe a > multi-dimensional temporal space (even beyond application time vs. > system time), and the standard is pretty disappointing here. It's not a > weird idea. > > But I just want to be explicit that this isn't something the standard > describes. (I think everyone in the conversation so far understands > that.) So far I've tried to be pretty scrupulous about following > SQL:2011, although personally I'd rather see Postgres support this > functionality. And it's not like it goes *against* what the standard > says. But if there are any objections, I'd love to hear them before > putting in the work. :-) I have no problem with a first version doing exactly what the standard says and expanding it later. > If we allow multiple+anywhere WITHOUT OVERLAPS in PRIMARY KEY & UNIQUE > constraints, then surely we also allow multiple+anywhere PERIOD in > FOREIGN KEY constraints too. (I guess the standard switched keywords > because a FK is more like "MUST OVERLAPS". :-) Seems reasonable. > Also if you have multiple application-time dimensions we probably need > to allow multiple FOR PORTION OF clauses. I think the syntax would be: > > UPDATE t > FOR PORTION OF valid_at FROM ... TO ... > FOR PORTION OF asserted_at FROM ... TO ... > [...] > SET foo = bar > > Does that sound okay? That sounds really cool. > [1] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational > Theory, Second Edition: Temporal Databases in the Relational Model and > SQL. 2nd edition, 2014. > [2] Tom Johnston. Bitemporal Data: Theory and Practice. 2014. Thanks! I have ordered these books. -- Vik Fearing
On Sat, Sep 2, 2023 at 5:58 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > > I don't quite understand this part: > > >> Also, your implementation > >> requires at least one non-overlaps column, which also seems like a > >> confusing restriction. > > That's just a regular non-temporal constraint. Right? If I'm missing > something let me know. > for a range primary key, is it fine to expect it to be unique, not null and also not overlap? (i am not sure how hard to implement it). ----------------------------------------------------------------- quote from 7IWD2-02-Foundation-2011-12.pdf. 4.18.3.2 Unique constraints, page 97 of 1483. 4.18.3.2 Unique constraints In addition to the components of every table constraint descriptor, a unique constraint descriptor includes: — An indication of whether it was defined with PRIMARY KEY or UNIQUE. — The names and positions of the unique columns specified in the <unique column list> — If <without overlap specification> is specified, then the name of the period specified. If the table descriptor for base table T includes a unique constraint descriptor indicating that the unique constraint was defined with PRIMARY KEY, then the columns of that unique constraint constitute the primary key of T. A table that has a primary key cannot have a proper supertable. A unique constraint that does not include a <without overlap specification> on a table T is satisfied if and only if there do not exist two rows R1 and R2 of T such that R1 and R2 have the same non-null values in the unique columns. If a unique constraint UC on a table T includes a <without overlap specification> WOS, then let <application time period name> ATPN be the contained in WOS. UC is satisfied if and only if there do not exist two rows R1 and R2 of T such that R1 and R2 have the same non-null values in the unique columns and the ATPN period values of R1 and R2 overlap. In addition, if the unique constraint was defined with PRIMARY KEY, then it requires that none of the values in the specified column or columns be a null value. ----------------------------------------------------------------- based on the above, the unique constraint does not specify that the column list must be range type. UNIQUE (a, c WITHOUT OVERLAPS). Here column "a" can be a range type (that have overlap property) and can be not. In fact, many of your primary key, foreign key regess test using something like '[11,11]' (which make it more easy to understand), which in logic is a non-range usage. So UNIQUE (a, c WITHOUT OVERLAPS), column "a" be a non-range data type does make sense?
hi. the following script makes the server crash (Segmentation fault). create schema test; set search_path to test; DROP TABLE IF EXISTS temporal_rng; CREATE TABLE temporal_rng (id int4range, valid_at daterange); ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); INSERT INTO temporal_rng VALUES ('[11,11]', daterange('2018-01-01', '2020-01-01')), ('[11,11]', daterange('2020-01-01', '2021-01-01')), ('[20,20]', daterange('2018-01-01', '2020-01-01')), ('[20,20]', daterange('2020-01-01', '2021-01-01')); DROP TABLE IF EXISTS temporal_fk_rng2rng; CREATE TABLE temporal_fk_rng2rng ( id int4range, valid_at tsrange, parent_id int4range, CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) ); --------------------------------------------------------------- BEGIN; ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT IF EXISTS temporal_fk_rng2rng_fk; ALTER TABLE temporal_fk_rng2rng ALTER COLUMN parent_id SET DEFAULT '[-1,-1]', ALTER COLUMN valid_at SET DEFAULT tsrange('2018-01-01', '2019-11-11'); ALTER TABLE temporal_fk_rng2rng ADD CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng on update set DEFAULT on delete set DEFAULT; ---------------------------------------------------------gdb related info:--------------- (gdb) continue Continuing. Program received signal SIGSEGV, Segmentation fault. FindFKComparisonOperators (fkconstraint=0x556450100bd8, tab=0x55644ff8f570, i=1, fkattnum=0x7ffeb3286ba0, old_check_ok=0x7ffeb3286b11, old_pfeqop_item=0x7ffeb3286b28, pktype=3912, fktype=3908, opclass=10078, is_temporal=true, for_overlaps=true, pfeqopOut=0x7ffeb3286da4, ppeqopOut=0x7ffeb3286e24, ffeqopOut=0x7ffeb3286ea4) at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:11582 11582 pkattr_name = strVal(fkconstraint->pk_period); (gdb) where #0 FindFKComparisonOperators (fkconstraint=0x556450100bd8, tab=0x55644ff8f570, i=1, fkattnum=0x7ffeb3286ba0, old_check_ok=0x7ffeb3286b11, old_pfeqop_item=0x7ffeb3286b28, pktype=3912, fktype=3908, opclass=10078, is_temporal=true, for_overlaps=true, pfeqopOut=0x7ffeb3286da4, ppeqopOut=0x7ffeb3286e24, ffeqopOut=0x7ffeb3286ea4) at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:11582 #1 0x000055644e53875a in ATAddForeignKeyConstraint (wqueue=0x7ffeb3287118, tab=0x55644ff8f570, rel=0x7fb2dc124430, fkconstraint=0x556450100bd8, recurse=true, recursing=false, lockmode=6) at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:10395 #2 0x000055644e536cc2 in ATExecAddConstraint (wqueue=0x7ffeb3287118, tab=0x55644ff8f570, rel=0x7fb2dc124430, newConstraint=0x556450100bd8, recurse=true, is_readd=false, lockmode=6) at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:9948 #3 0x000055644e528eaa in ATExecCmd (wqueue=0x7ffeb3287118, tab=0x55644ff8f570, cmd=0x5564500fae48, lockmode=6, cur_pass=10, context=0x7ffeb3287310) at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:5711 #4 0x000055644e5283f6 in ATRewriteCatalogs (wqueue=0x7ffeb3287118, lockmode=6, context=0x7ffeb3287310) at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:5569 #5 0x000055644e527031 in ATController (parsetree=0x55645000e228, rel=0x7fb2dc124430, cmds=0x55645000e1d8, recurse=true, lockmode=6, context=0x7ffeb3287310) at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:5136 #6 0x000055644e526a9d in AlterTable (stmt=0x55645000e228, lockmode=6, context=0x7ffeb3287310) at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:4789 #7 0x000055644e92eb65 in ProcessUtilitySlow (pstate=0x55644ff8f460, pstmt=0x55645000e2d8, --Type <RET> for more, q to quit, c to continue without paging-- 55645000d330 "ALTER TABLE temporal_fk_rng2rng\n\tADD CONSTRAINT temporal_fk_rng2rng_fk\n\tFOREIGN KEY (parent_id, PERIOD valid_at)\n\tREFERENCES temporal_rng\n on update set DEFAULT \n on delete set DEFAULT;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55645000e698, qc=0x7ffeb3287970) at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/utility.c:1329 #8 0x000055644e92e24c in standard_ProcessUtility (pstmt=0x55645000e2d8, queryString=0x55645000d330 "ALTER TABLE temporal_fk_rng2rng\n\tADD CONSTRAINT temporal_fk_rng2rng_fk\n\tFOREIGN KEY (parent_id, PERIOD valid_at)\n\tREFERENCES temporal_rng\n on update set DEFAULT \n on delete set DEFAULT;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55645000e698, qc=0x7ffeb3287970) at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/utility.c:1078 #9 0x000055644e92c921 in ProcessUtility (pstmt=0x55645000e2d8, queryString=0x55645000d330 "ALTER TABLE temporal_fk_rng2rng\n\tADD CONSTRAINT temporal_fk_rng2rng_fk\n\tFOREIGN KEY (parent_id, PERIOD valid_at)\n\tREFERENCES temporal_rng\n on update set DEFAULT \n on delete set DEFAULT;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55645000e698, qc=0x7ffeb3287970) at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/utility.c:530 #10 0x000055644e92a83e in PortalRunUtility (portal=0x5564500a9840, pstmt=0x55645000e2d8, isTopLevel=true, setHoldSnapshot=false, dest=0x55645000e698, qc=0x7ffeb3287970) at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/pquery.c:1158 #11 0x000055644e92abdb in PortalRunMulti (portal=0x5564500a9840, isTopLevel=true, setHoldSnapshot=false, dest=0x55645000e698, altdest=0x55645000e698, qc=0x7ffeb3287970) at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/pquery.c:1315 #12 0x000055644e929b53 in PortalRun (portal=0x5564500a9840, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x55645000e698, altdest=0x55645000e698, qc=0x7ffeb3287970) at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/pquery.c:791 #13 0x000055644e91f206 in exec_simple_query ( query_string=0x55645000d330 "ALTER TABLE temporal_fk_rng2rng\n\tADD CONSTRAINT temporal_fk_rng2rng_fk\n\tFOREIGN KEY (parent_id, PERIOD valid_at)\n\tREFERENCES temporal_rng\n on update set DEFAULT \n on delete set DEFAULT;") at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/postgres.c:1274 --Type <RET> for more, q to quit, c to continue without paging-- #14 0x000055644e926c49 in PostgresMain (dbname=0x556450045610 "regression", username=0x5564500455f8 "jian") at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/postgres.c:4637 #15 0x000055644e7ff0e9 in BackendRun (port=0x5564500394f0) at ../../Desktop/pg_sources/main/postgres/src/backend/postmaster/postmaster.c:4438 #16 0x000055644e7fe6a1 in BackendStartup (port=0x5564500394f0) at ../../Desktop/pg_sources/main/postgres/src/backend/postmaster/postmaster.c:4166 #17 0x000055644e7f8aa0 in ServerLoop () at ../../Desktop/pg_sources/main/postgres/src/backend/postmaster/postmaster.c:1780 #18 0x000055644e7f8042 in PostmasterMain (argc=3, argv=0x55644ff77e60) at ../../Desktop/pg_sources/main/postgres/src/backend/postmaster/postmaster.c:1464 #19 0x000055644e67f884 in main (argc=3, argv=0x55644ff77e60) at ../../Desktop/pg_sources/main/postgres/src/backend/main/main.c:198
On Fri, Sep 8, 2023 at 2:35 AM jian he <jian.universality@gmail.com> wrote: > > hi. > the following script makes the server crash (Segmentation fault). > [snip] > > ALTER TABLE temporal_fk_rng2rng > ADD CONSTRAINT temporal_fk_rng2rng_fk > FOREIGN KEY (parent_id, PERIOD valid_at) > REFERENCES temporal_rng > on update set DEFAULT > on delete set DEFAULT; Thank you for the report! It looks like I forgot to handle implicit column names after REFERENCES. The PERIOD part needs to get looked up from the PK as we do for normal FK attrs. I'll add that to the next patch. Yours, Paul
hi I am confused by (pk,fk) on delete on update (restriction and no action) result based on v13. related post: https://stackoverflow.com/questions/14921668/difference-between-restrict-and-no-action Please check the following test and comments. ---common setup for test0, test1,test2,test3 BEGIN; DROP TABLE IF EXISTS temporal_rng, temporal_fk_rng2rng; CREATE TABLE temporal_rng ( id int4range,valid_at tsrange); ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); CREATE TABLE temporal_fk_rng2rng ( id int4range, valid_at tsrange, parent_id int4range ); commit; ----------------no_action_vs_restriction test0 BEGIN; ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT IF EXISTS temporal_fk_rng2rng_fk; ALTER TABLE temporal_fk_rng2rng ADD CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng ON DELETE NO ACTION ON UPDATE NO ACTION; INSERT INTO temporal_rng VALUES ('[5,5]', tsrange('2018-01-01', '2018-02-01')), ('[5,5]', tsrange('2018-02-01', '2018-03-01')); INSERT INTO temporal_fk_rng2rng VALUES ('[3,3]', tsrange('2018-01-05', '2018-01-10'), '[5,5]'); /* expect below to fail. since to be deleted range is being referenced (in temporal_fk_rng2rng) but the v13 patch won't fail. */ delete from temporal_rng FOR PORTION OF valid_at FROM '2018-01-06' TO '2018-01-11' WHERE id = '[5,5]' AND valid_at @> '2018-01-05'::timestamp; TABLE temporal_rng \; table temporal_fk_rng2rng; ROLLBACK; ----------------no_action_vs_restriction test1 BEGIN; ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT IF EXISTS temporal_fk_rng2rng_fk; ALTER TABLE temporal_fk_rng2rng ADD CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng ON DELETE RESTRICT ON UPDATE RESTRICT; INSERT INTO temporal_rng VALUES ('[5,5]', tsrange('2018-01-01', '2018-02-01')), ('[5,5]', tsrange('2018-02-01', '2018-03-01')); INSERT INTO temporal_fk_rng2rng VALUES ('[3,3]', tsrange('2018-01-05', '2018-01-10'), '[5,5]'); /* expect the below command not to fail. since to be deleted range is not being referenced in temporal_fk_rng2rng) but the v13 patch will fail. */ delete from temporal_rng FOR PORTION OF valid_at FROM '2018-01-12' TO '2018-01-20' WHERE id = '[5,5]' AND valid_at @> '2018-01-05'::timestamp; ROLLBACK; ----------------no_action_vs_restriction test2 BEGIN; ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT IF EXISTS temporal_fk_rng2rng_fk; ALTER TABLE temporal_fk_rng2rng ADD CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng ON DELETE no action ON UPDATE no action; INSERT INTO temporal_rng VALUES ('[5,5]', tsrange('2018-01-01', '2018-02-01')), ('[5,5]', tsrange('2018-02-01', '2018-03-01')); INSERT INTO temporal_fk_rng2rng VALUES ('[3,3]', tsrange('2018-01-05', '2018-01-10'), '[5,5]'); /* expect below command fail. since to be deleted range is being referenced (in temporal_fk_rng2rng) */ UPDATE temporal_rng FOR PORTION OF valid_at FROM '2018-01-06' TO '2018-01-08' SET id = '[7,7]' WHERE id = '[5,5]' AND valid_at @> '2018-01-05'::timestamp; TABLE temporal_rng \; table temporal_fk_rng2rng; ROLLBACK; ----------------no_action_vs_restriction test3 BEGIN; ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT IF EXISTS temporal_fk_rng2rng_fk; ALTER TABLE temporal_fk_rng2rng ADD CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng ON DELETE RESTRICT ON UPDATE RESTRICT; INSERT INTO temporal_rng VALUES ('[5,5]', tsrange('2018-01-01', '2018-02-01')), ('[5,5]', tsrange('2018-02-01', '2018-03-01')); INSERT INTO temporal_fk_rng2rng VALUES ('[3,3]', tsrange('2018-01-05', '2018-01-10'), '[5,5]'); /* expect the below command not to fail. since to be deleted range is not being referenced in temporal_fk_rng2rng) but the v13 patch will fail. */ UPDATE temporal_rng FOR PORTION OF valid_at FROM '2018-01-12' TO '2018-01-20' SET id = '[7,7]' WHERE id = '[5,5]' AND valid_at @> '2018-01-05'::timestamp; ROLLBACK;
hi. some trivial issue: in src/backend/catalog/index.c /* * System attributes are never null, so no need to check. */ if (attnum <= 0) since you already checked attnum == 0 so here you can just attnum < 0? ------------------------------------------------- ERROR: column "valid_at" named in WITHOUT OVERLAPS is not a range type IMHO, "named" is unnecessary. ------------------------------------------------- doc/src/sgml/catalogs.sgml pg_constraint adds another attribute (column): contemporal, seems no doc entry. also the temporal in oxford definition is "relating to time", here we can deal with range. So maybe "temporal" is not that accurate?
On 9/7/23 18:24, jian he wrote: > for a range primary key, is it fine to expect it to be unique, not > null and also not overlap? (i am not sure how hard to implement it). > > ----------------------------------------------------------------- > quote from 7IWD2-02-Foundation-2011-12.pdf. 4.18.3.2 Unique > constraints, page 97 of 1483. > > ... > ----------------------------------------------------------------- > based on the above, the unique constraint does not specify that the > column list must be range type. UNIQUE (a, c WITHOUT OVERLAPS). > Here column "a" can be a range type (that have overlap property) and > can be not. > In fact, many of your primary key, foreign key regess test using > something like '[11,11]' (which make it more easy to understand), > which in logic is a non-range usage. > So UNIQUE (a, c WITHOUT OVERLAPS), column "a" be a non-range data type > does make sense? I'm not sure I understand this question, but here are a few things that might help clarify things: In SQL:2011, a temporal primary key, unique constraint, or foreign key may have one or more "scalar" parts (just like a regular key) followed by one "PERIOD" part, which is denoted with "WITHOUT OVERLAPS" (in PKs/UNIQUEs) or "PERIOD" (in FKs). Except for this last key part, everything is still compared for equality, just as in a traditional key. But this last part is compared for overlaps. It's exactly the same as `EXCLUDE (id WITH =, valid_at WITH &&)`. The overlap part must come last and you can have only one (but you may have more than one scalar part if you like). In the patch, I have followed that pattern, except I also allow a regular range column anywhere I allow a PERIOD. In fact PERIODs are mostly implemented on top of range types. (Until recently PERIOD support was in the first patch, not the last, and there was code all throughout for handling both, e.g. within indexes, etc. But at pgcon Peter suggested building everything on just range columns, and then having PERIODs create an "internal" GENERATED column, and that cleaned up the code considerably.) One possible source of confusion is that in the tests I'm using range columns *also* for the scalar key part. So valid_at is a tsrange, and int is an int4range. This is not normally how you'd use the feature, but you need the btree_gist extension to mix int & tsrange (e.g.), and that's not available in the regress tests. We are still comparing the int4range for regular equality and the tsrange for overlaps. If you search this thread there was some discussion about bringing btree_gist into core, but it sounds like it doesn't need to happen. (It might be still desirable independently. EXCLUDE constraints are also not really something you can use practically without it, and their tests use the same trick of comparing ranges for plain equality.) The piece of discussion you're replying to is about allowing *multiple* WITHOUT OVERLAPS modifiers on a PK/UNIQUE constraint, and in any position. I think that's a good idea, so I've started adapting the code to support it. (In fact there is a lot of code that assumes the overlaps key part will be in the last position, and I've never really been happy with that, so it's an excuse to make that more robust.) Here I'm saying (1) you will still need at least one scalar key part, (2) if there are no WITHOUT OVERLAPS parts then you just have a regular key, not a temporal one, (3) changing this obliges us to do the same for foreign keys and FOR PORTION OF. I hope that helps! I apologize if I've completely missed the point. If so please try again. :-) Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Thanks for the thorough review and testing! Here is a v14 patch with the segfault and incorrect handling of NO ACTION and RESTRICT fixed (and reproductions added to the test suite). A few more comments below on feedback from you and Peter: On 9/12/23 02:01, jian he wrote: > hi. some trivial issue: > > in src/backend/catalog/index.c > /* * System attributes are never null, so no need to check. */ > if (attnum <= 0) > > since you already checked attnum == 0 > so here you can just attnum < 0? I fixed the "/* *" typo here. I'm reluctant to change the attnum comparison since that's not a line I touched. (It was just part of the context around the updated comment.) Your suggestion does make sense though, so perhaps it should be a separate commit? > ERROR: column "valid_at" named in WITHOUT OVERLAPS is not a range type > > IMHO, "named" is unnecessary. Changed. > doc/src/sgml/catalogs.sgml > pg_constraint adds another attribute (column): contemporal, seems no doc entry. Added. > also the temporal in oxford definition is "relating to time", here we > can deal with range. > So maybe "temporal" is not that accurate? I agree if we allow multiple WITHOUT OVERLAPS/etc clauses, we should change the terminology. I'll include that with the multiple-range-keys change discussed upthread. On 9/1/23 02:30, Peter Eisentraut wrote: > * There is a lot of talk about "temporal" in this patch, but this > functionality is more general than temporal. I would prefer to change > this to more neutral terms like "overlaps". Okay, sounds like several of us agree on this. > * The field ii_Temporal in IndexInfo doesn't seem necessary and could > be handled via local variables. See [0] for a similar discussion: > > [0]: > https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org Done. > * In gram.y, change withoutOverlapsClause -> without_overlaps_clause > for consistency with the surrounding code. Done. > * No-op assignments like n->without_overlaps = NULL; can be omitted. > (Or you should put them everywhere. But only in some places seems > inconsistent and confusing.) Changed. That makes sense since newNode uses palloc0fast. FWIW there is quite a lot of other code in gram.y that sets NULL fields though, including in ConstraintElem, and it seems like it does improve the clarity a little. By "everywhere" I think you mean wherever the file calls makeNode(Constraint)? I might go back and do it that way later. I'll keep working on a patch to support multiple range keys, but I wanted to work through the rest of the feedback first. Also there is some fixing to do with partitions I believe, and then I'll finish the PERIOD support. So this v14 patch is just some minor fixes & tweaks from September feedback. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On Fri, Sep 15, 2023 at 12:11 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > > I'll keep working on a patch to support multiple range keys, but I > wanted to work through the rest of the feedback first. Also there is > some fixing to do with partitions I believe, and then I'll finish the > PERIOD support. So this v14 patch is just some minor fixes & tweaks from > September feedback. > small issues so far I found, v14. IndexInfo struct definition comment still has Temporal related comment, should be removed. catalog-pg-index.html, no indperiod doc entry, also in table pg_index, column indperiod is junk value now. I think in UpdateIndexRelation, you need an add indperiod to build a pg_index tuple, similar to what you did in CreateConstraintEntry. seems to make the following query works, we need to bring btree_gist related code to core? CREATE TABLE temporal_fk_rng2rng22 (id int8, valid_at int4range, unique (id, valid_at WITHOUT OVERLAPS)); /* ---------------- * pg_period definition. cpp turns this into * typedef struct FormData_pg_period * ---------------- */ CATALOG(pg_period,8000,PeriodRelationId) { Oid oid; /* OID of the period */ NameData pername; /* name of period */ Oid perrelid; /* OID of relation containing this period */ int16 perstart; /* column for start value */ int16 perend; /* column for end value */ int16 perrange; /* column for range value */ Oid perconstraint; /* OID of (start < end) constraint */ } FormData_pg_period; no idea what the above comment "cpp'' refers to. The sixth field in FormData_pg_period: perrange, the comment conflict with catalogs.sgml >> perrngtype oid (references pg_type.oid) >> The OID of the range type associated with this period create table pt (id integer, ds date, de date, period for p (ds, de)); SELECT table_name, column_name, column_default, is_nullable, is_generated, generation_expression FROM information_schema.columns WHERE table_name = 'pt' ORDER BY 1, 2; the hidden generated column (p) is_nullable return NO. but ds, de is_nullable both return YES. so column p is_nullable should return YES?
On Fri, Sep 15, 2023 at 12:11 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > Thanks for the thorough review and testing! > > Here is a v14 patch with the segfault and incorrect handling of NO > ACTION and RESTRICT fixed (and reproductions added to the test suite). > another case: BEGIN; DROP TABLE IF EXISTS temporal_rng, temporal_fk_rng2rng; CREATE TABLE temporal_rng ( id int4range,valid_at tsrange); ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); CREATE TABLE temporal_fk_rng2rng ( id int4range, valid_at tsrange, parent_id int4range ); INSERT INTO temporal_rng VALUES ('[5,5]', tsrange('2018-01-01', '2018-02-01')), ('[5,5]', tsrange('2018-02-01', '2018-03-01')); INSERT INTO temporal_fk_rng2rng VALUES ('[3,3]', tsrange('2018-01-05','2018-01-10'), '[5,5]'); commit; BEGIN; ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT IF EXISTS temporal_fk_rng2rng_fk; ALTER TABLE temporal_fk_rng2rng ADD CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng ON DELETE NO ACTION ON UPDATE NO ACTION; ALTER TABLE temporal_fk_rng2rng ALTER CONSTRAINT temporal_fk_rng2rng_fk DEFERRABLE INITIALLY DEFERRED; delete from temporal_rng; ---should not fail. commit; ---fail in here. ------------------------------- seems in ATExecAlterConstrRecurse change to /* * Update deferrability of RI_FKey_noaction_del, * RI_FKey_noaction_upd, RI_FKey_check_ins and RI_FKey_check_upd * triggers, but not others; see createForeignKeyActionTriggers * and CreateFKCheckTrigger. */ if (tgform->tgfoid != F_RI_FKEY_NOACTION_DEL && tgform->tgfoid != F_TRI_FKEY_NOACTION_DEL && tgform->tgfoid != F_RI_FKEY_NOACTION_UPD && tgform->tgfoid != F_TRI_FKEY_NOACTION_UPD && tgform->tgfoid != F_RI_FKEY_CHECK_INS && tgform->tgfoid != F_TRI_FKEY_CHECK_INS && tgform->tgfoid != F_RI_FKEY_CHECK_UPD && tgform->tgfoid != F_TRI_FKEY_CHECK_UPD) continue; will work.
On 9/17/23 20:11, jian he wrote: > small issues so far I found, v14. Thank you again for the review! v15 is attached. > IndexInfo struct definition comment still has Temporal related > comment, should be removed. Fixed. > catalog-pg-index.html, no indperiod doc entry, also in table pg_index, > column indperiod is junk value now. You're right, it is just unneeded now that PERIODs are implemented by GENERATED columns. I've removed it. > I think in UpdateIndexRelation, you need an add indperiod to build a > pg_index tuple, similar to what you did in CreateConstraintEntry. It's gone now. > seems to make the following query works, we need to bring btree_gist > related code to core? > CREATE TABLE temporal_fk_rng2rng22 (id int8, valid_at int4range, > unique (id, valid_at WITHOUT OVERLAPS)); It doesn't need to be brought into core, but you would need to say `CREATE EXTENSION btree_gist` first. Since the regression tests don't assume we've built contrib, we have to use a workaround there. > /* ---------------- > * pg_period definition. cpp turns this into > * typedef struct FormData_pg_period > * ---------------- > */ > CATALOG(pg_period,8000,PeriodRelationId) > { > Oid oid; /* OID of the period */ > NameData pername; /* name of period */ > Oid perrelid; /* OID of relation containing this period */ > int16 perstart; /* column for start value */ > int16 perend; /* column for end value */ > int16 perrange; /* column for range value */ > Oid perconstraint; /* OID of (start < end) constraint */ > } FormData_pg_period; > > no idea what the above comment "cpp'' refers to. I believe cpp = C Pre-Processor. This comment is at the top of all the catalog/pg_*.h files. The next line is part of the same sentence (which took me a while to notice :-). > The sixth field in > FormData_pg_period: perrange, the comment conflict with catalogs.sgml >>> perrngtype oid (references pg_type.oid) >>> The OID of the range type associated with this period You're right, fixed! More cruft from the old PERIOD implementation. > create table pt (id integer, ds date, de date, period for p (ds, de)); > SELECT table_name, column_name, column_default, is_nullable, > is_generated, generation_expression > FROM information_schema.columns > WHERE table_name = 'pt' ORDER BY 1, 2; > > the hidden generated column (p) is_nullable return NO. but ds, de > is_nullable both return YES. so column p is_nullable should return > YES? The is_nullable behavior is correct I believe. In a range if the lower/upper value is NULL, it signifies the range has no lower/upper bound. So it's fine for ds or de to be NULL, but not the range itself (p). Technically the SQL spec says that the PERIOD start & end columns should be NOT NULL, but that forces people to use ugly sentinel values like '3999-01-01'. It's a shame to make people do that when NULL works so well instead. Our time-related types do have Infinity and -Infinity which is not as ugly, but many other types do not. Plus those values interact badly with ranges. For example `select '(,)'::daterange - '(,Infinity)'::daterange` gives the infinitesimal result `[infinity,)`. I've heard at least one report of that make a mess in a user's database. If a user wants to make the start/end columns NOT NULL they can, so I prefer not to force them. Continuing to your other email: On 9/18/23 05:49, jian he wrote: > BEGIN; > ... > ALTER TABLE temporal_fk_rng2rng ALTER CONSTRAINT > temporal_fk_rng2rng_fk DEFERRABLE INITIALLY DEFERRED; > > delete from temporal_rng; ---should not fail. > commit; ---fail in here. Great catch! This is fixed also. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On Wed, Sep 20, 2023 at 10:50 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 9/17/23 20:11, jian he wrote: > > small issues so far I found, v14. > > Thank you again for the review! v15 is attached. > hi. some tiny issues. IN src/backend/utils/adt/ri_triggers.c else { appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x", pk_only, pkrelname); } should change to else { appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x", pk_only, pkrelname); } ---- It would be better, we mention it somewhere: by default, you can only have a primary key(range_type[...], range_type WITHOUT OVERLAPS). preceding without overlaps, all columns (in primary key) data types only allowed range types. ------------------------------- The WITHOUT OVERLAPS value must be a range type and is used to constrain the record's applicability to just that interval (usually a range of dates or timestamps). "interval", I think "period" or "range" would be better. I am not sure we need to mention " must be a range type, not a multi range type". --------------------------------------------- I just `git apply`, then ran the test, and one test failed. Some minor changes need to make the test pass.
On 9/24/23 21:52, jian he wrote: > On Wed, Sep 20, 2023 at 10:50 AM Paul Jungwirth > <pj@illuminatedcomputing.com> wrote: >> >> On 9/17/23 20:11, jian he wrote: >>> small issues so far I found, v14. >> >> Thank you again for the review! v15 is attached. >> > > hi. some tiny issues. Rebased v16 patches attached. > IN src/backend/utils/adt/ri_triggers.c > > else { > appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x", > pk_only, pkrelname); > } > should change to > > else > { > appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x", > pk_only, pkrelname); > } Fixed. > It would be better, we mention it somewhere: > by default, you can only have a primary key(range_type[...], > range_type WITHOUT OVERLAPS). > > preceding without overlaps, all columns (in primary key) data types > only allowed range types. > ------------------------------- > The WITHOUT OVERLAPS value must be a range type and is used to > constrain the record's applicability to just that interval (usually a > range of dates or timestamps). > > "interval", I think "period" or "range" would be better. I am not sure > we need to mention " must be a range type, not a multi range type". I reworked those two paragraphs to incorporate those suggestions and hopefully clarify the idea bit further. (I'll revise these again once I support multiple WITHOUT OVERLAPS columns.) > I just `git apply`, then ran the test, and one test failed. Some minor > changes need to make the test pass. I couldn't reproduce this. If you're still seeing a failure please let me know what you're seeing. These patches also fix a problem I found with FKs when used with btree_gist. Privately I'm using the script below [1] to re-run all my tests with that extension and int+range columns. I'd like to add something similar to contrib/btree_gist. I'm open to advice how best to do that if anyone has any! [1] #!/bin/bash set -eu # without_overlaps cat ../src/test/regress/sql/without_overlaps.sql | \ sed -E 's/int4range/integer/g' | \ sed -E 's/valid_at integer/valid_at int4range/' | \ sed -E 's/int8range/bigint/g' | \ sed -E 's/'"'"'\[(-?[[:digit:]]+),\1\]'"'"'/\1/g' | \ cat > ./sql/without_overlaps.sql cat ../src/test/regress/expected/without_overlaps.out | \ sed -E 's/int4range/integer/g' | \ sed -E 's/valid_at integer/valid_at int4range/' | \ sed -E 's/incompatible types: integer and tsrange/incompatible types: int4range and tsrange/' | \ sed -E 's/int8range/bigint/g' | \ sed -E 's/'"'"'\[(-?[[:digit:]]+),\1\]'"'"'/\1/g' | \ sed -E 's/'"'"'\[(-?[[:digit:]]+),-?[[:digit:]]+\)'"'"'/\1/g' | \ sed -E 's/\[(-?[[:digit:]]+),\1\]/\1/g' | \ sed -E 's/\[(-?[[:digit:]]+),-?[[:digit:]]+\)/\1/g' | \ sed -E 'N;P;s/^ +id [^\n]+\n-+(\+.*)$/----\1/p;D' | \ sed -E 's/^----------\+-----------\+-----------\+----------\+---------$/----------+---------+-----------+----------+---------/' | \ sed -E 's/^----\+-------------------------\+--------\+-------$/----+-------------------------+-----+-------/' | \ cat > ./expected/without_overlaps.out # for_portion_of cat ../src/test/regress/sql/for_portion_of.sql | \ sed -E 's/int4range/integer/g' | \ sed -E 's/valid_at integer/valid_at int4range/' | \ sed -E 's/'"'"'\[(-?[[:digit:]]+),\1\]'"'"'/\1/g' | \ cat > ./sql/for_portion_of.sql cat ../src/test/regress/expected/for_portion_of.out | \ sed -E 's/int4range/integer/g' | \ sed -E 's/valid_at integer/valid_at int4range/' | \ sed -E 's/'"'"'\[(-?[[:digit:]]+),\1\]'"'"'/\1/g' | \ sed -E 's/'"'"'\[(-?[[:digit:]]+),-?[[:digit:]]+\)'"'"'/\1/g' | \ sed -E 's/\[(-?[[:digit:]]+),\1\]/\1/g' | \ sed -E 's/\[(-?[[:digit:]]+),-?[[:digit:]]+\)/\1/g' | \ sed -E 'N;P;s/^ +id [^\n]+\n-+(\+.*)$/----\1/p;D' | \ cat > ./expected/for_portion_of.out Regards, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On 25.09.23 21:20, Paul Jungwirth wrote: > On 9/24/23 21:52, jian he wrote: >> On Wed, Sep 20, 2023 at 10:50 AM Paul Jungwirth >> <pj@illuminatedcomputing.com> wrote: >>> >>> On 9/17/23 20:11, jian he wrote: >>>> small issues so far I found, v14. >>> >>> Thank you again for the review! v15 is attached. >>> >> >> hi. some tiny issues. > > Rebased v16 patches attached. Looking through the tests in v16-0001: +-- PK with no columns just WITHOUT OVERLAPS: +CREATE TABLE temporal_rng ( + valid_at tsrange, + CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OVERLAPS) +); +ERROR: syntax error at or near "WITHOUT" +LINE 3: CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OV... + ^ I think this error is confusing. The SQL standard requires at least one non-period column in a PK. I don't know why that is or why we should implement it. But if we want to implement it, maybe we should enforce that in parse analysis rather than directly in the parser, to be able to produce a more friendly error message. +-- PK with a range column/PERIOD that isn't there: +CREATE TABLE temporal_rng ( + id INTEGER, + CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) +); +ERROR: range or PERIOD "valid_at" in WITHOUT OVERLAPS does not exist I think here we should just produce a "column doesn't exist" error message, the same as if the "id" column was invalid. We don't need to get into the details of what kind of column it should be. That is done in the next test +ERROR: column "valid_at" in WITHOUT OVERLAPS is not a range type Also, in any case it would be nice to have a location pointer here (for both cases). +-- PK with one column plus a range: +CREATE TABLE temporal_rng ( + -- Since we can't depend on having btree_gist here, + -- use an int4range instead of an int. + -- (The rangetypes regression test uses the same trick.) + id int4range, + valid_at tsrange, + CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) +); I'm confused why you are using int4range here (and in further tests) for the scalar (non-range) part of the primary key. Wouldn't a plaint int4 serve here? +SELECT pg_get_indexdef(conindid, 0, true) FROM pg_constraint WHERE conname = 'temporal_rng_pk'; + pg_get_indexdef +------------------------------------------------------------------------------- + CREATE UNIQUE INDEX temporal_rng_pk ON temporal_rng USING gist (id, valid_at) Shouldn't this somehow show the operator classes for the columns? We are using different operator classes for the id and valid_at columns, aren't we? +-- PK with USING INDEX (not possible): +CREATE TABLE temporal3 ( + id int4range, + valid_at tsrange +); +CREATE INDEX idx_temporal3_uq ON temporal3 USING gist (id, valid_at); +ALTER TABLE temporal3 + ADD CONSTRAINT temporal3_pk + PRIMARY KEY USING INDEX idx_temporal3_uq; +ERROR: "idx_temporal3_uq" is not a unique index +LINE 2: ADD CONSTRAINT temporal3_pk + ^ +DETAIL: Cannot create a primary key or unique constraint using such an index. Could you also add a test where the index is unique and the whole thing does work? Apart from the tests, how about renaming the column pg_constraint.contemporal to something like to conwithoutoverlaps?
Hi Peter et al, On 9/1/23 12:56, Paul Jungwirth wrote: >> On 9/1/23 11:30, Peter Eisentraut wrote: >>> I think the WITHOUT OVERLAPS clause should be per-column, so that >>> something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) >>> would be possible. Then the WITHOUT OVERLAPS clause would directly >>> correspond to the choice between equality or overlaps operator per >>> column. > I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any > position) is a great recommendation that enables a lot of new > functionality. I've been working on implementing this, but I've come to think it is the wrong way to go. If we support this in primary key and unique constraints, then we must also support it for foreign keys and UPDATE/DELETE FOR PORTION OF. But implementing that logic is pretty tricky. For example take a foreign key on (id, PERIOD valid_at, PERIOD asserted_at). We need to ensure the referenced two-dimensional time space `contains` the referencing two-dimensional space. You can visualize a rectangle in two-dimensional space for each referencing record (which we validate one at a time). The referenced records must be aggregated and so form a polygon (of all right angles). For example the referencing record may be (1, [0,2), [0,2)) with referenced records of (1, [0,2), [0,1)) and (1, [0,1), [1,2)). (I'm using intranges since they're easier to read, but you could imagine these as dateranges like [2000-01-01,2002-01-01).) Now the range_agg of their valid_ats is [0,2) and of their asserted_ats is [0,2). But the referenced 2d space still doesn't contain the referencing space. It's got one corner missing. This is a well-known problem among game developers. We're lucky not to have arbitrary polygons, but it's still a tough issue. Besides `contains` we also need to compute `overlaps` and `intersects` to support these temporal features. Implementing that for 2d, 3d, etc looks very complicated, for something that is far outside the normal use case and also not part of the standard. It will cost a little performance for the normal 1d use case too. I think a better approach (which I want to attempt as an add-on patch, not in this main series) is to support not just range types, but any type with the necessary operators. Then you could have an mdrange (multi-dimensional range) or potentially even an arbitrary n-dimensional polygon. (PostGIS has something like this, but its `contains` operator compares (non-concave) *bounding boxes*, so it would not work for the example above. Still the similarity between temporal and spatial data is striking. I'm going to see if I can get some input from PostGIS folks about how useful any of this is to them.) This approach would also let us use multiranges: not for multiple dimensions, but for non-contiguous time spans stored in a single row. This puts the complexity in the types themselves (which seems more appropriate) and is ultimately more flexible (supporting not just mdrange but also multirange, and other things too). This approach also means that instead of storing a mask/list of which columns use WITHOUT OVERLAPS, I can just store one attnum. Again, this saves the common use-case from paying a performance penalty to support a much rarer one. I've still got my multi-WITHOUT OVERLAPS work, but I'm going to switch gears to what I've described here. Please let me know if you disagree! Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 9/25/23 14:00, Peter Eisentraut wrote: > Looking through the tests in v16-0001: > > +-- PK with no columns just WITHOUT OVERLAPS: > +CREATE TABLE temporal_rng ( > + valid_at tsrange, > + CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OVERLAPS) > +); > +ERROR: syntax error at or near "WITHOUT" > +LINE 3: CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OV... > + ^ > > I think this error is confusing. The SQL standard requires at least one > non-period column in a PK. I don't know why that is or why we should > implement it. But if we want to implement it, maybe we should enforce > that in parse analysis rather than directly in the parser, to be able to > produce a more friendly error message. Okay. (I think the reason the standard requires one non-period column is to identify the "entity". If philosophically the row is an Aristotelian proposition about that thing, the period qualifies it as true just during some time span. So the scalar part is doing the work that a PK conventionally does, and the period part does something else. Perhaps a PK/UNIQUE constraint with no scalar part would still be useful, but not very often I think, and I'm not sure it makes sense to call it PK/UNIQUE.) > +-- PK with a range column/PERIOD that isn't there: > +CREATE TABLE temporal_rng ( > + id INTEGER, > + CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT > OVERLAPS) > +); > +ERROR: range or PERIOD "valid_at" in WITHOUT OVERLAPS does not exist > > I think here we should just produce a "column doesn't exist" error > message, the same as if the "id" column was invalid. We don't need to > get into the details of what kind of column it should be. That is done > in the next test I'll change it. The reason for the different wording is that it might not be a column at all. It might be a PERIOD. So what about just "column or PERIOD doesn't exist"? (Your suggestion is fine too though.) > +ERROR: column "valid_at" in WITHOUT OVERLAPS is not a range type > > Also, in any case it would be nice to have a location pointer here (for > both cases). Agreed. > +-- PK with one column plus a range: > +CREATE TABLE temporal_rng ( > + -- Since we can't depend on having btree_gist here, > + -- use an int4range instead of an int. > + -- (The rangetypes regression test uses the same trick.) > + id int4range, > + valid_at tsrange, > + CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT > OVERLAPS) > +); > > I'm confused why you are using int4range here (and in further tests) for > the scalar (non-range) part of the primary key. Wouldn't a plaint int4 > serve here? A plain int4 would be better, and it would match the normal use-case, but you must have btree_gist to create an index like that, and the regress tests can't assume we have that. Here is the part from sql/rangetypes.sql I'm referring to: -- -- Btree_gist is not included by default, so to test exclusion -- constraints with range types, use singleton int ranges for the "=" -- portion of the constraint. -- create table test_range_excl( room int4range, speaker int4range, during tsrange, exclude using gist (room with =, during with &&), exclude using gist (speaker with =, during with &&) ); > +SELECT pg_get_indexdef(conindid, 0, true) FROM pg_constraint WHERE > conname = 'temporal_rng_pk'; > + pg_get_indexdef > +------------------------------------------------------------------------------- > + CREATE UNIQUE INDEX temporal_rng_pk ON temporal_rng USING gist (id, > valid_at) > > Shouldn't this somehow show the operator classes for the columns? We > are using different operator classes for the id and valid_at columns, > aren't we? We only print the operator classes if they are not the default, so they don't appear here. I do suspect something more is desirable though. For exclusion constraints we replace everything before the columns with just "EXCLUDE USING gist". I could embed WITHOUT OVERLAPS but it's not valid syntax in CREATE INDEX. Let me know if you have any ideas. > +-- PK with USING INDEX (not possible): > +CREATE TABLE temporal3 ( > + id int4range, > + valid_at tsrange > +); > +CREATE INDEX idx_temporal3_uq ON temporal3 USING gist (id, valid_at); > +ALTER TABLE temporal3 > + ADD CONSTRAINT temporal3_pk > + PRIMARY KEY USING INDEX idx_temporal3_uq; > +ERROR: "idx_temporal3_uq" is not a unique index > +LINE 2: ADD CONSTRAINT temporal3_pk > + ^ > +DETAIL: Cannot create a primary key or unique constraint using such an > index. > > Could you also add a test where the index is unique and the whole thing > does work? No problem! > Apart from the tests, how about renaming the column > pg_constraint.contemporal to something like to conwithoutoverlaps? Is that too verbose? I've got some code already changing it to conoverlaps but I'm probably happier with conwithoutoverlaps, assuming no one else minds it. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 10/11/23 05:47, Paul Jungwirth wrote: >> +SELECT pg_get_indexdef(conindid, 0, true) FROM pg_constraint WHERE >> conname = 'temporal_rng_pk'; >> + pg_get_indexdef >> +------------------------------------------------------------------------------- >> + CREATE UNIQUE INDEX temporal_rng_pk ON temporal_rng USING gist (id, >> valid_at) >> >> Shouldn't this somehow show the operator classes for the columns? We >> are using different operator classes for the id and valid_at columns, >> aren't we? > > We only print the operator classes if they are not the default, so they > don't appear here. > > I do suspect something more is desirable though. For exclusion > constraints we replace everything before the columns with just "EXCLUDE > USING gist". I could embed WITHOUT OVERLAPS but it's not valid syntax in > CREATE INDEX. Let me know if you have any ideas. Why not? The standard does not mention indexes (although some discussions last week might change that) so we can change the syntax for it as we wish. Doing so would also allow us to use ALTER TABLE ... USING INDEX for such things. -- Vik Fearing
On Tue, Sep 26, 2023 at 4:21 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 9/24/23 21:52, jian he wrote: > > On Wed, Sep 20, 2023 at 10:50 AM Paul Jungwirth > > <pj@illuminatedcomputing.com> wrote: > >> > >> On 9/17/23 20:11, jian he wrote: > >>> small issues so far I found, v14. > >> > >> Thank you again for the review! v15 is attached. > >> > > > > hi. some tiny issues. > > Rebased v16 patches attached. Can you rebase it? changes in https://git.postgresql.org/cgit/postgresql.git/log/src/backend/executor/nodeModifyTable.c https://git.postgresql.org/cgit/postgresql.git/log/src/backend/commands/tablecmds.c make it no longer applicable. I try to manually edit the patch to make it applicable. but failed at tablecmds.c
Hi. based on v16. /* Look up the FOR PORTION OF name requested. */ range_attno = attnameAttNum(targetrel, range_name, false); if (range_attno == InvalidAttrNumber) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_COLUMN), errmsg("column or period \"%s\" of relation \"%s\" does not exist", range_name, RelationGetRelationName(targetrel)), parser_errposition(pstate, forPortionOf->range_name_location))); attr = TupleDescAttr(targetrel->rd_att, range_attno - 1); // TODO: check attr->attisdropped (?), // and figure out concurrency issues with that in general. // It should work the same as updating any other column. I don't think we need to check attr->attisdropped here. because the above function attnameAttNum already does the job. -------------------------------------------- bool get_typname_and_namespace(Oid typid, char **typname, char **typnamespace) { HeapTuple tp; tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid)); if (HeapTupleIsValid(tp)) { Form_pg_type typtup = (Form_pg_type) GETSTRUCT(tp); *typname = pstrdup(NameStr(typtup->typname)); *typnamespace = get_namespace_name(typtup->typnamespace); ReleaseSysCache(tp); return *typnamespace; "return *typnamespace;" should be "return true"? Maybe name it to get_typname_and_typnamespace? ----------------------------------------------------------------------- if (!get_typname_and_namespace(attr->atttypid, &range_type_name, &range_type_namespace)) elog(ERROR, "missing range type %d", attr->atttypid); you can just `elog(ERROR, "missing range type %s", range_type_name);` ? Also, this should be placed just below if (!type_is_range(attr->atttypid))? ----------------------------------------------------------------------- src/backend/catalog/objectaddress.c if (OidIsValid(per->perrelid)) { StringInfoData rel; initStringInfo(&rel); getRelationDescription(&rel, per->perrelid, false); appendStringInfo(&buffer, _("period %s on %s"), NameStr(per->pername), rel.data); pfree(rel.data); } else { appendStringInfo(&buffer, _("period %s"), NameStr(per->pername)); } periods are always associated with the table, is the above else branch correct? ----------------------------------------------------------------------- File: src/backend/commands/tablecmds.c 7899: /* 7900: * this test is deliberately not attisdropped-aware, since if one tries to 7901: * add a column matching a dropped column name, it's gonna fail anyway. 7902: * 7903: * XXX: Does this hold for periods? 7904: */ 7905: attTuple = SearchSysCache2(ATTNAME, 7906: ObjectIdGetDatum(RelationGetRelid(rel)), 7907: PointerGetDatum(pername)); XXX: Does this hold for periods? Yes. we can add the following 2 sql for code coverage. alter table pt add period for tableoid (ds, de); alter table pt add period for "........pg.dropped.4........" (ds, de);
hi. also based on v16. -----------------tests. drop table if exists for_portion_of_test1; CREATE unlogged TABLE for_portion_of_test1 (id int4range, valid_at tsrange,name text ); INSERT INTO for_portion_of_test1 VALUES ('[1,1]', NULL, '[1,1]_NULL'),('[1,1]', '(,)', '()_[1,]') ,('[1,1]', 'empty', '[1,1]_empty'),(NULL,NULL, NULL), (nuLL, '(2018-01-01,2019-01-01)','misc'); --1 UPDATE for_portion_of_test1 FOR PORTION OF valid_at FROM NULL TO NULL SET name = 'for_portition_NULLtoNULL'; select * from for_portion_of_test1; --2 UPDATE for_portion_of_test1 FOR PORTION OF valid_at FROM null TO UNBOUNDED SET name = 'NULL_TO_UNBOUNDED'; select * from for_portion_of_test1; --3 UPDATE for_portion_of_test1 FOR PORTION OF valid_at FROM UNBOUNDED TO null SET name = 'UNBOUNDED__TO_NULL'; select * from for_portion_of_test1; --4 UPDATE for_portion_of_test1 FOR PORTION OF valid_at FROM UNBOUNDED TO UNBOUNDED SET name = 'UNBOUNDED__TO_UNBOUNDED'; select * from for_portion_of_test1; ------------------------ File: /src/backend/executor/nodeModifyTable.c 1277: oldRange = slot_getattr(oldtupleSlot, forPortionOf->rangeVar->varattno, &isNull); 1278: 1279: if (isNull) 1280: elog(ERROR, "found a NULL range in a temporal table"); 1281: oldRangeType = DatumGetRangeTypeP(oldRange); I wonder when this isNull will be invoked. the above tests won't invoke the error. also the above test, NULL seems equivalent to unbounded. FOR PORTION OF "from" and "to" both bound should not be null? which means the following code does not work as intended? I also cannot find a way to invoke the following elog error branch. File:src/backend/executor/nodeModifyTable.c 4458: exprState = ExecPrepareExpr((Expr *) forPortionOf->targetRange, estate); 4459: targetRange = ExecEvalExpr(exprState, econtext, &isNull); 4460: if (isNull) 4461: elog(ERROR, "Got a NULL FOR PORTION OF target range"); --------------------------- i also made some changes in the function range_leftover_internal, ExecForPortionOfLeftovers. please see the attached patch.
Attachment
On Wed, Oct 11, 2023 at 12:47 PM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 9/25/23 14:00, Peter Eisentraut wrote: > > Looking through the tests in v16-0001: > > > > +-- PK with no columns just WITHOUT OVERLAPS: > > +CREATE TABLE temporal_rng ( > > + valid_at tsrange, > > + CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OVERLAPS) > > +); > > +ERROR: syntax error at or near "WITHOUT" > > +LINE 3: CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OV... > > + ^ > > > > I think this error is confusing. The SQL standard requires at least one > > non-period column in a PK. I don't know why that is or why we should > > implement it. But if we want to implement it, maybe we should enforce > > that in parse analysis rather than directly in the parser, to be able to > > produce a more friendly error message. > > Okay. > > (I think the reason the standard requires one non-period column is to > identify the "entity". If philosophically the row is an Aristotelian > proposition about that thing, the period qualifies it as true just > during some time span. So the scalar part is doing the work that a PK > conventionally does, and the period part does something else. Perhaps a > PK/UNIQUE constraint with no scalar part would still be useful, but not > very often I think, and I'm not sure it makes sense to call it PK/UNIQUE.) > > > +-- PK with a range column/PERIOD that isn't there: > > +CREATE TABLE temporal_rng ( > > + id INTEGER, > > + CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT > > OVERLAPS) > > +); > > +ERROR: range or PERIOD "valid_at" in WITHOUT OVERLAPS does not exist > > > > I think here we should just produce a "column doesn't exist" error > > message, the same as if the "id" column was invalid. We don't need to > > get into the details of what kind of column it should be. That is done > > in the next test > > I'll change it. The reason for the different wording is that it might > not be a column at all. It might be a PERIOD. So what about just "column > or PERIOD doesn't exist"? (Your suggestion is fine too though.) > > > +ERROR: column "valid_at" in WITHOUT OVERLAPS is not a range type > > > > Also, in any case it would be nice to have a location pointer here (for > > both cases). > > Agreed. > I refactored findNeworOldColumn to better handle error reports. please check the attached.
Attachment
V16 patch doc/src/sgml/html/sql-createtable.html doc SET NULL description: ` SET NULL [ ( column_name [, ... ] ) ] Set all of the referencing columns, or a specified subset of the referencing columns, to null. A subset of columns can only be specified for ON DELETE actions. In a temporal foreign key, the change will use FOR PORTION OF semantics to constrain the effect to the bounds of the referenced row. ` I think it means, if the foreign key has PERIOD column[s], then the PERIOD column[s] will not be set to NULL in {ON DELETE|ON UPDATE}. We can also use FOR PORTION OF semantics to constrain the effect to the bounds of the referenced row. see below demo: BEGIN; drop table if exists temporal_rng CASCADE; drop table if exists temporal_fk_rng2rng CASCADE; CREATE unlogged TABLE temporal_rng (id int4range,valid_at tsrange); ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); CREATE unlogged TABLE temporal_fk_rng2rng (id int4range,valid_at tsrange,parent_id int4range, CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng (id, PERIOD valid_at) on update set null ON DELETE SET NULL); INSERT INTO temporal_rng VALUES ('[11,11]', tsrange('2018-01-01', '2021-01-01')); INSERT INTO temporal_fk_rng2rng VALUES ('[7,7]', tsrange('2018-01-01', '2021-01-01'), '[11,11]'); DELETE FROM temporal_rng WHERE id = '[11,11]'; table temporal_fk_rng2rng; commit; ----------------------------------------------------- also "REFERENCES temporal_rng (id, PERIOD valid_at) ON UPDATE SET NULL ON DELETE SET NULL)" is the same as "REFERENCES temporal_rng (id, PERIOD valid_at) ON UPDATE SET NULL ON DELETE SET NULL (parent_id)" in the current implementation. we might need to change the pg_constraint column "confdelsetcols" description. ------- the above also applies to SET DEFAULT. -------------------------------------------------------------------------------------------------------------------------- can you add the following for the sake of code coverage. I think src/test/regress/sql/without_overlaps.sql can be simplified. --- common template for test foreign key constraint. CREATE OR REPLACE PROCEDURE overlap_template() LANGUAGE SQL AS $$ DROP TABLE IF EXISTS temporal_rng CASCADE; DROP TABLE IF EXISTS temporal_fk_rng2rng CASCADE; CREATE UNLOGGED TABLE temporal_rng (id int4range,valid_at tsrange); ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); CREATE UNLOGGED TABLE temporal_fk_rng2rng ( id int4range, valid_at tsrange, parent_id int4range, CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng (id, PERIOD valid_at) ON UPDATE no action ON DELETE no action DEFERRABLE ); $$; call overlap_template(); --- on update/delete restrict -- coverage for TRI_FKey_restrict_upd,TRI_FKey_restrict_del. BEGIN; ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT temporal_fk_rng2rng_fk, ADD CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng(id,PERIOD valid_at) ON UPDATE RESTRICT ON DELETE RESTRICT; INSERT INTO temporal_rng VALUES ('[11,11]', tsrange('2018-01-01', '2021-01-01')); INSERT INTO temporal_fk_rng2rng VALUES ('[7,7]', tsrange('2018-01-01', '2020-01-01'), '[11,11]'); savepoint s; UPDATE temporal_rng FOR PORTION OF valid_at FROM '2018-01-01' TO '2018-01-03' SET id = '[9,9]' WHERE id = '[11,11]'; ROLLBACK to s; delete from temporal_rng FOR PORTION OF valid_at FROM '2018-01-01' TO '2020-01-01'; ROLLBACK to s; --this one should not have error. delete from temporal_rng FOR PORTION OF valid_at FROM '2020-01-01' TO '2021-01-01'; table temporal_rng; ROLLBACK; ------------- --- on delete set column list coverage for function tri_set. branch {if (riinfo->ndelsetcols != 0)} BEGIN; ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT temporal_fk_rng2rng_fk, ADD CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng(id,PERIOD valid_at) ON DELETE set default(parent_id); ALTER TABLE temporal_fk_rng2rng ALTER COLUMN parent_id SET DEFAULT '[2,2]'; ALTER TABLE temporal_fk_rng2rng ALTER COLUMN valid_at SET DEFAULT tsrange'(,)'; INSERT INTO temporal_rng VALUES ('[11,11]', tsrange('2018-01-01', '2021-01-01')); INSERT INTO temporal_fk_rng2rng VALUES ('[7,7]', tsrange('2018-01-01', '2020-01-01'), '[11,11]'); insert into temporal_rng values('[2,2]','(,)'); savepoint s; delete from temporal_rng FOR PORTION OF valid_at FROM '2018-01-01' TO '2019-01-01' where id = '[11,11]'; -- delete from temporal_rng where id = '[11,11]'; table temporal_fk_rng2rng; rollback;
hi. * The attached patch makes foreign keys with PERIOD fail if any of the foreign key columns is "generated columns". * The following queries will cause segmentation fault. not sure the best way to fix it. the reason in LINE: numpks = transformColumnNameList(RelationGetRelid(pkrel), fkconstraint->pk_attrs, pkattnum, pktypoid); begin; drop table if exists temporal3,temporal_fk_rng2rng; CREATE TABLE temporal3 (id int4range,valid_at tsrange, CONSTRAINT temporal3_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)); CREATE TABLE temporal_fk_rng2rng ( id int4range,valid_at tsrange,parent_id int4range, CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS), CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal3 (id, valid_at) ); * change the function FindFKComparisonOperators's "eqstrategy" to make pg_constraint record correct {conpfeqop,conppeqop,conffeqop}. * fix the ON DELETE SET NULL/DEFAULT (columnlist). Now the following queries error will be more consistent. ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT temporal_fk_rng2rng_fk, ADD CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng ON DELETE SET DEFAULT(valid_at); --ON DELETE SET NULL(valid_at); * refactor restrict_cascading_range function. * you did if (numfks != numpks) before if (is_temporal) {numfks += 1;}, So I changed the code order to make the error report more consistent. anyway, I put it in one patch. please check the attached.
Attachment
Thanks for all the feedback! Consolidating several emails below: > On Fri, Oct 20, 2023 at 5:45 AM jian he <jian.universality@gmail.com> wrote: > I don't think we need to check attr->attisdropped here Changed. > "return *typnamespace;" should be "return true"? No, but I added a comment to clarify. > Maybe name it get_typname_and_typnamespace? I could go either way on this but I left it as-is since it seems redundant, and there are other functions here that don't repeat the three-letter prefix. > you can just `elog(ERROR, "missing range type %s", range_type_name);` ? No, because this failure happens trying to look up the name. > Also, this should be placed just below if (!type_is_range(attr->atttypid))? We ereport there (not elog) because it's a user error (using a non-rangetype for the option), not an internal error. > periods are always associated with the table, is the above else branch correct? True but I'm following the code just above for OCLASS_CONSTRAINT. Even if this case is unexpected, it seems better to handle it gracefully than have a harder failure. > XXX: Does this hold for periods? > Yes. we can add the following 2 sql for code coverage. > alter table pt add period for tableoid (ds, de); > alter table pt add period for "........pg.dropped.4........" (ds, de); Added, thanks! > On Sun, Oct 22, 2023 at 5:01 PM jian he <jian.universality@gmail.com> wrote: > drop table if exists for_portion_of_test1; > CREATE unlogged TABLE for_portion_of_test1 (id int4range, valid_at > tsrange,name text ); > ... These are good tests, thanks! Originally FOR PORTION OF required a PRIMARY KEY or UNIQUE constraint, so we couldn't find NULLs here, but we changed that a while back, so it's good to verify it handles that case. > 1279: if (isNull) > 1280: elog(ERROR, "found a NULL range in a temporal table"); > 1281: oldRangeType = DatumGetRangeTypeP(oldRange); > > I wonder when this isNull will be invoked. The above tests won't > invoke the error. As far as I can tell it shouldn't happen, which is why it's elog. The new tests don't hit it because a NULL range should never match the range in the FROM+TO of the FOR PORTION OF clause. Maybe this should even be an assert, but I think I prefer elog for the nicer error message and less-local condition. > also the above test, NULL seems equivalent to unbounded. FOR PORTION > OF "from and "to" both bound should not be null? Correct, NULL and UNBOUNDED mean the same thing. This matches the meaning of NULL in ranges. > which means the following code does not work as intended? I also > cannot find a way to invoke the following elog error branch. > File:src/backend/executor/nodeModifyTable.c > 4458: exprState = ExecPrepareExpr((Expr *) forPortionOf->targetRange, estate); > 4459: targetRange = ExecEvalExpr(exprState, econtext, &isNull); > 4460: if (isNull) > 4461: elog(ERROR, "Got a NULL FOR PORTION OF target range"); Here we're checking the "target range", in other words the range built from the FROM+TO of the FOR PORTION OF clause---not a range from a tuple. Finding a NULL here *for the range itself* would indeed be an error. A NULL *bound* means "unbounded", but a NULL *range* should not be possible to construct. > I also made some changes in the function range_leftover_internal, I'm not really comfortable with these changes. "Leftover" doesn't refer to "left" vs "right" but to what *remains* (what is "left behind") after the UPDATE/DELETE. Also r1 and r2 are common parameter names throughout the rangetypes.c file, and they are more general than the names you've suggested. We shouldn't assume we will only ever call this function from the FOR PORTION OF context. > ExecForPortionOfLeftovers Thanks! I've made these code changes (with slight modifications, e.g. no need to call ExecFetchSlotHeapTuple if there are no leftovers). I'm not sure about the comment change though---I want to verify that myself (particularly the case when the partition key is updated so we have already been routed to a different partition than the old tuple). > On Tue, Oct 24, 2023 at 11:14 PM jian he <jian.universality@gmail.com> wrote: > I refactored findNewOrOldColumn to better handle error reports. Thanks, I like your changes here. Applied with some small adjustments. > On Sat, Oct 28, 2023 at 1:26 AM jian he <jian.universality@gmail.com> wrote: > I think it means, if the foreign key has PERIOD column[s], then the > PERIOD column[s] will not be set to NULL in {ON DELETE|ON UPDATE}. . . . I reworded this to explain that the PERIOD element will not be set to NULL (or the default value). > can you add the following for the sake of code coverage. I think > src/test/regress/sql/without_overlaps.sql can be simplified. > ... > call overlaps_template(); I'm not sure I want to add indirection like this to the tests, which I think makes them harder to read (and update). But there is indeed a tough combinatorial explosion, especially in the foreign key tests. We want to cover {ON DELETE,ON UPDATE} {NO ACTION,RESTRICT,CASCADE,SET NULL,SET DEFAULT} when {child inserts,child updates,parent updates,parent deletes} with {one,two} scalar columns and {,not} partitioned. Also ON DELETE SET {NULL,DEFAULT} against only a subset of columns. I updated the test cases to delete and re-use the same id values, so at least they are more isolated and thus easier to edit. I also added tests for `(parent_id1, parent2, PERIOD valid_at)` cases as well as `ON DELETE SET {NULL,DEFAULT} (parent_id1)`. (I think that last case covers what you are trying to do here, but if I misunderstood please let me know.) I haven't worked through your last email yet, but this seemed like enough changes to warrant an update. New patches attached (rebased to 0bc726d9). Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
hi. based on v17. I found several doc related issues. previously I didn't look closely.... + </para> + In a temporal foreign key, the delete/update will use + <literal>FOR PORTION OF</literal> semantics to constrain the + effect to the bounds being deleted/updated in the referenced row. + </para> The first "para" should be <para> ? --- There are many warnings after #define WRITE_READ_PARSE_PLAN_TREES see: http://cfbot.cputube.org/highlights/all.html#4308 Does that mean oue new change in gram.y is somehow wrong? ------ sgml/html/sql-update.html: "range_or_period_name The range column or period to use when performing a temporal update. This must match the range or period used in the table's temporal primary key." Is the second sentence unnecessary? since no primary can still do "for portion of update". sgml/html/sql-update.html: "start_time The earliest time (inclusive) to change in a temporal update. This must be a value matching the base type of the range or period from range_or_period_name. It may also be the special value MINVALUE to indicate an update whose beginning is unbounded." probably something like the following: "lower_bound" The lower bound (inclusive) to change in an overlap update. This must be a value matching the base type of the range or period from range_or_period_name. It may also be the special value UNBOUNDED to indicate an update whose beginning is unbounded." Obviously the "start_time" reference also needs to change, and the sql-delete.html reference also needs to change. ---------------------------------- UPDATE for_portion_of_test FOR PORTION OF valid_at FROM NULL TO "unbounded" SET name = 'NULL to NULL'; should fail, but not. double quoted unbounded is a column reference, I assume. That's why I am confused with the function transformForPortionOfBound. "if (nodeTag(n) == T_ColumnRef)" part. ----------------------------------- in create_table.sgml. you also need to add WITHOUT OVERLAPS related info into <varlistentry id="sql-createtable-parms-unique">
On 02.11.23 21:21, Paul Jungwirth wrote: > New patches attached (rebased to 0bc726d9). I went over the patch v17-0001-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch in more detail. Attached is a fixup patch that addresses a variety of cosmetic issues. Some details: - Renamed contemporal to conwithoutoverlaps, as previously discussed. Also renamed various variables and function arguments similarly. - Rearranged text in CREATE TABLE reference page so there are no forward references. (Describe WITHOUT OVERLAPS under UNIQUE and then PRIMARY KEy says "see above", rather than describe it under PRIMARY KEY and have UNIQUE say "see below.) - Removed various bits that related to temporal foreign keys, which belong in a later patch. - Reverted some apparently unrelated changes in src/backend/catalog/index.c. - Removed the "temporal UNIQUE" constraint_type assignment in DefineIndex(). This is meant to be used in error messages and should refer to actual syntax. I think it's fine without it this change. - Field contemporal in NewConstraint struct is not used by this patch. - Rearrange the grammar so that the rule with WITHOUT OVERLAPS is just a Boolean attribute rather than column name plus keywords. This was kind of confusing earlier and led to weird error messages for invalid syntax. I kept the restriction that you need at least one non-overlaps column, but that is now enforced in parse analysis, not in the grammar. (But maybe we don't need it?) (After your earlier explanation, I'm content to just allow one WITHOUT OVERLAPS column for now.) - Some places looked at conexclop to check whether something is a WITHOUT OVERLAPS constraint, instead of looking at conwithoutoverlaps directly. - Removed some redundant "unlike" entries in the pg_dump tests. (This caused cfbot tests to fail.) - Moved the "without_overlaps" test later in the schedule. It should at least be after "constraints" so that normal constraints are tested first. Two areas that could be improved: 1) In src/backend/commands/indexcmds.c, get_index_attr_temporal_operator() has this comment: + * This seems like a hack + * but I can't find any existing lookup function + * that knows about pseudotypes. This doesn't see very confident. ;-) I don't quite understand this. Is this a gap in the currently available APIs, do we need to improve something here, or does this need more research? 2) In src/backend/parser/parse_utilcmd.c, transformIndexConstraint(), there is too much duplication between the normal and the if (constraint->without_overlaps) case, like the whole not-null constraints stuff at the end. This should be one code block with a few conditionals inside. Also, the normal case deals with things like table inheritance, which the added parts do not. Is this all complete? I'm not sure the validateWithoutOverlaps() function is needed at this point in the code. We just need to check that the column exists, which the normal code path already does, and then have the index creation code later check that an appropriate overlaps operator exists. We don't even need to restrict this to range types. Consider for example, it's possible that a type does not have a btree equality operator. We don't check that here either, but let the index code later check it. Overall, with these fixes, I think this patch is structurally okay. We just need to make sure we have all the weird corner cases covered.
Attachment
On 11/9/23 05:47, Peter Eisentraut wrote: > I went over the patch > v17-0001-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch in more > detail Thanks Peter! I'm about halfway through jian he's last two emails. I'll address your feedback also. I wanted to reply to this without waiting though: > Overall, with these fixes, I think this patch is structurally okay. We > just need to make sure we have all the weird corner cases covered. One remaining issue I know about is with table partitions whose column order has changed. I've got an in-progress fix for that, but I've been prioritizing reviewer feedback the last few months. Just want to make sure you know about it for now. Thanks! -- Paul ~{:-) pj@illuminatedcomputing.com
based on v17. begin; drop table if exists s1; CREATE TABLE s1 (id numrange, misc int, misc1 text); create role test101 login; grant update, select on s1 to test101; insert into s1 VALUES ('[1,1000]',2); set session authorization test101; update s1 set id = '[1,1000]'; savepoint sp1; update s1 FOR PORTION OF id from 10 to 100 set misc1 = 'test'; table s1; savepoint sp2; insert into s1 VALUES ('[2,1000]',12); rollback; In UPDATE FOR PORTION OF from x to y, if range [x,y) overlaps with the "source" range then the UPDATE action would be UPDATE and INSERT. The above UPDATE FOR PORTION OF query should fail? UPDATE FOR PORTION OF, may need insert privilege. We also need to document this. Similarly, we also need to apply the above logic to DELETE FOR PORTION OF. ------------------------------------------------------- + <para> + If the table has a <link linkend="ddl-periods-application-periods">range column + or <literal>PERIOD</literal></link>, you may supply a should be + <para> + If the table has a range column or <link linkend="ddl-periods-application-periods"> + <literal>PERIOD</literal></link>, you may supply a similarly the doc/src/sgml/ref/delete.sgml the link reference also broken. -------------------------------------------------------- <para> If the table has a range column or <link linkend="ddl-periods-application-periods"> <literal>PERIOD</literal></link>, you may supply a <literal>FOR PORTION OF</literal> clause, and your update will only affect rows that overlap the given interval. Furthermore, if a row's span extends outside the <literal>FOR PORTION OF</literal> bounds, then it will be truncated to fit within the bounds, and new rows spanning the "cut off" duration will be inserted to preserve the old values. </para> "given interval", "cut off" these words, imho, feel not so clear. We also need a document that: "UPDATE FOR PORTION OF" is UPDATE and INSERT (if overlaps). If the "UPDATE FOR PORTION OF" range overlaps then It will invoke triggers in the following order: before row update, before row insert, after row insert. after row update. --------------------------------------- src/test/regress/sql/for_portion_of.sql You only need to create two triggers? since for_portion_of_trigger only raises notice to output the triggers meta info. CREATE TRIGGER trg_for_portion_of_before BEFORE INSERT OR UPDATE OR DELETE ON for_portion_of_test FOR EACH ROW EXECUTE FUNCTION for_portion_of_trigger(); CREATE TRIGGER trg_for_portion_of_after AFTER INSERT OR UPDATE OR DELETE ON for_portion_of_test FOR EACH ROW EXECUTE FUNCTION for_portion_of_trigger();
On 11/9/23 05:47, Peter Eisentraut wrote: > I went over the patch v17-0001-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch in more > detail. Attached is a fixup patch that addresses a variety of cosmetic issues. Thanks for the review! This all looks great to me, and it's applied in the attached patch (with one typo correction in a C comment). The patch addresses some of jian he's feedback too but I'll reply to those emails separately. > Two areas that could be improved: > > 1) In src/backend/commands/indexcmds.c, get_index_attr_temporal_operator() has this comment: > > + * This seems like a hack > + * but I can't find any existing lookup function > + * that knows about pseudotypes. > > This doesn't see very confident. ;-) I don't quite understand this. Is this a gap in the currently > available APIs, do we need to improve something here, or does this need more research? I've improved this a bit but I'm still concerned about part of it. First the improved part: I realized I should be calling get_opclass_opfamily_and_input_type first and passing the opcintype to get_opfamily_member, which solves the problem of having a concrete rangetype but needing an operator that targets anyrange. We do the same thing with partition keys. But I feel the overall approach is wrong: originally I used hardcoded "=" and "&&" operators, and you asked me to look them up by strategy number instead. But that leads to trouble with core gist types vs btree_gist types. The core gist opclasses use RT*StrategyNumbers, but btree_gist creates opclasses with BT*StrategyNumbers. I don't see any way to ask ahead of time which class of strategy numbers are used by a given opclass. So I have code like this: *strat = RTEqualStrategyNumber; opname = "equality"; *opid = get_opfamily_member(opfamily, opcintype, opcintype, *strat); /* * For the non-overlaps key elements, * try both RTEqualStrategyNumber and BTEqualStrategyNumber. * If you're using btree_gist then you'll need the latter. */ if (!OidIsValid(*opid)) { *strat = BTEqualStrategyNumber; *opid = get_opfamily_member(opfamily, opcintype, opcintype, *strat); } I do a similar thing for foreign keys. But that can't be right. I added a scary comment there in this patch, but I'll explain here too: It's only by luck that RTEqualStrategyNumber (18) is bigger than any BT*StrategyNumber. If I checked in the reverse order, I would always find an operator---it would just sometimes be the wrong one! And what if someone has defined a new type+opclass with totally different strategy numbers? As far as I can tell, the gist AM doesn't require an opclass have any particular operators, only support functions, so the strategy numbers are "private" and can vary between opclasses. What we want is a way to ask which operators mean equality & overlaps for a given opclass. But the strategy numbers aren't meaningful terms to ask the question. So I think asking for "=" and "&&" is actually better here. Those will be correct for both core & btree_gist, and they should also match user expectations for custom types. They are what you'd use in a roll-your-own temporal constraint via EXCLUDE. We can also document that we implement WITHOUT OVERLAPS with those operator names, so people can get the right behavior from custom types. (This also maybe lets us implement WITHOUT OVERLAPS for more than rangetypes, as you suggested. See below for more about that.) It's taken me a while to grok the am/opclass/opfamily/amop interaction, and maybe I'm still missing something here. Let me know if that's the case! > 2) In src/backend/parser/parse_utilcmd.c, transformIndexConstraint(), there is too much duplication > between the normal and the if (constraint->without_overlaps) case, like the whole not-null > constraints stuff at the end. This should be one code block with a few conditionals inside. Also, > the normal case deals with things like table inheritance, which the added parts do not. Is this all > complete? Cleaned things up here. I agree it's much better now. And you're right, now you should be able to use an inherited column in a temporal PK/UQ constraint. I think I need a lot more test coverage for how this feature combines with inherited tables, so I'll work on that. > I'm not sure the validateWithoutOverlaps() function is needed at this point in the code. Agreed, I removed it and moved the is-it-a-rangetype check into the caller. > We don't even need to > restrict this to range types. Consider for example, it's possible that a type does not have a btree > equality operator. We don't check that here either, but let the index code later check it. That is very interesting. Perhaps we allow anything with equals and overlaps then? Note that we need more for FOR PORTION OF, foreign keys, and foreign keys with CASCADE/SET. So it might be confusing if a type works with temporal PKs but not those other things. But if we documented what operators you need for each feature then you could implement as much as you liked. I like this direction a lot. It matches what I suggested in the conversation about multiple WITHOUT OVERLAPS/PERIOD columns: rather than having foreign keys and FOR PORTION OF know how to find n-dimensional "leftovers" we could leave it up the type, and just call a documented operator. (We would need to add that operator for rangetypes btw, one that calls range_leftover_internal. It should return an array (not a multirange!) of the untouched parts of the record.) This makes it easy to support bi/tri/n-temporal, spatial, multiranges, etc. (For spatial you probably want PostGIS instead, and I'm wary of over-abstracting here, but I like how this "leaves the door open" for PostGIS to eventually support spatial PKs/FKs.) Please let me know what you think! Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
Thank you for continuing to review this submission! My changes are in the v18 patch I sent a few days ago. Details below. On Sun, Oct 29, 2023 at 5:01 PM jian he <jian.universality@gmail.com> wrote: > * The attached patch makes foreign keys with PERIOD fail if any of the > foreign key columns is "generated columns". I don't see anything like that included in your attachment. I do see the restriction on `ON DELETE SET NULL/DEFAULT (columnlist)`, which I included. But you are referring to something else I take it? Why do you think FKs should fail if the referred column is GENERATED? Is that a restriction you think should apply to all FKs or only temporal ones? > * The following queries will cause segmentation fault. not sure the > best way to fix it. > . . . > CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) > REFERENCES temporal3 (id, valid_at) > ); Fixed, with additional tests re PERIOD on one side but not the other. > * change the function FindFKComparisonOperators's "eqstrategy" to > make pg_constraint record correct {conpfeqop,conppeqop,conffeqop}. This change is incorrect because it causes foreign keys to fail when created with btree_gist. See my reply to Peter for more about that. My v18 patch also includes some new (very simple) tests in the btree_gist extension so it's easier to see whether temporal PKs & FKs work there. > * fix the ON DELETE SET NULL/DEFAULT (columnlist). Now the following > queries error will be more consistent. > ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT temporal_fk_rng2rng_fk, > ADD CONSTRAINT temporal_fk_rng2rng_fk > FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng > ON DELETE SET DEFAULT(valid_at); > --ON DELETE SET NULL(valid_at); Okay, thanks! > * refactor restrict_cascading_range function. It looks like your attachment only renames the column, but I think "restrict" is more expressive and accurate than "get", so I'd like to keep the original name here. > * you did if (numfks != numpks) before if (is_temporal) {numfks += > 1;}, So I changed the code order to make the error report more > consistent. Since we do numfks +=1 and numpks +=1, I don't see any inconsistency here. Also you are making things now happen before a permissions check, which may be important (I'm not sure). Can you explain what improvement is intended here? Your changes don't seem to cause any changes in the tests, so what is the goal? Perhaps I'm misunderstanding what you mean by "more consistent." Thanks! I'll reply to your Nov 6 email separately. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Mon, Nov 6, 2023 at 11:07 PM jian he <jian.universality@gmail.com> wrote: > + </para> > + In a temporal foreign key, the delete/update will use > + <literal>FOR PORTION OF</literal> semantics to constrain the > + effect to the bounds being deleted/updated in the referenced row. > + </para> > > The first "para" should be <para> ? Thanks, fixed (in v18)! > There are many warnings after #define WRITE_READ_PARSE_PLAN_TREES > see: http://cfbot.cputube.org/highlights/all.html#4308 > Does that mean oue new change in gram.y is somehow wrong? Fixed (in read+out node funcs). > sgml/html/sql-update.html: > "range_or_period_name > The range column or period to use when performing a temporal update. > This must match the range or period used in the table's temporal > primary key." > > Is the second sentence unnecessary? since no primary can still do "for > portion of update". You're right, this dates back to an older version of the patch. Removed. > sgml/html/sql-update.html: > "start_time > The earliest time (inclusive) to change in a temporal update. This > must be a value matching the base type of the range or period from > range_or_period_name. It may also be the special value MINVALUE to > indicate an update whose beginning is unbounded." > > probably something like the following: > "lower_bound" > The lower bound (inclusive) to change in an overlap update. This must > be a value matching the base type of the range or period from > range_or_period_name. It may also be the special value UNBOUNDED to > indicate an update whose beginning is unbounded." > > Obviously the "start_time" reference also needs to change, and the > sql-delete.html reference also needs to change. See below re UNBOUNDED.... > UPDATE for_portion_of_test FOR PORTION OF valid_at FROM NULL TO > "unbounded" SET name = 'NULL to NULL'; > should fail, but not. double quoted unbounded is a column reference, I assume. > > That's why I am confused with the function transformForPortionOfBound. > "if (nodeTag(n) == T_ColumnRef)" part. You're right, using a ColumnDef was probably not good here, and treating `"UNBOUNDED"` (with quotes from the user) as a keyword is no good. I couldn't find a way to make this work without reduce/reduce conflicts, so I just took it out. It was syntactic sugar for `FROM/TO NULL` and not part of the standard, so it's not too important. Also I see that UNBOUNDED causes difficult problems already with window functions (comments in gram.y). I hope I can find a way to make this work eventually, but it can go for now. > in create_table.sgml. you also need to add WITHOUT OVERLAPS related > info into <varlistentry id="sql-createtable-parms-unique"> You're right, fixed (though Peter's patch then changed this same spot). Thanks, -- Paul ~{:-) pj@illuminatedcomputing.com
On Sun, Nov 19, 2023 at 1:24 PM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote: > > Thank you for continuing to review this submission! My changes are in > the v18 patch I sent a few days ago. Details below. > > On Sun, Oct 29, 2023 at 5:01 PM jian he <jian.universality@gmail.com> wrote: > > * The attached patch makes foreign keys with PERIOD fail if any of the > > foreign key columns is "generated columns". > > I don't see anything like that included in your attachment. I do see > the restriction on `ON DELETE SET NULL/DEFAULT (columnlist)`, which I > included. But you are referring to something else I take it? Why do > you think FKs should fail if the referred column is GENERATED? Is that > a restriction you think should apply to all FKs or only temporal ones? > I believe the following part should fail. Similar tests on src/test/regress/sql/generated.sql. line begin 347. drop table if exists gtest23a,gtest23x cascade; CREATE TABLE gtest23a (x int4range, y int4range, CONSTRAINT gtest23a_pk PRIMARY KEY (x, y WITHOUT OVERLAPS)); CREATE TABLE gtest23x (a int4range, b int4range GENERATED ALWAYS AS ('empty') STORED, FOREIGN KEY (a, PERIOD b ) REFERENCES gtest23a(x, PERIOD y) ON UPDATE CASCADE); -- should be error? ------- > > > * you did if (numfks != numpks) before if (is_temporal) {numfks += > > 1;}, So I changed the code order to make the error report more > > consistent. > > Since we do numfks +=1 and numpks +=1, I don't see any inconsistency > here. Also you are making things now happen before a permissions > check, which may be important (I'm not sure). Can you explain what > improvement is intended here? Your changes don't seem to cause any > changes in the tests, so what is the goal? Perhaps I'm > misunderstanding what you mean by "more consistent." > begin; drop table if exists fk, pk cascade; CREATE TABLE pk (id int4range, valid_at int4range, CONSTRAINT pk_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) ); CREATE TABLE fk ( id int4range,valid_at tsrange, parent_id int4range, CONSTRAINT fk FOREIGN KEY (parent_id, valid_at) REFERENCES pk ); rollback; -- the above query will return an error: number of referencing and referenced columns for foreign key disagree. but if you look at it closely, primary key and foreign key columns both are two! The error should be saying valid_at should be specified with "PERIOD". begin; drop table if exists fk, pk cascade; CREATE TABLE pk (id int4range, valid_at int4range, CONSTRAINT pk_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) ); CREATE TABLE fk ( id int4range,valid_at int4range, parent_id int4range, CONSTRAINT fk FOREIGN KEY (parent_id, period valid_at) REFERENCES pk ); select conname,array_length(conkey,1),array_length(confkey,1) from pg_constraint where conname = 'fk'; rollback; ------------ I found out other issues in v18. I first do `git apply` then `git diff --check`, there is a white space error in v18-0005. You also need to change update.sgml and delete.sgml <title>Outputs</title> part. Since at most, it can return 'UPDATE 3' or 'DELETE 3'. --the following query should work? drop table pk; CREATE table pk(a numrange PRIMARY key,b text); insert into pk values('[1,10]'); create or replace function demo1() returns void as $$ declare lb numeric default 1; up numeric default 3; begin update pk for portion of a from lb to up set b = 'lb_to_up'; return; end $$ language plpgsql; select * from demo1();
On 17.11.23 19:39, Paul Jungwirth wrote: > But I feel the overall approach is wrong: originally I used hardcoded > "=" and "&&" operators, and you asked me to look them up by strategy > number instead. But that leads to trouble with core gist types vs > btree_gist types. The core gist opclasses use RT*StrategyNumbers, but > btree_gist creates opclasses with BT*StrategyNumbers. Ouch. That also provides the answer to my question #2 here: https://www.postgresql.org/message-id/6f010a6e-8e20-658b-dc05-dc9033a694da%40eisentraut.org I don't have a good idea about this right now. Could we just change btree_gist perhaps? Do we need a new API for this somewhere?
On 20.11.23 08:58, Peter Eisentraut wrote: > On 17.11.23 19:39, Paul Jungwirth wrote: >> But I feel the overall approach is wrong: originally I used hardcoded >> "=" and "&&" operators, and you asked me to look them up by strategy >> number instead. But that leads to trouble with core gist types vs >> btree_gist types. The core gist opclasses use RT*StrategyNumbers, but >> btree_gist creates opclasses with BT*StrategyNumbers. > > Ouch. > > That also provides the answer to my question #2 here: > https://www.postgresql.org/message-id/6f010a6e-8e20-658b-dc05-dc9033a694da%40eisentraut.org > > I don't have a good idea about this right now. Could we just change > btree_gist perhaps? Do we need a new API for this somewhere? After further thought, I think the right solution is to change btree_gist (and probably also btree_gin) to use the common RT* strategy numbers. The strategy numbers are the right interface to determine the semantics of index AM operators. It's just that until now, nothing external has needed this information from gist indexes (unlike btree, hash), so it has been a free-for-all. I don't see an ALTER OPERATOR CLASS command that could be used to implement this. Maybe we could get away with a direct catalog UPDATE. Or we need to make some DDL for this. Alternatively, this could be the time to reconsider moving this into core.
Thank you again for such thorough reviews! On Thu, Nov 16, 2023 at 11:12 PM jian he <jian.universality@gmail.com> wrote: > UPDATE FOR PORTION OF, may need insert privilege. We also need to document this. > Similarly, we also need to apply the above logic to DELETE FOR PORTION OF. I don't think UPDATE/DELETE FOR PORTION OF is supposed to require INSERT permission. Notionally the INSERTs are just to preserve what was there already, not to add new data. The idea is that a temporal table is equivalent to a table with one row for every "instant", i.e. one row per microsecond/second/day/whatever-time-resolution. Of course that would be too slow, so we use PERIODs/ranges instead, but the behavior should be the same. Date's book has a good discussion of this idea. I also checked the SQL:2011 draft standard, and there is a section called Access Rules in Part 2: SQL/Foundation for UPDATE and DELETE statements. Those sections say you need UPDATE/DELETE privileges, but say nothing about needing INSERT privileges. That is on page 949 and 972 of the PDFs from the "SQL:20nn Working Draft Documents" link at [1]. If someone has a copy of SQL:2016 maybe something was changed, but I would be surprised. I also checked MariaDB and IBM DB2, the only two RDBMSes that implement FOR PORTION OF to my knowledge. (It is not in Oracle or MSSQL.) I created a table with one row, then gave another user privileges to SELECT & UPDATE, but not INSERT. In both cases, that user could execute an UPDATE FOR PORTION OF that resulted in new rows, but could not INSERT genuinely new rows. [2,3] So instead of changing this I've updated the documentation to make it explicit that you do not need INSERT privilege to use FOR PORTION OF. I also documented which triggers will fire and in which order. > + <para> > + If the table has a <link > linkend="ddl-periods-application-periods">range column > + or <literal>PERIOD</literal></link>, you may supply a > > should be > > + <para> > + If the table has a range column or <link > linkend="ddl-periods-application-periods"> > + <literal>PERIOD</literal></link>, you may supply a > > similarly the doc/src/sgml/ref/delete.sgml the link reference also broken. Okay, changed. > "given interval", "cut off" these words, imho, feel not so clear. > We also need a document that: > "UPDATE FOR PORTION OF" is UPDATE and INSERT (if overlaps). > If the "UPDATE FOR PORTION OF" range overlaps then > It will invoke triggers in the following order: before row update, > before row insert, after row insert. after row update. Okay, reworked the docs for this. > src/test/regress/sql/for_portion_of.sql > You only need to create two triggers? > since for_portion_of_trigger only raises notice to output the triggers > meta info. Changed. v19 patch series attached, rebased to a11c9c42ea. [1] https://web.archive.org/web/20230923221106/https://www.wiscorp.com/SQLStandards.html [2] MariaDB test: First create a table as the root user: ``` create table t (id int, ds date, de date, name text, period for valid_at (ds, de)); insert into t values (1, '2000-01-01', '2001-01-01', 'foo'); ``` and give another user select & update privlege (but not insert): ``` create database paul; use paul; create user 'update_only'@'localhost' identified by 'test'; grant select, update on paul.t to 'update_only'@'localhost'; flush privileges; ``` Now as that user: ``` mysql -uupdate_only -p use paul; -- We can update the whole record: update t for portion of valid_at from '2000-01-01' to '2001-01-01' set name = 'bar'; -- We can update a part of the record: update t for portion of valid_at from '2000-01-01' to '2000-07-01' set name = 'baz'; select * from t; +------+------------+------------+------+ | id | ds | de | name | +------+------------+------------+------+ | 1 | 2000-01-01 | 2000-07-01 | baz | | 1 | 2000-07-01 | 2001-01-01 | bar | +------+------------+------------+------+ -- We cannot insert: insert into t values (2, '2000-01-01', '2001-01-01' 'another'); ERROR 1142 (42000): INSERT command denied to user 'update_only'@'localhost' for table `paul`.`t` ``` [3] IBM DB2 test: ``` mkdir ~/local/db2 cd ~/local/db2 tar xzvf ~/Downloads/v11.5.9_linuxx64_server_dec.tar.gz cd server_dev ./db2_install # should put something at ~/sqllib source ~/sqllib/db2profile db2start # but I got "The database manager is already active." db2 create database paul -- first time only, note no semicolon connect to paul create table t (id integer, ds date not null, de date not null, name varchar(4000), period business_time (ds, de)); insert into t values (1, '2000-01-01', '2001-01-01', 'foo'); grant connect on database to user james; grant select, update on t to user james; ``` Now as james: ``` source ~paul/sqllib/db2profile db2 connect to paul select * from paul.t; update paul.t for portion of business_time from '2000-01-01' to '2000-06-01' set name = 'bar'; DB20000I The SQL command completed successfully. select * from paul.t; insert into paul.t values (2, '2000-01-01', '2001-01-01', 'bar'); DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL0551N The statement failed because the authorization ID does not have the required authorization or privilege to perform the operation. Authorization ID: "JAMES". Operation: "INSERT". Object: "PAUL.T". SQLSTATE=42501 ``` Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On Thu, Nov 23, 2023 at 1:08 AM Peter Eisentraut <peter@eisentraut.org> wrote: > After further thought, I think the right solution is to change > btree_gist (and probably also btree_gin) to use the common RT* strategy > numbers. Okay. That will mean bumping the version of btree_gist, and you must be running that version to use the new temporal features, or you will get silly results. Right? Is there a way to protect users against that and communicate they need to upgrade the extension? This also means temporal features may not work in custom GIST opclasses. What we're saying is they must have an appropriate operator for RTEqualStrategyNumber (18) and RTOverlapStrategyNumber (3). Equal matters for the scalar key part(s), overlap for the range part. So equal is more likely to be an issue, but overlap matters if we want to support non-ranges (which I'd say is worth doing). Also if they get it wrong, we won't really have any way to report an error. I did some research on other extensions in contrib, as well as PostGIS. Here is what I found: ## btree_gin: 3 is = 18 is undefined same for all types: macaddr8, int2, int4, int8, float4, float8, oid, timestamp, timestamptz, time, timetz, date, interval, inet, cidr, text, varchar, char, bytea, bit, varbit, numeric, anyenum, uuid, name, bool, bpchar ## cube 3 is && 18 is <=> ## intarray 3 is && 18 is undefined ## ltree 3 is = 18 is undefined ## hstore 3 and 18 are undefined ## seg 3 is && 18 is undefined ## postgis: geometry 3 is && 18 is undefined ## postgis: geometry_nd 3 is &&& 18 is undefined I thought about looking through pgxn for more, but I haven't yet. I may still do that. But already it seems like there is not much consistency. So what do you think of this idea instead?: We could add a new (optional) support function to GiST that translates "well-known" strategy numbers into the opclass's own strategy numbers. This would be support function 12. Then we can say translateStrategyNumber(RTEqualStrategyNumber) and look up the operator with the result. There is not a performance hit, because we do this for the DDL command (create pk/uq/fk), then store the operator in the index/constraint. If you don't provide this new support function, then creating the pk/uq/fk fails with a hint about what you can do to make it work. This approach means we don't change the rules about GiST opclasses: you can still use the stranums how you like. This function would also let me support non-range "temporal" foreign keys, where I'll need to build queries with && and maybe other operators. What do you think? -- Paul ~{:-) pj@illuminatedcomputing.com
On 12/2/23 19:11, Paul Jungwirth wrote: > Thank you again for such thorough reviews! > > On Thu, Nov 16, 2023 at 11:12 PM jian he <jian.universality@gmail.com> > wrote: > > UPDATE FOR PORTION OF, may need insert privilege. We also need to > document this. > > Similarly, we also need to apply the above logic to DELETE FOR > PORTION OF. > > I don't think UPDATE/DELETE FOR PORTION OF is supposed to require INSERT > permission. > > Notionally the INSERTs are just to preserve what was there already, not > to add new data. > The idea is that a temporal table is equivalent to a table with one row > for every "instant", > i.e. one row per microsecond/second/day/whatever-time-resolution. Of > course that would be too slow, > so we use PERIODs/ranges instead, but the behavior should be the same. > Date's book has a good discussion of this idea. > > I also checked the SQL:2011 draft standard, and there is a section > called Access Rules in Part 2: SQL/Foundation for UPDATE and DELETE > statements. Those sections say you need UPDATE/DELETE privileges, but > say nothing about needing INSERT privileges. That is on page 949 and 972 > of the PDFs from the "SQL:20nn Working Draft Documents" link at [1]. If > someone has a copy of SQL:2016 maybe something was changed, but I would > be surprised Nothing has changed here in SQL:2023 (or since). -- Vik Fearing
On 02.12.23 19:41, Paul Jungwirth wrote: > So what do you think of this idea instead?: > > We could add a new (optional) support function to GiST that translates > "well-known" strategy numbers into the opclass's own strategy numbers. > This would be support function 12. Then we can say > translateStrategyNumber(RTEqualStrategyNumber) and look up the operator > with the result. > > There is not a performance hit, because we do this for the DDL command > (create pk/uq/fk), then store the operator in the index/constraint. > > If you don't provide this new support function, then creating the > pk/uq/fk fails with a hint about what you can do to make it work. > > This approach means we don't change the rules about GiST opclasses: you > can still use the stranums how you like. > > This function would also let me support non-range "temporal" foreign > keys, where I'll need to build queries with && and maybe other operators. I had some conversations about this behind the scenes. I think this idea makes sense. The other idea was that we create new strategy numbers, like TemporalEqualsStrategy / TemporalOverlapsStrategy. But then you'd have the situation where some strategy numbers are reserved and others are not, so perhaps that is not so clean. I think your idea is good.
On Sun, Dec 3, 2023 at 2:11 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > v19 patch series attached, rebased to a11c9c42ea. > this TODO: * TODO: It sounds like FOR PORTION OF might need to do something here too? based on comments on ExprContext. I refactor a bit, and solved this TODO. tring to the following TODO: // TODO: Need to save context->mtstate->mt_transition_capture? (See comment on ExecInsert) but failed. I also attached the trial, and also added the related test. You can also use the test to check portion update with insert trigger with "referencing old table as old_table new table as new_table" situation.
Attachment
hi. some small issues.... diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c index e3ccf6c7f7..6781e55020 100644 --- a/src/backend/tcop/utility.c +++ b/src/backend/tcop/utility.c @@ -1560,7 +1560,7 @@ ProcessUtilitySlow(ParseState *pstate, true, /* check_rights */ true, /* check_not_in_use */ false, /* skip_build */ - false); /* quiet */ + false); /* quiet */ Is the above part unnecessary? diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 199eae525d..d04c75b398 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -78,6 +78,7 @@ OBJS = \ oracle_compat.o \ orderedsetaggs.o \ partitionfuncs.o \ + period.o \ pg_locale.o \ pg_lsn.o \ pg_upgrade_support.o \ diff --git a/src/backend/utils/adt/period.c b/src/backend/utils/adt/period.c new file mode 100644 index 0000000000..0ed4304e16 --- /dev/null +++ b/src/backend/utils/adt/period.c @@ -0,0 +1,56 @@ +/*------------------------------------------------------------------------- + * + * period.c + * Functions to support periods. + * + * + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * src/backend/utils/adt/period.c + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include "executor/tuptable.h" +#include "fmgr.h" +#include "nodes/primnodes.h" +#include "utils/fmgrprotos.h" +#include "utils/period.h" +#include "utils/rangetypes.h" + +Datum period_to_range(TupleTableSlot *slot, int startattno, int endattno, Oid rangetype) +{ + Datum startvalue; + Datum endvalue; + Datum result; + bool startisnull; + bool endisnull; + LOCAL_FCINFO(fcinfo, 2); + FmgrInfo flinfo; + FuncExpr *f; + + InitFunctionCallInfoData(*fcinfo, &flinfo, 2, InvalidOid, NULL, NULL); + f = makeNode(FuncExpr); + f->funcresulttype = rangetype; + flinfo.fn_expr = (Node *) f; + flinfo.fn_extra = NULL; + + /* compute oldvalue */ + startvalue = slot_getattr(slot, startattno, &startisnull); + endvalue = slot_getattr(slot, endattno, &endisnull); + + fcinfo->args[0].value = startvalue; + fcinfo->args[0].isnull = startisnull; + fcinfo->args[1].value = endvalue; + fcinfo->args[1].isnull = endisnull; + + result = range_constructor2(fcinfo); + if (fcinfo->isnull) + elog(ERROR, "function %u returned NULL", flinfo.fn_oid); + + return result; +} I am confused. so now I only apply v19, 0001 to 0003. period_to_range function never used. maybe we can move this part to 0005-Add PERIODs.patch? Also you add change in Makefile in 0003, meson.build change in 0005, better put it on in 0005? diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 5b110ca7fe..d54d84adf6 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y +/* + * We need to handle this shift/reduce conflict: + * FOR PORTION OF valid_at FROM INTERVAL YEAR TO MONTH TO foo. + * This is basically the classic "dangling else" problem, and we want a + * similar resolution: treat the TO as part of the INTERVAL, not as part of + * the FROM ... TO .... Users can add parentheses if that's a problem. + * TO just needs to be higher precedence than YEAR_P etc. + * TODO: I need to figure out a %prec solution before this gets committed! + */ +%nonassoc YEAR_P MONTH_P DAY_P HOUR_P MINUTE_P +%nonassoc TO this part will never happen? since "FROM INTERVAL YEAR TO MONTH TO" means "valid_at" will be interval range data type, which does not exist now. ri_PerformCheck(riinfo, &qkey, qplan, fk_rel, pk_rel, oldslot, NULL, + targetRangeParam, targetRange, true, /* must detect new rows */ SPI_OK_SELECT); @@ -905,6 +922,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS) ri_PerformCheck(riinfo, &qkey, qplan, fk_rel, pk_rel, oldslot, NULL, + -1, 0, true, /* must detect new rows */ SPI_OK_DELETE); @@ -1026,6 +1044,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS) ri_PerformCheck(riinfo, &qkey, qplan, fk_rel, pk_rel, oldslot, newslot, + -1, 0, true, /* must detect new rows */ SPI_OK_UPDATE); @@ -1258,6 +1277,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind) ri_PerformCheck(riinfo, &qkey, qplan, fk_rel, pk_rel, oldslot, NULL, + -1, 0, true, /* must detect new rows */ SPI_OK_UPDATE); @@ -2520,6 +2540,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo, RI_QueryKey *qkey, SPIPlanPtr qplan, Relation fk_rel, Relation pk_rel, TupleTableSlot *oldslot, TupleTableSlot *newslot, + int forPortionOfParam, Datum forPortionOf, bool detectNewRows, int expect_OK) for all the refactor related to ri_PerformCheck, do you need (Datum) 0 instead of plain 0? + <para> + If the table has a range column or + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>, + you may supply a <literal>FOR PORTION OF</literal> clause, and your delete will + only affect rows that overlap the given interval. Furthermore, if a row's span https://influentialpoints.com/Training/basic_statistics_ranges.htm#:~:text=A%20range%20is%20two%20numbers,or%20the%20difference%20between%20them So "range" is more accurate than "interval"? +/* ---------- + * ForPortionOfState() + * + * Copies a ForPortionOfState into the current memory context. + */ +static ForPortionOfState * +CopyForPortionOfState(ForPortionOfState *src) +{ + ForPortionOfState *dst = NULL; + if (src) { + MemoryContext oldctx; + RangeType *r; + TypeCacheEntry *typcache; + + /* + * Need to lift the FOR PORTION OF details into a higher memory context + * because cascading foreign key update/deletes can cause triggers to fire + * triggers, and the AfterTriggerEvents will outlive the FPO + * details of the original query. + */ + oldctx = MemoryContextSwitchTo(TopTransactionContext); should it be "Copy a ForPortionOfState into the TopTransactionContext"?
On Wed, Dec 6, 2023 at 12:59 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 02.12.23 19:41, Paul Jungwirth wrote: > > So what do you think of this idea instead?: > > > > We could add a new (optional) support function to GiST that translates > > "well-known" strategy numbers into the opclass's own strategy numbers. > > I had some conversations about this behind the scenes. I think this > idea makes sense. Here is a patch series with the GiST stratnum support function added. I put this into a separate patch (before all the temporal ones), so it's easier to review. Then in the PK patch (now #2) we call that function to figure out the = and && operators. I think this is a big improvement. I provide a couple "example" implementations: - An identity function that returns whatever you give it. The core gist opclasses use this since they use the RT* constants. Even though not all opclasses support all strategies, it is okay to return a stratnum with no amop entry. You will just get an error when you try to make a temporal PK with that type as the WITHOUT OVERLAPS part (which is natural for the types we're talking about). - A function that translates RT*StrategyNumbers to BT*StrategyNumbers when possible (just =/</<=/>/>=, and we really only need =). This is what the btree_gist opclasses use. (No other RT*StrategyNumber can be translated, which means you can only use these types for the non-WIHOUT OVERLAPS part, but again that is natural.) I didn't add a similar function to GIN. It's not possible to use GIN for temporal PKs, so I don't think it makes sense. ## Foreign Keys For FKs, I need a couple similar things: - The ContainedBy operator (<@ for rangetypes). - An aggregate function to combine referenced rows (instead of hardcoding range_agg as before). I look up ContainedBy just as I'm doing with Equal & Overlap for PKs. The aggregate function is another optional support function. I broke out that support function into another independent patch here. Then I updated by FKs patch to use it (and the ContainedBy operator). ## FOR PORTION OF Then for FOR PORTION OF I need an intersect operator (*) and a new "leftovers" operator. We have an intersect operator (for range & multirange at least), but no strategy number for it, thus no amop entry. My patch adds that, **but** it is neither a search operator ('s') nor ordering ('o'), so I've added a "portion" option ('p'). I'm not sure this is completely valid, since `FOR PORTION OF` is not really an *index* operation, but it does seem index-adjacent: you would only/usually use it on something with a temporal PK (which is an index). And it is an analogous situation, where pg_amop entries tell us how to implement the extensible parts. But if all this seems like the wrong approach please let me know. The new leftovers operator similarly has 'p' for amoppurpose and another amop entry. The leftovers operator returns an array of T, where T is the type of the valid_at column. Then we'll insert a new "leftovers" row for each array entry. So we aren't assuming only "before" and "after" (which doesn't work for multiranges or two-dimensional spaces as you'd have with bitemporal or spatial). But now that "leftovers" are becoming more of an external-facing part of Postgres, I feel we should have a less silly name. (That's too bad, because "leftovers" is memorable and non-ambiguous, and computer pioneers used all kinds of silly names, so if you tell me I don't have to be quite so professional, maybe I'll go back to it.) I considered things like "without" or "multi-subtract" or "except". I went with "without portion", which is nice because it evokes FOR PORTION OF and doesn't limit the scope to rangetypes. For the symbol I like `@-`. It conveys the similarity to subtraction, and "@" can be a mnemonic for "array". (Too bad we can't use `--`, ha ha.) I also thought about `@-@`, but that is used already by path_length and lseg_length, and maybe a non-commutative operator deserves a non-palindromic name. The {multi,}range_without_portion procs & operators are broken out into a separate commit, and the FPO patch now uses them in the exec node. It always made me a little uneasy to have rangetype code in nodeModifyTable.c, and now it's gone. Then the last thing I need for FOR PORTION OF is a "constructor". In SQL:2011 you use `FOR PORTION OF valid_at FROM '2000-01-01' TO '2010-01-01'`. But FROM and TO don't really work for non-range types. So I added an alternate syntax that is `FOR PORTION OF valid_at (tsmultirange(tsrange('2001-01-01', '2002-02-02'), tsrange('2003-03-03', '2004-04-04')))`. In other words parens wrapping a value of the type you're using. I still support FROM & TO for building a range type, so we follow the standard. That's it for now. Multiranges should be fully supported (but need lots more tests), as well as custom types. I've updated some of the docs, but I need to go through them and clarify where things don't necessarily have to be ranges. Rebased to cb44a8345e. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v20-0001-Add-stratnum-GiST-support-function.patch
- v20-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
- v20-0003-Add-GiST-referencedagg-support-func.patch
- v20-0004-Add-temporal-FOREIGN-KEYs.patch
- v20-0005-Add-multi-range_without_portion-proc-operator.patch
- v20-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v20-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v20-0008-Add-PERIODs.patch
On 12/31/23 00:51, Paul Jungwirth wrote: > That's it for now. Here is another update. I fixed FOR PORTION OF on partitioned tables, in particular when the attnums are different from the root partition. Rebased to cea89c93a1. -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v21-0001-Add-stratnum-GiST-support-function.patch
- v21-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
- v21-0003-Add-GiST-referencedagg-support-func.patch
- v21-0004-Add-temporal-FOREIGN-KEYs.patch
- v21-0005-Add-multi-range_without_portion-proc-operator.patch
- v21-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v21-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v21-0008-Add-PERIODs.patch
On Tue, Jan 2, 2024 at 9:59 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 12/31/23 00:51, Paul Jungwirth wrote: > > That's it for now. > > Here is another update. I fixed FOR PORTION OF on partitioned tables, in particular when the attnums > are different from the root partition. > > Rebased to cea89c93a1. > Hi. +/* + * range_without_portion_internal - Sets outputs and outputn to the ranges + * remaining and their count (respectively) after subtracting r2 from r1. + * The array should never contain empty ranges. + * The outputs will be ordered. We expect that outputs is an array of + * RangeType pointers, already allocated with two slots. + */ +void +range_without_portion_internal(TypeCacheEntry *typcache, RangeType *r1, + RangeType *r2, RangeType **outputs, int *outputn) +{ + int cmp_l1l2, + cmp_l1u2, + cmp_u1l2, + cmp_u1u2; + RangeBound lower1, + lower2; + RangeBound upper1, + upper2; + bool empty1, + empty2; + + range_deserialize(typcache, r1, &lower1, &upper1, &empty1); + range_deserialize(typcache, r2, &lower2, &upper2, &empty2); + + if (empty1) + { + /* if r1 is empty then r1 - r2 is empty, so return zero results */ + *outputn = 0; + return; + } + else if (empty2) + { + /* r2 is empty so the result is just r1 (which we know is not empty) */ + outputs[0] = r1; + *outputn = 1; + return; + } + + /* + * Use the same logic as range_minus_internal, + * but support the split case + */ + cmp_l1l2 = range_cmp_bounds(typcache, &lower1, &lower2); + cmp_l1u2 = range_cmp_bounds(typcache, &lower1, &upper2); + cmp_u1l2 = range_cmp_bounds(typcache, &upper1, &lower2); + cmp_u1u2 = range_cmp_bounds(typcache, &upper1, &upper2); + + if (cmp_l1l2 < 0 && cmp_u1u2 > 0) + { + lower2.inclusive = !lower2.inclusive; + lower2.lower = false; /* it will become the upper bound */ + outputs[0] = make_range(typcache, &lower1, &lower2, false, NULL); + + upper2.inclusive = !upper2.inclusive; + upper2.lower = true; /* it will become the lower bound */ + outputs[1] = make_range(typcache, &upper2, &upper1, false, NULL); + + *outputn = 2; + } + else if (cmp_l1u2 > 0 || cmp_u1l2 < 0) + { + outputs[0] = r1; + *outputn = 1; + } + else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0) + { + *outputn = 0; + } + else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0) + { + lower2.inclusive = !lower2.inclusive; + lower2.lower = false; /* it will become the upper bound */ + outputs[0] = make_range(typcache, &lower1, &lower2, false, NULL); + *outputn = 1; + } + else if (cmp_l1l2 >= 0 && cmp_u1u2 >= 0 && cmp_l1u2 <= 0) + { + upper2.inclusive = !upper2.inclusive; + upper2.lower = true; /* it will become the lower bound */ + outputs[0] = make_range(typcache, &upper2, &upper1, false, NULL); + *outputn = 1; + } + else + { + elog(ERROR, "unexpected case in range_without_portion"); + } +} I am confused. say condition: " (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)" the following code will only run PartA, never run PartB? ` else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0) PartA else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0) PartB ` minimum example: #include<stdio.h> #include<string.h> #include<stdlib.h> #include<assert.h> int main(void) { int cmp_l1l2; int cmp_u1u2; int cmp_u1l2; int cmp_l1u2; cmp_l1u2 = -1; cmp_l1l2 = 0; cmp_u1u2 = 0; cmp_u1l2 = 0; assert(cmp_u1l2 == 0); if (cmp_l1u2 > 0 || cmp_u1l2 < 0) printf("calling partA\n"); else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0) printf("calling partB\n"); else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0) printf("calling partC\n"); } I am confused with the name "range_without_portion", I think "range_not_overlap" would be better. select numrange(1.1, 2.2) @- numrange(2.0, 3.0); the result is not the same as select numrange(2.0, 3.0) @- numrange(1.1, 2.2); So your categorize oprkind as 'b' for operator "@-" is wrong? select oprname,oprkind,oprcanhash,oprcanmerge,oprleft,oprright,oprresult,oprcode from pg_operator where oprname = '@-'; aslo select count(*), oprkind from pg_operator group by oprkind; there are only 5% are prefix operators. maybe we should design it as: 1. if both inputs are empty range, the result array is empty. 2. if both inputs are non-empty and never overlaps, put both of them to the result array. 3. if one input is empty another one is not, then put the non-empty one into the result array. after applying the patch: now the catalog data seems not correct to me. SELECT a1.amopfamily ,a1.amoplefttype::regtype ,a1.amoprighttype ,a1.amopstrategy ,amoppurpose ,amopsortfamily ,amopopr ,op.oprname ,am.amname FROM pg_amop as a1 join pg_operator op on op.oid = a1.amopopr join pg_am am on am.oid = a1.amopmethod where amoppurpose = 'p'; output: amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose | amopsortfamily | amopopr | oprname | amname ------------+---------------+---------------+--------------+-------------+----------------+---------+---------+-------- 2593 | box | 603 | 31 | p | 0 | 803 | # | gist 3919 | anyrange | 3831 | 31 | p | 0 | 3900 | * | gist 6158 | anymultirange | 4537 | 31 | p | 0 | 4394 | * | gist 3919 | anyrange | 3831 | 32 | p | 0 | 8747 | @- | gist 6158 | anymultirange | 4537 | 32 | p | 0 | 8407 | @- | gist (5 rows) select oprcode, oprname, oprleft::regtype from pg_operator opr where opr.oprname in ('#','*','@-') and oprleft = oprright and oprleft in (603,3831,4537); output: oprcode | oprname | oprleft ----------------------------+---------+--------------- box_intersect | # | box range_intersect | * | anyrange multirange_intersect | * | anymultirange range_without_portion | @- | anyrange multirange_without_portion | @- | anymultirange (5 rows) should amoppurpose = 'p' is true apply to ' @-' operator? catalog-pg-amop.html: ` amopsortfamily oid (references pg_opfamily.oid): The B-tree operator family this entry sorts according to, if an ordering operator; zero if a search operator ` you should also update the above entry, the amopsortfamily is also zero for "portion operator" for the newly implemented "portion operator". v21-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch create mode 100644 src/backend/utils/adt/period.c create mode 100644 src/include/utils/period.h you should put these two files to v21-0008-Add-PERIODs.patch. it's not related to that patch, it also makes people easy to review.
Getting caught up on reviews from November and December: On 11/19/23 22:57, jian he wrote: > > I believe the following part should fail. Similar tests on > src/test/regress/sql/generated.sql. line begin 347. > > drop table if exists gtest23a,gtest23x cascade; > CREATE TABLE gtest23a (x int4range, y int4range, > CONSTRAINT gtest23a_pk PRIMARY KEY (x, y WITHOUT OVERLAPS)); > CREATE TABLE gtest23x (a int4range, b int4range GENERATED ALWAYS AS > ('empty') STORED, > FOREIGN KEY (a, PERIOD b ) REFERENCES gtest23a(x, PERIOD y) ON UPDATE > CASCADE); -- should be error? Okay, I've added a restriction for temporal FKs too. But note this will change once the PERIODs patch (the last one here) is finished. When the generated column is for a PERIOD, there will be logic to "reroute" the updates to the constituent start/end columns instead. > begin; > drop table if exists fk, pk cascade; > CREATE TABLE pk (id int4range, valid_at int4range, > CONSTRAINT pk_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) > ); > CREATE TABLE fk ( > id int4range,valid_at tsrange, parent_id int4range, > CONSTRAINT fk FOREIGN KEY (parent_id, valid_at) > REFERENCES pk > ); > rollback; > -- > the above query will return an error: number of referencing and > referenced columns for foreign key disagree. > but if you look at it closely, primary key and foreign key columns both are two! > The error should be saying valid_at should be specified with "PERIOD". Ah okay, thanks for the clarification! This is tricky because the user left out the PERIOD on the fk side, and left out the entire pk side, so those columns are just implicit. So there is no PERIOD anywhere. But I agree that if the pk has WITHOUT OVERLAPS, we should expect a corresponding PERIOD modifier on the fk side and explain that that's what's missing. The attached patches include that. > I found out other issues in v18. > I first do `git apply` then `git diff --check`, there is a white > space error in v18-0005. Fixed, thanks! > You also need to change update.sgml and delete.sgml <title>Outputs</title> part. > Since at most, it can return 'UPDATE 3' or 'DELETE 3'. This doesn't sound correct to me. An UPDATE or DELETE can target many rows. Also I don't think the inserted "leftovers" should be included in these counts. They represent the rows updated/deleted. > --the following query should work? > drop table pk; > CREATE table pk(a numrange PRIMARY key,b text); > insert into pk values('[1,10]'); > create or replace function demo1() returns void as $$ > declare lb numeric default 1; up numeric default 3; > begin > update pk for portion of a from lb to up set b = 'lb_to_up'; > return; > end > $$ language plpgsql; > select * from demo1(); Hmm this is a tough one. It is correct that the `FROM __ TO __` values cannot be column references. They are computed up front, not per row. One reason is they are used to search the table. In fact the standard basically allows nothing but literal strings here. See section 14.14, page 971 then look up <point in time> on page 348 and <datetime value expression> on page 308. The most flexibility you get is you can add/subtract an interval to the datetime literal. We are already well past that by allowing expressions, (certain) functions, parameters, etc. OTOH in your plpgsql example they are not really columns. They just get represented as ColumnRefs and then passed to transformColumnRef. I'm surprised plpgsql does it that way. As a workaround you could use `EXECUTE format(...)`, but I'd love to make that work as you show instead. I'll keep working on this one but it's not done yet. Perhaps I can move the restriction into analysis/planning. If anyone has any advice it'd be welcome. On 12/6/23 05:22, jian he wrote: > this TODO: > * TODO: It sounds like FOR PORTION OF might need to do something here too? > based on comments on ExprContext. I refactor a bit, and solved this TODO. The patch looks wrong to me. We need to range targeted by `FROM __ TO __` to live for the whole statement, not just one tuple (see just above). That's why it gets computed in the Init function node. I don't think that TODO is needed anymore at all. Older versions of the patch had more expressions besides this one, and I think it was those I was concerned about. So I've removed the comment here. > tring to the following TODO: > // TODO: Need to save context->mtstate->mt_transition_capture? (See > comment on ExecInsert) > > but failed. > I also attached the trial, and also added the related test. > > You can also use the test to check portion update with insert trigger > with "referencing old table as old_table new table as new_table" > situation. Thank you for the test case! This is very helpful. So the problem is `referencing new table as new_table` gets lost. I don't have a fix yet but I'll work on it. On 12/11/23 00:31, jian he wrote: > - false); /* quiet */ > + false); /* quiet */ > > Is the above part unnecessary? Good catch! Fixed. > I am confused. so now I only apply v19, 0001 to 0003. > period_to_range function never used. maybe we can move this part to > 0005-Add PERIODs.patch? > Also you add change in Makefile in 0003, meson.build change in 0005, > better put it on in 0005? You're right, those changes should have been in the PERIODs patch. Moved. > +/* > + * We need to handle this shift/reduce conflict: > + * FOR PORTION OF valid_at FROM INTERVAL YEAR TO MONTH TO foo. > + * This is basically the classic "dangling else" problem, and we want a > + * similar resolution: treat the TO as part of the INTERVAL, not as part of > + * the FROM ... TO .... Users can add parentheses if that's a problem. > + * TO just needs to be higher precedence than YEAR_P etc. > + * TODO: I need to figure out a %prec solution before this gets committed! > + */ > +%nonassoc YEAR_P MONTH_P DAY_P HOUR_P MINUTE_P > +%nonassoc TO > > this part will never happen? > since "FROM INTERVAL YEAR TO MONTH TO" > means "valid_at" will be interval range data type, which does not exist now. It appears still needed to me. Without those lines I get 4 shift/reduce conflicts. Are you seeing something different? Or if you have a better solution I'd love to add it. I definitely need to fix this before that patch gets applied. > for all the refactor related to ri_PerformCheck, do you need (Datum) 0 > instead of plain 0? Casts added. > + <para> > + If the table has a range column or > + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>, > + you may supply a <literal>FOR PORTION OF</literal> clause, and > your delete will > + only affect rows that overlap the given interval. Furthermore, if > a row's span > > https://influentialpoints.com/Training/basic_statistics_ranges.htm#:~:text=A%20range%20is%20two%20numbers,or%20the%20difference%20between%20them > So "range" is more accurate than "interval"? I don't think we should be using R to define the terms "range" and "interval", which both already have meanings in Postgres, SQL, and the literature for temporal databases. But I'm planning to revise the docs' terminology here anyway. Some temporal database texts use "interval" in this sense, and I thought it was a decent term to mean "range or PERIOD". But now we need something to mean "range or multirange or custom type or PERIOD". Actually "portion" seems like maybe the best term, since the SQL syntax `FOR PORTION OF` reinforces that term. If you have suggestions I'm happy for ideas. > +/* ---------- > + * ForPortionOfState() > + * > + * Copies a ForPortionOfState into the current memory context. > + */ > +static ForPortionOfState * > +CopyForPortionOfState(ForPortionOfState *src) > +{ > + ForPortionOfState *dst = NULL; > + if (src) { > + MemoryContext oldctx; > + RangeType *r; > + TypeCacheEntry *typcache; > + > + /* > + * Need to lift the FOR PORTION OF details into a higher memory context > + * because cascading foreign key update/deletes can cause triggers to fire > + * triggers, and the AfterTriggerEvents will outlive the FPO > + * details of the original query. > + */ > + oldctx = MemoryContextSwitchTo(TopTransactionContext); > > should it be "Copy a ForPortionOfState into the TopTransactionContext"? You're right, the other function comments here use imperative mood. Changed. New patches attached, rebased to 43b46aae12. I'll work on your feedback from Jan 4 next. Thanks! -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v22-0001-Add-stratnum-GiST-support-function.patch
- v22-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
- v22-0003-Add-GiST-referencedagg-support-func.patch
- v22-0004-Add-temporal-FOREIGN-KEYs.patch
- v22-0005-Add-multi-range_without_portion-proc-operator.patch
- v22-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v22-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v22-0008-Add-PERIODs.patch
On Fri, Jan 5, 2024 at 1:06 PM jian he <jian.universality@gmail.com> wrote: > > On Tue, Jan 2, 2024 at 9:59 AM Paul Jungwirth > <pj@illuminatedcomputing.com> wrote: > > > > On 12/31/23 00:51, Paul Jungwirth wrote: > > > That's it for now. > > > > Here is another update. I fixed FOR PORTION OF on partitioned tables, in particular when the attnums > > are different from the root partition. > > > > Rebased to cea89c93a1. > > > > Hi. > > +/* > + * range_without_portion_internal - Sets outputs and outputn to the ranges > + * remaining and their count (respectively) after subtracting r2 from r1. > + * The array should never contain empty ranges. > + * The outputs will be ordered. We expect that outputs is an array of > + * RangeType pointers, already allocated with two slots. > + */ > +void > +range_without_portion_internal(TypeCacheEntry *typcache, RangeType *r1, > + RangeType *r2, RangeType **outputs, int *outputn) > I am confused with the name "range_without_portion", I think > "range_not_overlap" would be better. > range_intersect returns the intersection of two ranges. I think here we are doing the opposite. names the main SQL function "range_not_intersect" and the internal function as "range_not_intersect_internal" should be fine. so people don't need to understand the meaning of "portion".
On 1/8/24 06:54, jian he wrote: > On Fri, Jan 5, 2024 at 1:06 PM jian he <jian.universality@gmail.com> wrote: > > range_intersect returns the intersection of two ranges. > I think here we are doing the opposite. > names the main SQL function "range_not_intersect" and the internal > function as "range_not_intersect_internal" should be fine. > so people don't need to understand the meaning of "portion". Thank you for helping me figure out a name here! I realize that can be a bike-sheddy kind of discussion, so let me share some of my principles. Range and multirange are highly mathematically "pure", and that's something I value in them. It makes them more general-purpose, less encumbered by edge cases, easier to combine, and easier to reason about. Preserving that close connection to math is a big goal. What I've called `without_portion` is (like) a closed form of minus (hence `@-` for the operator). Minus isn't closed under everything (e.g. ranges), so `without_portion` adds arrays---much as to close subtraction we add negative numbers and to close division we add rationals). We get the same effect from multiranges, but that only buys us range support. It would be awesome to support arbitrary types: ranges, multiranges, mdranges, boxes, polygons, inets, etc., so I think an array is the way to go here. And then each array element is a "leftover". What do we call a closed form of minus that returns arrays? Using "not" suggests a function that returns true/false, but `@-` returns an array of things. So instead of "not" let's consider "complement". I think that's what you're expressing re intersection. But `@-` is not the same as the complement of intersection. For one thing, `@-` is not commutative. `old_range @- target_portion` is not the same as `target_portion @- old_range`. But `complement(old_range * target_portion)` *is* the same as `complement(target_portion * old_range)`. Or from another angle: it's true that `old_range @- target_portion = old_range @- (old_range * target_portion)`, but the intersection isn't "doing" anything here. It's true that intersection and minus both "reduce" what you put in, but minus is more accurate. So I think we want a name that captures that idea of "minus". Both "not" and "intersection" are misleading IMO. Of course "minus" is already taken (and you wouldn't expect it to return arrays anyway), which is why I'm thinking about names like "without" or "except". Or maybe "multi-minus". I still think "without portion" is the closest to capturing everything above (and avoids ambiguity with other SQL operations). And the "portion" ties the operator to `FOR PORTION OF`, which is its purpose. But I wouldn't be surprised if there were something better. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Sat, Jan 6, 2024 at 8:20 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > Getting caught up on reviews from November and December: > > > New patches attached, rebased to 43b46aae12. I'll work on your feedback from Jan 4 next. Thanks! > +/* + * ForPortionOfClause + * representation of FOR PORTION OF <period-name> FROM <ts> TO <te> + * or FOR PORTION OF <period-name> (<target>) + */ +typedef struct ForPortionOfClause +{ + NodeTag type; + char *range_name; + int range_name_location; + Node *target; + Node *target_start; + Node *target_end; +} ForPortionOfClause; "range_name_location" can be just "location"? generally most of the struct put the "location" to the last field in the struct. (that's the pattern I found all over other code) + if (isUpdate) + { + /* + * Now make sure we update the start/end time of the record. + * For a range col (r) this is `r = r * targetRange`. + */ + Expr *rangeSetExpr; + TargetEntry *tle; + + strat = RTIntersectStrategyNumber; + GetOperatorFromCanonicalStrategy(opclass, InvalidOid, "intersects", "FOR PORTION OF", &opid, &strat); + rangeSetExpr = (Expr *) makeSimpleA_Expr(AEXPR_OP, get_opname(opid), + (Node *) copyObject(rangeVar), targetExpr, + forPortionOf->range_name_location); + rangeSetExpr = (Expr *) transformExpr(pstate, (Node *) rangeSetExpr, EXPR_KIND_UPDATE_PORTION); + + /* Make a TLE to set the range column */ + result->rangeSet = NIL; + tle = makeTargetEntry(rangeSetExpr, range_attno, range_name, false); + result->rangeSet = lappend(result->rangeSet, tle); + + /* Mark the range column as requiring update permissions */ + target_perminfo->updatedCols = bms_add_member(target_perminfo->updatedCols, + range_attno - FirstLowInvalidHeapAttributeNumber); + } + else + result->rangeSet = NIL; I think the name "rangeSet" is misleading, since "set" is generally related to a set of records. but here it's more about the "range intersect". in ExecDelete we have following code pattern: ExecDeleteEpilogue(context, resultRelInfo, tupleid, oldtuple, changingPart); if (processReturning && resultRelInfo->ri_projectReturning) { .... if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid, SnapshotAny, slot)) elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING"); } } but the ExecForPortionOfLeftovers is inside ExecDeleteEpilogue. meaning even without ExecForPortionOfLeftovers, we can still call table_tuple_fetch_row_version also if it was *not* concurrently updated, then our current process holds the lock until the ending of the transaction, i think. So the following TODO is unnecessary? + /* + * Get the range of the old pre-UPDATE/DELETE tuple, + * so we can intersect it with the FOR PORTION OF target + * and see if there are any "leftovers" to insert. + * + * We have already locked the tuple in ExecUpdate/ExecDelete + * (TODO: if it was *not* concurrently updated, does table_tuple_update lock the tuple itself? + * I don't found the code for that yet, and maybe it depends on the AM?) + * and it has passed EvalPlanQual. + * Make sure we're looking at the most recent version. + * Otherwise concurrent updates of the same tuple in READ COMMITTED + * could insert conflicting "leftovers". + */ + if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc, tupleid, SnapshotAny, oldtupleSlot)) + elog(ERROR, "failed to fetch tuple for FOR PORTION OF"); + +/* ---------------------------------------------------------------- + * ExecForPortionOfLeftovers + * + * Insert tuples for the untouched timestamp of a row in a FOR + * PORTION OF UPDATE/DELETE + * ---------------------------------------------------------------- + */ +static void +ExecForPortionOfLeftovers(ModifyTableContext *context, + EState *estate, + ResultRelInfo *resultRelInfo, + ItemPointer tupleid) maybe change the comment to "Insert tuples for the not intersection of a row in a FOR PORTION OF UPDATE/DELETE." + deconstruct_array(DatumGetArrayTypeP(allLeftovers), typcache->type_id, typcache->typlen, + typcache->typbyval, typcache->typalign, &leftovers, NULL, &nleftovers); + + if (nleftovers > 0) + { I think add something like assert nleftovers >=0 && nleftovers <= 2 (assume only range not multirange) would improve readability. + <para> + If the table has a range column or + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>, + you may supply a <literal>FOR PORTION OF</literal> clause, and your delete will + only affect rows that overlap the given interval. Furthermore, if a row's span + extends outside the <literal>FOR PORTION OF</literal> bounds, then your delete + will only change the span within those bounds. In effect you are deleting any + moment targeted by <literal>FOR PORTION OF</literal> and no moments outside. + </para> + + <para> + Specifically, after <productname>PostgreSQL</productname> deletes the existing row, + it will <literal>INSERT</literal> + new rows whose range or start/end column(s) receive the remaining span outside + the targeted bounds, containing the original values in other columns. + There will be zero to two inserted records, + depending on whether the original span extended before the targeted + <literal>FROM</literal>, after the targeted <literal>TO</literal>, both, or neither. + </para> + + <para> + These secondary inserts fire <literal>INSERT</literal> triggers. First + <literal>BEFORE DELETE</literal> triggers first, then + <literal>BEFORE INSERT</literal>, then <literal>AFTER INSERT</literal>, + then <literal>AFTER DELETE</literal>. + </para> + + <para> + These secondary inserts do not require <literal>INSERT</literal> privilege on the table. + This is because conceptually no new information has been added. The inserted rows only preserve + existing data about the untargeted time period. Note this may result in users firing <literal>INSERT</literal> + triggers who don't have insert privileges, so be careful about <literal>SECURITY DEFINER</literal> trigger functions! + </para> I think you need to wrap them into a big paragraph, otherwise they lose the context? please see the attached build sql-update.html. also I think + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>, should shove into Add-PERIODs.patch. otherwise you cannot build Add-UPDATE-DELETE-FOR-PORTION-OF.patch without all the patches. I think the "FOR-PORTION-OF" feature is kind of independ? Because, IMHO, "for portion" is a range datum interacting with another single range datum, but the primary key with "WITHOUT OVERLAPS", is range datum interacting with a set of range datums. now I cannot just git apply v22-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch. That maybe would make it more difficult to get commited?
Attachment
On Tue, Jan 9, 2024 at 2:54 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 1/8/24 06:54, jian he wrote: > > On Fri, Jan 5, 2024 at 1:06 PM jian he <jian.universality@gmail.com> wrote: > > > > range_intersect returns the intersection of two ranges. > > I think here we are doing the opposite. > > names the main SQL function "range_not_intersect" and the internal > > function as "range_not_intersect_internal" should be fine. > > so people don't need to understand the meaning of "portion". > > Thank you for helping me figure out a name here! I realize that can be a bike-sheddy kind of > discussion, so let me share some of my principles. > > Range and multirange are highly mathematically "pure", and that's something I value in them. It > makes them more general-purpose, less encumbered by edge cases, easier to combine, and easier to > reason about. Preserving that close connection to math is a big goal. > > What I've called `without_portion` is (like) a closed form of minus (hence `@-` for the operator). > Minus isn't closed under everything (e.g. ranges), so `without_portion` adds arrays---much as to > close subtraction we add negative numbers and to close division we add rationals). We get the same > effect from multiranges, but that only buys us range support. It would be awesome to support > arbitrary types: ranges, multiranges, mdranges, boxes, polygons, inets, etc., so I think an array is > the way to go here. And then each array element is a "leftover". What do we call a closed form of > minus that returns arrays? > > Of course "minus" is already taken (and you wouldn't expect it to return arrays anyway), which is > why I'm thinking about names like "without" or "except". Or maybe "multi-minus". I still think > "without portion" is the closest to capturing everything above (and avoids ambiguity with other SQL > operations). And the "portion" ties the operator to `FOR PORTION OF`, which is its purpose. But I > wouldn't be surprised if there were something better. > Thanks for the deep explanation. I think the name range_without_portion is better than my range_not_intersect. I learned a lot. I also googled " bike-sheddy". haha. src5=# select range_without_portion(numrange(1.0,3.0,'[]'), numrange(1.5,2.0,'(]')); range_without_portion --------------------------- {"[1.0,1.5]","(2.0,3.0]"} (1 row) src5=# \gdesc Column | Type -----------------------+----------- range_without_portion | numeric[] (1 row) src5=# \df range_without_portion List of functions Schema | Name | Result data type | Argument data types | Type ------------+-----------------------+------------------+---------------------+------ pg_catalog | range_without_portion | anyarray | anyrange, anyrange | func (1 row) so apparently, you cannot from (anyrange, anyrange) get anyarray the element type is anyrange. I cannot find the documented explanation in https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC anyrange is POLYMORPHIC, anyarray is POLYMORPHIC, but I suppose, getting an anyarray the element type is anyrange would be hard.
On Sat, 6 Jan 2024 at 05:50, Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > Getting caught up on reviews from November and December: > > On 11/19/23 22:57, jian he wrote: > > > > I believe the following part should fail. Similar tests on > > src/test/regress/sql/generated.sql. line begin 347. > > > > drop table if exists gtest23a,gtest23x cascade; > > CREATE TABLE gtest23a (x int4range, y int4range, > > CONSTRAINT gtest23a_pk PRIMARY KEY (x, y WITHOUT OVERLAPS)); > > CREATE TABLE gtest23x (a int4range, b int4range GENERATED ALWAYS AS > > ('empty') STORED, > > FOREIGN KEY (a, PERIOD b ) REFERENCES gtest23a(x, PERIOD y) ON UPDATE > > CASCADE); -- should be error? > > Okay, I've added a restriction for temporal FKs too. But note this will > change once the PERIODs patch (the last one here) is finished. When the > generated column is for a PERIOD, there will be logic to "reroute" the > updates to the constituent start/end columns instead. > > > begin; > > drop table if exists fk, pk cascade; > > CREATE TABLE pk (id int4range, valid_at int4range, > > CONSTRAINT pk_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) > > ); > > CREATE TABLE fk ( > > id int4range,valid_at tsrange, parent_id int4range, > > CONSTRAINT fk FOREIGN KEY (parent_id, valid_at) > > REFERENCES pk > > ); > > rollback; > > -- > > the above query will return an error: number of referencing and > > referenced columns for foreign key disagree. > > but if you look at it closely, primary key and foreign key columns both are two! > > The error should be saying valid_at should be specified with "PERIOD". > > Ah okay, thanks for the clarification! This is tricky because the user > left out the PERIOD on the fk side, and left out the entire pk side, so > those columns are just implicit. So there is no PERIOD anywhere. > But I agree that if the pk has WITHOUT OVERLAPS, we should expect a > corresponding PERIOD modifier on the fk side and explain that that's > what's missing. The attached patches include that. > > > I found out other issues in v18. > > I first do `git apply` then `git diff --check`, there is a white > > space error in v18-0005. > > Fixed, thanks! > > > You also need to change update.sgml and delete.sgml <title>Outputs</title> part. > > Since at most, it can return 'UPDATE 3' or 'DELETE 3'. > > This doesn't sound correct to me. An UPDATE or DELETE can target many > rows. Also I don't think the inserted "leftovers" should be included in > these counts. They represent the rows updated/deleted. > > > --the following query should work? > > drop table pk; > > CREATE table pk(a numrange PRIMARY key,b text); > > insert into pk values('[1,10]'); > > create or replace function demo1() returns void as $$ > > declare lb numeric default 1; up numeric default 3; > > begin > > update pk for portion of a from lb to up set b = 'lb_to_up'; > > return; > > end > > $$ language plpgsql; > > select * from demo1(); > > Hmm this is a tough one. It is correct that the `FROM __ TO __` values cannot be column references. > They are computed up front, not per row. One reason is they are used to search the table. In fact > the standard basically allows nothing but literal strings here. See section 14.14, page 971 then > look up <point in time> on page 348 and <datetime value expression> on page 308. The most > flexibility you get is you can add/subtract an interval to the datetime literal. We are already well > past that by allowing expressions, (certain) functions, parameters, etc. > > OTOH in your plpgsql example they are not really columns. They just get represented as ColumnRefs > and then passed to transformColumnRef. I'm surprised plpgsql does it that way. As a workaround you > could use `EXECUTE format(...)`, but I'd love to make that work as you show instead. I'll keep > working on this one but it's not done yet. Perhaps I can move the restriction into > analysis/planning. If anyone has any advice it'd be welcome. > > On 12/6/23 05:22, jian he wrote: > > this TODO: > > * TODO: It sounds like FOR PORTION OF might need to do something here too? > > based on comments on ExprContext. I refactor a bit, and solved this TODO. > > The patch looks wrong to me. We need to range targeted by `FROM __ > TO __` to live for the whole statement, not just one tuple (see just > above). That's why it gets computed in the Init function node. > > I don't think that TODO is needed anymore at all. Older versions of the > patch had more expressions besides this one, and I think it was those I > was concerned about. So I've removed the comment here. > > > tring to the following TODO: > > // TODO: Need to save context->mtstate->mt_transition_capture? (See > > comment on ExecInsert) > > > > but failed. > > I also attached the trial, and also added the related test. > > > > You can also use the test to check portion update with insert trigger > > with "referencing old table as old_table new table as new_table" > > situation. > > Thank you for the test case! This is very helpful. So the problem is > `referencing new table as new_table` gets lost. I don't have a fix yet > but I'll work on it. > > On 12/11/23 00:31, jian he wrote: > > - false); /* quiet */ > > + false); /* quiet */ > > > > Is the above part unnecessary? > > Good catch! Fixed. > > > I am confused. so now I only apply v19, 0001 to 0003. > > period_to_range function never used. maybe we can move this part to > > 0005-Add PERIODs.patch? > > Also you add change in Makefile in 0003, meson.build change in 0005, > > better put it on in 0005? > > You're right, those changes should have been in the PERIODs patch. Moved. > > > +/* > > + * We need to handle this shift/reduce conflict: > > + * FOR PORTION OF valid_at FROM INTERVAL YEAR TO MONTH TO foo. > > + * This is basically the classic "dangling else" problem, and we want a > > + * similar resolution: treat the TO as part of the INTERVAL, not as part of > > + * the FROM ... TO .... Users can add parentheses if that's a problem. > > + * TO just needs to be higher precedence than YEAR_P etc. > > + * TODO: I need to figure out a %prec solution before this gets committed! > > + */ > > +%nonassoc YEAR_P MONTH_P DAY_P HOUR_P MINUTE_P > > +%nonassoc TO > > > > this part will never happen? > > since "FROM INTERVAL YEAR TO MONTH TO" > > means "valid_at" will be interval range data type, which does not exist now. > > It appears still needed to me. Without those lines I get 4 shift/reduce > conflicts. Are you seeing something different? Or if you have a better > solution I'd love to add it. I definitely need to fix this before that > patch gets applied. > > > for all the refactor related to ri_PerformCheck, do you need (Datum) 0 > > instead of plain 0? > > Casts added. > > > + <para> > > + If the table has a range column or > > + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>, > > + you may supply a <literal>FOR PORTION OF</literal> clause, and > > your delete will > > + only affect rows that overlap the given interval. Furthermore, if > > a row's span > > > > > https://influentialpoints.com/Training/basic_statistics_ranges.htm#:~:text=A%20range%20is%20two%20numbers,or%20the%20difference%20between%20them > > So "range" is more accurate than "interval"? > > I don't think we should be using R to define the terms "range" and > "interval", which both already have meanings in Postgres, SQL, and the > literature for temporal databases. But I'm planning to revise the docs' > terminology here anyway. Some temporal database texts use "interval" > in this sense, and I thought it was a decent term to mean "range or > PERIOD". But now we need something to mean "range or multirange or > custom type or PERIOD". Actually "portion" seems like maybe the best > term, since the SQL syntax `FOR PORTION OF` reinforces that term. If you > have suggestions I'm happy for ideas. > > > +/* ---------- > > + * ForPortionOfState() > > + * > > + * Copies a ForPortionOfState into the current memory context. > > + */ > > +static ForPortionOfState * > > +CopyForPortionOfState(ForPortionOfState *src) > > +{ > > + ForPortionOfState *dst = NULL; > > + if (src) { > > + MemoryContext oldctx; > > + RangeType *r; > > + TypeCacheEntry *typcache; > > + > > + /* > > + * Need to lift the FOR PORTION OF details into a higher memory context > > + * because cascading foreign key update/deletes can cause triggers to fire > > + * triggers, and the AfterTriggerEvents will outlive the FPO > > + * details of the original query. > > + */ > > + oldctx = MemoryContextSwitchTo(TopTransactionContext); > > > > should it be "Copy a ForPortionOfState into the TopTransactionContext"? > > You're right, the other function comments here use imperative mood. Changed. > > New patches attached, rebased to 43b46aae12. I'll work on your feedback from Jan 4 next. Thanks! One of the test has failed in CFBot at [1] with: diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/generated.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/generated.out --- /tmp/cirrus-ci-build/src/test/regress/expected/generated.out 2024-01-06 00:34:48.078691251 +0000 +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/generated.out 2024-01-06 00:42:08.782292390 +0000 @@ -19,7 +19,9 @@ table_name | column_name | dependent_column ------------+-------------+------------------ gtest1 | a | b -(1 row) + pt | de | p + pt | ds | p +(3 rows) More details of the failure is available at [2]. [1] - https://cirrus-ci.com/task/5739983420522496 [2] - https://api.cirrus-ci.com/v1/artifact/task/5739983420522496/log/src/test/recovery/tmp_check/log/regress_log_027_stream_regress Regards, Vignesh
On 31.12.23 09:51, Paul Jungwirth wrote: > On Wed, Dec 6, 2023 at 12:59 AM Peter Eisentraut <peter@eisentraut.org> > wrote: > > > > On 02.12.23 19:41, Paul Jungwirth wrote: > > > So what do you think of this idea instead?: > > > > > > We could add a new (optional) support function to GiST that translates > > > "well-known" strategy numbers into the opclass's own strategy numbers. > > > > I had some conversations about this behind the scenes. I think this > > idea makes sense. > > Here is a patch series with the GiST stratnum support function added. I > put this into a separate patch (before all the temporal ones), so it's > easier to review. Then in the PK patch (now #2) we call that function to > figure out the = and && operators. I think this is a big improvement. I like this solution. Here is some more detailed review of the first two patches. (I reviewed v20; I see you have also posted v21, but they don't appear very different for this purpose.) v20-0001-Add-stratnum-GiST-support-function.patch * contrib/btree_gist/Makefile Needs corresponding meson.build updates. * contrib/btree_gist/btree_gist--1.7--1.8.sql Should gist_stratnum_btree() live in contrib/btree_gist/ or in core? Are there other extensions that use the btree strategy numbers for gist? +ALTER OPERATOR FAMILY gist_vbit_ops USING gist ADD + FUNCTION 12 (varbit, varbit) gist_stratnum_btree (int2) ; Is there a reason for the extra space after FUNCTION here (repeated throughout the file)? +-- added in 1.4: What is the purpose of these "added in" comments? v20-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch * contrib/btree_gist/Makefile Also update meson.build. * contrib/btree_gist/sql/without_overlaps.sql Maybe also insert a few values, to verify that the constraint actually does something? * doc/src/sgml/ref/create_table.sgml Is "must have a range type" still true? With the changes to the strategy number mapping, any type with a supported operator class should work? * src/backend/utils/adt/ruleutils.c Is it actually useful to add an argument to decompile_column_index_array()? Wouldn't it be easier to just print the " WITHOUT OVERLAPS" in the caller after returning from it? * src/include/access/gist_private.h The added function gistTranslateStratnum() isn't really "private" to gist. So access/gist.h would be a better place for it. Also, most other functions there appear to be named "GistSomething", so a more consistent name might be GistTranslateStratnum. * src/include/access/stratnum.h The added StrategyIsValid() doesn't seem that useful? Plenty of existing code just compares against InvalidStrategy, and there is only one caller for the new function. I suggest to do without it. * src/include/commands/defrem.h We are using two terms here, well-known strategy number and canonical strategy number, to mean the same thing (I think?). Let's try to stick with one. Or explain the relationship? If these points are addressed, and maybe with another round of checking that all corner cases are covered, I think these patches (0001 and 0002) are close to ready.
On Thu, Jan 11, 2024 at 10:44 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 31.12.23 09:51, Paul Jungwirth wrote: > > On Wed, Dec 6, 2023 at 12:59 AM Peter Eisentraut <peter@eisentraut.org> > > wrote: > > > > > > On 02.12.23 19:41, Paul Jungwirth wrote: > > > > So what do you think of this idea instead?: > > > > > > > > We could add a new (optional) support function to GiST that translates > > > > "well-known" strategy numbers into the opclass's own strategy numbers. > > > > > > I had some conversations about this behind the scenes. I think this > > > idea makes sense. > > > > Here is a patch series with the GiST stratnum support function added. I > > put this into a separate patch (before all the temporal ones), so it's > > easier to review. Then in the PK patch (now #2) we call that function to > > figure out the = and && operators. I think this is a big improvement. > > I like this solution. > > Here is some more detailed review of the first two patches. (I reviewed > v20; I see you have also posted v21, but they don't appear very > different for this purpose.) > > v20-0001-Add-stratnum-GiST-support-function.patch > > * contrib/btree_gist/Makefile > > Needs corresponding meson.build updates. fixed > > * contrib/btree_gist/btree_gist--1.7--1.8.sql > > Should gist_stratnum_btree() live in contrib/btree_gist/ or in core? > Are there other extensions that use the btree strategy numbers for > gist? > > +ALTER OPERATOR FAMILY gist_vbit_ops USING gist ADD > + FUNCTION 12 (varbit, varbit) gist_stratnum_btree (int2) ; > > Is there a reason for the extra space after FUNCTION here (repeated > throughout the file)? > fixed. > +-- added in 1.4: > > What is the purpose of these "added in" comments? > > > v20-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch > > * contrib/btree_gist/Makefile > > Also update meson.build. fixed. > * contrib/btree_gist/sql/without_overlaps.sql > > Maybe also insert a few values, to verify that the constraint actually > does something? > I added an ok and failed INSERT. > * doc/src/sgml/ref/create_table.sgml > > Is "must have a range type" still true? With the changes to the > strategy number mapping, any type with a supported operator class > should work? > > * src/backend/utils/adt/ruleutils.c > > Is it actually useful to add an argument to > decompile_column_index_array()? Wouldn't it be easier to just print > the " WITHOUT OVERLAPS" in the caller after returning from it? fixed. i just print it right after decompile_column_index_array. > * src/include/access/gist_private.h > > The added function gistTranslateStratnum() isn't really "private" to > gist. So access/gist.h would be a better place for it. > > Also, most other functions there appear to be named "GistSomething", > so a more consistent name might be GistTranslateStratnum. > > * src/include/access/stratnum.h > > The added StrategyIsValid() doesn't seem that useful? Plenty of > existing code just compares against InvalidStrategy, and there is only > one caller for the new function. I suggest to do without it. > If more StrategyNumber are used in the future, will StrategyIsValid() make sense? > * src/include/commands/defrem.h > > We are using two terms here, well-known strategy number and canonical > strategy number, to mean the same thing (I think?). Let's try to > stick with one. Or explain the relationship? > In my words: for range type, well-known strategy number and canonical strategy number are the same thing. For types Gist does not natively support equality, like int4, GetOperatorFromCanonicalStrategy will pass RTEqualStrategyNumber from ComputeIndexAttrs and return BTEqualStrategyNumber. > If these points are addressed, and maybe with another round of checking > that all corner cases are covered, I think these patches (0001 and 0002) > are close to ready. > the following are my review: + /* exclusionOpNames can be non-NIL if we are creating a partition */ + if (iswithoutoverlaps && exclusionOpNames == NIL) + { + indexInfo->ii_ExclusionOps = palloc_array(Oid, nkeycols); + indexInfo->ii_ExclusionProcs = palloc_array(Oid, nkeycols); + indexInfo->ii_ExclusionStrats = palloc_array(uint16, nkeycols); + } I am not sure the above comment is related to the code +/* + * Returns the btree number for equals, otherwise invalid. + * + * This is for GiST opclasses in btree_gist (and maybe elsewhere) + * that use the BT*StrategyNumber constants. + */ +Datum +gist_stratnum_btree(PG_FUNCTION_ARGS) +{ + StrategyNumber strat = PG_GETARG_UINT16(0); + + switch (strat) + { + case RTEqualStrategyNumber: + PG_RETURN_UINT16(BTEqualStrategyNumber); + case RTLessStrategyNumber: + PG_RETURN_UINT16(BTLessStrategyNumber); + case RTLessEqualStrategyNumber: + PG_RETURN_UINT16(BTLessEqualStrategyNumber); + case RTGreaterStrategyNumber: + PG_RETURN_UINT16(BTGreaterStrategyNumber); + case RTGreaterEqualStrategyNumber: + PG_RETURN_UINT16(BTGreaterEqualStrategyNumber); + default: + PG_RETURN_UINT16(InvalidStrategy); + } the above comment seems not right? even though currently strat will only be RTEqualStrategyNumber. +void +GetOperatorFromCanonicalStrategy(Oid opclass, + Oid atttype, + const char *opname, + Oid *opid, + StrategyNumber *strat) +{ + Oid opfamily; + Oid opcintype; + StrategyNumber opstrat = *strat; + + *opid = InvalidOid; + + if (get_opclass_opfamily_and_input_type(opclass, + &opfamily, + &opcintype)) + { + /* + * Ask the opclass to translate to its internal stratnum + * + * For now we only need GiST support, but this could support + * other indexams if we wanted. + */ + *strat = gistTranslateStratnum(opclass, opstrat); + if (!StrategyIsValid(*strat)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("no %s operator found for WITHOUT OVERLAPS constraint", opname), + errdetail("Could not translate strategy number %u for opclass %d.", + opstrat, opclass), + errhint("Define a stratnum support function for your GiST opclass."))); + + *opid = get_opfamily_member(opfamily, opcintype, opcintype, *strat); + } + + if (!OidIsValid(*opid)) + { + HeapTuple opftuple; + Form_pg_opfamily opfform; + + /* + * attribute->opclass might not explicitly name the opfamily, + * so fetch the name of the selected opfamily for use in the + * error message. + */ + opftuple = SearchSysCache1(OPFAMILYOID, + ObjectIdGetDatum(opfamily)); + if (!HeapTupleIsValid(opftuple)) + elog(ERROR, "cache lookup failed for opfamily %u", + opfamily); + opfform = (Form_pg_opfamily) GETSTRUCT(opftuple); + + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("no %s operator found for WITHOUT OVERLAPS constraint", opname), + errdetail("There must be an %s operator within opfamily \"%s\" for type \"%s\".", + opname, + NameStr(opfform->opfname), + format_type_be(atttype)))); + } +} I refactored this function. GetOperatorFromCanonicalStrategy called both for normal and WITHOUT OVERLAPS. so errmsg("no %s operator found for WITHOUT OVERLAPS constraint", opname) would be misleading for columns without "WITHOUT OVERLAPS". Also since that error part was deemed unreachable, it would make the error verbose, I guess. --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -2379,6 +2379,10 @@ describeOneTableDetails(const char *schemaname, else appendPQExpBufferStr(&buf, ", false AS indisreplident"); appendPQExpBufferStr(&buf, ", c2.reltablespace"); + if (pset.sversion >= 170000) + appendPQExpBufferStr(&buf, ", con.conwithoutoverlaps"); + else + appendPQExpBufferStr(&buf, ", false AS conwithoutoverlaps"); I don't know how to verify it. I think it should be: + if (pset.sversion >= 170000) + appendPQExpBufferStr(&buf, ", con.conwithoutoverlaps"); I refactored the 0002 commit message. The original commit message seems outdated. I put all the related changes into one attachment.
Attachment
Hello, Here are new patches consolidating feedback from several emails. I haven't addressed everything but I think I'm overdue for a reply: On 1/4/24 21:06, jian he wrote: > > I am confused. > say condition: " (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)" > the following code will only run PartA, never run PartB? > > ` > else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0) > PartA > else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0) > PartB > ` > > minimum example: > #include<stdio.h> > #include<string.h> > #include<stdlib.h> > #include<assert.h> > int > main(void) > { > int cmp_l1l2; > int cmp_u1u2; > int cmp_u1l2; > int cmp_l1u2; > cmp_l1u2 = -1; > cmp_l1l2 = 0; > cmp_u1u2 = 0; > cmp_u1l2 = 0; > assert(cmp_u1l2 == 0); > if (cmp_l1u2 > 0 || cmp_u1l2 < 0) > printf("calling partA\n"); > else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0) > printf("calling partB\n"); > else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0) > printf("calling partC\n"); > } All of the branches are used. I've attached a `without_portion.c` minimal example showing different cases. For ranges it helps to go through the Allen relationships (https://en.wikipedia.org/wiki/Allen%27s_interval_algebra) to make a comprehensive check. (But note that our operators don't exactly match that terminology, and it's important to consider closed-vs-open and unbounded cases.) > I am confused with the name "range_without_portion", I think > "range_not_overlap" would be better. I think I covered this in my other reply and we are now in agreement, but if that's mistaken let know me. > select numrange(1.1, 2.2) @- numrange(2.0, 3.0); > the result is not the same as > select numrange(2.0, 3.0) @- numrange(1.1, 2.2); Correct, @- is not commutative. > So your categorize oprkind as 'b' for operator "@-" is wrong? > select oprname,oprkind,oprcanhash,oprcanmerge,oprleft,oprright,oprresult,oprcode > from pg_operator > where oprname = '@-'; 'b' is the correct oprkind. It is a binary (infix) operator. > aslo > select count(*), oprkind from pg_operator group by oprkind; > there are only 5% are prefix operators. > maybe we should design it as: > 1. if both inputs are empty range, the result array is empty. > 2. if both inputs are non-empty and never overlaps, put both of them > to the result array. > 3. if one input is empty another one is not, then put the non-empty > one into the result array. Also covered before, but if any of this still applies please let me know. > after applying the patch: now the catalog data seems not correct to me. > SELECT a1.amopfamily > ,a1.amoplefttype::regtype > ,a1.amoprighttype > ,a1.amopstrategy > ,amoppurpose > ,amopsortfamily > ,amopopr > ,op.oprname > ,am.amname > FROM pg_amop as a1 join pg_operator op on op.oid = a1.amopopr > join pg_am am on am.oid = a1.amopmethod > where amoppurpose = 'p'; > output: > amopfamily | amoplefttype | amoprighttype | amopstrategy | > amoppurpose | amopsortfamily | amopopr | oprname | amname > ------------+---------------+---------------+--------------+-------------+----------------+---------+---------+-------- > 2593 | box | 603 | 31 | p > | 0 | 803 | # | gist > 3919 | anyrange | 3831 | 31 | p > | 0 | 3900 | * | gist > 6158 | anymultirange | 4537 | 31 | p > | 0 | 4394 | * | gist > 3919 | anyrange | 3831 | 32 | p > | 0 | 8747 | @- | gist > 6158 | anymultirange | 4537 | 32 | p > | 0 | 8407 | @- | gist > (5 rows) > > select oprcode, oprname, oprleft::regtype > from pg_operator opr > where opr.oprname in ('#','*','@-') > and oprleft = oprright > and oprleft in (603,3831,4537); > output: > > oprcode | oprname | oprleft > ----------------------------+---------+--------------- > box_intersect | # | box > range_intersect | * | anyrange > multirange_intersect | * | anymultirange > range_without_portion | @- | anyrange > multirange_without_portion | @- | anymultirange > (5 rows) This seems correct. '#' is the name of the box overlaps operator. Probably I should add a box @- operator too. But see below. . . . > should amoppurpose = 'p' is true apply to ' @-' operator? Yes. > catalog-pg-amop.html: > ` > amopsortfamily oid (references pg_opfamily.oid): > The B-tree operator family this entry sorts according to, if an > ordering operator; zero if a search operator > ` > you should also update the above entry, the amopsortfamily is also > zero for "portion operator" for the newly implemented "portion > operator". Okay, done. > v21-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch > create mode 100644 src/backend/utils/adt/period.c > create mode 100644 src/include/utils/period.h > you should put these two files to v21-0008-Add-PERIODs.patch. > it's not related to that patch, it also makes people easy to review. You're right, sorry! On 1/8/24 16:00, jian he wrote: > > +/* > + * ForPortionOfClause > + * representation of FOR PORTION OF <period-name> FROM <ts> TO <te> > + * or FOR PORTION OF <period-name> (<target>) > + */ > +typedef struct ForPortionOfClause > +{ > + NodeTag type; > + char *range_name; > + int range_name_location; > + Node *target; > + Node *target_start; > + Node *target_end; > +} ForPortionOfClause; > > "range_name_location" can be just "location"? > generally most of the struct put the "location" to the last field in the struct. > (that's the pattern I found all over other code) Agreed, done. > + if (isUpdate) > + { > + /* > + * Now make sure we update the start/end time of the record. > + * For a range col (r) this is `r = r * targetRange`. > + */ > + Expr *rangeSetExpr; > + TargetEntry *tle; > + > + strat = RTIntersectStrategyNumber; > + GetOperatorFromCanonicalStrategy(opclass, InvalidOid, "intersects", > "FOR PORTION OF", &opid, &strat); > + rangeSetExpr = (Expr *) makeSimpleA_Expr(AEXPR_OP, get_opname(opid), > + (Node *) copyObject(rangeVar), targetExpr, > + forPortionOf->range_name_location); > + rangeSetExpr = (Expr *) transformExpr(pstate, (Node *) rangeSetExpr, > EXPR_KIND_UPDATE_PORTION); > + > + /* Make a TLE to set the range column */ > + result->rangeSet = NIL; > + tle = makeTargetEntry(rangeSetExpr, range_attno, range_name, false); > + result->rangeSet = lappend(result->rangeSet, tle); > + > + /* Mark the range column as requiring update permissions */ > + target_perminfo->updatedCols = bms_add_member(target_perminfo->updatedCols, > + range_attno - FirstLowInvalidHeapAttributeNumber); > + } > + else > + result->rangeSet = NIL; > I think the name "rangeSet" is misleading, since "set" is generally > related to a set of records. > but here it's more about the "range intersect". Okay, I can see that. I used "rangeSet" because we add it to the SET clause of the UPDATE command. Here I've changed it to rangeTargetList. I think this matches other code and better indicates what it holds. Any objections? In the PERIOD patch we will need two TLEs here (that's why it's a List): one for the start column and one for the end column. > in ExecDelete > we have following code pattern: > ExecDeleteEpilogue(context, resultRelInfo, tupleid, oldtuple, changingPart); > if (processReturning && resultRelInfo->ri_projectReturning) > { > .... > if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid, > SnapshotAny, slot)) > elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING"); > } > } > > but the ExecForPortionOfLeftovers is inside ExecDeleteEpilogue. > meaning even without ExecForPortionOfLeftovers, we can still call > table_tuple_fetch_row_version > also if it was *not* concurrently updated, then our current process > holds the lock until the ending of the transaction, i think. > So the following TODO is unnecessary? > > + /* > + * Get the range of the old pre-UPDATE/DELETE tuple, > + * so we can intersect it with the FOR PORTION OF target > + * and see if there are any "leftovers" to insert. > + * > + * We have already locked the tuple in ExecUpdate/ExecDelete > + * (TODO: if it was *not* concurrently updated, does > table_tuple_update lock the tuple itself? > + * I don't found the code for that yet, and maybe it depends on the AM?) > + * and it has passed EvalPlanQual. > + * Make sure we're looking at the most recent version. > + * Otherwise concurrent updates of the same tuple in READ COMMITTED > + * could insert conflicting "leftovers". > + */ > + if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc, > tupleid, SnapshotAny, oldtupleSlot)) > + elog(ERROR, "failed to fetch tuple for FOR PORTION OF"); I think you're right. According to the comments on TM_Result (returned by table_tuple_update), a TM_Ok indicates that the lock was acquired. > +/* ---------------------------------------------------------------- > + * ExecForPortionOfLeftovers > + * > + * Insert tuples for the untouched timestamp of a row in a FOR > + * PORTION OF UPDATE/DELETE > + * ---------------------------------------------------------------- > + */ > +static void > +ExecForPortionOfLeftovers(ModifyTableContext *context, > + EState *estate, > + ResultRelInfo *resultRelInfo, > + ItemPointer tupleid) > > maybe change the comment to > "Insert tuples for the not intersection of a row in a FOR PORTION OF > UPDATE/DELETE." Changed to "untouched portion". > + deconstruct_array(DatumGetArrayTypeP(allLeftovers), > typcache->type_id, typcache->typlen, > + typcache->typbyval, typcache->typalign, &leftovers, NULL, &nleftovers); > + > + if (nleftovers > 0) > + { > I think add something like assert nleftovers >=0 && nleftovers <= 2 > (assume only range not multirange) would improve readability. I added the first assert. The second is not true for non-range types. > + <para> > + If the table has a range column or > + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>, > + you may supply a <literal>FOR PORTION OF</literal> clause, and > your delete will > + only affect rows that overlap the given interval. Furthermore, if > a row's span > + extends outside the <literal>FOR PORTION OF</literal> bounds, then > your delete > + will only change the span within those bounds. In effect you are > deleting any > + moment targeted by <literal>FOR PORTION OF</literal> and no moments outside. > + </para> > + > + <para> > + Specifically, after <productname>PostgreSQL</productname> deletes > the existing row, > + it will <literal>INSERT</literal> > + new rows whose range or start/end column(s) receive the remaining > span outside > + the targeted bounds, containing the original values in other columns. > + There will be zero to two inserted records, > + depending on whether the original span extended before the targeted > + <literal>FROM</literal>, after the targeted <literal>TO</literal>, > both, or neither. > + </para> > + > + <para> > + These secondary inserts fire <literal>INSERT</literal> triggers. First > + <literal>BEFORE DELETE</literal> triggers first, then > + <literal>BEFORE INSERT</literal>, then <literal>AFTER INSERT</literal>, > + then <literal>AFTER DELETE</literal>. > + </para> > + > + <para> > + These secondary inserts do not require <literal>INSERT</literal> > privilege on the table. > + This is because conceptually no new information has been added. > The inserted rows only preserve > + existing data about the untargeted time period. Note this may > result in users firing <literal>INSERT</literal> > + triggers who don't have insert privileges, so be careful about > <literal>SECURITY DEFINER</literal> trigger functions! > + </para> > > I think you need to wrap them into a big paragraph, otherwise they > lose the context? > please see the attached build sql-update.html. Still TODO. > also I think > + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>, > should shove into Add-PERIODs.patch. > > otherwise you cannot build Add-UPDATE-DELETE-FOR-PORTION-OF.patch > without all the patches. Fixed. > I think the "FOR-PORTION-OF" feature is kind of independ? > Because, IMHO, "for portion" is a range datum interacting with another > single range datum, but the primary key with "WITHOUT OVERLAPS", is > range datum interacting with a set of range datums. > now I cannot just git apply v22-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch. > That maybe would make it more difficult to get commited? Still TODO. On 1/8/24 21:33, jian he wrote: > > src5=# select range_without_portion(numrange(1.0,3.0,'[]'), > numrange(1.5,2.0,'(]')); > range_without_portion > --------------------------- > {"[1.0,1.5]","(2.0,3.0]"} > (1 row) > > src5=# \gdesc > Column | Type > -----------------------+----------- > range_without_portion | numeric[] > (1 row) > > src5=# \df range_without_portion > List of functions > Schema | Name | Result data type | Argument data > types | Type > ------------+-----------------------+------------------+---------------------+------ > pg_catalog | range_without_portion | anyarray | anyrange, > anyrange | func > (1 row) > > so apparently, you cannot from (anyrange, anyrange) get anyarray the > element type is anyrange. > I cannot find the documented explanation in > https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC > > anyrange is POLYMORPHIC, anyarray is POLYMORPHIC, > but I suppose, getting an anyarray the element type is anyrange would be hard. You're right, that is a problem. I think the right approach is to make intersect and without_portion just be support functions, not operators. Then I don't need to introduce the new 'p' amop strategy at all, which seemed like a dubious idea anyway. Then the without_portion function can return a SETOF instead of an array. Another idea is to add more polymorphic types, anyrangearray and anymultirangearray, but maybe that is too big a thing. OTOH I have wanted those same types before. I will take a stab at it. On 1/11/24 06:44, Peter Eisentraut wrote: > Here is some more detailed review of the first two patches. (I reviewed v20; I see you have also > posted v21, but they don't appear very different for this purpose.) > > v20-0001-Add-stratnum-GiST-support-function.patch > > * contrib/btree_gist/Makefile > > Needs corresponding meson.build updates. Fixed. > * contrib/btree_gist/btree_gist--1.7--1.8.sql > > Should gist_stratnum_btree() live in contrib/btree_gist/ or in core? > Are there other extensions that use the btree strategy numbers for > gist? Moved. None of our other contrib extensions use it. I thought it would be friendly to offer it to outside extensions, but maybe that is too speculative. > +ALTER OPERATOR FAMILY gist_vbit_ops USING gist ADD > + FUNCTION 12 (varbit, varbit) gist_stratnum_btree (int2) ; > > Is there a reason for the extra space after FUNCTION here (repeated > throughout the file)? Fixed. > +-- added in 1.4: > > What is the purpose of these "added in" comments? I added those to help me make sure I was including every type in the extension, but I've taken them out here. > v20-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch > > * contrib/btree_gist/Makefile > > Also update meson.build. Done. > * contrib/btree_gist/sql/without_overlaps.sql > > Maybe also insert a few values, to verify that the constraint actually > does something? Done. > * doc/src/sgml/ref/create_table.sgml > > Is "must have a range type" still true? With the changes to the > strategy number mapping, any type with a supported operator class > should work? Updated. Probably more docs to come; I want to go through them all now that we support more types. > * src/backend/utils/adt/ruleutils.c > > Is it actually useful to add an argument to > decompile_column_index_array()? Wouldn't it be easier to just print > the " WITHOUT OVERLAPS" in the caller after returning from it? Okay, done. > * src/include/access/gist_private.h > > The added function gistTranslateStratnum() isn't really "private" to > gist. So access/gist.h would be a better place for it. Moved. > Also, most other functions there appear to be named "GistSomething", > so a more consistent name might be GistTranslateStratnum. > > * src/include/access/stratnum.h Changed. > The added StrategyIsValid() doesn't seem that useful? Plenty of > existing code just compares against InvalidStrategy, and there is only > one caller for the new function. I suggest to do without it. > > * src/include/commands/defrem.h Okay, removed. > We are using two terms here, well-known strategy number and canonical > strategy number, to mean the same thing (I think?). Let's try to > stick with one. Or explain the relationship? True. Changed everything to "well-known" which seems like a better match for what's going on. I haven't gone through jian he's Jan 13 patch yet, but since he was also implementing Peter's requests I thought I should share what I have. I did this work a while ago, but I was hoping to finish the TODOs above first, and then we got hit with a winter storm that knocked out power. Sorry to cause duplicate work! Rebased to 2f35c14cfb. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v23-0001-Add-stratnum-GiST-support-function.patch
- v23-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
- v23-0003-Add-GiST-referencedagg-support-func.patch
- v23-0004-Add-temporal-FOREIGN-KEYs.patch
- v23-0005-Add-multi-range_without_portion-proc-operator.patch
- v23-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v23-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v23-0008-Add-PERIODs.patch
2024-01 Commitfest. Hi, This patch has a CF status of "Needs Review" [1], but it seems there were CFbot test failures last time it was run [2]. Please have a look and post an updated version if necessary. ====== [1] https://commitfest.postgresql.org/46/4308/ [2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4308 Kind Regards, Peter Smith.
On 18.01.24 04:59, Paul Jungwirth wrote: > Here are new patches consolidating feedback from several emails. I have committed 0001 and 0002 (the primary key support). The only significant tweak I did was the error messages in GetOperatorFromWellKnownStrategy(), to make the messages translatable better and share wording with other messages. These messages are difficult to reach, so we'll probably have to wait for someone to actually encounter them to see if they are useful. I would like to work on 0003 and 0004 (the foreign key support) during February/March. The patches beyond that are probably too optimistic for PG17. I recommend you focus getting 0003/0004 in good shape soon.
On 1/24/24 08:32, Peter Eisentraut wrote: > On 18.01.24 04:59, Paul Jungwirth wrote: >> Here are new patches consolidating feedback from several emails. > > I have committed 0001 and 0002 (the primary key support). Thanks Peter! I noticed the comment on gist_stratnum_btree was out-of-date, so here is a tiny patch correcting it. Also the remaining patches with some updates: I fixed the dependency issues with PERIODs and their (hidden) GENERATED range columns. This has been causing test failures and bugging me since I reordered the patches at PgCon, so I'm glad to finally clean it up. The PERIOD should have an INTERNAL dependency on the range column, but then when you dropped the table the dependency code thought the whole table was part of the INTERNAL dependency, so the drop would fail. The PERIOD patch here fixes the dependency logic. (I guess this is the first time a column has been an internal dependency of something.) I also fixed an error message when you try to change the type of a start/end column used by a PERIOD. Previously the error message would complain about the GENERATED column, not the PERIOD, which seems confusing. In fact it was non-deterministic, depending on which pg_depend record the index returned first. On 12/6/23 05:22, jian he wrote: > tring to the following TODO: > // TODO: Need to save context->mtstate->mt_transition_capture? (See > comment on ExecInsert) > > but failed. > I also attached the trial, and also added the related test. > > You can also use the test to check portion update with insert trigger > with "referencing old table as old_table new table as new_table" > situation. Thank you for the very helpful test case here. I fixed the issue of not passing along the transition table. But there is still more work to do here I think: - The AFTER INSERT FOR EACH ROW triggers have *both* leftover rows in the NEW table. Now the docs do say that for AFTER triggers, a named transition table can see all the changes from the *statement* (although that seems pretty weird to me), but the inserts are two *separate* statements. I think the SQL:2011 standard is fairly clear about that. So each time the trigger fires we should still get just one row in the transition table. - The AFTER INSERT FOR EACH STATEMENT triggers never fire. That happens outside ExecInsert (in ExecModifyTable). In fact there is a bunch of stuff in ExecModifyTable that maybe we need to do when we insert leftovers. Do we even need a separate exec node, perhaps wrapping ExecModifyTable? I'm not sure that would give us the correct trigger ordering for the triggers on the implicit insert statement(s) vs the explicit update/delete statement, so maybe it does all need to be part of the single node. But still I think we need to be more careful about memory, especially the per-tuple context. I'll keep working on that, but at least in this round of patches the transition tables aren't missing completely. My plan is still to replace the 'p' amoppurpose operators with just support functions. I want to do that next, although as Peter requested I'll also start focusing more narrowly on the foreign key patches. Rebased to 46a0cd4cef. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v24-0001-Fix-comment-on-gist_stratnum_btree.patch
- v24-0002-Add-GiST-referencedagg-support-func.patch
- v24-0003-Add-temporal-FOREIGN-KEYs.patch
- v24-0004-Add-multi-range_without_portion-proc-operator.patch
- v24-0005-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v24-0006-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v24-0007-Add-PERIODs.patch
On 24.01.24 23:06, Paul Jungwirth wrote: > On 1/24/24 08:32, Peter Eisentraut wrote: > > On 18.01.24 04:59, Paul Jungwirth wrote: > >> Here are new patches consolidating feedback from several emails. > > > > I have committed 0001 and 0002 (the primary key support). > > Thanks Peter! I noticed the comment on gist_stratnum_btree was > out-of-date, so here is a tiny patch correcting it. committed that
I fixed your tests, some of your tests can be simplified, (mainly primary key constraint is unnecessary for the failed tests) also your foreign key patch test table, temporal_rng is created at line 141, and we use it at around line 320. it's hard to get the definition of temporal_rng. I drop the table and recreate it. So people can view the patch with tests more easily. + <para> + In a temporal foreign key, the delete/update will use + <literal>FOR PORTION OF</literal> semantics to constrain the + effect to the bounds being deleted/updated in the referenced row. + </para> in v24-0003-Add-temporal-FOREIGN-KEYs.patch <literal>FOR PORTION OF</literal> not yet implemented, so we should not mention it. + <para> + If the last column is marked with <literal>PERIOD</literal>, + it must be a period or range column, and the referenced table + must have a temporal primary key. can we change "it must be a period or range column" to "it must be a range column", maybe we can add it on another patch.
Attachment
On Mon, Jan 29, 2024 at 8:00 AM jian he <jian.universality@gmail.com> wrote:
>
> I fixed your tests, some of your tests can be simplified, (mainly
> primary key constraint is unnecessary for the failed tests)
> also your foreign key patch test table, temporal_rng is created at
> line 141, and we use it at around line 320.
> it's hard to get the definition of temporal_rng. I drop the table
> and recreate it.
> So people can view the patch with tests more easily.
>
I've attached a new patch that further simplified the tests. (scope
v24 patch's 0002 and 0003)
Please ignore previous email attachments.
I've only applied the v24, 0002, 0003.
seems in doc/src/sgml/ref/create_table.sgml
lack the explanation of `<replaceable
class="parameter">temporal_interval</replaceable>`
since foreign key ON {UPDATE | DELETE} {CASCADE,SET NULL,SET DEFAULT}
not yet supported,
v24-0003 create_table.sgml should reflect that.
+ /*
+ * For FKs with PERIOD we need an operator and aggregate function
+ * to check whether the referencing row's range is contained
+ * by the aggregated ranges of the referenced row(s).
+ * For rangetypes this is fk.periodatt <@ range_agg(pk.periodatt).
+ * FKs will look these up at "runtime", but we should make sure
+ * the lookup works here.
+ */
+ if (is_temporal)
+ FindFKPeriodOpersAndProcs(opclasses[numpks - 1], &periodoperoid,
&periodprocoid);
within the function ATAddForeignKeyConstraint, you called
FindFKPeriodOpersAndProcs,
but never used the computed outputs: periodoperoid, periodprocoid, opclasses.
We validate these(periodoperoid, periodprocoid) at
lookupTRIOperAndProc, FindFKPeriodOpersAndProcs.
I'm not sure whether FindFKPeriodOpersAndProcs in
ATAddForeignKeyConstraint is necessary.
+ * Check if all key values in OLD and NEW are "equivalent":
+ * For normal FKs we check for equality.
+ * For temporal FKs we check that the PK side is a superset of its old value,
+ * or the FK side is a subset.
"or the FK side is a subset." is misleading, should it be something
like "or the FK side is a subset of X"?
+ if (indexStruct->indisexclusion) return i - 1;
+ else return i;
I believe our style should be (with proper indent)
if (indexStruct->indisexclusion)
return i - 1;
else
return i;
in transformFkeyCheckAttrs
+ if (found && is_temporal)
+ {
+ found = false;
+ for (j = 0; j < numattrs + 1; j++)
+ {
+ if (periodattnum == indexStruct->indkey.values[j])
+ {
+ opclasses[numattrs] = indclass->values[j];
+ found = true;
+ break;
+ }
+ }
+ }
can be simplified:
{
found = false;
if (periodattnum == indexStruct->indkey.values[numattrs])
{
opclasses[numattrs] = indclass->values[numattrs];
found = true;
}
}
Also wondering, at the end of the function transformFkeyCheckAttrs `if
(!found)` part:
do we need another error message handle is_temporal is true?
@@ -212,8 +213,11 @@ typedef struct NewConstraint
ConstrType contype; /* CHECK or FOREIGN */
Oid refrelid; /* PK rel, if FOREIGN */
Oid refindid; /* OID of PK's index, if FOREIGN */
+ bool conwithperiod; /* Whether the new FOREIGN KEY uses PERIOD */
Oid conid; /* OID of pg_constraint entry, if FOREIGN */
Node *qual; /* Check expr or CONSTR_FOREIGN Constraint */
+ Oid *operoids; /* oper oids for FOREIGN KEY with PERIOD */
+ Oid *procoids; /* proc oids for FOREIGN KEY with PERIOD */
ExprState *qualstate; /* Execution state for CHECK expr */
} NewConstraint;
primary key can only one WITHOUT OVERLAPS,
so *operoids and *procoids
can be replaced with just
`operoids, procoids`.
Also these two elements in struct NewConstraint not used in v24, 0002, 0003.
>
> I fixed your tests, some of your tests can be simplified, (mainly
> primary key constraint is unnecessary for the failed tests)
> also your foreign key patch test table, temporal_rng is created at
> line 141, and we use it at around line 320.
> it's hard to get the definition of temporal_rng. I drop the table
> and recreate it.
> So people can view the patch with tests more easily.
>
I've attached a new patch that further simplified the tests. (scope
v24 patch's 0002 and 0003)
Please ignore previous email attachments.
I've only applied the v24, 0002, 0003.
seems in doc/src/sgml/ref/create_table.sgml
lack the explanation of `<replaceable
class="parameter">temporal_interval</replaceable>`
since foreign key ON {UPDATE | DELETE} {CASCADE,SET NULL,SET DEFAULT}
not yet supported,
v24-0003 create_table.sgml should reflect that.
+ /*
+ * For FKs with PERIOD we need an operator and aggregate function
+ * to check whether the referencing row's range is contained
+ * by the aggregated ranges of the referenced row(s).
+ * For rangetypes this is fk.periodatt <@ range_agg(pk.periodatt).
+ * FKs will look these up at "runtime", but we should make sure
+ * the lookup works here.
+ */
+ if (is_temporal)
+ FindFKPeriodOpersAndProcs(opclasses[numpks - 1], &periodoperoid,
&periodprocoid);
within the function ATAddForeignKeyConstraint, you called
FindFKPeriodOpersAndProcs,
but never used the computed outputs: periodoperoid, periodprocoid, opclasses.
We validate these(periodoperoid, periodprocoid) at
lookupTRIOperAndProc, FindFKPeriodOpersAndProcs.
I'm not sure whether FindFKPeriodOpersAndProcs in
ATAddForeignKeyConstraint is necessary.
+ * Check if all key values in OLD and NEW are "equivalent":
+ * For normal FKs we check for equality.
+ * For temporal FKs we check that the PK side is a superset of its old value,
+ * or the FK side is a subset.
"or the FK side is a subset." is misleading, should it be something
like "or the FK side is a subset of X"?
+ if (indexStruct->indisexclusion) return i - 1;
+ else return i;
I believe our style should be (with proper indent)
if (indexStruct->indisexclusion)
return i - 1;
else
return i;
in transformFkeyCheckAttrs
+ if (found && is_temporal)
+ {
+ found = false;
+ for (j = 0; j < numattrs + 1; j++)
+ {
+ if (periodattnum == indexStruct->indkey.values[j])
+ {
+ opclasses[numattrs] = indclass->values[j];
+ found = true;
+ break;
+ }
+ }
+ }
can be simplified:
{
found = false;
if (periodattnum == indexStruct->indkey.values[numattrs])
{
opclasses[numattrs] = indclass->values[numattrs];
found = true;
}
}
Also wondering, at the end of the function transformFkeyCheckAttrs `if
(!found)` part:
do we need another error message handle is_temporal is true?
@@ -212,8 +213,11 @@ typedef struct NewConstraint
ConstrType contype; /* CHECK or FOREIGN */
Oid refrelid; /* PK rel, if FOREIGN */
Oid refindid; /* OID of PK's index, if FOREIGN */
+ bool conwithperiod; /* Whether the new FOREIGN KEY uses PERIOD */
Oid conid; /* OID of pg_constraint entry, if FOREIGN */
Node *qual; /* Check expr or CONSTR_FOREIGN Constraint */
+ Oid *operoids; /* oper oids for FOREIGN KEY with PERIOD */
+ Oid *procoids; /* proc oids for FOREIGN KEY with PERIOD */
ExprState *qualstate; /* Execution state for CHECK expr */
} NewConstraint;
primary key can only one WITHOUT OVERLAPS,
so *operoids and *procoids
can be replaced with just
`operoids, procoids`.
Also these two elements in struct NewConstraint not used in v24, 0002, 0003.
Attachment
I have done a review of the temporal foreign key patches in this patch series (0002 and 0003, v24). The patch set needs a rebase across c85977d8fef. I was able to do it manually, but it's a bit tricky, so perhaps you can post a new set to help future reviews. (Also, the last (0007) patch has some compiler warnings and also causes the pg_upgrade test to fail. I didn't check this further, but that's why the cfbot is all red.) In summary, in principle, this all looks more or less correct to me. As a general comment, we need to figure out the right terminology "period" vs. "temporal", especially if we are going to commit these features incrementally. But I didn't look at this too hard here yet. * v24-0002-Add-GiST-referencedagg-support-func.patch Do we really need this level of generality? Are there examples not using ranges that would need a different aggregate function? Maybe something with geometry (points and lines)? But it seems to me that then we'd also need some equivalent to "without portion" support for those types and a multirange equivalent (basically another gist support function wrapped around the 0004 patch). * v24-0003-Add-temporal-FOREIGN-KEYs.patch - contrib/btree_gist/expected/without_overlaps.out - contrib/btree_gist/sql/without_overlaps.sql typo "exusts" - doc/src/sgml/ref/create_table.sgml This mentions FOR PORTION OF from a later patch. It is not documented that SET NULL and SET DEFAULT are not supported, even though that is added in a later patch. (So this patch should say that it's not supported, and then the later patch should remove that.) - src/backend/commands/indexcmds.c The changes to GetOperatorFromWellKnownStrategy() don't work for message translations. We had discussed a similar issue for this function previously. I think it's ok to leave the function as it was. The additional context could be added with location pointers or errcontext() maybe, but it doesn't seem that important for now. - src/backend/commands/tablecmds.c The changes in ATAddForeignKeyConstraint(), which are the meat of the changes in this file, are very difficult to review in detail. I tried different git-diff options to get a sensible view, but it wasn't helpful. Do we need to do some separate refactoring here first? The error message "action not supported for temporal foreign keys" could be more detailed, mention the action. Look for example how the error for the generated columns is phrased. (But note that for generated columns, the actions are impossible to support, whereas here it is just something not done yet. So there should probably still be different error codes.) - src/backend/nodes/outfuncs.c - src/backend/nodes/readfuncs.c Perhaps you would like to review my patch 0001 in <https://www.postgresql.org/message-id/859d6155-e361-4a05-8db3-4aa1f007ff28@eisentraut.org>, which removes the custom out/read functions for the Constraint node. Then you could get rid of these changes. - src/backend/utils/adt/ri_triggers.c The added #include "catalog/pg_range.h" doesn't appear to be used for anything. Maybe we can avoid the added #include "commands/tablecmds.h" by putting the common function in some appropriate lower-level module. typo "PEROID" Renaming of ri_KeysEqual() to ri_KeysStable() doesn't improve clarity, I think. I think we can leave the old name and add a comment (as you have done). There is a general understanding around this feature set that "equal" sometimes means "contained" or something like that. The function ri_RangeAttributeNeedsCheck() could be documented better. It's bit terse and unclear. From the code, it looks like it is used instead of row equality checks. Maybe a different function name would be suitable. Various unnecessary reformatting in RI_FKey_check(). When assembling the SQL commands, you need to be very careful about fully quoting and schema-qualifying everything. See for example ri_GenerateQual(). Have you checked that the generated queries can use indexes and have suitable performance? Do you have example execution plans maybe? - src/backend/utils/adt/ruleutils.c This seems ok in principle, but it's kind of weird that the new argument of decompile_column_index_array() is called "withPeriod" (which seems appropriate seeing what it does), but what we are passing in is conwithoutoverlaps. Maybe we need to reconsider the naming of the constraint column? Sorry, I made you change it from "contemporal" or something, didn't I? Maybe "conperiod" would cover both meanings better? - src/backend/utils/cache/lsyscache.c get_func_name_and_namespace(): This function would at least need some identifier quoting. There is only one caller (lookupTRIOperAndProc), so let's just put this code inline there; it's not worth a separate global function. (Also, you could use psprintf() here to simplify palloc() + snprintf().) - src/include/catalog/pg_constraint.h You are changing in several comments "equality" to "comparison". I suspect you effectively mean "equality or containment"? Maybe "comparison" is too subtle to convey that meaning? Maybe be more explicit. You are changing a foreign key from DECLARE_ARRAY_FOREIGN_KEY to DECLARE_ARRAY_FOREIGN_KEY_OPT. Add a comment about it, like the one just above has. - src/include/catalog/pg_proc.dat For the names of the trigger functions, maybe instead of TRI_FKey_check_ins something like RI_FKey_period_check_ins so that all RI trigger functions group under a common prefix. On second thought, do we even need separate functions for this? Looking at ri_triggers.c, the temporal and non-temporal functions are the same, and all the differences are handled in the underlying implementation functions. - src/include/nodes/parsenodes.h The constants FKCONSTR_PERIOD_OP_CONTAINED_BY and FKCONSTR_PERIOD_PROC_REFERENCED_AGG could use more documentation here. For the Constraint struct, don't we just need a bool field saying "this is a period FK", and then we'd know that the last column is the period? Like we did for the primary keys (bool without_overlaps). - src/include/parser/kwlist.h For this patch, the keyword PERIOD can be unreserved. But it apparently will need to be reserved later for the patch that introduces PERIOD columns. Maybe it would make sense to leave it unreserved for this patch and upgrade it in the later one.
Hi more minor issues. + FindFKComparisonOperators( + fkconstraint, tab, i, fkattnum, + &old_check_ok, &old_pfeqop_item, + pktypoid[i], fktypoid[i], opclasses[i], + is_temporal, false, + &pfeqoperators[i], &ppeqoperators[i], &ffeqoperators[i]); + } + if (is_temporal) { + pkattnum[numpks] = pkperiodattnum; + pktypoid[numpks] = pkperiodtypoid; + fkattnum[numpks] = fkperiodattnum; + fktypoid[numpks] = fkperiodtypoid; - pfeqop = get_opfamily_member(opfamily, opcintype, fktyped, - eqstrategy); - if (OidIsValid(pfeqop)) - { - pfeqop_right = fktyped; - ffeqop = get_opfamily_member(opfamily, fktyped, fktyped, - eqstrategy); - } - else - { - /* keep compiler quiet */ - pfeqop_right = InvalidOid; - ffeqop = InvalidOid; - } + FindFKComparisonOperators( + fkconstraint, tab, numpks, fkattnum, + &old_check_ok, &old_pfeqop_item, + pkperiodtypoid, fkperiodtypoid, opclasses[numpks], + is_temporal, true, + &pfeqoperators[numpks], &ppeqoperators[numpks], &ffeqoperators[numpks]); + numfks += 1; + numpks += 1; + } opening curly brace should be the next line, also do you think it's good idea to add following in the `if (is_temporal)` branch ` Assert(OidIsValid(fkperiodtypoid) && OidIsValid(pkperiodtypoid)); Assert(OidIsValid(pkperiodattnum > 0 && fkperiodattnum > 0)); ` ` if (is_temporal)` branch, you can set the FindFKComparisonOperators 10th argument (is_temporal) to true, since you are already in the ` if (is_temporal)` branch. maybe we need some extra comments on ` + numfks += 1; + numpks += 1; ` since it might not be that evident? Do you think it's a good idea to list arguments line by line (with good indentation) is good format? like: FindFKComparisonOperators(fkconstraint, tab, i, fkattnum, &old_check_ok, &old_pfeqop_item, pktypoid[i], fktypoid[i], opclasses[i], false, false, &pfeqoperators[i], &ppeqoperators[i], &ffeqoperators[i]);
Hello, Here is another patch series for application time. It addresses the feedback from the last few emails. Details below: On 1/28/24 16:00, jian he wrote: > + <para> > + In a temporal foreign key, the delete/update will use > + <literal>FOR PORTION OF</literal> semantics to constrain the > + effect to the bounds being deleted/updated in the referenced row. > + </para> > > in v24-0003-Add-temporal-FOREIGN-KEYs.patch > <literal>FOR PORTION OF</literal> not yet implemented, so we should > not mention it. Fixed. > + <para> > + If the last column is marked with <literal>PERIOD</literal>, > + it must be a period or range column, and the referenced table > + must have a temporal primary key. > can we change "it must be a period or range column" to "it must be a > range column", maybe we can add it on another patch. Rewrote this section to be clearer. On 2/1/24 21:53, jian he wrote: > I've attached a new patch that further simplified the tests. (scope > v24 patch's 0002 and 0003) > Please ignore previous email attachments. Thanks, I've pulled in most of these changes to the tests. > I've only applied the v24, 0002, 0003. > seems in doc/src/sgml/ref/create_table.sgml > lack the explanation of `<replaceable > class="parameter">temporal_interval</replaceable>` You're right. Actually I think it is clearer without adding a separate name here, so I've updated the docs to use `column_name | period_name`. > since foreign key ON {UPDATE | DELETE} {CASCADE,SET NULL,SET DEFAULT} > not yet supported, > v24-0003 create_table.sgml should reflect that. Updated. > within the function ATAddForeignKeyConstraint, you called > FindFKPeriodOpersAndProcs, > but never used the computed outputs: periodoperoid, periodprocoid, opclasses. > We validate these(periodoperoid, periodprocoid) at > lookupTRIOperAndProc, FindFKPeriodOpersAndProcs. > I'm not sure whether FindFKPeriodOpersAndProcs in > ATAddForeignKeyConstraint is necessary. This is explained in the comment above: we will do the same lookup when the foreign key is checked, but we should make sure it works now so we can report the problem to the user. > + * Check if all key values in OLD and NEW are "equivalent": > + * For normal FKs we check for equality. > + * For temporal FKs we check that the PK side is a superset of its old value, > + * or the FK side is a subset. > "or the FK side is a subset." is misleading, should it be something > like "or the FK side is a subset of X"? Okay, changed. > + if (indexStruct->indisexclusion) return i - 1; > + else return i; > > I believe our style should be (with proper indent) > if (indexStruct->indisexclusion) > return i - 1; > else > return i; Fixed. > in transformFkeyCheckAttrs > + if (found && is_temporal) > + { > + found = false; > + for (j = 0; j < numattrs + 1; j++) > + { > + if (periodattnum == indexStruct->indkey.values[j]) > + { > + opclasses[numattrs] = indclass->values[j]; > + found = true; > + break; > + } > + } > + } > > can be simplified: > { > found = false; > if (periodattnum == indexStruct->indkey.values[numattrs]) > { > opclasses[numattrs] = indclass->values[numattrs]; > found = true; > } > } Changed. > Also wondering, at the end of the function transformFkeyCheckAttrs `if > (!found)` part: > do we need another error message handle is_temporal is true? I think the existing error message works well for both temporal and non-temporal cases. > @@ -212,8 +213,11 @@ typedef struct NewConstraint > ConstrType contype; /* CHECK or FOREIGN */ > Oid refrelid; /* PK rel, if FOREIGN */ > Oid refindid; /* OID of PK's index, if FOREIGN */ > + bool conwithperiod; /* Whether the new FOREIGN KEY uses PERIOD */ > Oid conid; /* OID of pg_constraint entry, if FOREIGN */ > Node *qual; /* Check expr or CONSTR_FOREIGN Constraint */ > + Oid *operoids; /* oper oids for FOREIGN KEY with PERIOD */ > + Oid *procoids; /* proc oids for FOREIGN KEY with PERIOD */ > ExprState *qualstate; /* Execution state for CHECK expr */ > } NewConstraint; > primary key can only one WITHOUT OVERLAPS, > so *operoids and *procoids > can be replaced with just > `operoids, procoids`. > Also these two elements in struct NewConstraint not used in v24, 0002, 0003. I've removed these entirely. Sorry, they were leftover from an earlier revision. On 2/12/24 01:55, Peter Eisentraut wrote: > (Also, the last (0007) patch has some compiler warnings and also > causes the pg_upgrade test to fail. I didn't check this further, but > that's why the cfbot is all red.) Fixed the pg_upgrade problem. I'm not seeing compiler warnings. If they still exist can you point me to those? > As a general comment, we need to figure out the right terminology > "period" vs. "temporal", especially if we are going to commit these > features incrementally. But I didn't look at this too hard here yet. Agreed. I think it is okay to use "temporal" in the docs for the feature in general, if we clarify that non-temporal values are also supported. That is what the rest of the world calls this kind of thing. The word "period" is confusing because it can be the `PERIOD` keyword used in temporal FKs, or also the SQL:2011 `PERIOD` object that is like our range types. And then we also have ranges, etc. In the past I was using "interval" to mean "range or PERIOD" (and "interval" is used by Date in his temporal book), but perhaps that is too idiosyncratic. I've removed "interval" from the FK docs, and instead I've tried to be very explicit and avoid ambiguity. (I haven't given as much attention to cleaning up the later patches' docs yet.) > * v24-0002-Add-GiST-referencedagg-support-func.patch > > Do we really need this level of generality? Are there examples not > using ranges that would need a different aggregate function? Maybe > something with geometry (points and lines)? But it seems to me that > then we'd also need some equivalent to "without portion" support for > those types and a multirange equivalent (basically another gist > support function wrapped around the 0004 patch). I'm not sure how else to do it. The issue is that `range_agg` returns a multirange, so the result type doesn't match the inputs. But other types will likely have the same problem: to combine boxes you may need a multibox. The combine mdranges you may need a multimdrange. I agree we need something to support "without portion" too. The patches here give implementations for ranges and multiranges. But that is for `FOR PORTION OF`, so it comes after the foreign key patches (part 5 here). Btw that part changed a bit since v24 because as jian he pointed out, our type system doesn't support anyrange inputs and an anyrange[] output. So I changed the support funcs to use SETOF. I could alternately add anyrangearray and anymultirangearray pseudotypes. It's not the first time I've wanted those, so I'd be happy to go that way if folks are open to it. It seems like it should be a totally separate patch though. > * v24-0003-Add-temporal-FOREIGN-KEYs.patch > > - contrib/btree_gist/expected/without_overlaps.out > - contrib/btree_gist/sql/without_overlaps.sql > > typo "exusts" Fixed. > - doc/src/sgml/ref/create_table.sgml > > This mentions FOR PORTION OF from a later patch. > > It is not documented that SET NULL and SET DEFAULT are not supported, > even though that is added in a later patch. (So this patch should say > that it's not supported, and then the later patch should remove that.) All fixed. > - src/backend/commands/indexcmds.c > > The changes to GetOperatorFromWellKnownStrategy() don't work for > message translations. We had discussed a similar issue for this > function previously. I think it's ok to leave the function as it was. > The additional context could be added with location pointers or > errcontext() maybe, but it doesn't seem that important for now. Okay I've tried a different approach here that should fit better with t9n. Let me know if it still needs work. > - src/backend/commands/tablecmds.c > > The changes in ATAddForeignKeyConstraint(), which are the meat of the > changes in this file, are very difficult to review in detail. I tried > different git-diff options to get a sensible view, but it wasn't > helpful. Do we need to do some separate refactoring here first? I moved the FindFKComparisonOperators refactor into a separate patch, and that seems to confuse git less. Your suggestion to group the PERIOD attribute with the others (below) also helped a lot to cut down the diff here. In fact it means I only call FindFKComparisonOperators once, so pulling it into a separate method is not even necessary anymore. But I do think it helps simplify what's already a very long function, so I've left it in. Let me know if more work is needed here. > The error message "action not supported for temporal foreign keys" > could be more detailed, mention the action. Look for example how the > error for the generated columns is phrased. (But note that for > generated columns, the actions are impossible to support, whereas here > it is just something not done yet. So there should probably still be > different error codes.) Fixed. > - src/backend/nodes/outfuncs.c > - src/backend/nodes/readfuncs.c > > Perhaps you would like to review my patch 0001 in > <https://www.postgresql.org/message-id/859d6155-e361-4a05-8db3-4aa1f007ff28@eisentraut.org>, > which removes the custom out/read functions for the Constraint node. > Then you could get rid of these changes. That is a nice improvement! > - src/backend/utils/adt/ri_triggers.c > > The added #include "catalog/pg_range.h" doesn't appear to be used for > anything. Removed. > Maybe we can avoid the added #include "commands/tablecmds.h" by > putting the common function in some appropriate lower-level module. Moved to pg_constraint.{c,h}. > typo "PEROID" Fixed. > Renaming of ri_KeysEqual() to ri_KeysStable() doesn't improve clarity, > I think. I think we can leave the old name and add a comment (as you > have done). There is a general understanding around this feature set > that "equal" sometimes means "contained" or something like that. Okay. > The function ri_RangeAttributeNeedsCheck() could be documented better. > It's bit terse and unclear. From the code, it looks like it is used > instead of row equality checks. Maybe a different function name would > be suitable. I realized I could simplify this a lot and reuse ri_AttributesEqual, so the whole method is gone now. > Various unnecessary reformatting in RI_FKey_check(). Fixed, sorry about that. > When assembling the SQL commands, you need to be very careful about > fully quoting and schema-qualifying everything. See for example > ri_GenerateQual(). Went through everything and added quoting & schemes to a few places that were missing it. > Have you checked that the generated queries can use indexes and have > suitable performance? Do you have example execution plans maybe? The plans look good to me. Here are some tests: -- test when inserting/updating the FK side: regression=# explain analyze select 1 from ( select valid_at as r from only temporal_rng x where id = '[8,8]' and valid_at && '[2010-01-01,2012-01-01)' for key share of x ) x1 having '[2010-01-01,2012-01-01)'::tsrange <@ range_agg(x1.r); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=8.19..8.20 rows=1 width=4) (actual time=0.165..0.167 rows=0 loops=1) Filter: ('["2010-01-01 00:00:00","2012-01-01 00:00:00")'::tsrange <@ range_agg(x1.r)) Rows Removed by Filter: 1 -> Subquery Scan on x1 (cost=0.14..8.18 rows=1 width=32) (actual time=0.152..0.153 rows=0 loops=1) -> LockRows (cost=0.14..8.17 rows=1 width=38) (actual time=0.151..0.151 rows=0 loops=1) -> Index Scan using temporal_rng_pk on temporal_rng x (cost=0.14..8.16 rows=1 width=38) (actual time=0.150..0.150 rows=0 loops=1) Index Cond: ((id = '[8,9)'::int4range) AND (valid_at && '["2010-01-01 00:00:00","2012-01-01 00:00:00")'::tsrange)) Planning Time: 0.369 ms Execution Time: 0.289 ms (9 rows) -- test when deleting/updating from the PK side: regression=# explain analyze select 1 from only temporal_rng x where id = '[8,8]' and valid_at && '[2010-01-01,2012-01-01)' for key share of x; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- LockRows (cost=0.14..8.17 rows=1 width=10) (actual time=0.079..0.079 rows=0 loops=1) -> Index Scan using temporal_rng_pk on temporal_rng x (cost=0.14..8.16 rows=1 width=10) (actual time=0.078..0.078 rows=0 loops=1) Index Cond: ((id = '[8,9)'::int4range) AND (valid_at && '["2010-01-01 00:00:00","2012-01-01 00:00:00")'::tsrange)) Planning Time: 0.249 ms Execution Time: 0.123 ms (5 rows) I will do some further tests with more rows, but I haven't yet. > - src/backend/utils/adt/ruleutils.c > > This seems ok in principle, but it's kind of weird that the new > argument of decompile_column_index_array() is called "withPeriod" > (which seems appropriate seeing what it does), but what we are passing > in is conwithoutoverlaps. Maybe we need to reconsider the naming of > the constraint column? Sorry, I made you change it from "contemporal" > or something, didn't I? Maybe "conperiod" would cover both meanings > better? Certainly conperiod is easier to read. Since we are using it for PK/UNIQUE/FKs, conperiod also seems like a better match. FKs don't use WITHOUT OVERLAPS syntax, and OTOH PK/UNIQUEs will still accept a PERIOD (eventually, also a range/etc now). I've renamed it, but since the old name was already committed with the PK patch, I've broken the renaming into a separate patch that could be committed without anything else. > - src/backend/utils/cache/lsyscache.c > > get_func_name_and_namespace(): This function would at least need some > identifier quoting. There is only one caller (lookupTRIOperAndProc), > so let's just put this code inline there; it's not worth a separate > global function. (Also, you could use psprintf() here to simplify > palloc() + snprintf().) Removed. > - src/include/catalog/pg_constraint.h > > You are changing in several comments "equality" to "comparison". I > suspect you effectively mean "equality or containment"? Maybe > "comparison" is too subtle to convey that meaning? Maybe be more > explicit. Okay, changed. > You are changing a foreign key from DECLARE_ARRAY_FOREIGN_KEY to > DECLARE_ARRAY_FOREIGN_KEY_OPT. Add a comment about it, like the one > just above has. I don't need this change at all now that we're using GENERATED columns for PERIODs, so I've taken it out. > - src/include/catalog/pg_proc.dat > > For the names of the trigger functions, maybe instead of > > TRI_FKey_check_ins > > something like > > RI_FKey_period_check_ins > > so that all RI trigger functions group under a common prefix. Renamed. > On second thought, do we even need separate functions for this? > Looking at ri_triggers.c, the temporal and non-temporal functions are > the same, and all the differences are handled in the underlying > implementation functions. My thinking was to avoid making the non-temporal functions suffer in performance and complexity. What do you think? I've kept the separate functions here but I can combine them if you like. > - src/include/nodes/parsenodes.h > > The constants FKCONSTR_PERIOD_OP_CONTAINED_BY and > FKCONSTR_PERIOD_PROC_REFERENCED_AGG could use more documentation here. Removed. They are obsolete now (and were already in v24---sorry!). > For the Constraint struct, don't we just need a bool field saying > "this is a period FK", and then we'd know that the last column is the > period? Like we did for the primary keys (bool without_overlaps). Okay, changed. Also in ATExecAddConstraint we can treat the PERIOD element like any other FK element, which simplifies the changes there a lot. > - src/include/parser/kwlist.h > > For this patch, the keyword PERIOD can be unreserved. But it > apparently will need to be reserved later for the patch that > introduces PERIOD columns. Maybe it would make sense to leave it > unreserved for this patch and upgrade it in the later one. I tried doing this but got a shift/reduce conflict, so it's still reserved here. Thanks, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v25-0001-Rename-conwithoutoverlaps-to-conperiod.patch
- v25-0002-Add-GiST-referencedagg-support-func.patch
- v25-0003-Refactor-FK-operator-lookup.patch
- v25-0004-Add-temporal-FOREIGN-KEYs.patch
- v25-0005-Add-support-funcs-for-FOR-PORTION-OF.patch
- v25-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v25-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v25-0008-Add-PERIODs.patch
On 2/13/24 21:00, jian he wrote: > Hi > more minor issues. > > + FindFKComparisonOperators( > + fkconstraint, tab, i, fkattnum, > + &old_check_ok, &old_pfeqop_item, > + pktypoid[i], fktypoid[i], opclasses[i], > + is_temporal, false, > + &pfeqoperators[i], &ppeqoperators[i], &ffeqoperators[i]); > + } > + if (is_temporal) { > + pkattnum[numpks] = pkperiodattnum; > + pktypoid[numpks] = pkperiodtypoid; > + fkattnum[numpks] = fkperiodattnum; > + fktypoid[numpks] = fkperiodtypoid; > > - pfeqop = get_opfamily_member(opfamily, opcintype, fktyped, > - eqstrategy); > - if (OidIsValid(pfeqop)) > - { > - pfeqop_right = fktyped; > - ffeqop = get_opfamily_member(opfamily, fktyped, fktyped, > - eqstrategy); > - } > - else > - { > - /* keep compiler quiet */ > - pfeqop_right = InvalidOid; > - ffeqop = InvalidOid; > - } > + FindFKComparisonOperators( > + fkconstraint, tab, numpks, fkattnum, > + &old_check_ok, &old_pfeqop_item, > + pkperiodtypoid, fkperiodtypoid, opclasses[numpks], > + is_temporal, true, > + &pfeqoperators[numpks], &ppeqoperators[numpks], &ffeqoperators[numpks]); > + numfks += 1; > + numpks += 1; > + } > > opening curly brace should be the next line, Fixed in v25 (submitted in my other email). > also do you think it's > good idea to add following in the `if (is_temporal)` branch > ` > Assert(OidIsValid(fkperiodtypoid) && OidIsValid(pkperiodtypoid)); > Assert(OidIsValid(pkperiodattnum > 0 && fkperiodattnum > 0)); > ` > > ` if (is_temporal)` branch, you can set the FindFKComparisonOperators > 10th argument (is_temporal) > to true, since you are already in the ` if (is_temporal)` branch. > > maybe we need some extra comments on > ` > + numfks += 1; > + numpks += 1; > ` > since it might not be that evident? That branch doesn't exist anymore. Same with the increments. > Do you think it's a good idea to list arguments line by line (with > good indentation) is good format? like: > FindFKComparisonOperators(fkconstraint, > tab, > i, > fkattnum, > &old_check_ok, > &old_pfeqop_item, > pktypoid[i], > fktypoid[i], > opclasses[i], > false, > false, > &pfeqoperators[i], > &ppeqoperators[i], > &ffeqoperators[i]); There are places we do that, but most code I've seen tries to fill the line. I haven't followed that strictly here, but I'm trying to get better at doing what pg_indent wants. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 2/29/24 13:16, Paul Jungwirth wrote: > Hello, > > Here is another patch series for application time. Here is a v26 patch series to fix a cfbot failure in sepgsql. Rebased to 655dc31046. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v26-0001-Rename-conwithoutoverlaps-to-conperiod.patch
- v26-0002-Add-GiST-referencedagg-support-func.patch
- v26-0003-Refactor-FK-operator-lookup.patch
- v26-0004-Add-temporal-FOREIGN-KEYs.patch
- v26-0005-Add-support-funcs-for-FOR-PORTION-OF.patch
- v26-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v26-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v26-0008-Add-PERIODs.patch
On 3/1/24 12:38, Paul Jungwirth wrote: > On 2/29/24 13:16, Paul Jungwirth wrote: > Here is a v26 patch series to fix a cfbot failure in sepgsql. Rebased to 655dc31046. v27 attached, fixing some cfbot failures from headerscheck+cpluspluscheck. Sorry for the noise! Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v27-0001-Rename-conwithoutoverlaps-to-conperiod.patch
- v27-0002-Add-GiST-referencedagg-support-func.patch
- v27-0003-Refactor-FK-operator-lookup.patch
- v27-0004-Add-temporal-FOREIGN-KEYs.patch
- v27-0005-Add-support-funcs-for-FOR-PORTION-OF.patch
- v27-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v27-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v27-0008-Add-PERIODs.patch
On 01.03.24 22:56, Paul Jungwirth wrote: > On 3/1/24 12:38, Paul Jungwirth wrote: >> On 2/29/24 13:16, Paul Jungwirth wrote: >> Here is a v26 patch series to fix a cfbot failure in sepgsql. Rebased >> to 655dc31046. > > v27 attached, fixing some cfbot failures from > headerscheck+cpluspluscheck. Sorry for the noise! I had committed v27-0001-Rename-conwithoutoverlaps-to-conperiod.patch a little while ago. I have reviewed v27-0002 through 0004 now. I have one semantic question below, and there are a few places where more clarification of the interfaces could help. Other than that, I think this is pretty good. Attached is a small patch that changes the PERIOD keyword to unreserved for this patch. You had said earlier that this didn't work for you. The attached patch works for me when applied on top of 0003. * v27-0002-Add-GiST-referencedagg-support-func.patch You wrote: > I'm not sure how else to do it. The issue is that `range_agg` returns > a multirange, so the result > type doesn't match the inputs. But other types will likely have the > same problem: to combine boxes > you may need a multibox. The combine mdranges you may need a > multimdrange. Can we just hardcode the use of range_agg for this release? Might be easier. I don't see all this generality being useful in the near future. > Btw that part changed a bit since v24 because as jian he pointed out, > our type system doesn't > support anyrange inputs and an anyrange[] output. So I changed the > support funcs to use SETOF. I didn't see any SETOF stuff in the patch, or I didn't know where to look. I'm not sure I follow all the details here. So more explanations of any kind could be helpful. * v27-0003-Refactor-FK-operator-lookup.patch I suggest to skip this refactoring patch. I don't think the way this is sliced up is all that great, and it doesn't actually help with the subsequent patches. * v27-0004-Add-temporal-FOREIGN-KEYs.patch - src/backend/catalog/pg_constraint.c FindFKPeriodOpersAndProcs() could use a bit more top-level documentation. Where does the input opclass come from? What are the three output values? What is the business with "symmetric types"? - src/backend/commands/indexcmds.c GetOperatorFromWellKnownStrategy() is apparently changed to accept InvalidOid for rhstype, but the meaning of this is not explained in the function header. It's also not clear to me why an existing caller is changed. This should be explained more thoroughly. - src/backend/commands/tablecmds.c is_temporal and similar should be renamed to with_period or similar throughout this patch. In transformFkeyGetPrimaryKey(): * Now build the list of PK attributes from the indkey definition (we - * assume a primary key cannot have expressional elements) + * assume a primary key cannot have expressional elements, unless it + * has a PERIOD) I think the original statement is still true even with PERIOD. The expressional elements refer to expression indexes. I don't think we can have a PERIOD marker on an expression? - src/backend/utils/adt/ri_triggers.c Please remove the separate trigger functions for the period case. They are the same as the non-period ones, so we don't need separate ones. The difference is handled lower in the call stack, which I think is a good setup. Removing the separate functions also removes a lot of extra code in other parts of the patch. - src/include/catalog/pg_constraint.h Should also update catalogs.sgml accordingly. - src/test/regress/expected/without_overlaps.out - src/test/regress/sql/without_overlaps.sql A few general comments on the tests: - In the INSERT commands, specify the column names explicitly. This makes the tests easier to read (especially since the column order between the PK and the FK table is sometimes different). - Let's try to make it so that the inserted literals match the values shown in the various error messages, so it's easier to match them up. So, change the int4range literals to half-open notation. And also maybe change the date output format to ISO. - In various comments, instead of test FK "child", maybe use "referencing table"? Instead of "parent", use "referenced table" (or primary key table). When I read child and parent I was looking for inheritance. - Consider truncating the test tables before each major block of tests and refilling them with fresh data. So it's easier to eyeball the tests. Otherwise, there is too much dependency on what earlier tests left behind. A specific question: In this test, a PERIOD marker on the referenced site is automatically inferred from the primary key: +-- with inferred PK on the referenced table: +CREATE TABLE temporal_fk_rng2rng ( + id int4range, + valid_at tsrange, + parent_id int4range, + CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS), + CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) + REFERENCES temporal_rng +); In your patch, this succeeds. According to the SQL standard, it should not. In subclause 11.8, syntax rule 4b: """ Otherwise, the table descriptor of the referenced table shall include a unique constraint UC that specifies PRIMARY KEY. The table constraint descriptor of UC shall not include an application time period name. """ So this case is apparently explicitly ruled out. (It might be ok to make an extension here, but then we should be explicit about it.)
Attachment
+ <para> + If the last column is marked with <literal>PERIOD</literal>, + it is treated in a special way. + While the non-<literal>PERIOD</literal> columns are treated normally + (and there must be at least one of them), + the <literal>PERIOD</literal> column is not compared for equality. + Instead the constraint is considered satisfied + if the referenced table has matching records + (based on the non-<literal>PERIOD</literal> parts of the key) + whose combined <literal>PERIOD</literal> values completely cover + the referencing record's. + In other words, the reference must have a referent for its entire duration. + Normally this column would be a range or multirange type, + although any type whose GiST opclass has a "contained by" operator + and a <literal>referenced_agg</literal> support function is allowed. + (See <xref linkend="gist-extensibility"/>.) + In addition the referenced table must have a primary key + or unique constraint declared with <literal>WITHOUT PORTION</literal>. + </para> typo "referenced_agg", in the gist-extensibility.html page is "referencedagg" <literal>WITHOUT PORTION</literal> should be <literal>WITHOUT OVERLAPS</literal> + While the non-<literal>PERIOD</literal> columns are treated normally + (and there must be at least one of them), + the <literal>PERIOD</literal> column is not compared for equality. the above sentence didn't say what is "normally"? maybe we can do the following: + While the non-<literal>PERIOD</literal> columns are treated + normally for equality + (and there must be at least one of them), + the <literal>PERIOD</literal> column is not compared for equality. +<programlisting> +Datum +my_range_agg_transfn(PG_FUNCTION_ARGS) +{ + MemoryContext aggContext; + Oid rngtypoid; + ArrayBuildState *state; + + if (!AggCheckCallContext(fcinfo, &aggContext)) + elog(ERROR, "range_agg_transfn called in non-aggregate context"); + + rngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1); + if (!type_is_range(rngtypoid)) + elog(ERROR, "range_agg must be called with a range"); + + if (PG_ARGISNULL(0)) + state = initArrayResult(rngtypoid, aggContext, false); + else + state = (ArrayBuildState *) PG_GETARG_POINTER(0); + + /* skip NULLs */ + if (!PG_ARGISNULL(1)) + accumArrayResult(state, PG_GETARG_DATUM(1), false, rngtypoid, aggContext); + + PG_RETURN_POINTER(state); +} + +Datum +my_range_agg_finalfn(PG_FUNCTION_ARGS) +{ + MemoryContext aggContext; + Oid mltrngtypoid; + TypeCacheEntry *typcache; + ArrayBuildState *state; + int32 range_count; + RangeType **ranges; + int i; + + if (!AggCheckCallContext(fcinfo, &aggContext)) + elog(ERROR, "range_agg_finalfn called in non-aggregate context"); + + state = PG_ARGISNULL(0) ? NULL : (ArrayBuildState *) PG_GETARG_POINTER(0); + if (state == NULL) + /* This shouldn't be possible, but just in case.... */ + PG_RETURN_NULL(); + + /* Also return NULL if we had zero inputs, like other aggregates */ + range_count = state->nelems; + if (range_count == 0) + PG_RETURN_NULL(); + + mltrngtypoid = get_fn_expr_rettype(fcinfo->flinfo); + typcache = multirange_get_typcache(fcinfo, mltrngtypoid); + + ranges = palloc0(range_count * sizeof(RangeType *)); + for (i = 0; i < range_count; i++) + ranges[i] = DatumGetRangeTypeP(state->dvalues[i]); + + PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid, typcache->rngtype, range_count, ranges)); +} my_range_agg_transfn error message is inconsistent? `elog(ERROR, "range_agg_transfn called in non-aggregate context");` `elog(ERROR, "range_agg must be called with a range");` maybe just `my_range_agg_transfn`, instead of mention {range_agg_transfn|range_agg} similarly my_range_agg_finalfn error is also inconsistent. my_range_agg_finalfn need `type_is_multirange(mltrngtypoid)`?
On Mon, Mar 11, 2024 at 3:46 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > A few general comments on the tests: > > - In the INSERT commands, specify the column names explicitly. This > makes the tests easier to read (especially since the column order > between the PK and the FK table is sometimes different). > > - Let's try to make it so that the inserted literals match the values > shown in the various error messages, so it's easier to match them up. > So, change the int4range literals to half-open notation. And also maybe > change the date output format to ISO. > maybe just change the tsrange type to daterange, then the dot out file will be far less verbose. minor issues while reviewing v27, 0001 to 0004. transformFkeyGetPrimaryKey comments need to update, since bool pk_period also returned. +/* + * FindFKComparisonOperators - + * + * Gets the operators for pfeqopOut, ppeqopOut, and ffeqopOut. + * Sets old_check_ok if we can avoid re-validating the constraint. + * Sets old_pfeqop_item to the old pfeqop values. + */ +static void +FindFKComparisonOperators(Constraint *fkconstraint, + AlteredTableInfo *tab, + int i, + int16 *fkattnum, + bool *old_check_ok, + ListCell **old_pfeqop_item, + Oid pktype, Oid fktype, Oid opclass, + Oid *pfeqopOut, Oid *ppeqopOut, Oid *ffeqopOut) I think the above comments is `Sets the operators for pfeqopOut, ppeqopOut, and ffeqopOut.`. + if (is_temporal) + { + if (!fkconstraint->fk_with_period) + ereport(ERROR, + (errcode(ERRCODE_INVALID_FOREIGN_KEY), + errmsg("foreign key uses PERIOD on the referenced table but not the referencing table"))); + } can be if (is_temporal && !fkconstraint->fk_with_period) ereport(ERROR, (errcode(ERRCODE_INVALID_FOREIGN_KEY), errmsg("foreign key uses PERIOD on the referenced table but not the referencing table"))); + + if (is_temporal) + { + if (!fkconstraint->pk_with_period) + /* Since we got pk_attrs, one should be a period. */ + ereport(ERROR, + (errcode(ERRCODE_INVALID_FOREIGN_KEY), + errmsg("foreign key uses PERIOD on the referencing table but not the referenced table"))); + } can be if (is_temporal && !fkconstraint->pk_with_period) /* Since we got pk_attrs, one should be a period. */ ereport(ERROR, (errcode(ERRCODE_INVALID_FOREIGN_KEY), errmsg("foreign key uses PERIOD on the referencing table but not the referenced table"))); refactor decompile_column_index_array seems unnecessary. Peter already mentioned it at [1], I have tried to fix it at [2]. @@ -12141,7 +12245,8 @@ transformFkeyGetPrimaryKey(Relation pkrel, Oid *indexOid, /* * Now build the list of PK attributes from the indkey definition (we - * assume a primary key cannot have expressional elements) + * assume a primary key cannot have expressional elements, unless it + * has a PERIOD) */ *attnamelist = NIL; for (i = 0; i < indexStruct->indnkeyatts; i++) @@ -12155,6 +12260,8 @@ transformFkeyGetPrimaryKey(Relation pkrel, Oid *indexOid, makeString(pstrdup(NameStr(*attnumAttName(pkrel, pkattno))))); } + *pk_period = (indexStruct->indisexclusion); I don't understand the "expression elements" in the comments, most of the tests case is like ` PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) ` + *pk_period = (indexStruct->indisexclusion); can be `+ *pk_period = indexStruct->indisexclusion;` [1] https://postgr.es/m/7be8724a-5c25-46d7-8325-1bd8be6fa523@eisentraut.org [2] https://postgr.es/m/CACJufxHVg65raNhG2zBwXgjrD6jqace4NZbePyMhP8-_Q=iT8w@mail.gmail.com
in GetOperatorFromWellKnownStrategy: *strat = GistTranslateStratnum(opclass, instrat); if (*strat == InvalidStrategy) { HeapTuple tuple; tuple = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclass)); if (!HeapTupleIsValid(tuple)) elog(ERROR, "cache lookup failed for operator class %u", opclass); ereport(ERROR, errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(errstr, format_type_be(opcintype)), errdetail("Could not translate strategy number %d for operator class \"%s\" for access method \"%s\".", instrat, NameStr(((Form_pg_opclass) GETSTRUCT(tuple))->opcname), "gist")); ReleaseSysCache(tuple); } last `ReleaseSysCache(tuple);` is unreachable? @@ -118,12 +120,17 @@ typedef struct RI_ConstraintInfo int16 confdelsetcols[RI_MAX_NUMKEYS]; /* attnums of cols to set on * delete */ char confmatchtype; /* foreign key's match type */ + bool temporal; /* if the foreign key is temporal */ int nkeys; /* number of key columns */ int16 pk_attnums[RI_MAX_NUMKEYS]; /* attnums of referenced cols */ int16 fk_attnums[RI_MAX_NUMKEYS]; /* attnums of referencing cols */ Oid pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */ Oid pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */ Oid ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */ + Oid period_contained_by_oper; /* operator for PERIOD SQL */ + Oid agged_period_contained_by_oper; /* operator for PERIOD SQL */ + Oid period_referenced_agg_proc; /* proc for PERIOD SQL */ + Oid period_referenced_agg_rettype; /* rettype for previous */ the comment seems not clear to me. Here is my understanding about it: period_contained_by_oper is the operator where a single period/range contained by a single period/range. agged_period_contained_by_oper is the operator oid where a period contained by a bound of periods period_referenced_agg_proc is the oprcode of the agged_period_contained_by_oper. period_referenced_agg_rettype is the function period_referenced_agg_proc returning data type.
Hello, Here is a new patch series addressing the last few feedback emails from Peter & Jian He. It mostly focuses on the FKs patch, trying to get it really ready to commit, but it also finishes restoring all the functionality to the PERIODs patch (that I removed temporarily when we changed PERIODs to GENERATED columns). I still want to restore a few more tests there, but all the functionality is back (e.g. PERIODs with foreign keys and FOR PORTION OF), so it proves the GENERATED idea works in principle. Specific feedback below: On Mon, Mar 11, 2024 at 12:46 AM Peter Eisentraut <peter@eisentraut.org> wrote: > I had committed v27-0001-Rename-conwithoutoverlaps-to-conperiod.patch a > little while ago. Thanks! It looks like you also fixed the pg_catalog docs which I missed. > Attached is a small patch that changes the PERIOD keyword to unreserved > for this patch. You had said earlier that this didn't work for you. > The attached patch works for me when applied on top of 0003. Applied and included here. > You wrote: > > > I'm not sure how else to do it. The issue is that `range_agg` returns > > a multirange, so the result > > type doesn't match the inputs. But other types will likely have the > > same problem: to combine boxes > > you may need a multibox. The combine mdranges you may need a > > multimdrange. > > Can we just hardcode the use of range_agg for this release? Might be > easier. I don't see all this generality being useful in the near future. Okay, I've hard-coded range_agg in the main patch and separated the support for multirange/etc in the next two patches. But there isn't much code there (mostly tests and docs). Since we can't hard-code the *operators*, most of the infrastructure is already there not to hard-code the aggregate function. Supporting multiranges is already a nice improvement. E.g. it should cut down on disk usage when a record gets updated frequently. Supporting arbitrary types also seems very powerful, and we already do that for PKs. > > Btw that part changed a bit since v24 because as jian he pointed out, > > our type system doesn't > > support anyrange inputs and an anyrange[] output. So I changed the > > support funcs to use SETOF. > > I didn't see any SETOF stuff in the patch, or I didn't know where to look. > > I'm not sure I follow all the details here. So more explanations of any > kind could be helpful. This is talking about the FOR PORTION OF patch, not the FKs patch. It is the function that gives the "leftovers" after a temporal UPDATE/DELETE. There is explanation in the preliminary patch (adding the support function) and the actual FOR PORTION OF patch, but if you think they need more let me know. But I'd love to talk more about this here: The reason for using a SETOF function is because you can't return an anyarray from a function that takes anyrange or anymultirange. Or rather if you do, the array elements match the rangetype's bounds' type, not the rangetype itself: `T[] f(rangetype<T>)`, not `rangetype<T>[] f(rangetype<T>)`, and we need the latter. So to get a list of rangetype objects we do a SETOF function that is `anyrange f(anyrange)`. Personally I think an improvement would be to add a broken-out patch to add pseudotypes called anyrangearray and anymultirangearray, but using SETOF works now, and I don't know if anyone is interested in such a patch. But it's not the first time I've hit this shortcoming in the pg type system, so I think it's worthwhile. And since FOR PORTION OF isn't getting into v17, there is time to do it. What do you think? If it's an acceptable idea I will get started. It should be a separate commitfest entry I think. > * v27-0003-Refactor-FK-operator-lookup.patch > > I suggest to skip this refactoring patch. I don't think the way this is > sliced up is all that great, and it doesn't actually help with the > subsequent patches. Okay. > - src/backend/catalog/pg_constraint.c > > FindFKPeriodOpersAndProcs() could use a bit more top-level > documentation. Where does the input opclass come from? What are the > three output values? What is the business with "symmetric types"? Added and tried to clarify about the types. > - src/backend/commands/indexcmds.c > > GetOperatorFromWellKnownStrategy() is apparently changed to accept > InvalidOid for rhstype, but the meaning of this is not explained in > the function header. It's also not clear to me why an existing caller > is changed. This should be explained more thoroughly. It's not so much changing a param as removing one and adding another. The old param was unneeded because it's just the opclass's opcintype, and we're already passing the opclass. Then the new param lets you optionally ask for an operator that is not `opcintype op opcintype` but `opcintype op rhstype`. We need this because FKs compare fkattr <@ range_agg(pkattr)`, and range_agg returns a multirange, not a range. Even if we hard-code range_agg, the easiest way to get the operator is to use this function, passing ANYMULTIRANGEOID (but better is to pass whatever the referencedagg support func returns, as the now-separate multirange/custom type patch does). > - src/backend/commands/tablecmds.c > > is_temporal and similar should be renamed to with_period or similar > throughout this patch. Done. > In transformFkeyGetPrimaryKey(): > > * Now build the list of PK attributes from the indkey definition (we > - * assume a primary key cannot have expressional elements) > + * assume a primary key cannot have expressional elements, unless it > + * has a PERIOD) > > I think the original statement is still true even with PERIOD. The > expressional elements refer to expression indexes. I don't think we can > have a PERIOD marker on an expression? You're right: I wrote this back before PERIODs became GENERATED columns. Updated now. > - src/backend/utils/adt/ri_triggers.c > > Please remove the separate trigger functions for the period case. They > are the same as the non-period ones, so we don't need separate ones. > The difference is handled lower in the call stack, which I think is a > good setup. Removing the separate functions also removes a lot of extra > code in other parts of the patch. Done. The later patch for FKs with CASCADE/SET NULL/SET DEFAULT still has separate functions (since they call actually-different implementations), but I will see if I can unify things a bit more there. > - src/include/catalog/pg_constraint.h > > Should also update catalogs.sgml accordingly. Looks like you did this already in 030e10ff1a. > - src/test/regress/expected/without_overlaps.out > - src/test/regress/sql/without_overlaps.sql > > A few general comments on the tests: > > - In the INSERT commands, specify the column names explicitly. This > makes the tests easier to read (especially since the column order > between the PK and the FK table is sometimes different). Okay. > - Let's try to make it so that the inserted literals match the values > shown in the various error messages, so it's easier to match them up. > So, change the int4range literals to half-open notation. And also maybe > change the date output format to ISO. Done. Also changed the tsrange cols to daterange and made them YYYY-MM-DD. This is much easier to read IMO. Note there were already a few tsrange columns in the PK tests, so I changed those separately in the very first patch here. > - In various comments, instead of test FK "child", maybe use > "referencing table"? Instead of "parent", use "referenced table" (or > primary key table). When I read child and parent I was looking for > inheritance. Done. > - Consider truncating the test tables before each major block of tests > and refilling them with fresh data. So it's easier to eyeball the > tests. Otherwise, there is too much dependency on what earlier tests > left behind. Done. This will also let me reuse ids in the FOR PORTION OF partitioned table tests, but that's not done yet. > A specific question: > > In this test, a PERIOD marker on the referenced site is automatically > inferred from the primary key: > > +-- with inferred PK on the referenced table: > +CREATE TABLE temporal_fk_rng2rng ( > + id int4range, > + valid_at tsrange, > + parent_id int4range, > + CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT > OVERLAPS), > + CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD > valid_at) > + REFERENCES temporal_rng > +); > > In your patch, this succeeds. According to the SQL standard, it should > not. In subclause 11.8, syntax rule 4b: > > """ > Otherwise, the table descriptor of the referenced table shall include a > unique constraint UC that specifies PRIMARY KEY. The table constraint > descriptor of UC shall not include an application time period name. > """ > > So this case is apparently explicitly ruled out. > > (It might be ok to make an extension here, but then we should be > explicit about it.) Okay, I agree it doesn't match the standard. IMO our behavior is better, but the patches here should let you go either way. The main FK patch keeps the old behavior, but there is a follow-up patch doing what the standard says. There are some interesting implications, which you can see by looking at the test changes in that patch. Basically you can never give an inferred REFERENCES against a temporal table. Either your FK has a PERIOD element, and it fails because we exclude the PK's WITHOUT OVERLAPS in the inferred attributes, or your FK does not have a PERIOD element, and it fails because you want a PK side that is genuinely unique, but the PK index has a temporal definition of "unique" (and is not B-tree but GiST). I don't see any drawbacks from supporting inferred REFERENCES with temporal tables, so my vote is to break from the standard here, and *not* apply that follow-up patch. Should I add some docs about that? Also skipping the patch will cause some annoying merge conflicts, so let me know if that's what you choose and I'll handle them right away. Btw I tried checking what other vendors do here, but no one supports temporal FKs yet! MS SQL Server doesn't support application time at all. Oracle and MariaDB don't support temporal PKs or FKs. And IBM DB2 only supports temporal PKs. Actually DB2's docs in 2019 were *claiming* they supported temporal FKs, but it didn't work for me or at least one other person posting in their forums. And the latest docs no longer mention it.[1] I wrote about trying to make it work in my survey of other vendors.[2] The old docs are now a 404,[3] as is the forums post.[4] My DB2 test code is below in case anyone else wants to try.[5] So there is no precedent here for us to follow. Incidentally, here are two non-standard things I would like to add "some day": 1. FKs from non-temporal tables to temporal tables. Right now temporal tables are "contagious", which can be annoying. Maybe a non-temporal record is valid as long as a referenced temporal row exists at *any time*. You can't do that today. You can't even add an additional UNIQUE constraint, because there are surely duplicates that invalidate it. This kind of FK would be satisfied if *at least one* reference exists. 2. FKs from a single-timestamp table to a temporal table. Maybe the referring table is an "event" with no duration, but it is valid as long as the referenced table contains it. A workaround is to have a range that is `[t,t]`, but that's annoying. Anyway that's not important for these patches. As far as I can tell, whatever we choose re inferred PERIOD in REFERENCES keeps our options open for those ideas. One more thought: if we wanted to be cheekily compatible with the standard, we could infer *range types* that are WITHOUT OVERLAPs but not true PERIOD objects. "The table constraint descriptor of UC shall not include an application time period name." If it's a rangetype column, then it doesn't include a period name. :-P. So then we would skip the follow-up patch here but I could work it into the final patch for PERIOD support. This is probably not the wisest choice, although I guess it does let us defer deciding what to do. On Mon, Mar 11, 2024 at 7:45 PM jian he <jian.universality@gmail.com> wrote: > typo "referenced_agg", in the gist-extensibility.html page is "referencedagg" > <literal>WITHOUT PORTION</literal> should be <literal>WITHOUT OVERLAPS</literal> Good catch! Fixed. > + While the non-<literal>PERIOD</literal> columns are treated normally > + (and there must be at least one of them), > + the <literal>PERIOD</literal> column is not compared for equality. > the above sentence didn't say what is "normally"? > maybe we can do the following: > + While the non-<literal>PERIOD</literal> columns are treated > + normally for equality > + (and there must be at least one of them), > + the <literal>PERIOD</literal> column is not compared for equality. Reworked the language here. > my_range_agg_transfn error message is inconsistent? > `elog(ERROR, "range_agg_transfn called in non-aggregate context");` > `elog(ERROR, "range_agg must be called with a range");` > maybe just `my_range_agg_transfn`, instead of mention > {range_agg_transfn|range_agg} > similarly my_range_agg_finalfn error is also inconsistent. This matches what other aggs do (e.g. array_agg, json_agg, etc.) as well as the actual core range_agg code. And I think it is an appropriate difference. You only hit the first error if you are invoking the transfn directly, so that's what we should say. OTOH you hit the second error by calling the aggregate function, but with the wrong type. So the error message should mention the aggregate function. > my_range_agg_finalfn need `type_is_multirange(mltrngtypoid)`? This isn't part of the core range_agg_finalfn, so I'd rather not include it here. And I don't think it is needed. You would only get a non-multirange if the transfn does something wrong, and even if it does, the error will be caught and reported in multirange_get_typcache. On Mon, Mar 11, 2024 at 7:47 PM jian he <jian.universality@gmail.com> wrote: > maybe just change the tsrange type to daterange, then the dot out file > will be far less verbose. Agreed, done. > minor issues while reviewing v27, 0001 to 0004. > transformFkeyGetPrimaryKey comments need to update, > since bool pk_period also returned. pk_period is no longer returned in this latest patch. > +/* > + * FindFKComparisonOperators - > + * > + * Gets the operators for pfeqopOut, ppeqopOut, and ffeqopOut. > + * Sets old_check_ok if we can avoid re-validating the constraint. > + * Sets old_pfeqop_item to the old pfeqop values. > + */ > +static void > +FindFKComparisonOperators(Constraint *fkconstraint, > + AlteredTableInfo *tab, > + int i, > + int16 *fkattnum, > + bool *old_check_ok, > + ListCell **old_pfeqop_item, > + Oid pktype, Oid fktype, Oid opclass, > + Oid *pfeqopOut, Oid *ppeqopOut, Oid *ffeqopOut) > > I think the above comments is > `Sets the operators for pfeqopOut, ppeqopOut, and ffeqopOut.`. This whole function is removed. > + if (is_temporal) > + { > + if (!fkconstraint->fk_with_period) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_FOREIGN_KEY), > + errmsg("foreign key uses PERIOD on the referenced table but not the > referencing table"))); > + } > can be > if (is_temporal && !fkconstraint->fk_with_period) > ereport(ERROR, > (errcode(ERRCODE_INVALID_FOREIGN_KEY), > errmsg("foreign key uses PERIOD on the referenced table but not the > referencing table"))); The patch about inferred REFERENCES moves things around a bit, so this no longer applies. > + if (is_temporal) > + { > + if (!fkconstraint->pk_with_period) > + /* Since we got pk_attrs, one should be a period. */ > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_FOREIGN_KEY), > + errmsg("foreign key uses PERIOD on the referencing table but not the > referenced table"))); > + } > can be > if (is_temporal && !fkconstraint->pk_with_period) > /* Since we got pk_attrs, one should be a period. */ > ereport(ERROR, > (errcode(ERRCODE_INVALID_FOREIGN_KEY), > errmsg("foreign key uses PERIOD on the referencing table but not the > referenced table"))); Likewise. > refactor decompile_column_index_array seems unnecessary. > Peter already mentioned it at [1], I have tried to fix it at [2]. No, that conversation is about handling WITHOUT OVERLAPS, not PERIOD. Because the syntax is `valid_at WITHOUT OVERLAPS` but `PERIOD valid_at` (post vs pre), we must handle PERIOD inside the function. > I don't understand the "expression elements" in the comments, most of > the tests case is like Covered above in Peter's feedback. > ` > PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) > ` > + *pk_period = (indexStruct->indisexclusion); > can be > `+ *pk_period = indexStruct->indisexclusion;` No longer included here. On Wed, Mar 13, 2024 at 5:00 PM jian he <jian.universality@gmail.com> wrote: > @@ -118,12 +120,17 @@ typedef struct RI_ConstraintInfo > int16 confdelsetcols[RI_MAX_NUMKEYS]; /* attnums of cols to set on > * delete */ > char confmatchtype; /* foreign key's match type */ > + bool temporal; /* if the foreign key is temporal */ > int nkeys; /* number of key columns */ > int16 pk_attnums[RI_MAX_NUMKEYS]; /* attnums of referenced cols */ > int16 fk_attnums[RI_MAX_NUMKEYS]; /* attnums of referencing cols */ > Oid pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */ > Oid pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */ > Oid ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */ > + Oid period_contained_by_oper; /* operator for PERIOD SQL */ > + Oid agged_period_contained_by_oper; /* operator for PERIOD SQL */ > + Oid period_referenced_agg_proc; /* proc for PERIOD SQL */ > + Oid period_referenced_agg_rettype; /* rettype for previous */ > > the comment seems not clear to me. Here is my understanding about it: > period_contained_by_oper is the operator where a single period/range > contained by a single period/range. > agged_period_contained_by_oper is the operator oid where a period > contained by a bound of periods > period_referenced_agg_proc is the oprcode of the agged_period_contained_by_oper. > period_referenced_agg_rettype is the function > period_referenced_agg_proc returning data type. Expanded these comments a bit. Thanks to you both for such detailed, careful feedback! Rebased to 605062227f. If anything else comes up re FKs I'll tackle that first, but otherwise I think I will work on some of the outstanding issues in the FOR PORTION OF patch (e.g. trigger transition table names). I may experiment with handling the leftover inserts as a separate executor node. If anyone has advice there I'm happy to hear it! Yours, Paul [1] https://www.ibm.com/docs/en/db2/11.5?topic=statements-alter-table [2] https://illuminatedcomputing.com/posts/2019/08/sql2011-survey/ [3] https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/intro/src/tpc/db2z_integrity.html [4] https://www.ibm.com/developerworks/community/forums/html/topic?id=440e07ad-23ee-4b0a-ae23-8c747abca819 [5] Here is DB2 test code showing temporal FKs don't work. (Note they disobey the standard re declaring `PERIOD p (s, e)` not `PERIOD FOR p (s, e)`, and it must be named `business_time`.) ``` create table t (id integer not null, ds date not null, de date not null, name varchar(4000), period business_time (ds, de)); alter table t add constraint tpk primary key (id, business_time without overlaps) insert into t values (1, '2000-01-01', '2001-01-01', 'foo'); create table fk (id integer, ds date not null, de date not null, period business_time (ds, de)); -- all this fails: alter table fk add constraint fkfk foreign key (id, period business_time) references t (id, period business_time); alter table fk add constraint fkfk foreign key (id, business_time) references t (id, business_time); alter table fk add constraint fkfk foreign key (id, period business_time) references t; alter table fk add constraint fkfk foreign key (id, business_time) references t; alter table fk add constraint fkfk foreign key (id, period for business_time) references t; alter table fk add constraint fkfk foreign key (id, period for business_time) references t (id, period for business_time); alter table fk add constraint fkfk foreign key (id, business_time without overlaps) references t; alter table fk add constraint fkfk foreign key (id, business_time without overlaps) references t (id, business_time without overlaps); alter table fk add constraint fkfk foreign key (id) references t; alter table fk add constraint fkfk foreign key (id) references t (id); ```
Attachment
- v28-0004-Add-GiST-referencedagg-support-func.patch
- v28-0005-Support-multiranges-or-any-type-in-temporal-FKs.patch
- v28-0002-Add-temporal-FOREIGN-KEYs.patch
- v28-0001-Use-daterange-and-YMD-in-tests-instead-of-tsrang.patch
- v28-0003-Don-t-infer-PERIOD-on-PK-side-of-temporal-FK.patch
- v28-0006-Add-support-funcs-for-FOR-PORTION-OF.patch
- v28-0007-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v28-0008-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v28-0009-Add-PERIODs.patch
Hi, minor issues from 00001 to 0005. + <row> + <entry><function>referencedagg</function></entry> + <entry>aggregates referenced rows' <literal>WITHOUT OVERLAPS</literal> + part</entry> + <entry>13</entry> + </row> comparing with surrounding items, maybe need to add `(optional)`? I think the explanation is not good as explained in referencedagg entry below: <para> An aggregate function. Given values of this opclass, it returns a value combining them all. The return value need not be the same type as the input, but it must be a type that can appear on the right hand side of the "contained by" operator. For example the built-in <literal>range_ops</literal> opclass uses <literal>range_agg</literal> here, so that foreign keys can check <literal>fkperiod @> range_agg(pkperiod)</literal>. </para> + In other words, the reference must have a referent for its entire duration. + This column must be a column with a range type. + In addition the referenced table must have a primary key + or unique constraint declared with <literal>WITHOUT PORTION</literal>. + </para> seems you missed replacing this one. in v28-0002, the function name is FindFKPeriodOpers, then in v28-0005 rename it to FindFKPeriodOpersAndProcs? renaming the function name in a set of patches seems not a good idea? + <para> + This is used for temporal foreign key constraints. + If you omit this support function, your type cannot be used + as the <literal>PERIOD</literal> part of a foreign key. + </para> in v28-0004, I think here "your type" should change to "your opclass"? +bool +check_amproc_is_aggregate(Oid funcid) +{ + bool result; + HeapTuple tp; + Form_pg_proc procform; + + tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); + if (!HeapTupleIsValid(tp)) + elog(ERROR, "cache lookup failed for function %u", funcid); + procform = (Form_pg_proc) GETSTRUCT(tp); + result = procform->prokind == 'a'; + ReleaseSysCache(tp); + return result; +} maybe ` change procform->prokind == 'a'; ` to ` procform->prokind == PROKIND_AGGREGATE; ` or we can put the whole function to cache/lsyscache.c name it just as proc_is_aggregate. - Added pg_dump support. - Show the correct syntax in psql \d output for foreign keys. in 28-0002, seems there is no work to correspond to these 2 items in the commit message? @@ -12335,7 +12448,8 @@ validateForeignKeyConstraint(char *conname, Relation rel, Relation pkrel, Oid pkindOid, - Oid constraintOid) + Oid constraintOid, + bool temporal) do you need to change the last argument of this function to "is_period"? + sprintf(paramname, "$%d", riinfo->nkeys); + sprintf(paramname, "$%d", riinfo->nkeys); do you think it worth the trouble to change to snprintf, I found related post on [1]. [1] https://stackoverflow.com/a/7316500/15603477
one more minor issue related to error reporting. I've only applied v28, 0001 to 0005. -- (parent_id, valid_at) REFERENCES [implicit] -- FOREIGN KEY part should specify PERIOD CREATE TABLE temporal_fk_rng2rng ( id int4range, valid_at daterange, parent_id int4range, CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS), CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, valid_at) REFERENCES temporal_rng ); ERROR: number of referencing and referenced columns for foreign key disagree -- (parent_id, PERIOD valid_at) REFERENCES (id) CREATE TABLE temporal_fk_rng2rng ( id int4range, valid_at daterange, parent_id int4range, CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS), CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng (id) ); ERROR: foreign key uses PERIOD on the referencing table but not the referenced table these error messages seem somehow inconsistent with the comments above? + else + { + /* + * Check it's a btree; currently this can never fail since no other + * index AMs support unique indexes. If we ever did have other types + * of unique indexes, we'd need a way to determine which operator + * strategy number is equality. (Is it reasonable to insist that + * every such index AM use btree's number for equality?) + */ + if (amid != BTREE_AM_OID) + elog(ERROR, "only b-tree indexes are supported for foreign keys"); + eqstrategy = BTEqualStrategyNumber; + } the comments say never fail. but it actually failed. see: +-- (parent_id) REFERENCES [implicit] +-- This finds the PK (omitting the WITHOUT OVERLAPS element), +-- but it's not a b-tree index, so it fails anyway. +-- Anyway it must fail because the two sides have a different definition of "unique". +CREATE TABLE temporal_fk_rng2rng ( + id int4range, + valid_at daterange, + parent_id int4range, + CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS), + CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id) + REFERENCES temporal_rng +); +ERROR: only b-tree indexes are supported for foreign keys because in transformFkeyGetPrimaryKey. we have `if (indexStruct->indisexclusion && i == indexStruct->indnatts - 1)` we have pk_with_period, fk_with_period in Constraint struct. maybe we can add a bool argument to transformFkeyGetPrimaryKey indicate, this primary key is a conperiod constraint. then we can check condition: the primary key is a conperiod constraint and fk_with_period or is pk_with_period is false I've made a patch to make these error reporting more accurate. you can further refine it.
Attachment
Hi All, A few more changes here: On 3/17/24 16:30, jian he wrote: > Hi, minor issues from 00001 to 0005. > + <row> > + <entry><function>referencedagg</function></entry> > + <entry>aggregates referenced rows' <literal>WITHOUT OVERLAPS</literal> > + part</entry> > + <entry>13</entry> > + </row> > comparing with surrounding items, maybe need to add `(optional)`? We do say this function is optional above, in the list of support functions. That seems to be the normal approach. The only other support function that mentions being optional elsewhere is sortsupport. > I think the explanation is not good as explained in referencedagg entry below: > <para> > An aggregate function. Given values of this opclass, > it returns a value combining them all. The return value > need not be the same type as the input, but it must be a > type that can appear on the right hand side of the "contained by" > operator. For example the built-in <literal>range_ops</literal> > opclass uses <literal>range_agg</literal> here, so that foreign > keys can check <literal>fkperiod @> range_agg(pkperiod)</literal>. > </para> Can you explain what you'd like to see improved here? > + In other words, the reference must have a referent for its > entire duration. > + This column must be a column with a range type. > + In addition the referenced table must have a primary key > + or unique constraint declared with <literal>WITHOUT PORTION</literal>. > + </para> > seems you missed replacing this one. I'm not sure what this is referring to. Replaced what? > in v28-0002, the function name is FindFKPeriodOpers, > then in v28-0005 rename it to FindFKPeriodOpersAndProcs? > renaming the function name in a set of patches seems not a good idea? We'll only apply part 5 if we support more than range types (though I think that would be great). It doesn't make sense to name this function FindFKPeriodOpersAndProcs when it isn't yet finding a proc. If it's a problem to rename it in part 5 perhaps the commits should be squashed by the committer? But I don't see the problem really. > + <para> > + This is used for temporal foreign key constraints. > + If you omit this support function, your type cannot be used > + as the <literal>PERIOD</literal> part of a foreign key. > + </para> > in v28-0004, I think here "your type" should change to "your opclass"? I think "your type" addresses what the user is more likely to care about, but I added some clarification here. > +bool > +check_amproc_is_aggregate(Oid funcid) > +{ > + bool result; > + HeapTuple tp; > + Form_pg_proc procform; > + > + tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); > + if (!HeapTupleIsValid(tp)) > + elog(ERROR, "cache lookup failed for function %u", funcid); > + procform = (Form_pg_proc) GETSTRUCT(tp); > + result = procform->prokind == 'a'; > + ReleaseSysCache(tp); > + return result; > +} > maybe > ` > change procform->prokind == 'a'; > ` > to > ` > procform->prokind == PROKIND_AGGREGATE; > ` > or we can put the whole function to cache/lsyscache.c > name it just as proc_is_aggregate. Added the constant reference. Since lsyscache.c already has get_func_prokind, I changed the gist validation function to call that directly. > - Added pg_dump support. > - Show the correct syntax in psql \d output for foreign keys. > in 28-0002, seems there is no work to correspond to these 2 items in > the commit message? The changes to psql and pg_dump happen in pg_get_constraintdef_worker and decompile_column_index_array (both in ruleutils.c). > @@ -12335,7 +12448,8 @@ validateForeignKeyConstraint(char *conname, > Relation rel, > Relation pkrel, > Oid pkindOid, > - Oid constraintOid) > + Oid constraintOid, > + bool temporal) > do you need to change the last argument of this function to "is_period"? Changed to hasperiod. > + sprintf(paramname, "$%d", riinfo->nkeys); > + sprintf(paramname, "$%d", riinfo->nkeys); > do you think it worth the trouble to change to snprintf, I found > related post on [1]. > > [1] https://stackoverflow.com/a/7316500/15603477 paramname holds 16 chars so I don't think there is any risk of an int overflowing here. The existing foreign key code already uses sprintf, so I don't think it makes sense to be inconsistent here. And if we want to change it it should probably be in a separate commit, not buried in a commit about adding temporal foreign keys. On 3/17/24 21:47, jian he wrote: > one more minor issue related to error reporting. > I've only applied v28, 0001 to 0005. > > -- (parent_id, valid_at) REFERENCES [implicit] > -- FOREIGN KEY part should specify PERIOD > CREATE TABLE temporal_fk_rng2rng ( > id int4range, > valid_at daterange, > parent_id int4range, > CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS), > CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, valid_at) > REFERENCES temporal_rng > ); > ERROR: number of referencing and referenced columns for foreign key disagree > > -- (parent_id, PERIOD valid_at) REFERENCES (id) > CREATE TABLE temporal_fk_rng2rng ( > id int4range, > valid_at daterange, > parent_id int4range, > CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS), > CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) > REFERENCES temporal_rng (id) > ); > ERROR: foreign key uses PERIOD on the referencing table but not the > referenced table > > these error messages seem somehow inconsistent with the comments above? Clarified the comments. > + else > + { > + /* > + * Check it's a btree; currently this can never fail since no other > + * index AMs support unique indexes. If we ever did have other types > + * of unique indexes, we'd need a way to determine which operator > + * strategy number is equality. (Is it reasonable to insist that > + * every such index AM use btree's number for equality?) > + */ > + if (amid != BTREE_AM_OID) > + elog(ERROR, "only b-tree indexes are supported for foreign keys"); > + eqstrategy = BTEqualStrategyNumber; > + } > > the comments say never fail. > but it actually failed. see: > > +-- (parent_id) REFERENCES [implicit] > +-- This finds the PK (omitting the WITHOUT OVERLAPS element), > +-- but it's not a b-tree index, so it fails anyway. > +-- Anyway it must fail because the two sides have a different > definition of "unique". > +CREATE TABLE temporal_fk_rng2rng ( > + id int4range, > + valid_at daterange, > + parent_id int4range, > + CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS), > + CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id) > + REFERENCES temporal_rng > +); > +ERROR: only b-tree indexes are supported for foreign keys You're right, now that we have temporal primary keys the comment is out-of-date. You can reach that error message by creating a regular foreign key against a temporal primary key. Perhaps we should update the comment separately, although I haven't added a new patch for that here. I did update the comment as part of this FK patch. I also added "non-PERIOD" to the error message (which only makes sense in the FK patch). Since the error message was impossible before, I assume that is no problem. I think this is a simpler fix than what you have in your attached patch. In addition your patch doesn't work if we include part 3 here: see Peter's feedback about the SQL standard and my reply. Rebased to 846311051e. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v29-0001-Use-daterange-and-YMD-in-tests-instead-of-tsrang.patch
- v29-0002-Add-temporal-FOREIGN-KEYs.patch
- v29-0003-Don-t-infer-PERIOD-on-PK-side-of-temporal-FK.patch
- v29-0004-Add-GiST-referencedagg-support-func.patch
- v29-0005-Support-multiranges-or-any-type-in-temporal-FKs.patch
- v29-0006-Add-support-funcs-for-FOR-PORTION-OF.patch
- v29-0007-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v29-0008-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v29-0009-Add-PERIODs.patch
On Tue, Mar 19, 2024 at 6:49 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > Rebased to 846311051e. > Hi, I just found out some minor issues. + * types matching the PERIOD element. periodprocoid is a GiST support function to + * aggregate multiple PERIOD element values into a single value + * (whose return type need not match its inputs, + * e.g. many ranges can be aggregated into a multirange). * And aggedperiodoperoid is also a ContainedBy operator, - * but one whose rhs is anymultirange. + * but one whose rhs matches the type returned by aggedperiodoperoid. * That way foreign keys can compare fkattr <@ range_agg(pkattr). */ void -FindFKPeriodOpers(Oid opclass, - Oid *periodoperoid, - Oid *aggedperiodoperoid) +FindFKPeriodOpersAndProcs(Oid opclass, + Oid *periodoperoid, + Oid *aggedperiodoperoid, + Oid *periodprocoid) I think, aggedperiodoperoid is more descriptive than periodprocoid, in 0005, you don't need to rename it. aslo do we need to squash v29 0001 to 0005 together? --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1167,7 +1167,8 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM column(s) of some row of the referenced table. If the <replaceable class="parameter">refcolumn</replaceable> list is omitted, the primary key of the <replaceable class="parameter">reftable</replaceable> - is used. Otherwise, the <replaceable class="parameter">refcolumn</replaceable> + is used (omitting any part declared with <literal>WITHOUT OVERLAPS</literal>). + Otherwise, the <replaceable class="parameter">refcolumn</replaceable> list must refer to the columns of a non-deferrable unique or primary key constraint or be the columns of a non-partial unique index. </para> I think this does not express that foreign key is PERIOD, then the last column of refcolumn must specify PERIOD? + <para> + If the last column is marked with <literal>PERIOD</literal>, + it is treated in a special way. + While the non-<literal>PERIOD</literal> columns are compared for equality + (and there must be at least one of them), + the <literal>PERIOD</literal> column is not. + Instead the constraint is considered satisfied + if the referenced table has matching records + (based on the non-<literal>PERIOD</literal> parts of the key) + whose combined <literal>PERIOD</literal> values completely cover + the referencing record's. + In other words, the reference must have a referent for its entire duration. + This column must be a column with a range type. + In addition the referenced table must have a primary key + or unique constraint declared with <literal>WITHOUT PORTION</literal>. + </para> you forgot to change <literal>WITHOUT PORTION</literal> to <literal>WITHOUT OVERLAPS</literal> Oid pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */ Oid pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */ Oid ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */ in struct RI_ConstraintInfo, these comments need to be updated?
On 16.03.24 22:37, Paul A Jungwirth wrote: > Here is a new patch series addressing the last few feedback emails > from Peter & Jian He. It mostly focuses on the FKs patch, trying to > get it really ready to commit, I have committed the test changes (range and date format etc.). The FOREIGN KEY patch looks okay to me now. Maybe check if any of the subsequent comments from jian should be applied. >> > I'm not sure how else to do it. The issue is that `range_agg` returns >> > a multirange, so the result >> > type doesn't match the inputs. But other types will likely have the >> > same problem: to combine boxes >> > you may need a multibox. The combine mdranges you may need a >> > multimdrange. >> >> Can we just hardcode the use of range_agg for this release? Might be >> easier. I don't see all this generality being useful in the near future. > > Okay, I've hard-coded range_agg in the main patch and separated the > support for multirange/etc in the next two patches. But there isn't > much code there (mostly tests and docs). Since we can't hard-code the > *operators*, most of the infrastructure is already there not to > hard-code the aggregate function. Supporting multiranges is already a > nice improvement. E.g. it should cut down on disk usage when a record > gets updated frequently. Supporting arbitrary types also seems very > powerful, and we already do that for PKs. I think we could also handle multiranges in a hardcoded way? Ranges and multiranges are hardcoded concepts anyway. It's just when we move to arbitrary types supporting containment, then it gets a bit more complicated. What would a patch that adds just multiranges on the FK side, but without the full pluggable gist support, look like? > I don't see any drawbacks from supporting inferred REFERENCES with > temporal tables, so my vote is to break from the standard here, and > *not* apply that follow-up patch. Should I add some docs about that? > Also skipping the patch will cause some annoying merge conflicts, so > let me know if that's what you choose and I'll handle them right away. I agree we can allow this.
On 3/19/24 04:02, Peter Eisentraut wrote: > On 16.03.24 22:37, Paul A Jungwirth wrote: >> Here is a new patch series addressing the last few feedback emails >> from Peter & Jian He. It mostly focuses on the FKs patch, trying to >> get it really ready to commit, > > I have committed the test changes (range and date format etc.). > > The FOREIGN KEY patch looks okay to me now. Maybe check if any of the subsequent comments from jian > should be applied. Okay, specifics below. > I think we could also handle multiranges in a hardcoded way? Ranges and multiranges are hardcoded > concepts anyway. It's just when we move to arbitrary types supporting containment, then it gets a > bit more complicated. > > What would a patch that adds just multiranges on the FK side, but without the full pluggable gist > support, look like? Attached a separate patch extending FKs to multiranges only. I'd still love to support arbitrary types eventually but it's not part of the patches here now. >> I don't see any drawbacks from supporting inferred REFERENCES with >> temporal tables, so my vote is to break from the standard here, and >> *not* apply that follow-up patch. Should I add some docs about that? >> Also skipping the patch will cause some annoying merge conflicts, so >> let me know if that's what you choose and I'll handle them right away. > > I agree we can allow this. Great, thanks! Took out those changes. On 3/19/24 02:01, jian he wrote: > + * types matching the PERIOD element. periodprocoid is a GiST support > function to > + * aggregate multiple PERIOD element values into a single value > + * (whose return type need not match its inputs, > + * e.g. many ranges can be aggregated into a multirange). > * And aggedperiodoperoid is also a ContainedBy operator, > - * but one whose rhs is anymultirange. > + * but one whose rhs matches the type returned by aggedperiodoperoid. > * That way foreign keys can compare fkattr <@ range_agg(pkattr). > */ > void > -FindFKPeriodOpers(Oid opclass, > - Oid *periodoperoid, > - Oid *aggedperiodoperoid) > +FindFKPeriodOpersAndProcs(Oid opclass, > + Oid *periodoperoid, > + Oid *aggedperiodoperoid, > + Oid *periodprocoid) > > I think, aggedperiodoperoid is more descriptive than periodprocoid, in > 0005, you don't need to rename it. > aslo do we need to squash v29 0001 to 0005 together? I changed the operator names to {,agged}containedbyoperoid. The proc names are not included now because we only need them for supporting more than ranges + multiranges. > --- a/doc/src/sgml/ref/create_table.sgml > +++ b/doc/src/sgml/ref/create_table.sgml > @@ -1167,7 +1167,8 @@ WITH ( MODULUS <replaceable > class="parameter">numeric_literal</replaceable>, REM > column(s) of some row of the referenced table. If the <replaceable > class="parameter">refcolumn</replaceable> list is omitted, the > primary key of the <replaceable class="parameter">reftable</replaceable> > - is used. Otherwise, the <replaceable > class="parameter">refcolumn</replaceable> > + is used (omitting any part declared with <literal>WITHOUT > OVERLAPS</literal>). > + Otherwise, the <replaceable class="parameter">refcolumn</replaceable> > list must refer to the columns of a non-deferrable unique or primary key > constraint or be the columns of a non-partial unique index. > </para> > I think this does not express that > foreign key is PERIOD, then the last column of refcolumn must specify PERIOD? Okay, added a sentence about that (and adjusted some other things re allowing implicit REFERENCES and only supporting ranges + multiranges). > + <para> > + If the last column is marked with <literal>PERIOD</literal>, > + it is treated in a special way. > + While the non-<literal>PERIOD</literal> columns are compared for equality > + (and there must be at least one of them), > + the <literal>PERIOD</literal> column is not. > + Instead the constraint is considered satisfied > + if the referenced table has matching records > + (based on the non-<literal>PERIOD</literal> parts of the key) > + whose combined <literal>PERIOD</literal> values completely cover > + the referencing record's. > + In other words, the reference must have a referent for its > entire duration. > + This column must be a column with a range type. > + In addition the referenced table must have a primary key > + or unique constraint declared with <literal>WITHOUT PORTION</literal>. > + </para> > you forgot to change <literal>WITHOUT PORTION</literal> to > <literal>WITHOUT OVERLAPS</literal> Oh! Thanks, I guess I was just blind. > Oid pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */ > Oid pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */ > Oid ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */ > in struct RI_ConstraintInfo, these comments need to be updated? In earlier feedback Peter advised not changing the "equals" language (e.g. in KeysEqual). But I added a comment at the top of the struct to clarify. Rebased to 605721f819. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
hi. minor cosmetic issues, other than that, looks good. *pk_period = (indexStruct->indisexclusion); to *pk_period = indexStruct->indisexclusion; if (with_period) { if (!fkconstraint->fk_with_period) ereport(ERROR, (errcode(ERRCODE_INVALID_FOREIGN_KEY), errmsg("foreign key uses PERIOD on the referenced table but not the referencing table"))); } change to if (with_period && !fkconstraint->fk_with_period) ereport(ERROR, (errcode(ERRCODE_INVALID_FOREIGN_KEY), errmsg("foreign key uses PERIOD on the referenced table but not the referencing table")));
On 3/20/24 03:55, jian he wrote: > hi. > minor cosmetic issues, other than that, looks good. > > *pk_period = (indexStruct->indisexclusion); > to > *pk_period = indexStruct->indisexclusion; > > ... > > if (with_period && !fkconstraint->fk_with_period) > ereport(ERROR, > (errcode(ERRCODE_INVALID_FOREIGN_KEY), > errmsg("foreign key uses PERIOD on the referenced table but not the > referencing table"))); Both included in the new patches here. Rebased to a0390f6ca6. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
with foreign key "no action", in a transaction, we can first insert foreign key data, then primary key data. also the update/delete can fail at the end of transaction. based on [1] explanation about the difference between "no action" and "restrict". I only refactor the v31-0002-Support-multiranges-in-temporal-FKs.patch test. [1 https://stackoverflow.com/questions/14921668/difference-between-restrict-and-no-action
Attachment
On 20.03.24 17:21, Paul Jungwirth wrote: > On 3/20/24 03:55, jian he wrote: >> hi. >> minor cosmetic issues, other than that, looks good. >> >> *pk_period = (indexStruct->indisexclusion); >> to >> *pk_period = indexStruct->indisexclusion; >> >> ... > >> if (with_period && !fkconstraint->fk_with_period) >> ereport(ERROR, >> (errcode(ERRCODE_INVALID_FOREIGN_KEY), >> errmsg("foreign key uses PERIOD on the referenced table but not the >> referencing table"))); > > Both included in the new patches here. > > Rebased to a0390f6ca6. Two more questions: 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to ri_AttributesEqual(): - if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], RIAttType(rel, attnums[i]), - oldvalue, newvalue)) + if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]), + newvalue, oldvalue)) But the declared arguments of ri_AttributesEqual() are oldvalue and newvalue, so passing them backwards is really confusing. And the change does matter in the tests. Can we organize this better? 2. There are some tests that error with ERROR: only b-tree indexes are supported for non-PERIOD foreign keys But this is an elog() error, so should not normally be visible. I suspect some other error should really show here, and the order of checks is a bit wrong or something?
v32 attached. On 3/21/24 07:57, Peter Eisentraut wrote: > Two more questions: > > 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to ri_AttributesEqual(): > > - if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], RIAttType(rel, attnums[i]), > - oldvalue, newvalue)) > + if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]), > + newvalue, oldvalue)) > > But the declared arguments of ri_AttributesEqual() are oldvalue and newvalue, so passing them > backwards is really confusing. And the change does matter in the tests. > > Can we organize this better? I renamed the params and actually the whole function. All it's doing is execute `oldvalue op newvalue`, casting if necessary. So I changed it to ri_CompareWithCast and added some documentation. In an earlier version of this patch I had a separate function for the PERIOD comparison, but it's just doing the same thing, so I think the best thing is to give the function a more accurate name and use it. > 2. There are some tests that error with > > ERROR: only b-tree indexes are supported for non-PERIOD foreign keys > > But this is an elog() error, so should not normally be visible. I suspect some other error should > really show here, and the order of checks is a bit wrong or something? At first I thought I should just make this ereport, because it is reachable now, but I didn't like the error message or where we were reaching it. The high-level problem is defining a non-temporal FK against a temporal PK, and we should check for that in those terms, not when looking at individual attribute opclasses. So I added a check prior to this and gave it a more descriptive error message. On 3/21/24 01:25, jian he wrote: > with foreign key "no action", > in a transaction, we can first insert foreign key data, then primary key data. > also the update/delete can fail at the end of transaction. > > based on [1] explanation about the difference between "no action" and > "restrict". > I only refactor the v31-0002-Support-multiranges-in-temporal-FKs.patch test. I added some tests for deferred NO ACTION checks. I added them for all of range/multirange/PERIOD. I also adopted your change ALTERing the constraint for NO ACTION (even though it's already that), to make each test section more independent. Your patch had a lot of other noisy changes, e.g. whitespace and reordering lines. If there are other things you intended to add to the tests, can you describe them? Rebased to 7e65ad197f. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On Fri, Mar 22, 2024 at 8:35 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > Your patch had a lot of other noisy changes, e.g. > whitespace and reordering lines. If there are other things you intended to add to the tests, can you > describe them? i think on update restrict, on delete restrict cannot be deferred, even if you set it DEFERRABLE INITIALLY DEFERRED. based on this idea, I made minor change on v32-0002-Support-multiranges-in-temporal-FKs.patch other than that, v32, 0002 looks good.
Attachment
On 22.03.24 01:35, Paul Jungwirth wrote: > > 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to > ri_AttributesEqual(): > > > > - if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], > RIAttType(rel, attnums[i]), > > - oldvalue, newvalue)) > > + if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]), > > + newvalue, oldvalue)) > > > > But the declared arguments of ri_AttributesEqual() are oldvalue and > newvalue, so passing them > > backwards is really confusing. And the change does matter in the tests. > > > > Can we organize this better? > > I renamed the params and actually the whole function. All it's doing is > execute `oldvalue op newvalue`, casting if necessary. So I changed it to > ri_CompareWithCast and added some documentation. In an earlier version > of this patch I had a separate function for the PERIOD comparison, but > it's just doing the same thing, so I think the best thing is to give the > function a more accurate name and use it. Ok, I see now, and the new explanation is better. But after reading the comment in the function about collations, I think there could be trouble. As long as we are only comparing for equality (and we don't support nondeterministic global collations), then we can use any collation to compare for equality. But if we are doing contained-by, then the collation does matter, so we would need to get the actual collation somehow. So as written, this might not always work correctly. I think it would be safer for now if we just kept using the equality operation even for temporal foreign keys. If we did that, then in the case that you update a key to a new value that is contained by the old value, this function would say "not equal" and fire all the checks, even though it wouldn't need to. This is kind of similar to the "false negatives" that the comment already talks about. What do you think?
On Fri, Mar 22, 2024 at 11:49 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 22.03.24 01:35, Paul Jungwirth wrote: > > > 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to > > ri_AttributesEqual(): > > > > > > - if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], > > RIAttType(rel, attnums[i]), > > > - oldvalue, newvalue)) > > > + if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]), > > > + newvalue, oldvalue)) > > > > > > But the declared arguments of ri_AttributesEqual() are oldvalue and > > newvalue, so passing them > > > backwards is really confusing. And the change does matter in the tests. > > > > > > Can we organize this better? > > > > I renamed the params and actually the whole function. All it's doing is > > execute `oldvalue op newvalue`, casting if necessary. So I changed it to > > ri_CompareWithCast and added some documentation. In an earlier version > > of this patch I had a separate function for the PERIOD comparison, but > > it's just doing the same thing, so I think the best thing is to give the > > function a more accurate name and use it. > > Ok, I see now, and the new explanation is better. > > But after reading the comment in the function about collations, I think > there could be trouble. As long as we are only comparing for equality > (and we don't support nondeterministic global collations), then we can > use any collation to compare for equality. But if we are doing > contained-by, then the collation does matter, so we would need to get > the actual collation somehow. So as written, this might not always work > correctly. > > I think it would be safer for now if we just kept using the equality > operation even for temporal foreign keys. If we did that, then in the > case that you update a key to a new value that is contained by the old > value, this function would say "not equal" and fire all the checks, even > though it wouldn't need to. This is kind of similar to the "false > negatives" that the comment already talks about. > > What do you think? > we don't need to worry about primary key and foreign key with different collation. because it will be error out as incompatible data type, foreign key constraint will not be created. if there are the same collation, when we build the query string, we don't need to worry about collation. because at runtime, the operator associated oprcode will fetch collation information later. main operator and the main oprcode related to this patch(0001, 0002) are: range_contained_by_multirange range_eq range_overlaps range_contained_by the first 3 functions will fetch collation information within range_cmp_bounds. range_contained_by will fetch collation information in range_contains_elem_internal. demo: CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false); DROP TABLE IF exists temporal_fk_rng2rng; DROP TABLE IF exists temporal_rng; DROP TYPE textrange_case_insensitive; create type textrange_case_insensitive as range(subtype=text, collation=case_insensitive); CREATE TABLE temporal_rng (id int4range, valid_at textrange_case_insensitive); ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); CREATE TABLE temporal_fk_rng2rng ( id int4range, valid_at textrange_case_insensitive, parent_id int4range, CONSTRAINT temporal_fk_rng2rng_fk2 FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng (id, PERIOD valid_at) ); INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', textrange_case_insensitive('c', 'h','[]')); --fail INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', textrange_case_insensitive('B', 'B','[]'), '[1,2)'); --fail. INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', textrange_case_insensitive('a', 'F','[]'), '[1,2)'); --fail. INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', textrange_case_insensitive('e', 'Z','[]'), '[1,2)'); --ok INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[1,2)', textrange_case_insensitive('d', 'F','[]'), '[1,2)');
v33 attached with minor changes. On 3/22/24 20:02, jian he wrote: > On Fri, Mar 22, 2024 at 11:49 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> But after reading the comment in the function about collations, I think >> there could be trouble. As long as we are only comparing for equality >> (and we don't support nondeterministic global collations), then we can >> use any collation to compare for equality. But if we are doing >> contained-by, then the collation does matter, so we would need to get >> the actual collation somehow. So as written, this might not always work >> correctly. >> >> I think it would be safer for now if we just kept using the equality >> operation even for temporal foreign keys. If we did that, then in the >> case that you update a key to a new value that is contained by the old >> value, this function would say "not equal" and fire all the checks, even >> though it wouldn't need to. This is kind of similar to the "false >> negatives" that the comment already talks about. >> >> What do you think? >> > > we don't need to worry about primary key and foreign key with > different collation. > because it will be error out as incompatible data type, > foreign key constraint will not be created. I agree with jian he here. Here is my own investigation: Rangetypes themselves are never collatable (see DefineRange in commands/typecmds.c). But rangetypes do store a collation for their base type. So you can say: paul=# create type textrange as range (subtype = text, collation = "C"); CREATE TYPE That is stored in pg_range.rngcollation, but pg_type.typcollation is always zero. So putting a collection on a rangetype column is an error: paul=# create table t (r1 textrange collate "en-US-x-icu"); ERROR: collations are not supported by type textrange And so is using an ad hoc collation with an operator: paul=# select '[J,J]'::textrange <@ '[a,z]'::textrange collate "en-US-x-icu"; ERROR: collations are not supported by type textrange LINE 1: select '[J,J]'::textrange <@ '[a,z]'::textrange collate "en-... Almost everything ranges do is built on range_cmp_bounds, which uses the base type's collation. There is no way to use a different one. So when ri_CompareWithCast calls `lhs <@ rhs`, it is using the collation for that range's base type. Indexes will use the same collation. You also can't mix different range types. Our textrange puts (English) lowercase after uppercase: paul=# select '[j,j]'::textrange <@ '[a,z]'::textrange; ?column? ---------- t (1 row) paul=# select '[J,J]'::textrange <@ '[a,z]'::textrange; ?column? ---------- f (1 row) We could create a rangetype that intermingles uppercase & lower: paul=# create type itextrange as range (subtype = text, collation = "en-US-x-icu"); CREATE TYPE paul=# select '[J,J]'::itextrange <@ '[a,z]'::itextrange; ?column? ---------- t (1 row) But you can't mix them: paul=# select '[J,J]'::itextrange <@ '[a,z]'::textrange; ERROR: operator does not exist: itextrange <@ textrange LINE 1: select '[J,J]'::itextrange <@ '[a,z]'::textrange; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. Even if I create casts, mixing still fails: paul=# create cast (textrange as itextrange) without function as implicit; CREATE CAST paul=# create cast (itextrange as textrange) without function as implicit; CREATE CAST paul=# select '[J,J]'::itextrange <@ '[a,z]'::textrange; ERROR: operator does not exist: itextrange <@ textrange LINE 1: select '[J,J]'::itextrange <@ '[a,z]'::textrange; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. That's because the operator parameters are anyrange, and in can_coerce_type we call check_generic_type_consistency which doesn't use casts. It just asks if all the concrete range types are the same (as with other polymorphic types). Adding a foreign key runs the same check: paul=# create table pk (id int4range, valid_at textrange, constraint pkpk primary key (id, valid_at without overlaps)); CREATE TABLE paul=# create table fk (id int4range, valid_at itextrange, parent_id int4range); CREATE TABLE paul=# alter table fk add constraint fkfk foreign key (parent_id, period valid_at) references pk; ERROR: foreign key constraint "fkfk" cannot be implemented DETAIL: Key columns "valid_at" and "valid_at" are of incompatible types: itextrange and textrange. I guess the user could define their own `textrange <@ itextrange` operator, using the lhs collation. We would choose that operator for pfeqop but not ppeqop or ffeqop. And we use ffeqop here, which would allow us to skip a check that pfeqop would fail. Is that an issue? It feels like the user is doing their best to get nonsense results at that point, and it's not really about the collation per se. Incidentally here is another separate issue with foreign keys and collations I noticed this morning: https://www.postgresql.org/message-id/78d824e0-b21e-480d-a252-e4b84bc2c24b%40illuminatedcomputing.com That comes from nondeterministic collations, which feel like a troublesome thing here. Probably foreign keys just weren't fully re-thought when we added them. But we avoid the issue from 59a85cb4 (discussion at https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e89a@2ndquadrant.com) about cascading changes when a PK experiences a not-binary-identical change that the collation considers equal. These days we only call ri_CompareWithCast for changes on the FK side. Now this is a long chain of reasoning to say rangetypes are safe. I added a comment. Note it doesn't apply to arbitrary types, so if we support those eventually we should just require a recheck always, or alternately use equals, not containedby. (That would require storing equals somewhere. It could go in ffeqop, but that feels like a footgun since pfeqop and ppeqop need overlaps.) On 3/21/24 22:33, jian he wrote: > i think on update restrict, on delete restrict cannot be deferred, > even if you set it DEFERRABLE INITIALLY DEFERRED. > based on this idea, I made minor change on > v32-0002-Support-multiranges-in-temporal-FKs.patch Okay, added those tests too. Thanks! Rebased to 697f8d266c. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On 23.03.24 18:42, Paul Jungwirth wrote: > Now this is a long chain of reasoning to say rangetypes are safe. I > added a comment. Note it doesn't apply to arbitrary types, so if we > support those eventually we should just require a recheck always, or > alternately use equals, not containedby. (That would require storing > equals somewhere. It could go in ffeqop, but that feels like a footgun > since pfeqop and ppeqop need overlaps.) Ok, this explanation is good enough for now. I have committed the patches v33-0001-Add-temporal-FOREIGN-KEYs.patch and v33-0002-Support-multiranges-in-temporal-FKs.patch (together).
On Sun, Mar 24, 2024 at 1:42 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > v33 attached with minor changes. > > Okay, added those tests too. Thanks! > > Rebased to 697f8d266c. > hi. minor issues I found in v33-0003. there are 29 of {check_amproc_signature?.*false} only one {check_amproc_signature(procform->amproc, opcintype, true} is this refactoring really worth it? We also need to refactor gistadjustmembers? + <row> + <entry><function>intersect</function></entry> + <entry>computes intersection with <literal>FOR PORTION OF</literal> + bounds</entry> + <entry>13</entry> + </row> + <row> + <entry><function>without_portion</function></entry> + <entry>computes remaining duration(s) outside + <literal>FOR PORTION OF</literal> bounds</entry> + <entry>14</entry> + </row> needs to add "(optional)". +<programlisting> +Datum +my_range_intersect(PG_FUNCTION_ARGS) +{ + RangeType *r1 = PG_GETARG_RANGE_P(0); + RangeType *r2 = PG_GETARG_RANGE_P(1); + TypeCacheEntry *typcache; + + /* Different types should be prevented by ANYRANGE matching rules */ + if (RangeTypeGetOid(r1) != RangeTypeGetOid(r2)) elog(ERROR, "range types do not match"); + + typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1)); + + PG_RETURN_RANGE_P(range_intersect_internal(typcache, r1, r2)); +} +</programlisting> the elog, ERROR indentation is wrong? +/* + * range_without_portion_internal - Sets outputs and outputn to the ranges + * remaining and their count (respectively) after subtracting r2 from r1. + * The array should never contain empty ranges. + * The outputs will be ordered. We expect that outputs is an array of + * RangeType pointers, already allocated with two slots. + */ +void +range_without_portion_internal(TypeCacheEntry *typcache, RangeType *r1, + RangeType *r2, RangeType **outputs, int *outputn) the comments need to be refactored? there is nothing related to "slot"? not sure the "array" description is right. (my understanding is compute rangetype r1 and r2, and save the result to RangeType **outputs. select proisstrict, proname from pg_proc where proname = 'range_without_portion'; range_without_portion is strict. but select range_without_portion(NULL::int4range, int4range(11, 20,'[]')); return zero rows. Is this the expected behavior? 0003 seems simple enough. but it's more related to "for portion of". not sure we can push 0003 into v17.
On 3/24/24 00:38, Peter Eisentraut wrote:> I have committed the patches > v33-0001-Add-temporal-FOREIGN-KEYs.patch and v33-0002-Support-multiranges-in-temporal-FKs.patch > (together). Hi Hackers, I found some problems with temporal primary keys and the idea of uniqueness, especially around the indisunique column. Here are some small fixes and a proposal for a larger fix, which I think we need but I'd like some feedback on. The first patch fixes problems with ON CONFLICT DO NOTHING/UPDATE. DO NOTHING fails because it doesn't expect a non-btree unique index. It's fine to make it accept a temporal PRIMARY KEY/UNIQUE index though (i.e. an index with both indisunique and indisexclusion). This is no different than an exclusion constraint. So I skip BuildSpeculativeIndexInfo for WITHOUT OVERLAPS indexes. (Incidentally, AFAICT ii_UniqueOps is never used, only ii_UniqueProcs. Right?) We should still forbid temporally-unique indexes for ON CONFLICT DO UPDATE, since there may be more than one row that conflicts. Ideally we would find and update all the conflicting rows, but we don't now, and we don't want to update just one: postgres=# create table t (id int4range, valid_at daterange, name text, constraint tpk primary key (id, valid_at without overlaps)); CREATE TABLE postgres=# insert into t values ('[1,2)', '[2000-01-01,2001-01-01)', 'a'), ('[1,2)', '[2001-01-01,2002-01-01)', 'b'); INSERT 0 2 postgres=# insert into t values ('[1,2)', '[2000-01-01,2002-01-01)', 'c') on conflict (id, valid_at) do update set name = excluded.name; INSERT 0 1 postgres=# select * from t; id | valid_at | name -------+-------------------------+------ [1,2) | [2001-01-01,2002-01-01) | b [1,2) | [2000-01-01,2001-01-01) | c (2 rows) So I also added code to prevent that. This is just preserving the old behavior for exclusion constraints, which was bypassed because of indisunique. All this is in the first patch. That got me thinking about indisunique and where else it could cause problems. Perhaps there are other places that assume only b-trees are unique. I couldn't find anywhere that just gives an error like ON CONFLICT, but I can imagine more subtle problems. A temporal PRIMARY KEY or UNIQUE constraint is unique in at least three ways: It is *metaphorically* unique: the conceit is that the scalar part is unique at every moment in time. You may have id 5 in your table more than once, as long as the records' application times don't overlap. And it is *officially* unique: the standard calls these constraints unique. I think it is correct for us to report them as unique in pg_index. But is it *literally* unique? Well two identical keys, e.g. (5, '[Jan24,Mar24)') and (5, '[Jan24,Mar24)'), do have overlapping ranges, so the second is excluded. Normally a temporal unique index is *more* restrictive than a standard one, since it forbids other values too (e.g. (5, '[Jan24,Feb24)')). But sadly there is one exception: the ranges in these keys do not overlap: (5, 'empty'), (5, 'empty'). With ranges/multiranges, `'empty' && x` is false for all x. You can add that key as many times as you like, despite a PK/UQ constraint: postgres=# insert into t values ('[1,2)', 'empty', 'foo'), ('[1,2)', 'empty', 'bar'); INSERT 0 2 postgres=# select * from t; id | valid_at | name -------+----------+------ [1,2) | empty | foo [1,2) | empty | bar (2 rows) Cases like this shouldn't actually happen for temporal tables, since empty is not a meaningful value. An UPDATE/DELETE FOR PORTION OF would never cause an empty. But we should still make sure they don't cause problems. One place we should avoid temporally-unique indexes is REPLICA IDENTITY. Fortunately we already do that, but patch 2 adds a test to keep it that way. Uniqueness is an important property to the planner, too. We consider indisunique often for estimates, where it needn't be 100% true. Even if there are nullable columns or a non-indimmediate index, it still gives useful stats. Duplicates from 'empty' shouldn't cause any new problems there. In proof code we must be more careful. Patch 3 updates relation_has_unique_index_ext and rel_supports_distinctness to disqualify WITHOUT OVERLAPS indexes. Maybe that's more cautious than needed, but better safe than sorry. This patch has no new test though. I had trouble writing SQL that was wrong before its change. I'd be happy for help here! Another problem is GROUP BY and functional dependencies. This is wrong: postgres=# create table a (id int4range, valid_at daterange, name text, constraint apk primary key (id, valid_at without overlaps)); CREATE TABLE postgres=# insert into a values ('[1,2)', 'empty', 'foo'), ('[1,2)', 'empty', 'bar'); INSERT 0 2 postgres=# select * from a group by id, valid_at; id | valid_at | name -------+----------+------ [1,2) | empty | foo (1 row) One fix is to return false from check_functional_grouping for WITHOUT OVERLAPS primary keys. But I think there is a better fix that is less ad hoc. We should give temporal primary keys an internal CHECK constraint saying `NOT isempty(valid_at)`. The problem is analogous to NULLs in parts of a primary key. NULLs prevent two identical keys from ever comparing as equal. And just as a regular primary key cannot contain NULLs, so a temporal primary key should not contain empties. The standard effectively prevents this with PERIODs, because a PERIOD adds a constraint saying start < end. But our ranges enforce only start <= end. If you say `int4range(4,4)` you get `empty`. If we constrain primary keys as I'm suggesting, then they are literally unique, and indisunique seems safer. Should we add the same CHECK constraint to temporal UNIQUE indexes? I'm inclined toward no, just as we don't forbid NULLs in parts of a UNIQUE key. We should try to pick what gives users more options, when possible. Even if it is questionably meaningful, I can see use cases for allowing empty ranges in a temporal table. For example it lets you "disable" a row, preserving its values but marking it as never true. Also it gives you a way to make a non-temporal foreign key reference to a temporal table. Normally temporal tables are "contagious", which is annoying. But if the referencing table had 'empty' for its temporal part, then references should succeed. For example this is true: 'empty'::daterange <@ '[2000-01-01,2001-01-01)'. (Technically this would require a small change to our FK SQL, because we do `pkperiod && fkperiod` as an optimization (to use the index more fully), and we would need to skip that when fkperiod is empty.) Finally, if we have a not-empty constraint on our primary keys, then the GROUP BY problem above goes away. And we can still use temporal primary keys in proofs (but maybe not other temporally-unique indexes). We can allow them in relation_has_unique_index_ext/rel_supports_distinctness. The drawback to putting a CHECK constraint on just PKs and not UNIQUEs is that indisunique may not be literally unique for them, if they have empty ranges. But even for traditional UNIQUE constraints, indisunique can be misleading: If they have nullable parts, identical keys are still "unique", so the code is already careful about them. Do note though the problems come from 'empty' values, not nullable values, so there might still be some planner rules we need to correct. Another drawback is that by using isempty we're limiting temporal PKs to just ranges and multiranges, whereas currently any type with appropriate operators is allowed. But since we decided to limit FKs already, I think this is okay. We can open it back up again later if we like (e.g. by adding a support function for the isempty concept). I'll start working on a patch for this too, but I'd be happy for early feedback/objections/etc. I guess an alternative would be to add a new operator, say &&&, that is the same as overlaps, except 'empty' overlaps everything instead of nothing. In a way that seems more consistent with <@. (How can a range contain something if it doesn't overlap it?) I don't love that a key like (5, 'empty') would conflict with every other 5, but you as I said it's not a meaningful value in a temporal table anyway. Or you could have 'empty' overlap nothing except itself. Maybe I prefer this solution to an internal CHECK constraint, but it feels like it has more unknown unknowns. Thoughts? Also I suspect there are still places where indisunique causes problems. I'll keep looking for them, but if others have thoughts please let me know. Patches here are generated against c627d944e6. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On Wed, Apr 3, 2024 at 1:30 PM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 3/24/24 00:38, Peter Eisentraut wrote:> I have committed the patches > > v33-0001-Add-temporal-FOREIGN-KEYs.patch and v33-0002-Support-multiranges-in-temporal-FKs.patch > > (together). > > Hi Hackers, > > I found some problems with temporal primary keys and the idea of uniqueness, especially around the > indisunique column. Here are some small fixes and a proposal for a larger fix, which I think we need > but I'd like some feedback on. > > The first patch fixes problems with ON CONFLICT DO NOTHING/UPDATE. > > DO NOTHING fails because it doesn't expect a non-btree unique index. It's fine to make it accept a > temporal PRIMARY KEY/UNIQUE index though (i.e. an index with both indisunique and indisexclusion). > This is no different than an exclusion constraint. So I skip BuildSpeculativeIndexInfo for WITHOUT > OVERLAPS indexes. (Incidentally, AFAICT ii_UniqueOps is never used, only ii_UniqueProcs. Right?) > hi. for unique index, primary key: ii_ExclusionOps, ii_UniqueOps is enough to distinguish this index support without overlaps, we don't need another ii_HasWithoutOverlaps? (i didn't test it though) ON CONFLICT DO NOTHING ON CONFLICT (id, valid_at) DO NOTHING ON CONFLICT ON CONSTRAINT temporal_rng_pk DO NOTHING I am confused by the test. here temporal_rng only has one primary key, ON CONFLICT only deals with it. I thought these three are the same thing? DROP TABLE temporal_rng; CREATE TABLE temporal_rng (id int4range,valid_at daterange); ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); +-- ON CONFLICT +-- +TRUNCATE temporal_rng; +INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', daterange('2000-01-01', '2010-01-01')); +-- with a conflict +INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', daterange('2005-01-01', '2006-01-01')) ON CONFLICT DO NOTHING; +-- id matches but no conflict +TRUNCATE temporal_rng; +INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', daterange('2000-01-01', '2010-01-01')); +-- with a conflict +INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', daterange('2005-01-01', '2006-01-01')) ON CONFLICT (id, valid_at) DO NOTHING; +ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification +TRUNCATE temporal_rng; +INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', daterange('2000-01-01', '2010-01-01')); +-- with a conflict +INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', daterange('2005-01-01', '2006-01-01')) ON CONFLICT ON CONSTRAINT temporal_rng_pk DO NOTHING;
On Wed, Apr 3, 2024 at 1:30 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > I found some problems with temporal primary keys and the idea of uniqueness, especially around the > indisunique column. Here are some small fixes and a proposal for a larger fix, which I think we need > but I'd like some feedback on. I think this thread should be added to the open items list. You're raising questions about whether the feature that was committed to this release is fully correct. If it isn't, we shouldn't release it without fixing it. https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items -- Robert Haas EDB: http://www.enterprisedb.com
On 4/26/24 12:25, Robert Haas wrote: > I think this thread should be added to the open items list. Thanks! I sent a request to pgsql-www to get edit permission. I didn't realize there was a wiki page tracking things like this. I agree it needs to be fixed if we want to include the feature. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Fri, Apr 26, 2024 at 3:41 PM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > On 4/26/24 12:25, Robert Haas wrote: > > I think this thread should be added to the open items list. > > Thanks! I sent a request to pgsql-www to get edit permission. I didn't realize there was a wiki page > tracking things like this. I agree it needs to be fixed if we want to include the feature. Great, I see that it's on the list now. Peter, could you have a look at http://postgr.es/m/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com and express an opinion about whether each of those proposals are (a) good or bad ideas and (b) whether they need to be fixed for the current release? Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
On 4/30/24 09:24, Robert Haas wrote: > Peter, could you have a look at > http://postgr.es/m/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com > and express an opinion about whether each of those proposals are (a) > good or bad ideas and (b) whether they need to be fixed for the > current release? Here are the same patches but rebased. I've added a fourth which is my progress on adding the CHECK constraint. I don't really consider it finished though, because it has these problems: - The CHECK constraint should be marked as an internal dependency of the PK, so that you can't drop it, and it gets dropped when you drop the PK. I don't see a good way to tie the two together though, so I'd appreciate any advice there. They are separate AlterTableCmds, so how do I get the ObjectAddress of both constraints at the same time? I wanted to store the PK's ObjectAddress on the Constraint node, but since ObjectAddress isn't a Node it doesn't work. - The CHECK constraint should maybe be hidden when you say `\d foo`? Or maybe not, but that's what we do with FK triggers. - When you create partitions you get a warning about the constraint already existing, because it gets created via the PK and then also the partitioning code tries to copy it. Solving the first issue here should solve this nicely though. Alternately we could just fix the GROUP BY functional dependency code to only accept b-tree indexes. But I think the CHECK constraint approach is a better solution. Thanks, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On Wed, May 1, 2024 at 12:39 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 4/30/24 09:24, Robert Haas wrote: > > Peter, could you have a look at > > http://postgr.es/m/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com > > and express an opinion about whether each of those proposals are (a) > > good or bad ideas and (b) whether they need to be fixed for the > > current release? > > Here are the same patches but rebased. I've added a fourth which is my progress on adding the CHECK > constraint. I don't really consider it finished though, because it has these problems: > > - The CHECK constraint should be marked as an internal dependency of the PK, so that you can't drop > it, and it gets dropped when you drop the PK. I don't see a good way to tie the two together though, > so I'd appreciate any advice there. They are separate AlterTableCmds, so how do I get the > ObjectAddress of both constraints at the same time? I wanted to store the PK's ObjectAddress on the > Constraint node, but since ObjectAddress isn't a Node it doesn't work. > > - The CHECK constraint should maybe be hidden when you say `\d foo`? Or maybe not, but that's what > we do with FK triggers. > > - When you create partitions you get a warning about the constraint already existing, because it > gets created via the PK and then also the partitioning code tries to copy it. Solving the first > issue here should solve this nicely though. > > Alternately we could just fix the GROUP BY functional dependency code to only accept b-tree indexes. > But I think the CHECK constraint approach is a better solution. > I will consider these issues later. The following are general ideas after applying your patches. CREATE TABLE temporal_rng1( id int4range, valid_at daterange, CONSTRAINT temporal_rng1_pk unique (id, valid_at WITHOUT OVERLAPS) ); insert into temporal_rng1(id, valid_at) values (int4range '[1,1]', 'empty'::daterange), ('[1,1]', 'empty'); table temporal_rng1; id | valid_at -------+---------- [1,2) | empty [1,2) | empty (2 rows) i hope i didn't miss something: exclude the 'empty' special value, WITHOUT OVERLAP constraint will be unique and is more restrictive? if so, then adding a check constraint to make the WITHOUT OVERLAP not include the special value 'empty' is better than writing a doc explaining that on some special occasion, a unique constraint is not meant to be unique ? in here https://www.postgresql.org/docs/devel/ddl-constraints.html#DDL-CONSTRAINTS-UNIQUE-CONSTRAINTS says: << Unique constraints ensure that the data contained in a column, or a group of columns, is unique among all the rows in the table. << + /* + * The WITHOUT OVERLAPS part (if any) must be + * a range or multirange type. + */ + if (constraint->without_overlaps && lc == list_last_cell(constraint->keys)) + { + Oid typid = InvalidOid; + + if (!found && cxt->isalter) + { + /* + * Look up the column type on existing table. + * If we can't find it, let things fail in DefineIndex. + */ + Relation rel = cxt->rel; + for (int i = 0; i < rel->rd_att->natts; i++) + { + Form_pg_attribute attr = TupleDescAttr(rel->rd_att, i); + const char *attname; + + if (attr->attisdropped) + break; + + attname = NameStr(attr->attname); + if (strcmp(attname, key) == 0) + { + typid = attr->atttypid; + break; + } + } + } + else + typid = typenameTypeId(NULL, column->typeName); + + if (OidIsValid(typid) && !type_is_range(typid) && !type_is_multirange(typid)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("column \"%s\" in WITHOUT OVERLAPS is not a range or multirange type", key), + parser_errposition(cxt->pstate, constraint->location))); + } + if (attr->attisdropped) + break; it will break the loop? but here you want to continue the loop? + if (OidIsValid(typid) && !type_is_range(typid) && !type_is_multirange(typid)) didn't consider the case where typid is InvalidOid, maybe we can simplify to + if (!type_is_range(typid) && !type_is_multirange(typid)) + notnullcmds = lappend(notnullcmds, notemptycmd); seems weird. we can imitate notnullcmds related logic for notemptycmd, not associated notnullcmds in any way.
On Wed, May 1, 2024 at 12:39 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 4/30/24 09:24, Robert Haas wrote: > > Peter, could you have a look at > > http://postgr.es/m/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com > > and express an opinion about whether each of those proposals are (a) > > good or bad ideas and (b) whether they need to be fixed for the > > current release? > > Here are the same patches but rebased. I've added a fourth which is my progress on adding the CHECK > constraint. I don't really consider it finished though, because it has these problems: > > - The CHECK constraint should be marked as an internal dependency of the PK, so that you can't drop > it, and it gets dropped when you drop the PK. I don't see a good way to tie the two together though, > so I'd appreciate any advice there. They are separate AlterTableCmds, so how do I get the > ObjectAddress of both constraints at the same time? I wanted to store the PK's ObjectAddress on the > Constraint node, but since ObjectAddress isn't a Node it doesn't work. > hi. I hope I understand the problem correctly. my understanding is that we are trying to solve a corner case: create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS)); insert into t values ('[1,2]','empty'), ('[1,2]','empty'); I think the entry point is ATAddCheckNNConstraint and index_create. in a chain of DDL commands, you cannot be sure which one (primary key constraint or check constraint) is being created first, you just want to make sure that after both constraints are created, then add a dependency between primary key and check constraint. so you need to validate at different functions (ATAddCheckNNConstraint, index_create) that these two constraints are indeed created, only after that we have a dependency linking these two constraints. I've attached a patch trying to solve this problem. the patch is not totally polished, but works as expected, and also has lots of comments.
Attachment
On 30.04.24 18:39, Paul Jungwirth wrote: > On 4/30/24 09:24, Robert Haas wrote: >> Peter, could you have a look at >> http://postgr.es/m/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com >> and express an opinion about whether each of those proposals are (a) >> good or bad ideas and (b) whether they need to be fixed for the >> current release? > > Here are the same patches but rebased. I have committed v2-0002-Add-test-for-REPLICA-IDENTITY-with-a-temporal-key.patch. About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, I think the ideas are right, but I wonder if we can fine-tune the new conditionals a bit. --- a/src/backend/executor/execIndexing.c +++ b/src/backend/executor/execIndexing.c @@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative) * If the indexes are to be used for speculative insertion, add extra * information required by unique index entries. */ - if (speculative && ii->ii_Unique) + if (speculative && ii->ii_Unique && !ii->ii_HasWithoutOverlaps) BuildSpeculativeIndexInfo(indexDesc, ii); Here, I think we could check !indexDesc->rd_index->indisexclusion instead. So we wouldn't need ii_HasWithoutOverlaps. Or we could push this into BuildSpeculativeIndexInfo(); it could just skip the rest if an exclusion constraint is passed, on the theory that all the speculative index info is already present in that case. --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root) */ if (indexOidFromConstraint == idxForm->indexrelid) { - if (!idxForm->indisunique && onconflict->action == ONCONFLICT_UPDATE) + if ((!idxForm->indisunique || idxForm->indisexclusion) && onconflict->action == ONCONFLICT_UPDATE) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("ON CONFLICT DO UPDATE not supported with exclusion constraints"))); Shouldn't this use only idxForm->indisexclusion anyway? Like + if (idxForm->indisexclusion && onconflict->action == ONCONFLICT_UPDATE) That matches what the error message is reporting afterwards. * constraints), so index under consideration can be immediately * skipped if it's not unique */ - if (!idxForm->indisunique) + if (!idxForm->indisunique || idxForm->indisexclusion) goto next; Maybe here we need a comment. Or make that a separate statement, like /* not supported yet etc. */ if (idxForm->indixexclusion) next;
Here are a couple new patches, rebased to e305f715, addressing Peter's feedback. I'm still working on integrating jian he's suggestions for the last patch, so I've omitted that one here. On 5/8/24 06:51, Peter Eisentraut wrote: > About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, I think the > ideas are right, but I wonder if we can fine-tune the new conditionals a bit. > > --- a/src/backend/executor/execIndexing.c > +++ b/src/backend/executor/execIndexing.c > @@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative) > * If the indexes are to be used for speculative insertion, add extra > * information required by unique index entries. > */ > - if (speculative && ii->ii_Unique) > + if (speculative && ii->ii_Unique && !ii->ii_HasWithoutOverlaps) > BuildSpeculativeIndexInfo(indexDesc, ii); > > Here, I think we could check !indexDesc->rd_index->indisexclusion instead. So we > wouldn't need ii_HasWithoutOverlaps. Okay. > Or we could push this into BuildSpeculativeIndexInfo(); it could just skip the rest > if an exclusion constraint is passed, on the theory that all the speculative index > info is already present in that case. I like how BuildSpeculativeIndexInfo starts with an Assert that it's given a unique index, so I've left the check outside the function. This seems cleaner anyway: the function stays more focused. > --- a/src/backend/optimizer/util/plancat.c > +++ b/src/backend/optimizer/util/plancat.c > @@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root) > */ > if (indexOidFromConstraint == idxForm->indexrelid) > { > - if (!idxForm->indisunique && onconflict->action == ONCONFLICT_UPDATE) > + if ((!idxForm->indisunique || idxForm->indisexclusion) && onconflict->action == > ONCONFLICT_UPDATE) > ereport(ERROR, > (errcode(ERRCODE_WRONG_OBJECT_TYPE), > errmsg("ON CONFLICT DO UPDATE not supported with exclusion constraints"))); > > Shouldn't this use only idxForm->indisexclusion anyway? Like > > + if (idxForm->indisexclusion && onconflict->action == ONCONFLICT_UPDATE) > > That matches what the error message is reporting afterwards. Agreed. > * constraints), so index under consideration can be immediately > * skipped if it's not unique > */ > - if (!idxForm->indisunique) > + if (!idxForm->indisunique || idxForm->indisexclusion) > goto next; > > Maybe here we need a comment. Or make that a separate statement, like Yes, that is nice. Done. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
Hi, I haven't really been following this thread, but after playing around a bit with the feature I feel there are new gaps in error messages. I also think there are gaps in the functionality regarding the (lack of) support for CREATE UNIQUE INDEX, and attaching these indexes to constraints. pg=# CREATE TABLE temporal_testing ( pg(# id bigint NOT NULL pg(# generated always as identity, pg(# valid_during tstzrange pg(# ); CREATE TABLE pg=# ALTER TABLE temporal_testing pg-# ADD CONSTRAINT temp_unique UNIQUE (id, valid_during WITHOUT OVERLAPS); ALTER TABLE pg=# \d+ temp_unique Index "public.temp_unique" Column | Type | Key? | Definition | Storage | Stats target --------------+-------------+------+--------------+----------+-------------- id | gbtreekey16 | yes | id | plain | valid_during | tstzrange | yes | valid_during | extended | unique, gist, for table "public.temporal_testing" -- ^^ note the "unique, gist" pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during); ERROR: access method "gist" does not support unique indexes Here we obviously have a unique GIST index in the catalogs, but they're "not supported" by GIST when we try to create such index ourselves (!). Either the error message needs updating, or we need to have a facility to actually support creating these unique indexes outside constraints. Additionally, because I can't create my own non-constraint-backing unique GIST indexes, I can't pre-create my unique constraints CONCURRENTLY as one could do for the non-temporal case: UNIQUE constraints hold ownership of the index and would drop the index if the constraint is dropped, too, and don't support a CONCURRENTLY modifier, nor an INVALID modifier. This means temporal unique constraints have much less administrative wiggle room than normal unique constraints, and I think that's not great. Kind regards, Matthias van de Meent.
I have committed the v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch from this (confusingly, there was also a v2 earlier in this thread), and I'll continue working on the remaining items. On 09.05.24 06:24, Paul Jungwirth wrote: > Here are a couple new patches, rebased to e305f715, addressing Peter's > feedback. I'm still working on integrating jian he's suggestions for the > last patch, so I've omitted that one here. > > On 5/8/24 06:51, Peter Eisentraut wrote: >> About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, >> I think the >> ideas are right, but I wonder if we can fine-tune the new conditionals >> a bit. >> >> --- a/src/backend/executor/execIndexing.c >> +++ b/src/backend/executor/execIndexing.c >> @@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool >> speculative) >> * If the indexes are to be used for speculative >> insertion, add extra >> * information required by unique index entries. >> */ >> - if (speculative && ii->ii_Unique) >> + if (speculative && ii->ii_Unique && >> !ii->ii_HasWithoutOverlaps) >> BuildSpeculativeIndexInfo(indexDesc, ii); >> >> Here, I think we could check !indexDesc->rd_index->indisexclusion >> instead. So we >> wouldn't need ii_HasWithoutOverlaps. > > Okay. > >> Or we could push this into BuildSpeculativeIndexInfo(); it could just >> skip the rest >> if an exclusion constraint is passed, on the theory that all the >> speculative index >> info is already present in that case. > > I like how BuildSpeculativeIndexInfo starts with an Assert that it's > given a unique index, so I've left the check outside the function. This > seems cleaner anyway: the function stays more focused. > >> --- a/src/backend/optimizer/util/plancat.c >> +++ b/src/backend/optimizer/util/plancat.c >> @@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root) >> */ >> if (indexOidFromConstraint == idxForm->indexrelid) >> { >> - if (!idxForm->indisunique && onconflict->action == >> ONCONFLICT_UPDATE) >> + if ((!idxForm->indisunique || idxForm->indisexclusion) && >> onconflict->action == ONCONFLICT_UPDATE) >> ereport(ERROR, >> (errcode(ERRCODE_WRONG_OBJECT_TYPE), >> errmsg("ON CONFLICT DO UPDATE not supported >> with exclusion constraints"))); >> >> Shouldn't this use only idxForm->indisexclusion anyway? Like >> >> + if (idxForm->indisexclusion && onconflict->action == >> ONCONFLICT_UPDATE) >> >> That matches what the error message is reporting afterwards. > > Agreed. > >> * constraints), so index under consideration can be immediately >> * skipped if it's not unique >> */ >> - if (!idxForm->indisunique) >> + if (!idxForm->indisunique || idxForm->indisexclusion) >> goto next; >> >> Maybe here we need a comment. Or make that a separate statement, like > > Yes, that is nice. Done. > > Yours, >
On Mon, May 6, 2024 at 11:01 AM jian he <jian.universality@gmail.com> wrote: > > On Wed, May 1, 2024 at 12:39 AM Paul Jungwirth > <pj@illuminatedcomputing.com> wrote: > > > > On 4/30/24 09:24, Robert Haas wrote: > > > Peter, could you have a look at > > > http://postgr.es/m/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com > > > and express an opinion about whether each of those proposals are (a) > > > good or bad ideas and (b) whether they need to be fixed for the > > > current release? > > > > Here are the same patches but rebased. I've added a fourth which is my progress on adding the CHECK > > constraint. I don't really consider it finished though, because it has these problems: > > > > - The CHECK constraint should be marked as an internal dependency of the PK, so that you can't drop > > it, and it gets dropped when you drop the PK. I don't see a good way to tie the two together though, > > so I'd appreciate any advice there. They are separate AlterTableCmds, so how do I get the > > ObjectAddress of both constraints at the same time? I wanted to store the PK's ObjectAddress on the > > Constraint node, but since ObjectAddress isn't a Node it doesn't work. > > > > hi. > I hope I understand the problem correctly. > my understanding is that we are trying to solve a corner case: > create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS)); > insert into t values ('[1,2]','empty'), ('[1,2]','empty'); > but we still not yet address for cases like: create table t10(a int4range, b int4range, unique (a, b WITHOUT OVERLAPS)); insert into t10 values ('[1,2]','empty'), ('[1,2]','empty'); one table can have more than one temporal unique constraint, for each temporal unique constraint adding a check isempty constraint seems not easy. for example: CREATE TABLE t ( id int4range, valid_at daterange, parent_id int4range, CONSTRAINT t1 unique (id, valid_at WITHOUT OVERLAPS), CONSTRAINT t2 unique (parent_id, valid_at WITHOUT OVERLAPS), CONSTRAINT t3 unique (valid_at, id WITHOUT OVERLAPS), CONSTRAINT t4 unique (parent_id, id WITHOUT OVERLAPS), CONSTRAINT t5 unique (id, parent_id WITHOUT OVERLAPS), CONSTRAINT t6 unique (valid_at, parent_id WITHOUT OVERLAPS) ); add 6 check isempty constraints for table "t" is challenging. so far, I see the challenging part: * alter table alter column data type does not drop previous check isempty constraint, and will also add a check isempty constraint, so overall it will add more check constraints. * adding more check constraints needs a way to resolve naming collisions. Maybe we can just mention that the special 'empty' range value makes temporal unique constraints not "unique". also we can make sure that FOREIGN KEY can only reference primary keys, not unique temporal constraints. so the unique temporal constraints not "unique" implication is limited. I played around with it, we can error out these cases in the function transformFkeyCheckAttrs.
On 5/11/24 17:00, jian he wrote: >> I hope I understand the problem correctly. >> my understanding is that we are trying to solve a corner case: >> create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS)); >> insert into t values ('[1,2]','empty'), ('[1,2]','empty'); >> > > > but we still not yet address for cases like: > create table t10(a int4range, b int4range, unique (a, b WITHOUT OVERLAPS)); > insert into t10 values ('[1,2]','empty'), ('[1,2]','empty'); > > one table can have more than one temporal unique constraint, > for each temporal unique constraint adding a check isempty constraint > seems not easy. I think we should add the not-empty constraint only for PRIMARY KEYs, not all UNIQUE constraints. The empty edge case is very similar to the NULL edge case, and while every PK column must be non-null, we do allow nulls in ordinary UNIQUE constraints. If users want to have 'empty' in those constraints, I think we should let them. And then the problems you give don't arise. > Maybe we can just mention that the special 'empty' range value makes > temporal unique constraints not "unique". Just documenting the behavior is also an okay solution here I think. I see two downsides though: (1) it makes rangetype temporal keys differ from PERIOD temporal keys (2) it could allow more planner/etc bugs than we have thought of. So I think it's worth adding the constraint instead. > also we can make sure that > FOREIGN KEY can only reference primary keys, not unique temporal constraints. > so the unique temporal constraints not "unique" implication is limited. > I played around with it, we can error out these cases in the function > transformFkeyCheckAttrs. I don't think it is a problem to reference a temporal UNIQUE constraint, even if it contains empty values. An empty value means you're not asserting that row at any time (though another row might assert the same thing for some time), so it could never contribute toward fulfilling a reference anyway. I do think it would be nice if the *reference* could contain empty values. Right now the FK SQL will cause that to never match, because we use `&&` as an optimization, but we could tweak the SQL (maybe for v18 instead) so that users could get away with that kind of thing. As I said in an earlier email, this would be you an escape hatch to reference a temporal table from a non-temporal table. Otherwise temporal tables are "contagious," which is a bit of a drawback. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 5/9/24 17:44, Matthias van de Meent wrote: > I haven't really been following this thread, but after playing around > a bit with the feature I feel there are new gaps in error messages. I > also think there are gaps in the functionality regarding the (lack of) > support for CREATE UNIQUE INDEX, and attaching these indexes to > constraints Thank you for trying this out and sharing your thoughts! I think these are good points about CREATE UNIQUE INDEX and then creating the constraint by handing it an existing index. This is something that I am hoping to add, but it's not covered by the SQL:2011 standard, so I think it needs some discussion, and I don't think it needs to go into v17. For instance you are saying: > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during); > ERROR: access method "gist" does not support unique indexes To me that error message seems correct. The programmer hasn't said anything about the special temporal behavior they are looking for. To get non-overlapping semantics from an index, this more explicit syntax seems better, similar to PKs in the standard: > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during WITHOUT OVERLAPS); > ERROR: access method "gist" does not support unique indexes We could also support *non-temporal* unique GiST indexes, particularly now that we have the stratnum support function. Those would use the syntax you gave, omitting WITHOUT OVERLAPS. But that seems like a separate effort to me. > Additionally, because I can't create my own non-constraint-backing > unique GIST indexes, I can't pre-create my unique constraints > CONCURRENTLY as one could do for the non-temporal case: UNIQUE > constraints hold ownership of the index and would drop the index if > the constraint is dropped, too, and don't support a CONCURRENTLY > modifier, nor an INVALID modifier. This means temporal unique > constraints have much less administrative wiggle room than normal > unique constraints, and I think that's not great. This is a great use-case for why we should support this eventually, even if it uses non-standard syntax. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Sun, 12 May 2024 at 05:26, Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > On 5/9/24 17:44, Matthias van de Meent wrote: > > I haven't really been following this thread, but after playing around > > a bit with the feature I feel there are new gaps in error messages. I > > also think there are gaps in the functionality regarding the (lack of) > > support for CREATE UNIQUE INDEX, and attaching these indexes to > > constraints > Thank you for trying this out and sharing your thoughts! I think these are good points about CREATE > UNIQUE INDEX and then creating the constraint by handing it an existing index. This is something > that I am hoping to add, but it's not covered by the SQL:2011 standard, so I think it needs some > discussion, and I don't think it needs to go into v17. Okay. > For instance you are saying: > > > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during); > > ERROR: access method "gist" does not support unique indexes > > To me that error message seems correct. The programmer hasn't said anything about the special > temporal behavior they are looking for. But I showed that I had a GIST index that does have the indisunique flag set, which shows that GIST does support indexes with unique semantics. That I can't use CREATE UNIQUE INDEX to create such an index doesn't mean the feature doesn't exist, which is what the error message implies. > To get non-overlapping semantics from an index, this more > explicit syntax seems better, similar to PKs in the standard: Yes, agreed on that part. > > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during WITHOUT OVERLAPS); > > ERROR: access method "gist" does not support unique indexes > > We could also support *non-temporal* unique GiST indexes, particularly now that we have the stratnum > support function. Those would use the syntax you gave, omitting WITHOUT OVERLAPS. But that seems > like a separate effort to me. No objection on that. Kind regards, Matthias van de Meent
On 5/12/24 05:55, Matthias van de Meent wrote: >> > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during); >> > ERROR: access method "gist" does not support unique indexes >> >> To me that error message seems correct. The programmer hasn't said anything about the special >> temporal behavior they are looking for. > > But I showed that I had a GIST index that does have the indisunique > flag set, which shows that GIST does support indexes with unique > semantics. > > That I can't use CREATE UNIQUE INDEX to create such an index doesn't > mean the feature doesn't exist, which is what the error message > implies. True, the error message is not really telling the truth anymore. I do think most people who hit this error are not thinking about temporal constraints at all though, and for non-temporal constraints it is still true. It's also true for CREATE INDEX, since WITHOUT OVERLAPS is only available on the *constraint*. So how about adding a hint, something like this?: ERROR: access method "gist" does not support unique indexes HINT: To create a unique constraint with non-overlap behavior, use ADD CONSTRAINT ... WITHOUT OVERLAPS. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 5/5/24 20:01, jian he wrote: > hi. > I hope I understand the problem correctly. > my understanding is that we are trying to solve a corner case: > create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS)); > insert into t values ('[1,2]','empty'), ('[1,2]','empty'); > > > I think the entry point is ATAddCheckNNConstraint and index_create. > in a chain of DDL commands, you cannot be sure which one > (primary key constraint or check constraint) is being created first, > you just want to make sure that after both constraints are created, > then add a dependency between primary key and check constraint. > > so you need to validate at different functions > (ATAddCheckNNConstraint, index_create) > that these two constraints are indeed created, > only after that we have a dependency linking these two constraints. > > > I've attached a patch trying to solve this problem. > the patch is not totally polished, but works as expected, and also has > lots of comments. Thanks for this! I've incorporated it into the CHECK constraint patch with some changes. In particular I thought index_create was a strange place to change the conperiod value of a pg_constraint record, and it is not actually needed if we are copying that value correctly. Some other comments on the patch file: > N.B. we also need to have special care for case > where check constraint was readded, e.g. ALTER TYPE. > if ALTER TYPE is altering the PERIOD column of the primary key, > alter column of primary key makes the index recreate, check constraint recreate, > however, former interally also including add a check constraint. > so we need to take care of merging two check constraint. This is a good point. I've included tests for this based on your patch. > N.B. the check constraint name is hard-wired, so if you create the constraint > with the same name, PERIOD primary key cannot be created. Yes, it may be worth doing something like other auto-named constraints and trying to avoid duplicates. I haven't taken that on yet; I'm curious what others have to say about it. > N.B. what about UNIQUE constraint? See my previous posts on this thread about allowing 'empty' in UNIQUE constraints. > N.B. seems ok to not care about FOREIGN KEY regarding this corner case? Agreed. v3 patches attached, rebased to 3ca43dbbb6. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On 5/12/24 08:51, Paul Jungwirth wrote: > On 5/12/24 05:55, Matthias van de Meent wrote: >>> > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during); >>> > ERROR: access method "gist" does not support unique indexes >>> >>> To me that error message seems correct. The programmer hasn't said anything about the special >>> temporal behavior they are looking for. >> >> But I showed that I had a GIST index that does have the indisunique >> flag set, which shows that GIST does support indexes with unique >> semantics. >> >> That I can't use CREATE UNIQUE INDEX to create such an index doesn't >> mean the feature doesn't exist, which is what the error message >> implies. > > True, the error message is not really telling the truth anymore. I do think most people who hit this > error are not thinking about temporal constraints at all though, and for non-temporal constraints it > is still true. It's also true for CREATE INDEX, since WITHOUT OVERLAPS is only available on the > *constraint*. So how about adding a hint, something like this?: > > ERROR: access method "gist" does not support unique indexes > HINT: To create a unique constraint with non-overlap behavior, use ADD CONSTRAINT ... WITHOUT OVERLAPS. I thought a little more about eventually implementing WITHOUT OVERLAPS support for CREATE INDEX, and how it relates to this error message in particular. Even when that is done, it will still depend on the stratnum support function for the keys' opclasses, so the GiST AM itself will still have false amcanunique, I believe. Probably the existing error message is still the right one. The hint won't need to mention ADD CONSTRAINT anymore. It should still point users to WITHOUT OVERLAPS, and possibly the stratnum support function too. I think what we are doing for v17 is all compatible with that plan. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 03.04.24 07:30, Paul Jungwirth wrote: > But is it *literally* unique? Well two identical keys, e.g. (5, > '[Jan24,Mar24)') and (5, '[Jan24,Mar24)'), do have overlapping ranges, > so the second is excluded. Normally a temporal unique index is *more* > restrictive than a standard one, since it forbids other values too (e.g. > (5, '[Jan24,Feb24)')). But sadly there is one exception: the ranges in > these keys do not overlap: (5, 'empty'), (5, 'empty'). With > ranges/multiranges, `'empty' && x` is false for all x. You can add that > key as many times as you like, despite a PK/UQ constraint: > > postgres=# insert into t values > ('[1,2)', 'empty', 'foo'), > ('[1,2)', 'empty', 'bar'); > INSERT 0 2 > postgres=# select * from t; > id | valid_at | name > -------+----------+------ > [1,2) | empty | foo > [1,2) | empty | bar > (2 rows) > > Cases like this shouldn't actually happen for temporal tables, since > empty is not a meaningful value. An UPDATE/DELETE FOR PORTION OF would > never cause an empty. But we should still make sure they don't cause > problems. > We should give temporal primary keys an internal CHECK constraint saying > `NOT isempty(valid_at)`. The problem is analogous to NULLs in parts of a > primary key. NULLs prevent two identical keys from ever comparing as > equal. And just as a regular primary key cannot contain NULLs, so a > temporal primary key should not contain empties. > > The standard effectively prevents this with PERIODs, because a PERIOD > adds a constraint saying start < end. But our ranges enforce only start > <= end. If you say `int4range(4,4)` you get `empty`. If we constrain > primary keys as I'm suggesting, then they are literally unique, and > indisunique seems safer. > > Should we add the same CHECK constraint to temporal UNIQUE indexes? I'm > inclined toward no, just as we don't forbid NULLs in parts of a UNIQUE > key. We should try to pick what gives users more options, when possible. > Even if it is questionably meaningful, I can see use cases for allowing > empty ranges in a temporal table. For example it lets you "disable" a > row, preserving its values but marking it as never true. It looks like we missed some of these fundamental design questions early on, and it might be too late now to fix them for PG17. For example, the discussion on unique constraints misses that the question of null values in unique constraints itself is controversial and that there is now a way to change the behavior. So I imagine there is also a selection of possible behaviors you might want for empty ranges. Intuitively, I don't think empty ranges are sensible for temporal unique constraints. But anyway, it's a bit late now to be discussing this. I'm also concerned that if ranges have this fundamental incompatibility with periods, then the plan to eventually evolve this patch set to support standard periods will also have as-yet-unknown problems. Some of these issues might be design flaws in the underlying mechanisms, like range types and exclusion constraints. Like, if you're supposed to use this for scheduling but you can use empty ranges to bypass exclusion constraints, how is one supposed to use this? Yes, a check constraint using isempty() might be the right answer. But I don't see this documented anywhere. On the technical side, adding an implicit check constraint as part of a primary key constraint is quite a difficult implementation task, as I think you are discovering. I'm just reminded about how the patch for catalogued not-null constraints struggled with linking these not-null constraints to primary keys correctly. This sounds a bit similar. I'm afraid that these issues cannot be resolved in good time for this release, so we should revert this patch set for now.
On 5/13/24 03:11, Peter Eisentraut wrote: > It looks like we missed some of these fundamental design questions early on, and it might be too > late now to fix them for PG17. > > For example, the discussion on unique constraints misses that the question of null values in unique > constraints itself is controversial and that there is now a way to change the behavior. So I > imagine there is also a selection of possible behaviors you might want for empty ranges. > Intuitively, I don't think empty ranges are sensible for temporal unique constraints. But anyway, > it's a bit late now to be discussing this. > > I'm also concerned that if ranges have this fundamental incompatibility with periods, then the plan > to eventually evolve this patch set to support standard periods will also have as-yet-unknown problems. > > Some of these issues might be design flaws in the underlying mechanisms, like range types and > exclusion constraints. Like, if you're supposed to use this for scheduling but you can use empty > ranges to bypass exclusion constraints, how is one supposed to use this? Yes, a check constraint > using isempty() might be the right answer. But I don't see this documented anywhere. > > On the technical side, adding an implicit check constraint as part of a primary key constraint is > quite a difficult implementation task, as I think you are discovering. I'm just reminded about how > the patch for catalogued not-null constraints struggled with linking these not-null constraints to > primary keys correctly. This sounds a bit similar. > > I'm afraid that these issues cannot be resolved in good time for this release, so we should revert > this patch set for now. I think reverting is a good idea. I'm not really happy with the CHECK constraint solution either. I'd be happy to have some more time to rework this for v18. A couple alternatives I'd like to explore: 1. Domain constraints instead of a CHECK constraint. I think this is probably worse, and I don't plan to spend much time on it, but I thought I'd mention it in case someone else thought otherwise. 2. A slightly different overlaps operator, say &&&, where 'empty' &&& 'empty' is true. But 'empty' with anything else could still be false (or not). That operator would prevent duplicates in an exclusion constraint. This also means we could support more types than just ranges & multiranges. I need to think about whether this combines badly with existing operators, but if not it has a lot of promise. If anything it might be *less* contradictory, because it fits better with 'empty' @> 'empty', which we say is true. Another thing a revert would give me some time to consider: even though it's not standard syntax, I wonder if we want to require syntax something like `PRIMARY KEY USING gist (id, valid_at WITHOUT OVERLAPS)`. Everywhere else we default to btree, so defaulting to gist feels a little weird. In theory we could even someday support WITHOUT OVERLAPS with btree, if we taught that AM to answer that question. (I admit there is probably not a lot of desire for that though.) Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Tue, May 14, 2024 at 7:30 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 5/13/24 03:11, Peter Eisentraut wrote: > > It looks like we missed some of these fundamental design questions early on, and it might be too > > late now to fix them for PG17. > > > > For example, the discussion on unique constraints misses that the question of null values in unique > > constraints itself is controversial and that there is now a way to change the behavior. So I > > imagine there is also a selection of possible behaviors you might want for empty ranges. > > Intuitively, I don't think empty ranges are sensible for temporal unique constraints. But anyway, > > it's a bit late now to be discussing this. > > > > I'm also concerned that if ranges have this fundamental incompatibility with periods, then the plan > > to eventually evolve this patch set to support standard periods will also have as-yet-unknown problems. > > > > Some of these issues might be design flaws in the underlying mechanisms, like range types and > > exclusion constraints. Like, if you're supposed to use this for scheduling but you can use empty > > ranges to bypass exclusion constraints, how is one supposed to use this? Yes, a check constraint > > using isempty() might be the right answer. But I don't see this documented anywhere. > > > > On the technical side, adding an implicit check constraint as part of a primary key constraint is > > quite a difficult implementation task, as I think you are discovering. I'm just reminded about how > > the patch for catalogued not-null constraints struggled with linking these not-null constraints to > > primary keys correctly. This sounds a bit similar. > > > > I'm afraid that these issues cannot be resolved in good time for this release, so we should revert > > this patch set for now. > > I think reverting is a good idea. I'm not really happy with the CHECK constraint solution either. > I'd be happy to have some more time to rework this for v18. > > A couple alternatives I'd like to explore: > > 1. Domain constraints instead of a CHECK constraint. I think this is probably worse, and I don't > plan to spend much time on it, but I thought I'd mention it in case someone else thought otherwise. > > 2. A slightly different overlaps operator, say &&&, where 'empty' &&& 'empty' is true. But 'empty' > with anything else could still be false (or not). That operator would prevent duplicates in an > exclusion constraint. This also means we could support more types than just ranges & multiranges. I > need to think about whether this combines badly with existing operators, but if not it has a lot of > promise. If anything it might be *less* contradictory, because it fits better with 'empty' @> > 'empty', which we say is true. > thanks for the idea, I roughly played around with it, seems doable. but the timing seems not good, reverting is a good idea. I also checked the commit. 6db4598fcb82a87a683c4572707e522504830a2b + +/* + * Returns the btree number for equals, otherwise invalid. + */ +Datum +gist_stratnum_btree(PG_FUNCTION_ARGS) +{ + StrategyNumber strat = PG_GETARG_UINT16(0); + + switch (strat) + { + case RTEqualStrategyNumber: + PG_RETURN_UINT16(BTEqualStrategyNumber); + case RTLessStrategyNumber: + PG_RETURN_UINT16(BTLessStrategyNumber); + case RTLessEqualStrategyNumber: + PG_RETURN_UINT16(BTLessEqualStrategyNumber); + case RTGreaterStrategyNumber: + PG_RETURN_UINT16(BTGreaterStrategyNumber); + case RTGreaterEqualStrategyNumber: + PG_RETURN_UINT16(BTGreaterEqualStrategyNumber); + default: + PG_RETURN_UINT16(InvalidStrategy); + } +} the comments seem not right?
On Tue, May 14, 2024 at 01:33:46PM +0800, jian he wrote: > thanks for the idea, I roughly played around with it, seems doable. > but the timing seems not good, reverting is a good idea. Please note that this is still an open item, and that time is running short until beta1. A revert seems to be the consensus reached, so, Peter, are you planning to do so? -- Michael
Attachment
On 15.05.24 08:13, Michael Paquier wrote: > On Tue, May 14, 2024 at 01:33:46PM +0800, jian he wrote: >> thanks for the idea, I roughly played around with it, seems doable. >> but the timing seems not good, reverting is a good idea. > > Please note that this is still an open item, and that time is running > short until beta1. A revert seems to be the consensus reached, so, > Peter, are you planning to do so? I'm on it. Here is the list of patches I have identified to revert: git show --oneline --no-patch 144c2ce0cc7 c3db1f30cba 482e108cd38 34768ee3616 5577a71fb0c a88c800deb6 030e10ff1a3 86232a49a43 46a0cd4cefb 6db4598fcb8 144c2ce0cc7 Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes c3db1f30cba doc: clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE 482e108cd38 Add test for REPLICA IDENTITY with a temporal key 34768ee3616 Add temporal FOREIGN KEY contraints 5577a71fb0c Use half-open interval notation in without_overlaps tests a88c800deb6 Use daterange and YMD in without_overlaps tests instead of tsrange. 030e10ff1a3 Rename pg_constraint.conwithoutoverlaps to conperiod 86232a49a43 Fix comment on gist_stratnum_btree 46a0cd4cefb Add temporal PRIMARY KEY and UNIQUE constraints 6db4598fcb8 Add stratnum GiST support function Attached are the individual revert patches. I'm supplying these here mainly so that future efforts can use those instead of the original patches, since that would have to redo all the conflict resolution and also miss various typo fixes etc. that were applied in the meantime. I will commit this as one squashed patch.
Attachment
- 0001-Revert-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-tempora.patch
- 0002-Revert-doc-clarify-PERIOD-and-WITHOUT-OVERLAPS-in-CR.patch
- 0003-Revert-Add-test-for-REPLICA-IDENTITY-with-a-temporal.patch
- 0004-Revert-Add-temporal-FOREIGN-KEY-contraints.patch
- 0005-Revert-Use-half-open-interval-notation-in-without_ov.patch
- 0006-Revert-Use-daterange-and-YMD-in-without_overlaps-tes.patch
- 0007-Revert-Rename-pg_constraint.conwithoutoverlaps-to-co.patch
- 0008-Revert-Fix-comment-on-gist_stratnum_btree.patch
- 0009-Revert-Add-temporal-PRIMARY-KEY-and-UNIQUE-constrain.patch
- 0010-Revert-Add-stratnum-GiST-support-function.patch
On 15.05.24 11:39, Peter Eisentraut wrote: > Attached are the individual revert patches. I'm supplying these here > mainly so that future efforts can use those instead of the original > patches, since that would have to redo all the conflict resolution and > also miss various typo fixes etc. that were applied in the meantime. I > will commit this as one squashed patch. This has been done.
On Mon, 2024-05-13 at 12:11 +0200, Peter Eisentraut wrote: > Some of these issues might be design flaws in the underlying > mechanisms, > like range types and exclusion constraints. Like, if you're supposed > to > use this for scheduling but you can use empty ranges to bypass > exclusion > constraints, how is one supposed to use this? An empty range does not "bypass" the an exclusion constraint. The exclusion constraint has a documented meaning and it's enforced. Of course there are situations where an empty range doesn't make a lot of sense. For many domains zero doesn't make any sense, either. Consider receiving an email saying "thank you for purchasing 0 widgets!". Check constraints seem like a reasonable way to prevent those kinds of problems. Regards, Jeff Davis
On Thu, May 16, 2024 at 7:22 PM Jeff Davis <pgsql@j-davis.com> wrote: > An empty range does not "bypass" the an exclusion constraint. The > exclusion constraint has a documented meaning and it's enforced. > > Of course there are situations where an empty range doesn't make a lot > of sense. For many domains zero doesn't make any sense, either. > Consider receiving an email saying "thank you for purchasing 0 > widgets!". Check constraints seem like a reasonable way to prevent > those kinds of problems. I think that's true. Having infinitely many events zero-length events scheduled at the same point in time isn't necessarily a problem: I can attend an infinite number of simultaneous meetings if I only need to attend them for exactly zero time. What I think is less clear is what that means for temporal primary keys. As Paul pointed out upthread, in every other case, a temporal primary key is at least as unique as a regular primary key, but in this case, it isn't. And someone might reasonably think that a temporal primary key should exclude empty ranges just as all primary keys exclude nulls. Or they might think the opposite. At least, so it seems to me. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, 21 May 2024 at 13:57, Robert Haas <robertmhaas@gmail.com> wrote:
What I think is less clear is what that means for temporal primary
keys. As Paul pointed out upthread, in every other case, a temporal
primary key is at least as unique as a regular primary key, but in
this case, it isn't. And someone might reasonably think that a
temporal primary key should exclude empty ranges just as all primary
keys exclude nulls. Or they might think the opposite.
Fascinating. I think you're absolutely right that it's clear that two empty intervals don't conflict. If somebody wants to claim two intervals conflict, they need to point to at least one instant in time that is common between them.
But a major point of a primary key, it seems to me, is that it uniquely identifies a row. If items are identified by a time range, non-overlapping or not, then the empty range can only identify one item (per value of whatever other columns are in the primary key). I think for a unique key the non-overlapping restriction has to be considered an additional restriction on top of the usual uniqueness restriction.
I suspect in many applications there will be a non-empty constraint; for example, it seems quite reasonable to me for a meeting booking system to forbid empty meetings. But when they are allowed they should behave in the mathematically appropriate way.
On Tue, 2024-05-21 at 13:57 -0400, Robert Haas wrote: > What I think is less clear is what that means for temporal primary > keys. Right. My message was specifically a response to the concern that there was some kind of design flaw in the range types or exclusion constraints mechanisms. I don't believe that empty ranges represent a design flaw. If they don't make sense for temporal constraints, then temporal constraints should forbid them. Regards, Jeff Davis
On 5/21/24 11:27, Isaac Morland wrote: > On Tue, 21 May 2024 at 13:57, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote: > > What I think is less clear is what that means for temporal primary > keys. As Paul pointed out upthread, in every other case, a temporal > primary key is at least as unique as a regular primary key, but in > this case, it isn't. And someone might reasonably think that a > temporal primary key should exclude empty ranges just as all primary > keys exclude nulls. Or they might think the opposite. > > > Fascinating. I think you're absolutely right that it's clear that two empty intervals don't > conflict. If somebody wants to claim two intervals conflict, they need to point to at least one > instant in time that is common between them. > > But a major point of a primary key, it seems to me, is that it uniquely identifies a row. If items > are identified by a time range, non-overlapping or not, then the empty range can only identify one > item (per value of whatever other columns are in the primary key). I think for a unique key the > non-overlapping restriction has to be considered an additional restriction on top of the usual > uniqueness restriction. > > I suspect in many applications there will be a non-empty constraint; for example, it seems quite > reasonable to me for a meeting booking system to forbid empty meetings. But when they are allowed > they should behave in the mathematically appropriate way. Finding a way forward for temporal PKs got a lot of discussion at pgconf.dev (thanks especially to Peter Eisentraut and Jeff Davis!), so I wanted to summarize some options and describe what I think is the best approach. First the problem: empty ranges! A temporal PK/UNIQUE constraint is basically an exclusion constraint that is `(id WITH =, valid_at WITH &&)`. But the special 'empty' value never overlaps anything, *including itself*. (Note it has no "position": [3,3) is the same as [4,4).) Since the exclusion constraint forbids overlapping ranges, and empties never overlap, your table can have duplicates. (I'm talking about "literal uniqueness" as discussed in [1].) For instance: CREATE EXTENSION btree_gist; CREATE TABLE t (id int, valid_at daterange, name text); ALTER TABLE t ADD CONSTRAINT tpk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); INSERT INTO t VALUES (1, 'empty', 'foo'); INSERT INTO t VALUES (1, 'empty', 'bar'); Multiranges have the same problem. So what do we do about that? **Option 0**: Allow it but document it. It shouldn't happen in practice: there is no reason for an empty range to get into a temporal table, and it arguably doesn't mean anything. The record is true at no time? But of course it will happen anyway. It's a footgun and will break expectations for at least some. It causes problems for us too. If you say `SELECT name FROM t GROUP BY id, valid_at`, we recognize that `name` is a functional dependency on the PK, so we allow it and give you the first row matching each key. You might get "foo" or you might get "bar". Also the planner uses not-nullable uniqueness to take many shortcuts. I couldn't create any concrete breakage there, but I bet someone else could. PKs that are not literally unique seems like something that would cause headaches for years. **Option 1**: Temporal PKs should automatically create a CHECK constraint that forbids empty ranges. Should UNIQUE constraints too? I'm tempted to say no, since sometimes users surprise us by coming up with new ways to use things. For instance one way to use empty ranges is to reference a temporal table from a non-temporal table, since `'empty' <@ anything` is always true (though this has questionable meaning or practical use). But probably we should forbid empties for UNIQUE constraints too. Forbidding them is more aligned with the SQL standard, which says that when you have a PERIOD, startcol < endcol (not <=). And it feels more consistent to treat both constraints the same way. Finally, if UNIQUEs do allow empties, we still risk confusing our planner. My last patch created these CHECK constraints for PKs (but not UNIQUEs) as INTERNAL dependencies. It's pretty clunky. There are lots of cases to handle, e.g. `ALTER COLUMN c TYPE` may reuse the PK index or may generate a new one. And what if the user already created the same constraint? Seeing all the trouble giving PKs automatic (cataloged) NOT NULL constraints makes me wary about this approach. It's not as bad, since there is no legacy, but it's still more annoying than I expected. Finally, hanging the CHECK constraint off the PK sets us up for problems when we add true PERIODs. Under 11.27 of SQL/Foundation, General Rules 2b says that defining a PERIOD should automatically add a CHECK constraint that startcol < endcol. That is already part of my last patch in this series. But that would be redundant with the constraint from the PK. And attaching the constraint to the PERIOD is a lot simpler than attaching it to the PK. **Option 2**: Add a new operator, called &&&, that works like && except an empty range *does* overlap another empty range. Empty ranges should still not overlap anything else. This would fix the exclusion constraint. You could add `(5, 'empty')` once but not twice. This would allow empties to people who want to use them. (We would still forbid them if you define a PERIOD, because those come with the CHECK constraint mentioned above.) And there is almost nothing to code. But it is mathematically suspect to say an empty range overlaps something small (something with zero width) but not something big. Surely if a && b and b <@ c, then a && c? So this feels like the kind of elegant hack that you eventually regret. **Option 3**: Forbid empties, not as a reified CHECK constraint, but just with some code in the executor. Again we could do just PKs or PKs and UNIQUEs. Let's do both, for all the reasons above. Not creating a CHECK constraint is much less clunky. There is no catalog entry to create/drop. Users don't wonder where it came from when they say `\d t`. It can't conflict with constraints of their own. We would enforce this in ExecConstraints, where we enforce NOT NULL and CHECK constraints, for any table with constraints where conperiod is true. We'd also need to do this check on existing rows when you create a temporal PK/UQ. This option also requires a new field in pg_class: just as we have relchecks, relhasrules, relhastriggers, etc. to let us skip work in the relcache, I assume we'd want relperiods. **Option 4**: Teach GiST indexes to enforce uniqueness. We didn't discuss this at pgconf, at least not in reference to the empties problem. But I was thinking about this request from Matthias for temporal PKs & UQs to support `USING INDEX idx`.[2] It is confusing that a temporal index has indisunique, but if you try to create a unique GiST index directly we say they don't support UNIQUE indexes! Similarly `pg_indexam_has_property(783, 'can_unique')` returns false. There is something muddled about all that. So how about we give the GiST AM handler amcanunique? As I understand it, GiST indexes are capable of uniqueness,[3] and indeed today you can create an exclusion constraint with the same effect, but in the past the core had no way of asking an opclass which operator gave equality. With the stratnum support proc from 6db4598fcb (part of this patch series, but reverted from v17), we could get a known operator for "equals". If the index's opclasses had that sproc and it gave non-zero for RTEqualStrategyNumber, then CREATE UNIQUE INDEX would succeed. We would just ("just") need to make GiST raise an error if it found a duplicate. And if *that* was happening, the empty ranges wouldn't cause a problem. I think Option 3 is good, but I like Option 4 a lot because (1) it doesn't assume ranges & multiranges (2) it allows empties if users have some reason for them (3) since the real problem is duplicates, forbidding them is a more precise solution, (4) it clears up the confusing situation of GiST not being canunique, even though you can create an index with indisunique. OTOH it is probably more work, and it is slower than just forbidding duplicates. (The unique check requires a separate index search, according to [3], as an exclusion constraint would do.) Also if we do it to make GiST be canunique, that can happen separately from the temporal work. So I'm proceeding with Option 3, which at worst can eventually become an optimization for Option 4. I don't think forbidding empty ranges is a great loss to be honest. But if anyone has any feedback, please share: ojections, alternatives, advice---all is welcome. [1] https://www.postgresql.org/message-id/47550967-260b-4180-9791-b224859fe63e%40illuminatedcomputing.com [2] https://www.postgresql.org/message-id/CAEze2Wh21V66udM8cbvBBsAgyQ_5x9nfR0d3sWzbmZk%2B%2Bey7xw%40mail.gmail.com [3] https://dsf.berkeley.edu/papers/sigmod97-gist.pdf Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Thu, May 9, 2024 at 5:44 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > Additionally, because I can't create my own non-constraint-backing > unique GIST indexes, I can't pre-create my unique constraints > CONCURRENTLY as one could do for the non-temporal case We talked about this a bit at pgconf.dev. I would like to implement it, since I agree it is an important workflow to support. Here are some thoughts about what would need to be done. First we could take a small step: allow non-temporal UNIQUE GiST indexes. This is possible according to [1], but in the past we had no way of knowing which strategy number an opclass was using for equality. With the stratnum support proc introduced by 6db4598fcb (reverted for v17), we could change amcanunique to true for the GiST AM handler. If the index's opclasses had that sproc and it gave non-zero for RTEqualStrategyNumber, we would have a reliable "definition of uniqueness". UNIQUE GiST indexes would raise an error if they detected a duplicate record. Incidentally, this would also let us correct the error message about GiST not supporting unique, fixing the problem you raised here: On Sun, May 12, 2024 at 8:51 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 5/12/24 05:55, Matthias van de Meent wrote: > >> > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during); > >> > ERROR: access method "gist" does not support unique indexes > >> > >> To me that error message seems correct. The programmer hasn't said anything about the special > >> temporal behavior they are looking for. > > > > But I showed that I had a GIST index that does have the indisunique > > flag set, which shows that GIST does support indexes with unique > > semantics. > > > > That I can't use CREATE UNIQUE INDEX to create such an index doesn't > > mean the feature doesn't exist, which is what the error message > > implies. > > True, the error message is not really telling the truth anymore. But that is just regular non-temporal indexes. To avoid a long table lock you'd need a way to build the index that is not just unique, but also does exclusion based on &&. We could borrow syntax from SQL:2011 and allow `CREATE INDEX idx ON t (id, valid_at WITHOUT OVERLAPS)`. But since CREATE INDEX is a lower-level concept than a constraint, it'd be better to do something more general. You can already give opclasses for each indexed column. How about allowing operators as well? For instance `CREATE UNIQUE INDEX idx ON t (id WITH =, valid_at WITH &&)`? Then the index would know to enforce those rules. This is the same data we store today in pg_constraint.conexclops. So that would get moved/copied to pg_index (probably moved). Then when you add the constraint, what is the syntax? Today when you say PRIMARY KEY/UNIQUE USING INDEX, you don't give the column names. So how do we know it's WITHOUT OVERLAPS? I guess if the underlying index has (foo WITH = [, bar WITH =], baz WITH &&) we just assume the user wants WITHOUT OVERLAPS, and otherwise they want a regular PK/UQ constraint? In addition this workflow only works if you can CREATE INDEX CONCURRENTLY. I'm not sure yet if we'll have problems there. I noticed that for REINDEX at least, there were plans in 2012 to support exclusion-constraint indexes,[2] but when the patch was committed in 2019 they had been dropped, with plans to add support eventually.[3] Today they are still not supported. Maybe whatever caused problems for REINDEX isn't an issue for just INDEX, but it would take more research to find out. [1] https://dsf.berkeley.edu/papers/sigmod97-gist.pdf [2] Original patch thread from 2012: https://www.postgresql.org/message-id/flat/CAB7nPqS%2BWYN021oQHd9GPe_5dSVcVXMvEBW_E2AV9OOEwggMHw%40mail.gmail.com#e1a372074cfdf37bf9e5b4e29ddf7b2d [3] Revised patch thread, committed in 2019: https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Wed, Jun 5, 2024 at 4:56 PM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > **Option 2**: Add a new operator, called &&&, that works like && except an empty range *does* > overlap another empty range. Empty ranges should still not overlap anything else. This would fix the > exclusion constraint. You could add `(5, 'empty')` once but not twice. This would allow empties to > people who want to use them. (We would still forbid them if you define a PERIOD, because those come > with the CHECK constraint mentioned above.) > And there is almost nothing to code. But it is mathematically suspect to say an empty range overlaps > something small (something with zero width) but not something big. Surely if a && b and b <@ c, then > a && c? So this feels like the kind of elegant hack that you eventually regret. I think this might be fine. > **Option 3**: Forbid empties, not as a reified CHECK constraint, but just with some code in the > executor. Again we could do just PKs or PKs and UNIQUEs. Let's do both, for all the reasons above. > Not creating a CHECK constraint is much less clunky. There is no catalog entry to create/drop. Users > don't wonder where it came from when they say `\d t`. It can't conflict with constraints of their > own. We would enforce this in ExecConstraints, where we enforce NOT NULL and CHECK constraints, for > any table with constraints where conperiod is true. We'd also need to do this check on existing rows > when you create a temporal PK/UQ. This option also requires a new field in pg_class: just as we have > relchecks, relhasrules, relhastriggers, etc. to let us skip work in the relcache, I assume we'd want > relperiods. I don't really like the existing relhasWHATEVER fields and am not very keen about adding more of them. Maybe it will turn out to be the best way, but finding the right times to set and unset such fields has been challenging over the years, and we've had to fix some bugs. So, if you go this route, I recommend looking carefully at whether there's a reasonable way to avoid the need for such a field. Other than that, this idea seems reasonable. > **Option 4**: Teach GiST indexes to enforce uniqueness. We didn't discuss this at pgconf, at least > not in reference to the empties problem. But I was thinking about this request from Matthias for > temporal PKs & UQs to support `USING INDEX idx`.[2] It is confusing that a temporal index has > indisunique, but if you try to create a unique GiST index directly we say they don't support UNIQUE > indexes! Similarly `pg_indexam_has_property(783, 'can_unique')` returns false. There is something > muddled about all that. So how about we give the GiST AM handler amcanunique? > > As I understand it, GiST indexes are capable of uniqueness,[3] and indeed today you can create an > exclusion constraint with the same effect, but in the past the core had no way of asking an opclass > which operator gave equality. With the stratnum support proc from 6db4598fcb (part of this patch > series, but reverted from v17), we could get a known operator for "equals". If the index's opclasses > had that sproc and it gave non-zero for RTEqualStrategyNumber, then CREATE UNIQUE INDEX would > succeed. We would just ("just") need to make GiST raise an error if it found a duplicate. And if > *that* was happening, the empty ranges wouldn't cause a problem. Isn't this just a more hacky version of option (2)? -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, 5 Jun 2024 at 22:57, Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On Thu, May 9, 2024 at 5:44 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > Additionally, because I can't create my own non-constraint-backing > > unique GIST indexes, I can't pre-create my unique constraints > > CONCURRENTLY as one could do for the non-temporal case > > We talked about this a bit at pgconf.dev. I would like to implement it, since I agree it is an > important workflow to support. Here are some thoughts about what would need to be done. > > First we could take a small step: allow non-temporal UNIQUE GiST indexes. This is possible according > to [1], but in the past we had no way of knowing which strategy number an opclass was using for > equality. With the stratnum support proc introduced by 6db4598fcb (reverted for v17), we could > change amcanunique to true for the GiST AM handler. If the index's opclasses had that sproc and it > gave non-zero for RTEqualStrategyNumber, we would have a reliable "definition of uniqueness". UNIQUE > GiST indexes would raise an error if they detected a duplicate record. Cool. > But that is just regular non-temporal indexes. To avoid a long table lock you'd need a way to build > the index that is not just unique, but also does exclusion based on &&. We could borrow syntax from > SQL:2011 and allow `CREATE INDEX idx ON t (id, valid_at WITHOUT OVERLAPS)`. But since CREATE INDEX > is a lower-level concept than a constraint, it'd be better to do something more general. You can > already give opclasses for each indexed column. How about allowing operators as well? For instance > `CREATE UNIQUE INDEX idx ON t (id WITH =, valid_at WITH &&)`? Then the index would know to enforce > those rules. I think this looks fine. I'd like it even better if we could default to the equality operator that's used by the type's default btree opclass in this syntax; that'd make CREATE UNIQUE INDEX much less awkward for e.g. hash indexes. > This is the same data we store today in pg_constraint.conexclops. So that would get > moved/copied to pg_index (probably moved). I'd keep the pg_constraint.conexclops around: People are inevitably going to want to keep the current exclusion constraints' handling of duplicate empty ranges, which is different from expectations we see for UNIQUE INDEX's handling. > Then when you add the constraint, what is the syntax? Today when you say PRIMARY KEY/UNIQUE USING > INDEX, you don't give the column names. So how do we know it's WITHOUT OVERLAPS? I guess if the > underlying index has (foo WITH = [, bar WITH =], baz WITH &&) we just assume the user wants WITHOUT > OVERLAPS, and otherwise they want a regular PK/UQ constraint? Presumably you would know this based on the pg_index.indisunique flag? > In addition this workflow only works if you can CREATE INDEX CONCURRENTLY. I'm not sure yet if we'll > have problems there. I noticed that for REINDEX at least, there were plans in 2012 to support > exclusion-constraint indexes,[2] but when the patch was committed in 2019 they had been dropped, > with plans to add support eventually.[3] Today they are still not supported. Maybe whatever caused > problems for REINDEX isn't an issue for just INDEX, but it would take more research to find out. I don't quite see where exclusion constraints get into the picture? Isn't this about unique indexes, not exclusion constraints? I understand exclusion constraints are backed by indexes, but that doesn't have to make it a unique index, right? I mean, currently, you can write an exclusion constraint that makes sure that all rows with a certain prefix have the same suffix columns (given a btree-esque index type with <> -operator support), which seems exactly opposite of what unique indexes should do. Kind regards, Matthias van de Meent Neon (https://neon.tech)
On 6/12/24 07:31, Robert Haas wrote: > On Wed, Jun 5, 2024 at 4:56 PM Paul Jungwirth >> **Option 3**: Forbid empties, not as a reified CHECK constraint, but just with some code in the >> executor. Again we could do just PKs or PKs and UNIQUEs. Let's do both, for all the reasons above. >> Not creating a CHECK constraint is much less clunky. There is no catalog entry to create/drop. Users >> don't wonder where it came from when they say `\d t`. It can't conflict with constraints of their >> own. We would enforce this in ExecConstraints, where we enforce NOT NULL and CHECK constraints, for >> any table with constraints where conperiod is true. We'd also need to do this check on existing rows >> when you create a temporal PK/UQ. This option also requires a new field in pg_class: just as we have >> relchecks, relhasrules, relhastriggers, etc. to let us skip work in the relcache, I assume we'd want >> relperiods. > > I don't really like the existing relhasWHATEVER fields and am not very > keen about adding more of them. Maybe it will turn out to be the best > way, but finding the right times to set and unset such fields has been > challenging over the years, and we've had to fix some bugs. So, if you > go this route, I recommend looking carefully at whether there's a > reasonable way to avoid the need for such a field. Other than that, > this idea seems reasonable. Here is a reworked patch series following Option 3: rather than using a cataloged CHECK constraint, we just do the check in the executor (but in the same place we do CHECK constraints). We also make sure existing rows are empty-free when you add the index. I took the reverted commits from v17, squashed the minor fixes, rebased everything, and added a new patch to forbid empty ranges/multiranges wherever there is a WITHOUT OVERLAPS constraint. It comes right after the PK patch in the series. I don't intend it to be committed separately, but I thought it would make review easier, since the other code has been reviewed a lot already. I did add a relperiods column, but I have a mostly-complete branch here (not included in the patches) that does without. Not maintaining that new column is simpler for sure. The consequence is that the relcache must scan for WITHOUT OVERLAPS constraints on every table. That seems like a high performance cost for a feature most databases won't use. Since we try hard to avoid that kind of thing (e.g. [1]), I thought adding relperiods would be preferred. If that's the wrong tradeoff I can change it. One idea I considered was to include WITHOUT OVERLAPS constraints in the relchecks count. But that feels pretty hacky, and it is harder than it sounds, since index constraints are handled pretty far from where we update relchecks now. It doesn't save any complexity (but rather makes it worse), so the only reason to do it would be to avoid expanding pg_class records. These patches still add some if-clauses to psql and pg_dump that say `if (fout->remoteVersion >= 170000)`. But if I change them to 180000 I get failures in e.g. the pg_dump tests. What do other people do here before a release is cut? Rebased on 3e53492aa7. [1] https://github.com/postgres/postgres/blob/5d6c64d290978dab76c00460ba809156874be035/src/backend/utils/cache/relcache.c#L688-L713 Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v34-0001-Add-stratnum-GiST-support-function.patch
- v34-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
- v34-0003-Forbid-empty-ranges-multiranges-in-WITHOUT-OVERL.patch
- v34-0004-Add-temporal-FOREIGN-KEY-contraints.patch
- v34-0005-Add-support-funcs-for-FOR-PORTION-OF.patch
- v34-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v34-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v34-0008-Add-PERIODs.patch
On Thu, Jun 27, 2024 at 5:56 PM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > I did add a relperiods column, but I have a mostly-complete branch here (not included in the > patches) that does without. Not maintaining that new column is simpler for sure. The consequence is > that the relcache must scan for WITHOUT OVERLAPS constraints on every table. That seems like a high > performance cost for a feature most databases won't use. Since we try hard to avoid that kind of > thing (e.g. [1]), I thought adding relperiods would be preferred. If that's the wrong tradeoff I can > change it. I'm sure that you are right that nobody is going to like an extra index scan just to find periods. So, suppose we do as you propose and add relperiods. In the situation where we are adding the first period (or whatever the right term is) to the table, what kind of lock are we holding on the table? Conversely, when we drop the last period, what kind of lock are we holding on the table? If, hypothetically, both answers were AccessExclusiveLock, this might not be too bad, but if you say "ShareLock" then we've got a lot of problems; that's not even self-exclusive. > These patches still add some if-clauses to psql and pg_dump that say `if (fout->remoteVersion >= > 170000)`. But if I change them to 180000 I get failures in e.g. the pg_dump tests. What do other > people do here before a release is cut? Sometimes I make a commit that bumps the version number (update major version in src/tools/version_stamp.pl, then run it, then run autoconf, then commit). Then I build my patch set on top of that. Once the actual major release bump happens, I just drop that commit from the stack. -- Robert Haas EDB: http://www.enterprisedb.com
Here is v35 of this patch series, with a few small changes. I renamed relperiods to relwithoutoverlaps, since that is more accurate about what we're counting. (PERIODs come in a later patch and we don't need to count them.) Also I cleaned up the branches in psql/pg_dump on version now that we're officially on v18. On 6/28/24 05:18, Robert Haas wrote: > On Thu, Jun 27, 2024 at 5:56 PM Paul Jungwirth > <pj@illuminatedcomputing.com> wrote: >> I did add a relperiods column, but I have a mostly-complete branch here (not included in the >> patches) that does without. Not maintaining that new column is simpler for sure. The consequence is >> that the relcache must scan for WITHOUT OVERLAPS constraints on every table. That seems like a high >> performance cost for a feature most databases won't use. Since we try hard to avoid that kind of >> thing (e.g. [1]), I thought adding relperiods would be preferred. If that's the wrong tradeoff I can >> change it. > > I'm sure that you are right that nobody is going to like an extra > index scan just to find periods. So, suppose we do as you propose and > add relperiods. In the situation where we are adding the first period > (or whatever the right term is) to the table, what kind of lock are we > holding on the table? Conversely, when we drop the last period, what > kind of lock are we holding on the table? If, hypothetically, both > answers were AccessExclusiveLock, this might not be too bad, but if > you say "ShareLock" then we've got a lot of problems; that's not even > self-exclusive. This happens when creating a PRIMARY KEY or UNIQUE constraint, so we already have an AccessExclusiveLock on the table (whether creating or dropping). If we ever supported CREATE INDEX CONCURRENTLY for this, we would need to be careful about where we update the new field, but today we don't support that for exclusion constraints. Rebased to 4b211003ec. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v35-0001-Add-stratnum-GiST-support-function.patch
- v35-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
- v35-0003-Forbid-empty-ranges-multiranges-in-WITHOUT-OVERL.patch
- v35-0004-Add-temporal-FOREIGN-KEY-contraints.patch
- v35-0005-Add-support-funcs-for-FOR-PORTION-OF.patch
- v35-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v35-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v35-0008-Add-PERIODs.patch
On Thu, Jun 6, 2024 at 4:56 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 5/21/24 11:27, Isaac Morland wrote: > > On Tue, 21 May 2024 at 13:57, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote: > > > > What I think is less clear is what that means for temporal primary > > keys. As Paul pointed out upthread, in every other case, a temporal > > primary key is at least as unique as a regular primary key, but in > > this case, it isn't. And someone might reasonably think that a > > temporal primary key should exclude empty ranges just as all primary > > keys exclude nulls. Or they might think the opposite. > > > > > > Fascinating. I think you're absolutely right that it's clear that two empty intervals don't > > conflict. If somebody wants to claim two intervals conflict, they need to point to at least one > > instant in time that is common between them. > > > > But a major point of a primary key, it seems to me, is that it uniquely identifies a row. If items > > are identified by a time range, non-overlapping or not, then the empty range can only identify one > > item (per value of whatever other columns are in the primary key). I think for a unique key the > > non-overlapping restriction has to be considered an additional restriction on top of the usual > > uniqueness restriction. > > > > I suspect in many applications there will be a non-empty constraint; for example, it seems quite > > reasonable to me for a meeting booking system to forbid empty meetings. But when they are allowed > > they should behave in the mathematically appropriate way. > > Finding a way forward for temporal PKs got a lot of discussion at pgconf.dev (thanks especially to > Peter Eisentraut and Jeff Davis!), so I wanted to summarize some options and describe what I think > is the best approach. > > First the problem: empty ranges! A temporal PK/UNIQUE constraint is basically an exclusion > constraint that is `(id WITH =, valid_at WITH &&)`. But the special 'empty' value never overlaps > anything, *including itself*. (Note it has no "position": [3,3) is the same as [4,4).) Since the > exclusion constraint forbids overlapping ranges, and empties never overlap, your table can have > duplicates. (I'm talking about "literal uniqueness" as discussed in [1].) For instance: > > CREATE EXTENSION btree_gist; > CREATE TABLE t (id int, valid_at daterange, name text); > ALTER TABLE t ADD CONSTRAINT tpk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); > INSERT INTO t VALUES (1, 'empty', 'foo'); > INSERT INTO t VALUES (1, 'empty', 'bar'); > > Multiranges have the same problem. So what do we do about that? > > **Option 0**: Allow it but document it. It shouldn't happen in practice: there is no reason for an > empty range to get into a temporal table, and it arguably doesn't mean anything. The record is true > at no time? But of course it will happen anyway. It's a footgun and will break expectations for at > least some. > > It causes problems for us too. If you say `SELECT name FROM t GROUP BY id, valid_at`, we recognize > that `name` is a functional dependency on the PK, so we allow it and give you the first row matching > each key. You might get "foo" or you might get "bar". Also the planner uses not-nullable uniqueness > to take many shortcuts. I couldn't create any concrete breakage there, but I bet someone else could. > PKs that are not literally unique seems like something that would cause headaches for years. > > **Option 1**: Temporal PKs should automatically create a CHECK constraint that forbids empty ranges. > Should UNIQUE constraints too? I'm tempted to say no, since sometimes users surprise us by coming up > with new ways to use things. For instance one way to use empty ranges is to reference a temporal > table from a non-temporal table, since `'empty' <@ anything` is always true (though this has > questionable meaning or practical use). But probably we should forbid empties for UNIQUE constraints > too. Forbidding them is more aligned with the SQL standard, which says that when you have a PERIOD, > startcol < endcol (not <=). And it feels more consistent to treat both constraints the same way. > Finally, if UNIQUEs do allow empties, we still risk confusing our planner. > > My last patch created these CHECK constraints for PKs (but not UNIQUEs) as INTERNAL dependencies. > It's pretty clunky. There are lots of cases to handle, e.g. `ALTER COLUMN c TYPE` may reuse the PK > index or may generate a new one. And what if the user already created the same constraint? Seeing > all the trouble giving PKs automatic (cataloged) NOT NULL constraints makes me wary about this > approach. It's not as bad, since there is no legacy, but it's still more annoying than I expected. > > Finally, hanging the CHECK constraint off the PK sets us up for problems when we add true PERIODs. > Under 11.27 of SQL/Foundation, General Rules 2b says that defining a PERIOD should automatically add > a CHECK constraint that startcol < endcol. That is already part of my last patch in this series. But > that would be redundant with the constraint from the PK. And attaching the constraint to the PERIOD > is a lot simpler than attaching it to the PK. > > **Option 2**: Add a new operator, called &&&, that works like && except an empty range *does* > overlap another empty range. Empty ranges should still not overlap anything else. This would fix the > exclusion constraint. You could add `(5, 'empty')` once but not twice. This would allow empties to > people who want to use them. (We would still forbid them if you define a PERIOD, because those come > with the CHECK constraint mentioned above.) > And there is almost nothing to code. But it is mathematically suspect to say an empty range overlaps > something small (something with zero width) but not something big. Surely if a && b and b <@ c, then > a && c? So this feels like the kind of elegant hack that you eventually regret. > > **Option 3**: Forbid empties, not as a reified CHECK constraint, but just with some code in the > executor. Again we could do just PKs or PKs and UNIQUEs. Let's do both, for all the reasons above. > Not creating a CHECK constraint is much less clunky. There is no catalog entry to create/drop. Users > don't wonder where it came from when they say `\d t`. It can't conflict with constraints of their > own. We would enforce this in ExecConstraints, where we enforce NOT NULL and CHECK constraints, for > any table with constraints where conperiod is true. We'd also need to do this check on existing rows > when you create a temporal PK/UQ. This option also requires a new field in pg_class: just as we have > relchecks, relhasrules, relhastriggers, etc. to let us skip work in the relcache, I assume we'd want > relperiods. > > **Option 4**: Teach GiST indexes to enforce uniqueness. We didn't discuss this at pgconf, at least > not in reference to the empties problem. But I was thinking about this request from Matthias for > temporal PKs & UQs to support `USING INDEX idx`.[2] It is confusing that a temporal index has > indisunique, but if you try to create a unique GiST index directly we say they don't support UNIQUE > indexes! Similarly `pg_indexam_has_property(783, 'can_unique')` returns false. There is something > muddled about all that. So how about we give the GiST AM handler amcanunique? > I think we can Forbid empties,not not mess with pg_class. to make the communication smooth, i've set the base commit to 46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92 {Add temporal PRIMARY KEY and UNIQUE constraints} https://git.postgresql.org/cgit/postgresql.git/commit/?id=46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92 you can git reset --hard 46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92 then apply the attached patch. I hope I understand it correctly. previously revert is only because the special value: empty. i tried to use the operator &&&, new gist strategy number, pg_amop entry to solve the problem. Now with the applied patch, if the range column is specified WITHOUT OVERLAPS, then this column is not allowed to have any empty range value. logic work through: * duplicate logic of range_overlaps but disallow empty value. also have the operator &&&, (almost equivalent to &&) * add new gist strategy number * thanks to add stratnum GiST support function (https://git.postgresql.org/cgit/postgresql.git/commit/?id=6db4598fcb82a87a683c4572707e522504830a2b) now we can set the strategy number to the mapped new function (equivalent to range_overlaps, but error out empty value) * in ComputeIndexAttrs, set the strategy number to the newly created StrategyNumber in "else if (iswithoutoverlaps)" block. * Similarly refactor src/backend/utils/adt/rangetypes_gist.c make the index value validation using newly created function. function name, error message maybe not great now, but it works. ------full demo, also see the comments. DROP TABLE if exists temporal_rng; CREATE TABLE temporal_rng (id int4range, valid_at tsrange); ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); --should be fine. INSERT INTO temporal_rng VALUES ('empty', '[2022-01-01,2022-01-02]'); --will error out, period column, empty range not allowed INSERT INTO temporal_rng VALUES ('[3,3]', 'empty'); ALTER TABLE temporal_rng DROP CONSTRAINT temporal_rng_pk; --period constraint dropped, now should be fine. INSERT INTO temporal_rng VALUES ('[3,3]', 'empty'); --reinstall constraint, should error out --because existing one row has empty value. ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS); delete from temporal_rng where id = '[3,3]'; --reinstall constraint, should be fine, because empty value removed. ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
Attachment
On 7/9/24 00:15, jian he wrote: >> **Option 2**: Add a new operator, called &&&, that works like && except an empty range *does* >> overlap another empty range. Empty ranges should still not overlap anything else. This would fix the >> exclusion constraint. You could add `(5, 'empty')` once but not twice. This would allow empties to >> people who want to use them. (We would still forbid them if you define a PERIOD, because those come >> with the CHECK constraint mentioned above.) >> And there is almost nothing to code. But it is mathematically suspect to say an empty range overlaps >> something small (something with zero width) but not something big. Surely if a && b and b <@ c, then >> a && c? So this feels like the kind of elegant hack that you eventually regret. > I think we can Forbid empties,not not mess with pg_class. > > to make the communication smooth, i've set the base commit to > 46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92 > {Add temporal PRIMARY KEY and UNIQUE constraints} > https://git.postgresql.org/cgit/postgresql.git/commit/?id=46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92 > you can git reset --hard 46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92 > then apply the attached patch. > > I hope I understand it correctly. > previously revert is only because the special value: empty. > i tried to use the operator &&&, new gist strategy number, pg_amop > entry to solve the problem. > Now with the applied patch, if the range column is specified WITHOUT OVERLAPS, > then this column is not allowed to have any empty range value. > > logic work through: > * duplicate logic of range_overlaps but disallow empty value. also > have the operator &&&, (almost equivalent to &&) > * add new gist strategy number > * thanks to add stratnum GiST support function > (https://git.postgresql.org/cgit/postgresql.git/commit/?id=6db4598fcb82a87a683c4572707e522504830a2b) > now we can set the strategy number to the mapped new function > (equivalent to range_overlaps, but error out empty value) > * in ComputeIndexAttrs, set the strategy number to the newly created > StrategyNumber in "else if (iswithoutoverlaps)" block. > * Similarly refactor src/backend/utils/adt/rangetypes_gist.c make the > index value validation using newly created function. I like this approach a lot, but I'd like to hear what some other people think? Jian he's &&& operator is similar to what I proposed upthread, but when either operand is an empty value it simply raises an error. (It should be an ereport, not an elog, and I think multirange_overlaps_multirange_internal is missing the empty check, but I can clean things up when I integrate it into the patch series.) This is much simpler than everything I'm doing: checking for empties in the executor phase, adding a field to pg_class, setting things in the relcache, and checking for empties in existing rows when you add an index. This patch uses existing infrastructure to do all the work. It seems like a much cleaner solution. Unlike my proposed &&& operator, it doesn't have weird mathematical consequences. At first I thought raising an error was not great, but it's the same thing you get when you divide by zero. It's fine for an operator to have a restricted domain of inputs. And we would only use this internally for primary keys and unique constraints, where indeed raising an error is just what we want. If I don't hear objections (or think of something myself :-), I'm inclined to use this approach. But what do people think? Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 7/17/24 20:34, Paul Jungwirth wrote: > I like this approach a lot, but I'd like to hear what some other people think? > > Jian he's &&& operator is similar to what I proposed upthread, but when either operand is an empty > value it simply raises an error. (It should be an ereport, not an elog, and I think > multirange_overlaps_multirange_internal is missing the empty check, but I can clean things up when I > integrate it into the patch series.) I thought of a possible problem: this operator works great if there are already rows in the table, but what if the *first row you insert* has an empty range? Then there is nothing to compare against, so the operator will never be used. Right? Except when I test it, it still works! After running `make installcheck`, I did this: regression=# truncate temporal_rng cascade; NOTICE: truncate cascades to table "temporal_fk_rng2rng" TRUNCATE TABLE regression=# insert into temporal_rng values ('[1,2)', 'empty'); ERROR: range cannot be empty My mental model must be wrong. Can anyone explain what is happening there? Is it something we can depend on? So I swapped in the &&& patch, cleaned it up, and added tests. But something is wrong. After I get one failure from an empty, I keep getting failures, even though the table is empty: regression=# truncate temporal_rng cascade; NOTICE: truncate cascades to table "temporal_fk_rng2rng" TRUNCATE TABLE regression=# insert into temporal_rng values ('[1,2)', '[2000-01-01,2010-01-01)'); -- ok so far INSERT 0 1 regression=# insert into temporal_rng values ('[1,2)', 'empty'); -- should fail and does ERROR: range cannot be empty regression=# insert into temporal_rng values ('[1,2)', '[2010-01-01,2020-01-01)'); -- uh oh ERROR: range cannot be empty regression=# truncate temporal_rng cascade; NOTICE: truncate cascades to table "temporal_fk_rng2rng" TRUNCATE TABLE regression=# insert into temporal_rng values ('[1,2)', '[2000-01-01,2010-01-01)'); -- ok so far INSERT 0 1 regression=# insert into temporal_rng values ('[1,2)', '[2010-01-01,2020-01-01)'); -- ok now INSERT 0 1 It looks like the index is getting corrupted. Continuing from the above: regression=# create extension pageinspect; CREATE EXTENSION regression=# select gist_page_items(get_raw_page('temporal_rng_pk', 0), 'temporal_rng_pk'); gist_page_items ---------------------------------------------------------------------------- (1,"(0,1)",40,f,"(id, valid_at)=(""[1,2)"", ""[2000-01-01,2010-01-01)"")") (2,"(0,2)",40,f,"(id, valid_at)=(""[1,2)"", ""[2010-01-01,2020-01-01)"")") (2 rows) regression=# insert into temporal_rng values ('[1,2)', 'empty'); ERROR: range cannot be empty regression=# select gist_page_items(get_raw_page('temporal_rng_pk', 0), 'temporal_rng_pk'); gist_page_items ---------------------------------------------------------------------------- (1,"(0,1)",40,f,"(id, valid_at)=(""[1,2)"", ""[2000-01-01,2010-01-01)"")") (2,"(0,2)",40,f,"(id, valid_at)=(""[1,2)"", ""[2010-01-01,2020-01-01)"")") (3,"(0,3)",32,f,"(id, valid_at)=(""[1,2)"", empty)") (3 rows) So maybe this is a bad place to ereport? Or is this a deeper bug with GiST? Here is where we're doing it: #0 range_nonempty_overlaps_internal (typcache=0x635a7fbf67f0, r1=0x635a7fc11f20, r2=0x635a7fc11f40) at rangetypes.c:876 #1 0x0000635a7f06175d in range_gist_consistent_leaf_range (typcache=0x635a7fbf67f0, strategy=31, key=0x635a7fc11f20, query=0x635a7fc11f40) at rangetypes_gist.c:1076 #2 0x0000635a7f05fc9a in range_gist_consistent (fcinfo=0x7ffcd20f9f60) at rangetypes_gist.c:216 #3 0x0000635a7f12d780 in FunctionCall5Coll (flinfo=0x635a7fb44eb8, collation=0, arg1=140723832725648, arg2=109240340727454, arg3=31, arg4=0, arg5=140723832725567) at fmgr.c:1242 #4 0x0000635a7e999af6 in gistindex_keytest (scan=0x635a7fb44d50, tuple=0x7d155c0a3fd0, page=0x7d155c0a2000 "", offset=1, recheck_p=0x7ffcd20fa129, recheck_distances_p=0x7ffcd20fa12a) at gistget.c:221 #5 0x0000635a7e99a109 in gistScanPage (scan=0x635a7fb44d50, pageItem=0x7ffcd20fa1e0, myDistances=0x0, tbm=0x0, ntids=0x0) at gistget.c:436 #6 0x0000635a7e99a797 in gistgettuple (scan=0x635a7fb44d50, dir=ForwardScanDirection) at gistget.c:637 #7 0x0000635a7e9e4d38 in index_getnext_tid (scan=0x635a7fb44d50, direction=ForwardScanDirection) at indexam.c:590 #8 0x0000635a7e9e4f7d in index_getnext_slot (scan=0x635a7fb44d50, direction=ForwardScanDirection, slot=0x635a7fb44950) at indexam.c:682 #9 0x0000635a7ec5690b in check_exclusion_or_unique_constraint (heap=0x7d1560cea348, index=0x7d1560cedd98, indexInfo=0x635a7fb44c40, tupleid=0x635a7fb44580, values=0x7ffcd20faf00, isnull=0x7ffcd20faee0, estate=0x635a7fb434a0, newIndex=false, waitMode=CEOUC_WAIT, violationOK=false, conflictTid=0x0) at execIndexing.c:780 #10 0x0000635a7ec55c58 in ExecInsertIndexTuples (resultRelInfo=0x635a7fb43930, slot=0x635a7fb44550, estate=0x635a7fb434a0, update=false, noDupErr=false, specConflict=0x0, arbiterIndexes=0x0, onlySummarizing=false) at execIndexing.c:483 #11 0x0000635a7eca38a2 in ExecInsert (context=0x7ffcd20fb1b0, resultRelInfo=0x635a7fb43930, slot=0x635a7fb44550, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1145 Is there anything I can do to save this &&& idea? I've attached the patches I'm working with, rebased to cd85ae1114. If ereport just won't work, then I might explore other definitions of a &&& operator. It was really nice to have such a clean solution. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On 7/18/24 11:39, Paul Jungwirth wrote: > So I swapped in the &&& patch, cleaned it up, and added tests. But something is wrong. After I get > one failure from an empty, I keep getting failures, even though the table is empty: > > regression=# truncate temporal_rng cascade; > NOTICE: truncate cascades to table "temporal_fk_rng2rng" > TRUNCATE TABLE > regression=# insert into temporal_rng values ('[1,2)', '[2000-01-01,2010-01-01)'); -- ok so far > INSERT 0 1 > regression=# insert into temporal_rng values ('[1,2)', 'empty'); -- should fail and does > ERROR: range cannot be empty > regression=# insert into temporal_rng values ('[1,2)', '[2010-01-01,2020-01-01)'); -- uh oh > ERROR: range cannot be empty > regression=# truncate temporal_rng cascade; > NOTICE: truncate cascades to table "temporal_fk_rng2rng" > TRUNCATE TABLE > regression=# insert into temporal_rng values ('[1,2)', '[2000-01-01,2010-01-01)'); -- ok so far > INSERT 0 1 > regression=# insert into temporal_rng values ('[1,2)', '[2010-01-01,2020-01-01)'); -- ok now > INSERT 0 1 > > It looks like the index is getting corrupted. Continuing from the above: > > regression=# create extension pageinspect; > CREATE EXTENSION > regression=# select gist_page_items(get_raw_page('temporal_rng_pk', 0), 'temporal_rng_pk'); > gist_page_items > ---------------------------------------------------------------------------- > (1,"(0,1)",40,f,"(id, valid_at)=(""[1,2)"", ""[2000-01-01,2010-01-01)"")") > (2,"(0,2)",40,f,"(id, valid_at)=(""[1,2)"", ""[2010-01-01,2020-01-01)"")") > (2 rows) > > regression=# insert into temporal_rng values ('[1,2)', 'empty'); > ERROR: range cannot be empty > regression=# select gist_page_items(get_raw_page('temporal_rng_pk', 0), 'temporal_rng_pk'); > gist_page_items > ---------------------------------------------------------------------------- > (1,"(0,1)",40,f,"(id, valid_at)=(""[1,2)"", ""[2000-01-01,2010-01-01)"")") > (2,"(0,2)",40,f,"(id, valid_at)=(""[1,2)"", ""[2010-01-01,2020-01-01)"")") > (3,"(0,3)",32,f,"(id, valid_at)=(""[1,2)"", empty)") > (3 rows) I realized this isn't index corruption, just MVCC. The exclusion constraint is checked after we update the index, which is why the row gets left behind. But it doesn't cause any wrong answers, and if you vacuum the table the row goes away. This also explains my confusion here: > I thought of a possible problem: this operator works great if there are already rows in the table, > but what if the *first row you insert* has an empty range? Then there is nothing to compare against, > so the operator will never be used. Right? > > Except when I test it, it still works! The first row still does a comparison because when we check the exclusion constraint, there is a comparison between the query and the key we just inserted. (When I say "query" I don't mean a SQL query, but the value used to search the index that is compared against its keys.) So I'm glad I didn't stumble on a GiST bug, but I think it means ereporting from an exclusion operator is not a workable approach. Failures leave behind invalid tuples, and future (valid) tuples can fail if we compare to those invalid tuples. Since MVCC visibility is stored in the heap, not in the index, it's not really accessible to us here. So far I don't have any ideas to rescue this idea, even though I like it a lot. So I will go back to the executor idea we discussed at pgconf.dev. One tempting alternative though is to let exclusion constraints do the not-empty check, instead of putting it in the executor. It would be an extra check we do only when the constraint has pg_constraint.conperiod. Then we don't need to add & maintain pg_class.relwithoutoverlaps, and we don't need a relcache change, and we don't need so much extra code to check existing rows when you add the constraint. It doesn't use the existing available exclusion constraint functionality, but if we're willing to extend the executor to know about WITHOUT OVERLAPS, I guess we could teach exclusion constraints about it instead. Doing the check there does seem to have better locality with the feature. So I think I will try that out as well. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Wed, Jul 24, 2024 at 12:08 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 7/18/24 11:39, Paul Jungwirth wrote: > > So I swapped in the &&& patch, cleaned it up, and added tests. But something is wrong. After I get > > one failure from an empty, I keep getting failures, even though the table is empty: > > > > regression=# truncate temporal_rng cascade; > > NOTICE: truncate cascades to table "temporal_fk_rng2rng" > > TRUNCATE TABLE > > regression=# insert into temporal_rng values ('[1,2)', '[2000-01-01,2010-01-01)'); -- ok so far > > INSERT 0 1 > > regression=# insert into temporal_rng values ('[1,2)', 'empty'); -- should fail and does > > ERROR: range cannot be empty > > regression=# insert into temporal_rng values ('[1,2)', '[2010-01-01,2020-01-01)'); -- uh oh > > ERROR: range cannot be empty > > regression=# truncate temporal_rng cascade; > > NOTICE: truncate cascades to table "temporal_fk_rng2rng" > > TRUNCATE TABLE > > regression=# insert into temporal_rng values ('[1,2)', '[2000-01-01,2010-01-01)'); -- ok so far > > INSERT 0 1 > > regression=# insert into temporal_rng values ('[1,2)', '[2010-01-01,2020-01-01)'); -- ok now > > INSERT 0 1 > > > > It looks like the index is getting corrupted. Continuing from the above: > > > > regression=# create extension pageinspect; > > CREATE EXTENSION > > regression=# select gist_page_items(get_raw_page('temporal_rng_pk', 0), 'temporal_rng_pk'); > > gist_page_items > > ---------------------------------------------------------------------------- > > (1,"(0,1)",40,f,"(id, valid_at)=(""[1,2)"", ""[2000-01-01,2010-01-01)"")") > > (2,"(0,2)",40,f,"(id, valid_at)=(""[1,2)"", ""[2010-01-01,2020-01-01)"")") > > (2 rows) > > > > regression=# insert into temporal_rng values ('[1,2)', 'empty'); > > ERROR: range cannot be empty > > regression=# select gist_page_items(get_raw_page('temporal_rng_pk', 0), 'temporal_rng_pk'); > > gist_page_items > > ---------------------------------------------------------------------------- > > (1,"(0,1)",40,f,"(id, valid_at)=(""[1,2)"", ""[2000-01-01,2010-01-01)"")") > > (2,"(0,2)",40,f,"(id, valid_at)=(""[1,2)"", ""[2010-01-01,2020-01-01)"")") > > (3,"(0,3)",32,f,"(id, valid_at)=(""[1,2)"", empty)") > > (3 rows) > > I realized this isn't index corruption, just MVCC. The exclusion constraint is checked after we > update the index, which is why the row gets left behind. But it doesn't cause any wrong answers, and > if you vacuum the table the row goes away. > > This also explains my confusion here: > > > I thought of a possible problem: this operator works great if there are already rows in the table, > > but what if the *first row you insert* has an empty range? Then there is nothing to compare against, > > so the operator will never be used. Right? > > > > Except when I test it, it still works! > > The first row still does a comparison because when we check the exclusion constraint, there is a > comparison between the query and the key we just inserted. (When I say "query" I don't mean a SQL > query, but the value used to search the index that is compared against its keys.) > > So I'm glad I didn't stumble on a GiST bug, but I think it means ereporting from an exclusion operator > is not a workable approach. Failures leave behind invalid tuples, and future (valid) tuples can fail if > we compare to those invalid tuples. Since MVCC visibility is stored in the heap, not in the index, it's > not really accessible to us here. So far I don't have any ideas to rescue this idea, even though I like > it a lot. So I will go back to the executor idea we discussed at pgconf.dev. > another kind of crazy idea. instead of "ERROR: range cannot be empty" let it return true. so 'empty'::int4range &&& 'empty'; return true. one downside is, if your first row period column is empty, then you can not insert any new rows that have the same non-period key column. for example: drop table if exists temporal_rng1 ; CREATE TABLE temporal_rng1 ( id int4range, valid_at int4range, CONSTRAINT temporal_rng1_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) ); insert into temporal_rng1 values ('[1,2]', 'empty'); In this context, now, you cannot insert any new rows whose id is equal to '[1,2]'. ----but if your first row is not empty, then you won't have empty. truncate temporal_rng1; insert into temporal_rng1 values ('[1,2]', '[3,4]'); then insert into temporal_rng1 values ('[1,2]', 'empty'); --will fail. In summary, you will have exactly one empty, no other values (if the first row is empty). or you will have values and not empty values at all.
On 7/23/24 09:08, Paul Jungwirth wrote: > One tempting alternative though is to let exclusion constraints do the not-empty check, instead of > putting it in the executor. It would be an extra check we do only when the constraint has > pg_constraint.conperiod. Then we don't need to add & maintain pg_class.relwithoutoverlaps, and we don't > need a relcache change, and we don't need so much extra code to check existing rows when you add the > constraint. It doesn't use the existing available exclusion constraint functionality, but if we're > willing to extend the executor to know about WITHOUT OVERLAPS, I guess we could teach exclusion > constraints about it instead. Doing the check there does seem to have better locality with the feature. > So I think I will try that out as well. Here is a patch moving the not-empty check into check_exclusion_or_unique_constraint. That is a more logical place for it than ExecConstraints, since WITHOUT OVERLAPS is part of the index constraint (not a CHECK constraint). At that point we've already looked up all the information we need. So there is no extra cost for non-temporal tables, and no need to change pg_class or add to the relcache. Also putting it there means we don't need any extra code to enforce non-empties when we build the index or do anything else with it. I think this is the nicest solution we can expect. It is even cleaner than the &&& ideas. So hopefully this gets us back to where we were when we decided to commit PKs & FKs to v17. As before, I've left the nonempty check as a separate patch to make reviewing easier, but when committing I would squash it with the PK patch. Rebased to 05faf06e9c. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v37-0001-Add-stratnum-GiST-support-function.patch
- v37-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
- v37-0003-Forbid-empty-ranges-multiranges-in-WITHOUT-OVERL.patch
- v37-0004-Add-temporal-FOREIGN-KEY-contraints.patch
- v37-0005-Add-support-funcs-for-FOR-PORTION-OF.patch
- v37-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v37-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v37-0008-Add-PERIODs.patch
On 7/25/24 08:52, Paul Jungwirth wrote: > Here is a patch moving the not-empty check into check_exclusion_or_unique_constraint. That is a more > logical place for it than ExecConstraints, since WITHOUT OVERLAPS is part of the index constraint > (not a CHECK constraint). At that point we've already looked up all the information we need. So > there is no extra cost for non-temporal tables, and no need to change pg_class or add to the > relcache. Also putting it there means we don't need any extra code to enforce non-empties when we > build the index or do anything else with it. > > I think this is the nicest solution we can expect. It is even cleaner than the &&& ideas. So > hopefully this gets us back to where we were when we decided to commit PKs & FKs to v17. > > As before, I've left the nonempty check as a separate patch to make reviewing easier, but when > committing I would squash it with the PK patch. Hello, Here is an updated set of patches, rebased because the old patches no longer applied. Also I have a question about foreign key RESTRICT behavior and the SQL spec. I added some tests for a particular condition: there are two adjacent referenced rows (sharing a scalar key part), and a single referencing row whose time spans the transition between the referenced rows. So graphing the records on a timeline, they look like this: PK: |-----|-----| FK: |-----| Now suppose you simultaneously update both referenced rows to be like so: PK: |---------|-| FK: |-----| Note that the FK's condition is still fulfilled. In a NO ACTION constraint, we clearly should not raise an error (and we don't). In a RESTRICT constraint, we *do* raise an error (but maybe we shouldn't). Here is some specific SQL (added to the tests in these patches): -- A PK update sliding the edge between two referenced rows: INSERT INTO temporal_rng (id, valid_at) VALUES ('[6,7)', daterange('2018-01-01', '2018-02-01')), ('[6,7)', daterange('2018-02-01', '2018-03-01')); INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[4,5)', daterange('2018-01-15', '2018-02-15'), '[6,7)'); UPDATE temporal_rng SET valid_at = CASE WHEN lower(valid_at) = '2018-01-01' THEN daterange('2018-01-01', '2018-01-05') WHEN lower(valid_at) = '2018-02-01' THEN daterange('2018-01-05', '2018-03-01') END WHERE id = '[6,7)'; or if you prefer PERIODs: -- A PK update sliding the edge between two referenced rows: INSERT INTO temporal_per (id, valid_from, valid_til) VALUES ('[6,7)', '2018-01-01', '2018-02-01'), ('[6,7)', '2018-02-01', '2018-03-01'); INSERT INTO temporal_fk_per2per (id, valid_from, valid_til, parent_id) VALUES ('[4,5)', '2018-01-15', '2018-02-15', '[6,7)'); UPDATE temporal_per SET valid_from = CASE WHEN valid_from = '2018-01-01' THEN '2018-01-01' WHEN valid_from = '2018-02-01' THEN '2018-01-05' END::date, valid_til = CASE WHEN valid_from = '2018-01-01' THEN '2018-01-05' WHEN valid_from = '2018-02-01' THEN '2018-03-01' END::date WHERE id = '[6,7)'; Here is what the SQL:2011 spec says (section 4.18.3.3 from Part 2 Foundation): > ON UPDATE RESTRICT: any change to a referenced column in the referenced table is prohibited if there is a matching row. So that says we should raise an error. But it seems clearly written with only non-temporal constraints in mind. Is it really correct in the scenario above? The reference is still valid. Does anyone know if the text has been updated in more recent versions of the standard? Part of me is happy the standard says this, because not raising an error is harder to implement. Maybe a lot harder. On the other hand, what if we have just one row in each table, and we *expand* the referenced range? In other words, from this: PK: |-----| FK: |-| to this: PK: |-------| FK: |-| Should that raise an error too? Currently it does not. But I think that is correct. As usual I go back to Date's model about "one row per millisecond". The referenced milliseconds didn't get updated, only the unreferenced ones. So I think what we are doing is okay. Likewise that same principle indicates we are doing the right thing in the original case: we did update the referenced milliseconds. Even though we swapped in replacements, we have to raise an error. This is no different than the non-temporal case. So my conclusion is we are doing the right thing in all places. But here is an opportunity for people to disagree. :-) Rebased to f5f30c22ed. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v38-0001-Add-stratnum-GiST-support-function.patch
- v38-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
- v38-0003-Forbid-empty-ranges-multiranges-in-WITHOUT-OVERL.patch
- v38-0004-Add-temporal-FOREIGN-KEY-contraints.patch
- v38-0005-Add-support-funcs-for-FOR-PORTION-OF.patch
- v38-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v38-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v38-0008-Add-PERIODs.patch
On Fri, Aug 2, 2024 at 1:09 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote: > > On 7/25/24 08:52, Paul Jungwirth wrote: > > Here is a patch moving the not-empty check into check_exclusion_or_unique_constraint. That is a more > > logical place for it than ExecConstraints, since WITHOUT OVERLAPS is part of the index constraint > > (not a CHECK constraint). At that point we've already looked up all the information we need. So > > there is no extra cost for non-temporal tables, and no need to change pg_class or add to the > > relcache. Also putting it there means we don't need any extra code to enforce non-empties when we > > build the index or do anything else with it. > > > > I think this is the nicest solution we can expect. It is even cleaner than the &&& ideas. So > > hopefully this gets us back to where we were when we decided to commit PKs & FKs to v17. > > > > As before, I've left the nonempty check as a separate patch to make reviewing easier, but when > > committing I would squash it with the PK patch. > > Hello, > > Here is an updated set of patches, rebased because the old patches no longer applied. > void ExecWithoutOverlapsNotEmpty(Relation rel, Datum attval, Oid typtype, Oid atttypid); should this just be a static function? I am not so sure. Oid typtype should be char typtype ? errmsg("new row for relation \"%s\" contains empty WITHOUT OVERLAPS value", we already have Form_pg_attribute via "TupleDesc tupdesc = RelationGetDescr(heap);" we can make the error message be: errmsg("cannot be empty range value for WITHOUT OVERLAPS column \"%s\" in relation \"%s\", colname, RelationGetRelationName(rel)) elog(ERROR, "Got unknown type for WITHOUT OVERLAPS column: %d", atttypid); people will wonder if domain over range works or not. but currently not, better error message would be: elog(ERROR, "WITHOUT OVERLAPS column \"%s\" is not a range or multirange type ", colname); This part is unlikely to be reachable, so I don't have a strong opinion on it. + if (!found) + column = NULL; this part no need? because if not found, the column would be last element in ColumnDef type list columns also the following change also make sense: + if (!OidIsValid(typid) && column) + typid = typenameTypeId(NULL, column->typeName); + /* The WITHOUT OVERLAPS part (if any) must be a range or multirange type. */ + if (constraint->without_overlaps && lc == list_last_cell(constraint->keys)) + { + if (!found && cxt->isalter) + { + /* + * Look up the column type on existing table. + * If we can't find it, let things fail in DefineIndex. + */ + Relation rel = cxt->rel; + for (int i = 0; i < rel->rd_att->natts; i++) + { + Form_pg_attribute attr = TupleDescAttr(rel->rd_att, i); + const char *attname; + + if (attr->attisdropped) + break; + + attname = NameStr(attr->attname); + if (strcmp(attname, key) == 0) + { + typid = attr->atttypid; + break; + } + } + } + if (found) +{ +} I am confused with this change? you found out the typid,but didn't using this information, should it be + if (strcmp(attname, key) == 0) + { + typid = attr->atttypid; + found = true; + break; + } so the failing error message be same for the following two cases: CREATE TABLE t1 (id int4range,valid_at tsrange,b text, CONSTRAINT temporal_rng_pk PRIMARY KEY (id, b WITHOUT OVERLAPS) ); CREATE TABLE t1 (id int4range,valid_at tsrange,b text); alter table t1 add CONSTRAINT temporal_rng_pk PRIMARY KEY (id, b WITHOUT OVERLAPS);
On Tue, Aug 6, 2024 at 10:02 AM jian he <jian.universality@gmail.com> wrote: > > On Fri, Aug 2, 2024 at 1:09 AM Paul Jungwirth > <pj@illuminatedcomputing.com> wrote: > > > > On 7/25/24 08:52, Paul Jungwirth wrote: > > > Here is a patch moving the not-empty check into check_exclusion_or_unique_constraint. That is a more > > > logical place for it than ExecConstraints, since WITHOUT OVERLAPS is part of the index constraint > > > (not a CHECK constraint). At that point we've already looked up all the information we need. So > > > there is no extra cost for non-temporal tables, and no need to change pg_class or add to the > > > relcache. Also putting it there means we don't need any extra code to enforce non-empties when we > > > build the index or do anything else with it. > > > > > > I think this is the nicest solution we can expect. It is even cleaner than the &&& ideas. So > > > hopefully this gets us back to where we were when we decided to commit PKs & FKs to v17. > > > > > > As before, I've left the nonempty check as a separate patch to make reviewing easier, but when > > > committing I would squash it with the PK patch. > > > > Hello, > > > > Here is an updated set of patches, rebased because the old patches no longer applied. > > hi. some minor issues. in generateClonedIndexStmt index->iswithoutoverlaps = (idxrec->indisprimary || idxrec->indisunique) && idxrec->indisexclusion; this case, the index accessMethod will be "gist" only? do you think it's necessary to: index->iswithoutoverlaps = (idxrec->indisprimary || idxrec->indisunique) && idxrec->indisexclusion && strcmp(index->accessMethod, "gist") == 0); src/bin/pg_dump/pg_dump.c and src/bin/psql/describe.c should be "if (pset.sversion >= 180000)"? + (This is sometimes called a + temporal key, if the column is a range of dates or timestamps, but + PostgreSQL allows ranges over any base type.) PostgreSQL should be decorated as <productname>PostgreSQL</productname> ? in DefineIndex we have: if (stmt->unique && !stmt->iswithoutoverlaps && !amRoutine->amcanunique) if (stmt->indexIncludingParams != NIL && !amRoutine->amcaninclude) if (numberOfKeyAttributes > 1 && !amRoutine->amcanmulticol) if (exclusion && amRoutine->amgettuple == NULL) maybe we can add: if (stmt->iswithoutoverlaps && strcmp(accessMethodName, "gist") != 0) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("access method \"%s\" does not support WITHOUT OVERLAPS constraints", accessMethodName))); + /* exclusionOpNames can be non-NIL if we are creating a partition */ + if (iswithoutoverlaps && exclusionOpNames == NIL) + { + indexInfo->ii_ExclusionOps = palloc_array(Oid, nkeycols); + indexInfo->ii_ExclusionProcs = palloc_array(Oid, nkeycols); + indexInfo->ii_ExclusionStrats = palloc_array(uint16, nkeycols); + } the comment is not 100% correct, i think. creating a partition, "create table like INCLUDING ALL", both will go through generateClonedIndexStmt. generateClonedIndexStmt will produce exclusionOpNames if this index supports exclusion constraint.
Here are some fixes based on outstanding feedback (some old some new). Details below: On 3/25/24 17:00, jian he wrote: > hi. > minor issues I found in v33-0003. > there are 29 of {check_amproc_signature?.*false} > only one {check_amproc_signature(procform->amproc, opcintype, true} > is this refactoring really worth it? I could add a separate function, for example check_amproc_retset_signature, but it would require duplicating almost the whole existing function, so a param seems better here. > We also need to refactor gistadjustmembers? You're right, added the new support procs there. > + <row> > + <entry><function>intersect</function></entry> > + <entry>computes intersection with <literal>FOR PORTION OF</literal> > + bounds</entry> > + <entry>13</entry> > + </row> > + <row> > + <entry><function>without_portion</function></entry> > + <entry>computes remaining duration(s) outside > + <literal>FOR PORTION OF</literal> bounds</entry> > + <entry>14</entry> > + </row> > needs to add "(optional)". Added. > +<programlisting> > +Datum > +my_range_intersect(PG_FUNCTION_ARGS) > +{ > + RangeType *r1 = PG_GETARG_RANGE_P(0); > + RangeType *r2 = PG_GETARG_RANGE_P(1); > + TypeCacheEntry *typcache; > + > + /* Different types should be prevented by ANYRANGE matching rules */ > + if (RangeTypeGetOid(r1) != RangeTypeGetOid(r2)) > elog(ERROR, "range > types do not match"); > + > + typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1)); > + > + PG_RETURN_RANGE_P(range_intersect_internal(typcache, r1, r2)); > +} > +</programlisting> > the elog, ERROR indentation is wrong? Fixed. > +/* > + * range_without_portion_internal - Sets outputs and outputn to the ranges > + * remaining and their count (respectively) after subtracting r2 from r1. > + * The array should never contain empty ranges. > + * The outputs will be ordered. We expect that outputs is an array of > + * RangeType pointers, already allocated with two slots. > + */ > +void > +range_without_portion_internal(TypeCacheEntry *typcache, RangeType *r1, > + RangeType *r2, RangeType **outputs, int *outputn) > the comments need to be refactored? > there is nothing related to "slot"? > not sure the "array" description is right. > (my understanding is compute rangetype r1 and r2, and save the result to > RangeType **outputs. Changed "slots" to "elements". Everything else looks correct to me. > select proisstrict, proname from pg_proc where proname = > 'range_without_portion'; > range_without_portion is strict. > but > select range_without_portion(NULL::int4range, int4range(11, 20,'[]')); > return zero rows. > Is this the expected behavior? Returning zero rows is correct if the function is never called (which is what strict does). I see other strict retset functions, e.g. json_array_elements. That also returns zero rows if you say SELECT json_array_elements(NULL); On 4/14/24 17:00, jian he wrote: > for unique index, primary key: > ii_ExclusionOps, ii_UniqueOps is enough to distinguish this index > support without overlaps, > we don't need another ii_HasWithoutOverlaps? > (i didn't test it though) I think it is worth having something named. But also ii_Exclusion is not set in index_concurrently_create_copy, so inferring when we have WITHOUT OVERLAPS will not work in that case. > ON CONFLICT DO NOTHING > ON CONFLICT (id, valid_at) DO NOTHING > ON CONFLICT ON CONSTRAINT temporal_rng_pk DO NOTHING > I am confused by the test. > here temporal_rng only has one primary key, ON CONFLICT only deals with it. > I thought these three are the same thing? They all have somewhat different code paths in infer_arbiter_indexes, and they mean different things. I recall when I first started dealing with empty ranges several of these test cases caught different bugs (as well as the DO UPDATE cases). On 8/5/24 19:02, jian he wrote: > void > ExecWithoutOverlapsNotEmpty(Relation rel, Datum attval, Oid typtype, > Oid atttypid); > > should this just be a static function? > I am not so sure. Changed. In a previous version I was calling this from two places, but I'm not anymore. > Oid typtype > should be > char typtype > ? Oops, you're right! Fixed. > errmsg("new row for relation \"%s\" contains empty > WITHOUT OVERLAPS value", > we already have Form_pg_attribute via "TupleDesc tupdesc = > RelationGetDescr(heap);" > we can make the error message be: > errmsg("cannot be empty range value for WITHOUT > OVERLAPS column \"%s\" in relation \"%s\", colname, > RelationGetRelationName(rel)) Yes, it's nicer to report the column name. Changed. > elog(ERROR, "Got unknown type for WITHOUT OVERLAPS column: %d", atttypid); > people will wonder if domain over range works or not. but currently > not, better error message would be: > elog(ERROR, "WITHOUT OVERLAPS column \"%s\" is not a range > or multirange type ", colname); > This part is unlikely to be reachable, so I don't have a strong opinion on it. Likewise. > + if (!found) > + column = NULL; > this part no need? > because if not found, the column would be last element in ColumnDef > type list columns We can later set `found` to true from inheritance (or it being a system column), and then `column` is set but wrong. So setting `column` to null seems generally clearer. But concretely, I use `column` below to give me the type (which I otherwise don't have in CREATE TABLE), so I can forbid types other than range and multirange. > also the following change also make sense: > > + if (!OidIsValid(typid) && column) > + typid = typenameTypeId(NULL, column->typeName); This is because in CREATE TABLE I need to get the type from the `column` variable. > I am confused with this change? > you found out the typid,but didn't using this information, should it be > + if (strcmp(attname, key) == 0) > + { > + typid = attr->atttypid; > + found = true; > + break; > + } Yes. Actually that is in the PERIOD patch file, but it should be in Forbid-empty-ranges. Moved. > so the failing error message be same for the following two cases: > CREATE TABLE t1 (id int4range,valid_at tsrange,b text, > CONSTRAINT temporal_rng_pk PRIMARY KEY (id, b WITHOUT OVERLAPS) > ); > > CREATE TABLE t1 (id int4range,valid_at tsrange,b text); > alter table t1 add CONSTRAINT temporal_rng_pk PRIMARY KEY (id, b > WITHOUT OVERLAPS); I think the same error message is the right thing to do here. It looks like that's what we're doing. If I've misunderstand what you want, can you clarify? On 8/6/24 07:50, jian he wrote: > in generateClonedIndexStmt > index->iswithoutoverlaps = (idxrec->indisprimary || > idxrec->indisunique) && idxrec->indisexclusion; > this case, the index accessMethod will be "gist" only? > > do you think it's necessary to: > index->iswithoutoverlaps = (idxrec->indisprimary || > idxrec->indisunique) && idxrec->indisexclusion > && strcmp(index->accessMethod, "gist") == 0); This doesn't seem necessary, and maybe we'll support non-gist someday, when this condition would be misleading. > src/bin/pg_dump/pg_dump.c and src/bin/psql/describe.c > should be "if (pset.sversion >= 180000)"? Ah, thanks. Changing these from 170000 also landed in the wrong patch file. Fixed. > + (This is sometimes called a > + temporal key, if the column is a range of dates or timestamps, but > + PostgreSQL allows ranges over any base type.) > > PostgreSQL should be decorated as > <productname>PostgreSQL</productname> Done. > in DefineIndex we have: > if (stmt->unique && !stmt->iswithoutoverlaps && !amRoutine->amcanunique) > if (stmt->indexIncludingParams != NIL && !amRoutine->amcaninclude) > if (numberOfKeyAttributes > 1 && !amRoutine->amcanmulticol) > if (exclusion && amRoutine->amgettuple == NULL) > > maybe we can add: > if (stmt->iswithoutoverlaps && strcmp(accessMethodName, "gist") != 0) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("access method \"%s\" does not support WITHOUT > OVERLAPS constraints", > accessMethodName))); Okay. > + /* exclusionOpNames can be non-NIL if we are creating a partition */ > + if (iswithoutoverlaps && exclusionOpNames == NIL) > + { > + indexInfo->ii_ExclusionOps = palloc_array(Oid, nkeycols); > + indexInfo->ii_ExclusionProcs = palloc_array(Oid, nkeycols); > + indexInfo->ii_ExclusionStrats = palloc_array(uint16, nkeycols); > + } > the comment is not 100% correct, i think. > creating a partition, "create table like INCLUDING ALL", both will go > through generateClonedIndexStmt. > generateClonedIndexStmt will produce exclusionOpNames if this index > supports exclusion constraint. I think the comment is correct, but non-NIL is a confusing double negative, and it's not clear that the comment is giving the motivation for the second half of the condition. I re-wrote it to be more clear. I also adjusted the `if` to avoid parsing operator names when not needed. Rebased to e56ccc8e42. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
- v39-0001-Add-stratnum-GiST-support-function.patch
- v39-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
- v39-0003-Forbid-empty-ranges-multiranges-in-WITHOUT-OVERL.patch
- v39-0004-Add-temporal-FOREIGN-KEY-contraints.patch
- v39-0005-Add-support-funcs-for-FOR-PORTION-OF.patch
- v39-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
- v39-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch
- v39-0008-Add-PERIODs.patch
On 07.08.24 22:54, Paul Jungwirth wrote: > Here are some fixes based on outstanding feedback (some old some new). I have studied your patches v39-0001 through v39-0004, which correspond to what had been reverted plus the new empty range check plus various minor fixes. This looks good to me now, so I propose to go ahead with that. Btw., in your 0003 you point out that this prevents using the WITHOUT OVERLAPS functionality for non-range types. But I think this could be accomplished by adding an "is empty" callback as a support function or something like that. I'm not suggesting to do that here, but it might be worth leaving a comment about that possibility.
On Thu, Sep 5, 2024 at 5:09 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 07.08.24 22:54, Paul Jungwirth wrote: > > Here are some fixes based on outstanding feedback (some old some new). > > I have studied your patches v39-0001 through v39-0004, which correspond > to what had been reverted plus the new empty range check plus various > minor fixes. This looks good to me now, so I propose to go ahead with that. Sounds good. Thanks! > Btw., in your 0003 you point out that this prevents using the WITHOUT > OVERLAPS functionality for non-range types. But I think this could be > accomplished by adding an "is empty" callback as a support function or > something like that. I'm not suggesting to do that here, but it might > be worth leaving a comment about that possibility. Yes, I was thinking the same. Agreed as well: it should be a follow-up patch, not needed for the base functionality. If we wanted a more generic name it could be "canWithoutOverlap" instead of "[!]isempty", but even "isempty" is probably still completely accurate. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Mon, Feb 12, 2024 at 3:55 AM Peter Eisentraut <peter@eisentraut.org> wrote: > Have you checked that the generated queries can use indexes and have > suitable performance? Do you have example execution plans maybe? This took longer than expected, but I wrote a long blog post about it here: https://illuminatedcomputing.com/posts/2024/09/benchmarking-temporal-foreign-keys/ The short answer is that yes we use the index, and the query plan is reasonable. I compared performance against two alternate implementations, and range_agg was fastest most of the time. When you have a lot of invalid FK checks, the implementation in Snodgrass's book wins, because it can short-circuit the plan and return a false result without executing most of it. But that seems like an unusual situation, and we should optimize for mostly-valid FK checks instead. There are some more experiments I'd like to do (see the end of that post), but for now I plan to prioritize getting the FOR PORTION OF patch ready to commit. But if there is anything you'd like to know more urgently, let me know. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 05.09.24 14:09, Peter Eisentraut wrote: > On 07.08.24 22:54, Paul Jungwirth wrote: >> Here are some fixes based on outstanding feedback (some old some new). > > I have studied your patches v39-0001 through v39-0004, which correspond > to what had been reverted plus the new empty range check plus various > minor fixes. This looks good to me now, so I propose to go ahead with > that. > > Btw., in your 0003 you point out that this prevents using the WITHOUT > OVERLAPS functionality for non-range types. But I think this could be > accomplished by adding an "is empty" callback as a support function or > something like that. I'm not suggesting to do that here, but it might > be worth leaving a comment about that possibility. I have committed these, as explained here. I look forward to an updated patch set from you to review the "FOR PORTION OF" patches next.
Foreign key violation errors are incorrectly raised in a few cases for a temporal foreign key with default ON UPDATE NO ACTION. Test is based on the commited v39 patches (used a snapshot version of PG18 devel available from PGDG). If there exists a single referencing row for a foreign key (with default ON UPDATE NO ACTION) with a range such as: c d |----------| and a single row in the referenced table, and the referenced row's range is updated as in one of the following cases: a b c d e f X>>>>>>>>>>>|==============================| ERROR 1: [a,f) updated to [b,f) or |==============================|<<<<<<<<<<<X [a,f) updated to [a,e) |==================|<<<<<<<<<<<<<<< ERROR 2: [b,) updated to [b,e) X>>>>>>>>>>>|================================== ERROR 3: [a,) updated to [b,) then an error is incorrectly raised (also, if the referencing range is [c,) instead of [c,d), then the last case also fails). See SQL-code below for how to reproduce the errors. --- CREATE TABLE temporal_rng ( id int4range, valid_at daterange, CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) ); CREATE TABLE temporal_fk_rng2rng ( id int4range, valid_at daterange, parent_id int4range, CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS), CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng ); -- ERROR 1 INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', daterange('2018-01-01', '2018-03-01')); INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[2,3)', daterange('2018-01-15', '2018-02-01'), '[1,2)'); -- ERROR: update or delete on table "temporal_rng" violates foreign key constraint "temporal_fk_rng2rng_fk" on table "temporal_fk_rng2rng" -- DETAIL: Key (id, valid_at)=([1,2), [2018-01-01,2018-03-01)) is still referenced from table "temporal_fk_rng2rng". UPDATE temporal_rng SET valid_at = daterange('2018-01-05', '2018-03-01') WHERE id = '[1,2)' AND valid_at = daterange('2018-01-01', '2018-03-01'); -- ERROR: update or delete on table "temporal_rng" violates foreign key constraint "temporal_fk_rng2rng_fk" on table "temporal_fk_rng2rng" -- DETAIL: Key (id, valid_at)=([1,2), [2018-01-01,2018-03-01)) is still referenced from table "temporal_fk_rng2rng". UPDATE temporal_rng SET valid_at = daterange('2018-01-01', '2018-02-15') WHERE id = '[1,2)' AND valid_at = daterange('2018-01-01', '2018-03-01'); -- ERROR 2 TRUNCATE temporal_rng, temporal_fk_rng2rng; INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', daterange('2018-01-05', NULL)); INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[2,3)', daterange('2018-01-15', '2018-02-01'), '[1,2)'); -- ERROR: update or delete on table "temporal_rng" violates foreign key constraint "temporal_fk_rng2rng_fk" on table "temporal_fk_rng2rng" -- DETAIL: Key (id, valid_at)=([1,2), [2018-01-05,)) is still referenced from table "temporal_fk_rng2rng". UPDATE temporal_rng SET valid_at = daterange('2018-01-05', '2018-02-15') WHERE id = '[1,2)' AND valid_at = daterange('2018-01-05', NULL); -- ERROR 3 TRUNCATE temporal_rng, temporal_fk_rng2rng; INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)', daterange('2018-01-01', NULL)); INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES ('[2,3)', daterange('2018-01-15', '2018-02-01'), '[1,2)'); -- ERROR: update or delete on table "temporal_rng" violates foreign key constraint "temporal_fk_rng2rng_fk" on table "temporal_fk_rng2rng" -- DETAIL: Key (id, valid_at)=([1,2), [2018-01-01,)) is still referenced from table "temporal_fk_rng2rng". UPDATE temporal_rng SET valid_at = daterange('2018-01-05', NULL) WHERE id = '[1,2)' AND valid_at = daterange('2018-01-01', NULL); --- I think the problem is the check in ri_restrict: SELECT 1 FROM [ONLY] <fktable> x WHERE $1 = fkatt1 [AND ...] FOR KEY SHARE OF x it will be performed in the NO ACTION case when ri_Check_Pk_Match returns false, and it'll then incorrectly assume that the presence of a referencing row in the <fktable> is an error. However, ri_Check_Pk_Match only tests wheter a temporal primary key's old range is contained by the multirange that includes its new updated range. If that's true, then all references are necessarily still valid. However, even if it is not contained, all references can still be valid. So, only testing for the presence of a referencing row is not enough. For example, for ERROR1, the range [a,f) is updated to [b,f): a b c d f X>>>>>>>>>>>|==============================| Clearly the old range: a c d f |==========================================| is no longer contained by (the multirange returned by range_agg of) the new range: b c d f |==============================| So ri_Check_Pk_Match returns false. Though the row in the referencing table: c d |----------| only specifies the range [c,d), so the temporal referential integrity still holds. However, the ri_restrict test will find a row in the referencing table and because of that raise an error. In the temporal NO ACTION case something similar to this (though with appropriate locks) could perhaps be tested in ri_restrict (when ri_Check_Pk_Match returns false): SELECT 1 FROM (SELECT range_agg(pkperiodatt) AS r FROM <pktable> WHERE pkatt1 = $1 [AND ...] AND pkperiodatt && $n) AS pktable, (SELECT fkperiodatt AS r FROM <fktable> WHERE fkatt1 = $1 [AND ...] AND fkperiodatt && $n) AS fktable WHERE NOT fktable.r <@ pktable.r /Sam
On 11/4/24 13:16, Sam Gabrielsson wrote: > Foreign key violation errors are incorrectly raised in a few cases for a temporal foreign key with > default ON UPDATE NO ACTION. Test is based on the commited v39 patches (used a snapshot version of > PG18 devel available from PGDG). Thank you for the report! I confirmed that this is a problem. In ri_restrict we fail if any fk records still match the being-changed pk, but for temporal if you're merely shrinking the pk range, fk references could still wind up being valid (if you're only shrinking it a little). So we need to do more work. > In the temporal NO ACTION case something similar to this (though with appropriate locks) could > perhaps be tested in ri_restrict (when ri_Check_Pk_Match returns false): > > SELECT 1 > FROM (SELECT range_agg(pkperiodatt) AS r > FROM <pktable> > WHERE pkatt1 = $1 [AND ...] > AND pkperiodatt && $n) AS pktable, > (SELECT fkperiodatt AS r > FROM <fktable> > WHERE fkatt1 = $1 [AND ...] > AND fkperiodatt && $n) AS fktable > WHERE NOT fktable.r <@ pktable.r This solution looks like it will work to me. Basically: find FKs that still match the PK, but only fail if they are no longer covered. IIRC for RESTRICT it is *correct* to reject the change, so we would want to keep the old SQL there, and only update it for NOACTION. I'll work on a fix and submit another set of patches. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
I have committed the documentation patches v43-0001-Add-WITHOUT-OVERLAPS-and-PERIOD-to-ALTER-TABLE-r.patch v43-0002-Update-conexclop-docs-for-WITHOUT-OVERLAPS.patch For the logical replication fixes v43-0003-Fix-logical-replication-for-temporal-tables.patch can you summarize what the issues currently are? Is it currently broken, or just not working as well as it could? AFAICT, there might be two separate issues. One is that you can't use a temporal index as replica identity, because ALTER TABLE rejects it. The other is that a subscriber fails to make use of a replica identity index, because it uses the wrong strategy numbers. This conditional is really hard to understand: + /* + * The AM must support uniqueness, and the index must in fact be unique. + * If we have a WITHOUT OVERLAPS constraint (identified by uniqueness + + * exclusion), we can use that too. + */ + if ((!indexRel->rd_indam->amcanunique || + !indexRel->rd_index->indisunique) && + !(indexRel->rd_index->indisunique && indexRel->rd_index->indisexclusion)) Why can we have a indisunique index when the AM is not amcanunique? Are we using the fields wrong? - return get_equal_strategy_number_for_am(am); + /* For GiST indexes we need to ask the opclass what strategy number to use. */ + if (am == GIST_AM_OID) + return GistTranslateStratnum(opclass, RTEqualStrategyNumber); + else + return get_equal_strategy_number_for_am(am); This code should probably be pushed into get_equal_strategy_number_for_am(). That function already has a switch based on index AM OIDs. Also, there are other callers of get_equal_strategy_number_for_am(), which also might want to become aware of GiST support. For the new test file, remember to add it to src/test/subscription/meson.build. Also, maybe add a introductory comment in the test file to describe generally what it's trying to test.
A quick comment on the patch v43-0005-Add-UPDATE-DELETE-FOR-PORTION-OF.patch regarding the code in transformForPortionOfClause() and the additions you made to lsyscache.c: What you are doing is taking a type OID and a function OID and then converting them back to name and namespace and then building a node and then feeding that node back through the parse analysis transformation. This all seems quite fishy and cumbersome. I think instead of building a FuncCall and transforming it, try to build a FuncExpr directly. Then you wouldn't need these new helper functions, which would also reduce the surface area of your patch. Additional mini-comment: #include "utils/rangetypes.h" in src/include/nodes/execnodes.h appears to be unnecessary (but it is then required in src/backend/commands/trigger.c).
I committed a few fixes in this area today. Has everything here been addressed? On 16.08.24 04:12, jian he wrote: > On Thu, Aug 8, 2024 at 4:54 AM Paul Jungwirth > <pj@illuminatedcomputing.com> wrote: >> >> Rebased to e56ccc8e42. > > I only applied to 0001-0003. > in create_table.sgml, I saw the WITHOUT OVERLAPS change is mainly in > table_constraint. > but we didn't touch alter_table.sgml. > Do we also need to change alter_table.sgml correspondingly? > > > + if (constraint->without_overlaps) > + { > + /* > + * This enforces that there is at least one equality column > + * besides the WITHOUT OVERLAPS columns. This is per SQL > + * standard. XXX Do we need this? > + */ > + if (list_length(constraint->keys) < 2) > + ereport(ERROR, > + errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("constraint using WITHOUT OVERLAPS needs at least two columns")); > + > + /* WITHOUT OVERLAPS requires a GiST index */ > + index->accessMethod = "gist"; > + } > if Constraint->conname is not NULL, we can > + errmsg("constraint \"%s\" using WITHOUT OVERLAPS needs at least two > columns")); > > "XXX Do we need this?" > I think currently we need this, otherwise the following create_table > synopsis will not be correct. > UNIQUE [ NULLS [ NOT ] DISTINCT ] ( column_name [, ... ] [, > column_name WITHOUT OVERLAPS ] ) > PRIMARY KEY ( column_name [, ... ] [, column_name WITHOUT OVERLAPS ] ) > > > we add a column in catalog-pg-constraint. > do we need change column conexclop, > "If an exclusion constraint, list of the per-column exclusion operators" > but currently, primary key, unique constraint both have valid conexclop. > > > +static void > +ExecWithoutOverlapsNotEmpty(Relation rel, NameData attname, Datum > attval, char typtype, Oid atttypid) > +{ > + bool isempty; > + RangeType *r; > + MultirangeType *mr; > + > + switch (typtype) > + { > + case TYPTYPE_RANGE: > + r = DatumGetRangeTypeP(attval); > + isempty = RangeIsEmpty(r); > + break; > + case TYPTYPE_MULTIRANGE: > + mr = DatumGetMultirangeTypeP(attval); > + isempty = MultirangeIsEmpty(mr); > + break; > + default: > + elog(ERROR, "WITHOUT OVERLAPS column \"%s\" is not a range or multirange", > + NameStr(attname)); > + } > + > + /* Report a CHECK_VIOLATION */ > + if (isempty) > + ereport(ERROR, > + (errcode(ERRCODE_CHECK_VIOLATION), > + errmsg("empty WITHOUT OVERLAPS value found in column \"%s\" in > relation \"%s\"", > + NameStr(attname), RelationGetRelationName(rel)))); > +} > I think in the default branch, you need at least set the isempty > value, otherwise maybe there will be a compiler warning > because later your use isempty, but via default branch is value undefined? > > > + /* > + * If this is a WITHOUT OVERLAPS constraint, > + * we must also forbid empty ranges/multiranges. > + * This must happen before we look for NULLs below, > + * or a UNIQUE constraint could insert an empty > + * range along with a NULL scalar part. > + */ > + if (indexInfo->ii_WithoutOverlaps) > + { > + ExecWithoutOverlapsNotEmpty(heap, att->attname, > + } > previously we found out that if this happens later, then it won't work. > but this comment didn't explain why this must have happened earlier. > I didn't dig deep enough to find out why. > but explaining it would be very helpful. > > > I think some tests are duplicated, so I did the refactoring.
On 11/13/24 02:11, Peter Eisentraut wrote: > I have committed the documentation patches > > v43-0001-Add-WITHOUT-OVERLAPS-and-PERIOD-to-ALTER-TABLE-r.patch > v43-0002-Update-conexclop-docs-for-WITHOUT-OVERLAPS.patch Thanks! > For the logical replication fixes > > v43-0003-Fix-logical-replication-for-temporal-tables.patch > > can you summarize what the issues currently are? Is it currently broken, or just not working as > well as it could? > > AFAICT, there might be two separate issues. One is that you can't use a temporal index as replica > identity, because ALTER TABLE rejects it. The other is that a subscriber fails to make use of a > replica identity index, because it uses the wrong strategy numbers. Correct, there are two issues this commit fixes: On the publisher side: You can use REPLICA IDENTITY DEFAULT with a temporal PK/UNIQUE index. There is no validation step, and sending the changes works fine. But REPLICA IDENTITY USING INDEX fails because the validation step rejects the non-btree index. Then on the subscriber side, we are not applying changes correctly, because we assume the strategy numbers are btree numbers. > This conditional is really hard to understand: > > + /* > + * The AM must support uniqueness, and the index must in fact be unique. > + * If we have a WITHOUT OVERLAPS constraint (identified by uniqueness + > + * exclusion), we can use that too. > + */ > + if ((!indexRel->rd_indam->amcanunique || > + !indexRel->rd_index->indisunique) && > + !(indexRel->rd_index->indisunique && indexRel->rd_index->indisexclusion)) > > Why can we have a indisunique index when the AM is not amcanunique? Are we using the fields wrong? > > - return get_equal_strategy_number_for_am(am); > + /* For GiST indexes we need to ask the opclass what strategy number to use. */ > + if (am == GIST_AM_OID) > + return GistTranslateStratnum(opclass, RTEqualStrategyNumber); > + else > + return get_equal_strategy_number_for_am(am); > > This code should probably be pushed into get_equal_strategy_number_for_am(). That function already > has a switch based on index AM OIDs. Also, there are other callers of > get_equal_strategy_number_for_am(), which also might want to become aware of GiST support. > > For the new test file, remember to add it to src/test/subscription/meson.build. > > Also, maybe add a introductory comment in the test file to describe generally what it's trying to test. Okay, I'll make these changes and re-send the patch. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Just sharing my progress here since it's been a week: On 11/6/24 17:03, Paul Jungwirth wrote: > On 11/4/24 13:16, Sam Gabrielsson wrote: >> Foreign key violation errors are incorrectly raised in a few cases for a temporal foreign key with >> default ON UPDATE NO ACTION. Test is based on the commited v39 patches (used a snapshot version of >> PG18 devel available from PGDG). > > Thank you for the report! I confirmed that this is a problem. In ri_restrict we fail if any fk > records still match the being-changed pk, but for temporal if you're merely shrinking the pk range, > fk references could still wind up being valid (if you're only shrinking it a little). So we need to > do more work. I'm nearly done with a patch for this. I'll try to wrap it up today and get it sent this evening. > IIRC for RESTRICT it is *correct* to reject the change, so we would want to keep the old SQL there, > and only update it for NOACTION. I realized there are problems with the RESTRICT case also. I've got a fix written for that too, but it needs some tidying up. I'll submit both patches together. The RESTRICT case needs to find the *lost* time span(s) (because it might not be the whole thing) and check for references to those. To do that, it calls our without_portion support function. That function was intended to support FOR PORTION OF, but it happens to be exactly what we need here. So I'm reordering the patches a bit and adjusting the documentation there. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Wed, 13 Nov 2024, 11:11 Peter Eisentraut, <peter@eisentraut.org> wrote: > This conditional is really hard to understand: > > + /* > + * The AM must support uniqueness, and the index must in fact be > unique. > + * If we have a WITHOUT OVERLAPS constraint (identified by > uniqueness + > + * exclusion), we can use that too. > + */ > + if ((!indexRel->rd_indam->amcanunique || > + !indexRel->rd_index->indisunique) && > + !(indexRel->rd_index->indisunique && > indexRel->rd_index->indisexclusion)) > > Why can we have a indisunique index when the AM is not amcanunique? Are > we using the fields wrong? I called this issue out earlier this year: amcanunique implies btree-style uniqueness, and allows CREATE UNIQUE INDEX. However, that IndexAmRoutine field seems to be ignored for indexes that are created to back temporal unique constraints, which thus get indrel->indisunique = true. This causes interesting issues when you look at the index catalog and errors: there are indexes with indisunique using gist, but CREATE UNIQUE INDEX USING gist (...) throws the nice "access method "gist" does not support unique indexes" error. It'd be nice if there was a better internal API to describe what types of uniqueness each index supports, so CREATE UNIQUE INDEX could work with gist for WITHOUT OVERLAPS, and these WITHOUT OVERLAPS unique indexes could be attached to primary keys without taking O(>= tablesize) of effort. Kind regards, Matthias van de Meent Neon (https://neon.tech)
On 11/14/24 10:14, Matthias van de Meent wrote: > I called this issue out earlier this year: amcanunique implies > btree-style uniqueness, and allows CREATE UNIQUE INDEX. However, that > IndexAmRoutine field seems to be ignored for indexes that are created > to back temporal unique constraints, which thus get > indrel->indisunique = true. This causes interesting issues when you > look at the index catalog and errors: there are indexes with > indisunique using gist, but CREATE UNIQUE INDEX USING gist (...) > throws the nice "access method "gist" does not support unique indexes" > error. > > It'd be nice if there was a better internal API to describe what types > of uniqueness each index supports, so CREATE UNIQUE INDEX could work > with gist for WITHOUT OVERLAPS, and these WITHOUT OVERLAPS unique > indexes could be attached to primary keys without taking O(>= > tablesize) of effort. I think the issue is that a specific GiST opclass may support uniqueness (if it defines the support proc to communicate its equality stratnum), but the AM as a whole doesn't support uniqueness. So amcanunique is false. We could make the stratnum support proc required, but that seems like it would break too many extensions. Or maybe we could change canunique to an opclass-level property? Probably we could support `CREATE UNIQUE INDEX USING gist (...)` *if* the opclasses involved have stratnum support funcs. I'm happy to write/assist a patch for that. It would use exclusion constraints behind the scenes, as we're doing with temporal PKs/UNIQUEs. But that still wouldn't give you CONCURRENTLY (which is the main motivation), because we don't support creating exclusion constraints concurrently yet. The work in 2014 on REINDEX CONCURRENTLY originally tried to support exclusion constraints, but it didn't make it into the final version. I'm not sure what needs to be done there. But one thing that seems tricky is that the *index* doesn't know about the exclusion rules; it's all in pg_constraint. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On 14.11.24 18:25, Paul Jungwirth wrote: > On 11/13/24 02:11, Peter Eisentraut wrote: >> I have committed the documentation patches >> >> v43-0001-Add-WITHOUT-OVERLAPS-and-PERIOD-to-ALTER-TABLE-r.patch >> v43-0002-Update-conexclop-docs-for-WITHOUT-OVERLAPS.patch > > Thanks! > >> For the logical replication fixes >> >> v43-0003-Fix-logical-replication-for-temporal-tables.patch >> >> can you summarize what the issues currently are? Is it currently >> broken, or just not working as well as it could? >> >> AFAICT, there might be two separate issues. One is that you can't use >> a temporal index as replica identity, because ALTER TABLE rejects it. >> The other is that a subscriber fails to make use of a replica identity >> index, because it uses the wrong strategy numbers. > > Correct, there are two issues this commit fixes: > > On the publisher side: You can use REPLICA IDENTITY DEFAULT with a > temporal PK/UNIQUE index. There is no validation step, and sending the > changes works fine. But REPLICA IDENTITY USING INDEX fails because the > validation step rejects the non-btree index. Ok, I have committed the fix for this, and I'll continue working through the rest of the patches. > Then on the subscriber side, we are not applying changes correctly, > because we assume the strategy numbers are btree numbers.
On Thu, Nov 21, 2024 at 01:56:36PM +0100, Peter Eisentraut wrote: > Ok, I have committed the fix for this, and I'll continue working through the > rest of the patches. nitpick: I think this one needs a pgindent [0]. [0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=koel&dt=2024-11-21%2013%3A49%3A02 -- nathan
On 12/4/24 03:15, Peter Eisentraut wrote: > I did some more work on this approach, with the attached patches resulting. This is essentially > what I'm describing above, which in turn is a variation of your patch v45-0001-Fix-logical- > replication-for-temporal-tables.patch, with your tests added at the end. > > I also did some more work on IsIndexUsableForReplicaIdentityFull() to make the various claims in the > comments reflected by actual code. With all of this, it can now also use gist indexes on the > subscriber side in cases of REPLICA IDENTITY FULL. This isn't immediately visible in the tests, but > you can see that the tests are using it internally by adding debugging elogs or something like that. > > Altogether, I think this fixes the original problem of temporal keys not being handled properly in > logical replication subscribers, and it makes things less hardcoded around btree and hash in general. > > Please review. Thanks! Between U.S. Thanksgiving and the flu I haven't had a chance to look at your previous patches, but I should have an evening or two this week to review what you've got here. I appreciate your keeping the ball moving! I also have some notes about the Index AMI work I will write up on that thread. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
On Wed, 4 Dec 2024 at 16:45, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 26.11.24 13:18, Peter Eisentraut wrote: > > I think this is the right idea, but after digging around a bit more, I > > think more could/should be done. > > > > After these changes, the difference between > > get_equal_strategy_number_for_am() and get_equal_strategy_number() is > > kind of pointless. We should really just use > > get_equal_strategy_number() for all purposes. > > > > But then you have the problem that IsIndexUsableForReplicaIdentityFull() > > doesn't have the opclass IDs available in the IndexInfo structure. You > > appear to have worked around that by writing > > > > + if (get_equal_strategy_number_for_am(indexInfo->ii_Am, InvalidOid) > > == InvalidStrategy) > > > > which I suppose will have the same ultimate result as before that patch, > > but it seems kind of incomplete. > > > > I figure this could all be simpler if > > IsIndexUsableForReplicaIdentityFull() used the index relcache entry > > directly instead of going the detour through IndexInfo. Then we have > > all the information available, and this should ultimately all work > > properly for suitable GiST indexes as well. > > > > I have attached three patches that show how that could be done. (This > > would work in conjunction with your new tests. (Although now we could > > also test GiST with replica identity full?)) > > > > The comment block for IsIndexUsableForReplicaIdentityFull() makes a > > bunch of claims that are not all explicitly supported by the code. The > > code doesn't actually check the AM, this is all only done indirectly via > > other checks. The second point (about tuples_equal()) appears to be > > slightly wrong, because while you need an equals operator from the type > > cache, that shouldn't prevent you from also using a different index AM > > than btree or hash for the replica identity index. And the stuff about > > amgettuple, if that is important, why is it only checked for assert builds? > > I did some more work on this approach, with the attached patches > resulting. This is essentially what I'm describing above, which in turn > is a variation of your patch > v45-0001-Fix-logical-replication-for-temporal-tables.patch, with your > tests added at the end. > > I also did some more work on IsIndexUsableForReplicaIdentityFull() to > make the various claims in the comments reflected by actual code. With > all of this, it can now also use gist indexes on the subscriber side in > cases of REPLICA IDENTITY FULL. This isn't immediately visible in the > tests, but you can see that the tests are using it internally by adding > debugging elogs or something like that. > > Altogether, I think this fixes the original problem of temporal keys not > being handled properly in logical replication subscribers, and it makes > things less hardcoded around btree and hash in general. > > Please review. I started having look at the patch, here are some comments while doing the initial review: 1) wait_for_catchup and data validation can be done after insertion itself, update and delete error validation can happen later: +($result, $stdout, $stderr) = $node_publisher->psql('postgres', + "DELETE FROM temporal_no_key WHERE id = '[3,4)'"); +is( $stderr, + qq(psql:<stdin>:1: ERROR: cannot delete from table "temporal_no_key" because it does not have a replica identity and publishes deletes +HINT: To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE.), + "can't DELETE temporal_no_key DEFAULT"); + +$node_publisher->wait_for_catchup('sub1'); + +$result = $node_subscriber->safe_psql('postgres', + "SELECT * FROM temporal_no_key ORDER BY id, valid_at"); +is( $result, qq{[1,2)|[2000-01-01,2010-01-01)|a +[2,3)|[2000-01-01,2010-01-01)|a +[3,4)|[2000-01-01,2010-01-01)|a +[4,5)|[2000-01-01,2010-01-01)|a}, 'replicated temporal_no_key DEFAULT'); 2) Copyright need not mention "2021-" diff --git a/src/test/subscription/t/034_temporal.pl b/src/test/subscription/t/034_temporal.pl new file mode 100644 index 00000000000..0f501f1cee8 --- /dev/null +++ b/src/test/subscription/t/034_temporal.pl @@ -0,0 +1,673 @@ + +# Copyright (c) 2021-2024, PostgreSQL Global Development Group + +# Logical replication tests for temporal tables +# 3) This statement seems very long in a single line, could we split it into multiple lines: @@ -844,6 +842,15 @@ IsIndexUsableForReplicaIdentityFull(Relation idxrel, AttrMap *attrmap) Assert(idxrel->rd_index->indnatts >= 1); + indclass = (oidvector *) DatumGetPointer(SysCacheGetAttrNotNull(INDEXRELID, idxrel->rd_indextuple, Anum_pg_index_indclass)); + + /* Ensure that the index has a valid equal strategy for each key column */ + for (int i = 0; i < idxrel->rd_index->indnkeyatts; i++) + { + if (get_equal_strategy_number(indclass->values[i]) == InvalidStrategy) + return false; + } 4) The commit message had a small typo, should "fulfull" be "fulfill": IsIndexUsableForReplicaIdentityFull() described a number of conditions that a suitable index has to fulfull. But not all of these were 5) temporal_no_key table is not dropped: +$result = $node_subscriber->safe_psql('postgres', + "SELECT * FROM temporal_unique ORDER BY id, valid_at"); +is( $result, qq{[1,2)|[2000-01-01,2010-01-01)|a +[2,3)|[2000-01-01,2010-01-01)|a +[3,4)|[2000-01-01,2010-01-01)|a +[4,5)|[2000-01-01,2010-01-01)|a}, 'replicated temporal_unique NOTHING'); + +# cleanup + +$node_publisher->safe_psql('postgres', "DROP TABLE temporal_pk"); +$node_publisher->safe_psql('postgres', "DROP TABLE temporal_unique"); +$node_publisher->safe_psql('postgres', "DROP PUBLICATION pub1"); +$node_subscriber->safe_psql('postgres', "DROP TABLE temporal_pk"); +$node_subscriber->safe_psql('postgres', "DROP TABLE temporal_unique"); +$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub1"); + + +done_testing(); -- 2.47.1 6) Since this is common to first and last test we can have it in a subroutine and use it for both: +# create tables on publisher + +$node_publisher->safe_psql('postgres', + "CREATE TABLE temporal_no_key (id int4range, valid_at daterange, a text)" +); + +$node_publisher->safe_psql('postgres', + "CREATE TABLE temporal_pk (id int4range, valid_at daterange, a text, PRIMARY KEY (id, valid_at WITHOUT OVERLAPS))" +); + +$node_publisher->safe_psql('postgres', + "CREATE TABLE temporal_unique (id int4range NOT NULL, valid_at daterange NOT NULL, a text, UNIQUE (id, valid_at WITHOUT OVERLAPS))" +); + +# create tables on subscriber + +$node_subscriber->safe_psql('postgres', + "CREATE TABLE temporal_no_key (id int4range, valid_at daterange, a text)" +); + +$node_subscriber->safe_psql('postgres', + "CREATE TABLE temporal_pk (id int4range, valid_at daterange, a text, PRIMARY KEY (id, valid_at WITHOUT OVERLAPS))" +); + +$node_subscriber->safe_psql('postgres', + "CREATE TABLE temporal_unique (id int4range NOT NULL, valid_at daterange NOT NULL, a text, UNIQUE (id, valid_at WITHOUT OVERLAPS))" +); + 7) Similarly the drop tables can be moved to a subroutine and used as it is being used in multiple tests: +# cleanup + +$node_publisher->safe_psql('postgres', "DROP TABLE temporal_no_key"); +$node_publisher->safe_psql('postgres', "DROP TABLE temporal_pk"); +$node_publisher->safe_psql('postgres', "DROP TABLE temporal_unique"); +$node_publisher->safe_psql('postgres', "DROP PUBLICATION pub1"); +$node_subscriber->safe_psql('postgres', "DROP TABLE temporal_no_key"); +$node_subscriber->safe_psql('postgres', "DROP TABLE temporal_pk"); +$node_subscriber->safe_psql('postgres', "DROP TABLE temporal_unique"); +$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub1"); Regards, Vignesh
On 07.12.24 20:29, Paul Jungwirth wrote: > These five patches all look good to me. > > Note that my tests already include a section for REPLICA IDENTITY FULL, > which passed. But the subscriber was using a SeqScan to look up tuples > to update. > > Here are the steps (mostly just because it was confusing for me at > first): First in FindUsableIndexForReplicaIdentityFull, we would call > IsIndexUsableForReplicaIdentityFull, get back false, and decide there > was no index to use. Then in FindReplTupleInLocalRel, localidxoid was 0, > so we woudln't call IsIndexUsableForReplicaIdentityFull at all. > > After applying the five patches, I can see that we choose the index and > call IsIndexUsableForReplicaIdentityFull from both sites. This should > make applying changes a lot faster. I have committed these. I will continue with reviewing v45-0002 and following now.