Thread: Proposal: temporal extension "period" data type
I maintain the project "Temporal PostgreSQL" on pgfoundry: http://pgfoundry.org/projects/temporal Reference docs available here: http://temporal.projects.postgresql.org/reference.html This project provides a data type called "period" (formerly t_interval). This type is an interval in the mathematical sense, that is, it has a beginning time and end time (timestamptz, to be specific). This is not like the SQL type "interval" which is not anchored at any specific point in time. This is valuable for many applications that need to be able to manipulate time. This functionality is sometimes called "bi-temporal", because people sometimes represent it as two timestamps in two columns. Representing the information in one column of "periods" has important advantages such as:* It's much simpler and less error-prone to write queries expressing "overlaps", "contains", "intersection", etc.* We can effectively index this type with GiST. There's no effective way to index two separate timestamp columns. During PGCon 2008, several people encouraged me to submit the code for inclusion in the core. The advantages of including it in core are that some features can't be done from pgfoundry, such as:* temporal foreign keys* temporal joins* syntax like "ALTER TABLE ... ADD LOG". On the other hand, I don't currently have proposals for any of those things. And including in core has the usual drawbacks, like waiting for a new PostgreSQL release just to get some improvement for the data type. 1. Should it be included in core, or remain on pgfoundry? 2. If it should be included in core, I'd like to know if any changes should be made to the API, available operators, or the names of anything (see the reference docs). The current name of the type is "period" to avoid confusion with SQL's misnamed "interval" type. The operators are mostly self-explanatory, but I'm open to suggestion for better names for those, too. 3. I'd like to get some help with details like the optimal GiST picksplit function to use, and selectivity functions, and analyze functions. 4. Should we replace the undocumented type "tinterval"? Regards,Jeff Davis
Jeff Davis wrote: > During PGCon 2008, several people encouraged me to submit the code for > inclusion in the core. > > The advantages of including it in core are that some features can't be > done from pgfoundry, such as: > * temporal foreign keys > * temporal joins > Maybe I'm being dense, but how does inclusion in core help you do these things? Or, more precisely, how does non-inclusion in core prevent them. > * syntax like "ALTER TABLE ... ADD LOG". > What does this mean? cheers andrew
On Mon, 26 May 2008, Jeff Davis wrote: > And including in core has the usual drawbacks, like waiting for a new > PostgreSQL release just to get some improvement for the data type. How stable has the core code implementation been recently? If major changes to it are unlikely anyway because it's matured, this downside is a lot less important. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, 2008-05-26 at 06:49 -0400, Andrew Dunstan wrote: > > * temporal foreign keys > > * temporal joins > > > > Maybe I'm being dense, but how does inclusion in core help you do these > things? Or, more precisely, how does non-inclusion in core prevent them. Temporal FK's can be implemented with triggers, but can't be implemented with the current FK syntax and mechanism. Any extensions to the FK syntax to support this would need to be done in postgres. Temporal joins can be implemented by joining with the "overlaps" operator and selecting the intersection of the two period columns. Anything added would be syntactic sugar. > > > * syntax like "ALTER TABLE ... ADD LOG". > > > > What does this mean? > The general idea is the same as the idea behind pgfoundry projects like: http://pgfoundry.org/projects/tablelog/ and http://pgfoundry.org/projects/aupg/ The idea is to record all deleted or updated tuples in a table so that you can see a historical state of the table (i.e. a snapshot query) or see some changes to the data over time. Having some kind of syntax to back up a feature like that could be useful for simplifying the creation of a log. Also, people may be more comfortable using some built-in syntax if they are creating an audit log for security purposes. Regards,Jeff Davis
On Mon, 2008-05-26 at 10:59 -0700, Jeff Davis wrote: > On Mon, 2008-05-26 at 06:49 -0400, Andrew Dunstan wrote: > > > * temporal foreign keys > > > * temporal joins > > > > > > > Maybe I'm being dense, but how does inclusion in core help you do these > > things? Or, more precisely, how does non-inclusion in core prevent them. > > Temporal FK's can be implemented with triggers, but can't be > implemented with the current FK syntax and mechanism. Any extensions to > the FK syntax to support this would need to be done in postgres. > > Temporal joins can be implemented by joining with the "overlaps" > operator and selecting the intersection of the two period columns. > Anything added would be syntactic sugar. Either overlaps or contains seem valid join operators Will there be UNIQUE (non-overlapping) indexes for period ? -------------- Hannu
Jeff Davis wrote: > The advantages of including it in core are that some features can't be > done from pgfoundry, such as: > * temporal foreign keys > * temporal joins > * syntax like "ALTER TABLE ... ADD LOG". > On the other hand, I don't currently have proposals for any of those > things. And including in core has the usual drawbacks, like waiting for > a new PostgreSQL release just to get some improvement for the data type. > > 1. Should it be included in core, or remain on pgfoundry? I think it's a good candidate for inclusion in core. Obviously it's not ready for that yet, but it does seem that a complete solution, with joins etc., can't be achieved purely with user-defined operators. > 2. If it should be included in core, I'd like to know if any changes > should be made to the API, available operators, or the names of anything > (see the reference docs). The current name of the type is "period" to > avoid confusion with SQL's misnamed "interval" type. The operators are > mostly self-explanatory, but I'm open to suggestion for better names for > those, too. Instead of one datatype, I'd like to see the capability of defining period types based on any datatype with suitable opclass. Periods of dates and floats, for example, would be just as useful as periods of timestamps. > 4. Should we replace the undocumented type "tinterval"? I didn't realize we have one. It seems that tinterval provides operators like overlaps and contains as well. What's the difference between what you're working on and tinterval? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, 2008-05-26 at 19:11 +0100, Heikki Linnakangas wrote: > I think it's a good candidate for inclusion in core. Obviously it's not > ready for that yet, but it does seem that a complete solution, with > joins etc., can't be achieved purely with user-defined operators. Joins can be acheived with user-defined operators by using overlaps and intersection. However, it's ugly enough that not many people would actually use them. > > 2. If it should be included in core, I'd like to know if any changes > > should be made to the API, available operators, or the names of anything > > (see the reference docs). The current name of the type is "period" to > > avoid confusion with SQL's misnamed "interval" type. The operators are > > mostly self-explanatory, but I'm open to suggestion for better names for > > those, too. > > Instead of one datatype, I'd like to see the capability of defining > period types based on any datatype with suitable opclass. Periods of > dates and floats, for example, would be just as useful as periods of > timestamps. I agree, although we couldn't use the word "period" to describe floats (at least, that's not how I understand the word). Do we have any mechanism for that kind of a type generator? Typmod might work, although that seems awkward. > > 4. Should we replace the undocumented type "tinterval"? > > I didn't realize we have one. It seems that tinterval provides operators > like overlaps and contains as well. What's the difference between what > you're working on and tinterval? "period" is based on timestamptz, "tinterval" is based on "abstime". Also, my type can be inclusive or exclusive as needed, I couldn't find a way to make "tinterval" exclusive on either side. Regards,Jeff Davis
On Mon, 2008-05-26 at 21:10 +0300, Hannu Krosing wrote: > Either overlaps or contains seem valid join operators > "Overlaps" is the join operator as defined in [1]. In the case where you have a temporal FK, they are the same. But yes, either is useful. > Will there be UNIQUE (non-overlapping) indexes for period ? > I discussed this with several people and I believe this is possible with some modifications to GiST. I will be discussing it further and I will try to put together a proposal. Regards,Jeff Davis 1. _Temporal Data and the Relational Model_, C.J. Date, Hugh Darwen, Nikos A. Lorentzos, p. 149.
On Mon, 2008-05-26 at 08:49 -0400, Greg Smith wrote: > On Mon, 26 May 2008, Jeff Davis wrote: > > > And including in core has the usual drawbacks, like waiting for a new > > PostgreSQL release just to get some improvement for the data type. > > How stable has the core code implementation been recently? If major > changes to it are unlikely anyway because it's matured, this downside is a > lot less important. Aside from name changes, the code has been fairly static. However, I expect that others will have suggestions about improving the picksplit algorithm, selectivity functions, etc. Regards,Jeff Davis
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Le 26 mai 08 à 20:36, Jeff Davis a écrit : >> Instead of one datatype, I'd like to see the capability of defining >> period types based on any datatype with suitable opclass. Periods of >> dates and floats, for example, would be just as useful as periods of >> timestamps. > > I agree, although we couldn't use the word "period" to describe floats > (at least, that's not how I understand the word). In order to get a good enough implementation of text based prefix searches (where prefix_column @> 'literal') I coded a prefix_range datatype, which would be the text period here... http://pgfoundry.org/projects/prefix So I'd propose "range" as the generic "period" name, and let's extend it to text too. And while at it maybe we could index suffix searches in addition to prefix searches... Some tests show that even without specific index support, prefix @> text is quicker than text like 'prefix%', so maybe it would be possible to have some generic way to use ranges under the hood to speed-up some specific queries... Applied to timestamps and "period", this would allow using period contains operator for handling between, e.g. Dunno if this part is a good idea, but I nevertheless felt like raising it. HTH, - -- dim -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkg7JNsACgkQZnqmTbDljfBGFgCeJ43Q4XQPPl/mEX+AuJmGeMs+ e0sAnR1eD5/B6J+8VADK4Vuj3USXdZuv =WhNz -----END PGP SIGNATURE-----