Thread: Proposal: temporal extension "period" data type

Proposal: temporal extension "period" data type

From
Jeff Davis
Date:
I maintain the project "Temporal PostgreSQL" on pgfoundry:

http://pgfoundry.org/projects/temporal

Reference docs available here:
http://temporal.projects.postgresql.org/reference.html

This project provides a data type called "period" (formerly t_interval).
This type is an interval in the mathematical sense, that is, it has a
beginning time and end time (timestamptz, to be specific). This is not
like the SQL type "interval" which is not anchored at any specific point
in time.

This is valuable for many applications that need to be able to
manipulate time. This functionality is sometimes called "bi-temporal",
because people sometimes represent it as two timestamps in two columns.

Representing the information in one column of "periods" has important
advantages such as:* It's much simpler and less error-prone to write queries expressing
"overlaps", "contains", "intersection", etc.* We can effectively index this type with GiST. There's no effective
way to index two separate timestamp columns.

During PGCon 2008, several people encouraged me to submit the code for
inclusion in the core.

The advantages of including it in core are that some features can't be
done from pgfoundry, such as:* temporal foreign keys* temporal joins* syntax like "ALTER TABLE ... ADD LOG".
On the other hand, I don't currently have proposals for any of those
things. And including in core has the usual drawbacks, like waiting for
a new PostgreSQL release just to get some improvement for the data type.

1. Should it be included in core, or remain on pgfoundry?

2. If it should be included in core, I'd like to know if any changes
should be made to the API, available operators, or the names of anything
(see the reference docs). The current name of the type is "period" to
avoid confusion with SQL's misnamed "interval" type. The operators are
mostly self-explanatory, but I'm open to suggestion for better names for
those, too. 

3. I'd like to get some help with details like the optimal GiST
picksplit function to use, and selectivity functions, and analyze
functions. 

4. Should we replace the undocumented type "tinterval"?

Regards,Jeff Davis



Re: Proposal: temporal extension "period" data type

From
Andrew Dunstan
Date:

Jeff Davis wrote:
> During PGCon 2008, several people encouraged me to submit the code for
> inclusion in the core.
>
> The advantages of including it in core are that some features can't be
> done from pgfoundry, such as:
>  * temporal foreign keys
>  * temporal joins
>   

Maybe I'm being dense, but how does inclusion in core help you do these 
things? Or, more precisely, how does non-inclusion in core prevent them.

>  * syntax like "ALTER TABLE ... ADD LOG".
>   

What does this mean?


cheers

andrew



Re: Proposal: temporal extension "period" data type

From
Greg Smith
Date:
On Mon, 26 May 2008, Jeff Davis wrote:

> And including in core has the usual drawbacks, like waiting for a new 
> PostgreSQL release just to get some improvement for the data type.

How stable has the core code implementation been recently?  If major 
changes to it are unlikely anyway because it's matured, this downside is a 
lot less important.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: Proposal: temporal extension "period" data type

From
Jeff Davis
Date:
On Mon, 2008-05-26 at 06:49 -0400, Andrew Dunstan wrote:
> >  * temporal foreign keys
> >  * temporal joins
> >   
> 
> Maybe I'm being dense, but how does inclusion in core help you do these 
> things? Or, more precisely, how does non-inclusion in core prevent them.

Temporal FK's can be implemented with triggers, but can't be
implemented with the current FK syntax and mechanism. Any extensions to
the FK syntax to support this would need to be done in postgres. 

Temporal joins can be implemented by joining with the "overlaps"
operator and selecting the intersection of the two period columns.
Anything added would be syntactic sugar.

> 
> >  * syntax like "ALTER TABLE ... ADD LOG".
> >   
> 
> What does this mean?
> 

The general idea is the same as the idea behind pgfoundry projects like:

http://pgfoundry.org/projects/tablelog/ and
http://pgfoundry.org/projects/aupg/

The idea is to record all deleted or updated tuples in a table so that
you can see a historical state of the table (i.e. a snapshot query) or
see some changes to the data over time.

Having some kind of syntax to back up a feature like that could be
useful for simplifying the creation of a log. Also, people may be more
comfortable using some built-in syntax if they are creating an audit log
for security purposes.

Regards,Jeff Davis



Re: Proposal: temporal extension "period" data type

From
Hannu Krosing
Date:
On Mon, 2008-05-26 at 10:59 -0700, Jeff Davis wrote:
> On Mon, 2008-05-26 at 06:49 -0400, Andrew Dunstan wrote:
> > >  * temporal foreign keys
> > >  * temporal joins
> > >   
> > 
> > Maybe I'm being dense, but how does inclusion in core help you do these 
> > things? Or, more precisely, how does non-inclusion in core prevent them.
> 
> Temporal FK's can be implemented with triggers, but can't be
> implemented with the current FK syntax and mechanism. Any extensions to
> the FK syntax to support this would need to be done in postgres. 
> 
> Temporal joins can be implemented by joining with the "overlaps"
> operator and selecting the intersection of the two period columns.
> Anything added would be syntactic sugar.

Either overlaps or contains seem valid join operators

Will there be UNIQUE (non-overlapping) indexes for period ?

--------------
Hannu




Re: Proposal: temporal extension "period" data type

From
"Heikki Linnakangas"
Date:
Jeff Davis wrote:
> The advantages of including it in core are that some features can't be
> done from pgfoundry, such as:
>  * temporal foreign keys
>  * temporal joins
>  * syntax like "ALTER TABLE ... ADD LOG".
> On the other hand, I don't currently have proposals for any of those
> things. And including in core has the usual drawbacks, like waiting for
> a new PostgreSQL release just to get some improvement for the data type.
> 
> 1. Should it be included in core, or remain on pgfoundry?

I think it's a good candidate for inclusion in core. Obviously it's not 
ready for that yet, but it does seem that a complete solution, with 
joins etc., can't be achieved purely with user-defined operators.

> 2. If it should be included in core, I'd like to know if any changes
> should be made to the API, available operators, or the names of anything
> (see the reference docs). The current name of the type is "period" to
> avoid confusion with SQL's misnamed "interval" type. The operators are
> mostly self-explanatory, but I'm open to suggestion for better names for
> those, too. 

Instead of one datatype, I'd like to see the capability of defining 
period types based on any datatype with suitable opclass. Periods of 
dates and floats, for example, would be just as useful as periods of 
timestamps.

> 4. Should we replace the undocumented type "tinterval"?

I didn't realize we have one. It seems that tinterval provides operators 
like overlaps and contains as well. What's the difference between what 
you're working on and tinterval?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Proposal: temporal extension "period" data type

From
Jeff Davis
Date:
On Mon, 2008-05-26 at 19:11 +0100, Heikki Linnakangas wrote:
> I think it's a good candidate for inclusion in core. Obviously it's not 
> ready for that yet, but it does seem that a complete solution, with 
> joins etc., can't be achieved purely with user-defined operators.

Joins can be acheived with user-defined operators by using overlaps and
intersection. However, it's ugly enough that not many people would
actually use them. 

> > 2. If it should be included in core, I'd like to know if any changes
> > should be made to the API, available operators, or the names of anything
> > (see the reference docs). The current name of the type is "period" to
> > avoid confusion with SQL's misnamed "interval" type. The operators are
> > mostly self-explanatory, but I'm open to suggestion for better names for
> > those, too. 
> 
> Instead of one datatype, I'd like to see the capability of defining 
> period types based on any datatype with suitable opclass. Periods of 
> dates and floats, for example, would be just as useful as periods of 
> timestamps.

I agree, although we couldn't use the word "period" to describe floats
(at least, that's not how I understand the word).

Do we have any mechanism for that kind of a type generator? Typmod might
work, although that seems awkward. 

> > 4. Should we replace the undocumented type "tinterval"?
> 
> I didn't realize we have one. It seems that tinterval provides operators 
> like overlaps and contains as well. What's the difference between what 
> you're working on and tinterval?

"period" is based on timestamptz, "tinterval" is based on "abstime".
Also, my type can be inclusive or exclusive as needed, I couldn't find a
way to make "tinterval" exclusive on either side.

Regards,Jeff Davis






Re: Proposal: temporal extension "period" data type

From
Jeff Davis
Date:
On Mon, 2008-05-26 at 21:10 +0300, Hannu Krosing wrote:
> Either overlaps or contains seem valid join operators
> 

"Overlaps" is the join operator as defined in [1]. In the case where you
have a temporal FK, they are the same. But yes, either is useful.

> Will there be UNIQUE (non-overlapping) indexes for period ?
> 

I discussed this with several people and I believe this is possible with
some modifications to GiST. I will be discussing it further and I will
try to put together a proposal.

Regards,Jeff Davis

1. _Temporal Data and the Relational Model_, C.J. Date, Hugh Darwen,
Nikos A. Lorentzos, p. 149.



Re: Proposal: temporal extension "period" data type

From
Jeff Davis
Date:
On Mon, 2008-05-26 at 08:49 -0400, Greg Smith wrote:
> On Mon, 26 May 2008, Jeff Davis wrote:
> 
> > And including in core has the usual drawbacks, like waiting for a new 
> > PostgreSQL release just to get some improvement for the data type.
> 
> How stable has the core code implementation been recently?  If major 
> changes to it are unlikely anyway because it's matured, this downside is a 
> lot less important.

Aside from name changes, the code has been fairly static. However, I
expect that others will have suggestions about improving the picksplit
algorithm, selectivity functions, etc.

Regards,Jeff Davis



Re: Proposal: temporal extension "period" data type

From
Dimitri Fontaine
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Le 26 mai 08 à 20:36, Jeff Davis a écrit :
>> Instead of one datatype, I'd like to see the capability of defining
>> period types based on any datatype with suitable opclass. Periods of
>> dates and floats, for example, would be just as useful as periods of
>> timestamps.
>
> I agree, although we couldn't use the word "period" to describe floats
> (at least, that's not how I understand the word).

In order to get a good enough implementation of text based prefix
searches (where prefix_column @> 'literal') I coded a prefix_range
datatype, which would be the text period here...  http://pgfoundry.org/projects/prefix

So I'd propose "range" as the generic "period" name, and let's extend
it to text too.
And while at it maybe we could index suffix searches in addition to
prefix searches...

Some tests show that even without specific index support, prefix @>
text is quicker than text like 'prefix%', so maybe it would be
possible to have some generic way to use ranges under the hood to
speed-up some specific queries...
Applied to timestamps and "period", this would allow using period
contains operator for handling between, e.g. Dunno if this part is a
good idea, but I nevertheless felt like raising it.

HTH,
- --
dim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkg7JNsACgkQZnqmTbDljfBGFgCeJ43Q4XQPPl/mEX+AuJmGeMs+
e0sAnR1eD5/B6J+8VADK4Vuj3USXdZuv
=WhNz
-----END PGP SIGNATURE-----