Thread: Proposal - temporal contrib module

Proposal - temporal contrib module

From
Scott Bailey
Date:
I would like to add a temporal contrib module. The most important piece 
would be adding a period data type and some support functions. Jeff 
Davis and I both have temporal projects on pgFoundry, and we've been 
collaborating for a while. But there are some areas we'd like to get 
some advice on.

Disk format - A period can be represented as [closed-closed], 
(open-open), [closed-open) or (open-closed] intervals. Right now we 
convert these to the most common form, closed-open and store as two 
timestamptz's.

Nulls - A common use case for periods is for modeling valid time. Often 
the end point is not known.  For instance, you know when an employee has 
been hired but the termination time typically wouldn't be known ahead of 
time. We can either represent these with a null end time or with 
infinity. But I'm not sure how to deal with them. Obviously we can test 
for containment and overlap. But what about length or set operations?

Non-contiguous Sets - A period defines a contiguous set of time. But 
many times we need to work with non-contiguous sets (work shifts in a 
week, bus schedules, etc).  Right now, I'm using period arrays. But 
period arrays can contain overlapping and adjacent periods. And we have 
no way to indicate that a period array has been coalesced into a 
non-contiguous set. And what indexing strategies could be used with 
non-contiguous sets?

Temporal Keys - We need two types of temporal keys. A primary key, 
exclusion type prevents overlap so someone isn't at two places at the 
same time. And a foreign key, inclusion type so we can check that the 
valid time of a child is contained with in the valid time of the parent. 
Jeff is working on the former, but there is no easy way to do the latter.


There is actually a lot of theory out there but very few 
implementations. Although not an official standard, we try to follow the 
TSQL2 spec pretty closely. Further reading:

Developing Time-Oriented Database Applications - Snodgrass
http://www.cs.arizona.edu/~rts/tdbbook.pdf

TSQL2 spec ftp://ftp.cs.arizona.edu/tsql/tsql2/spec.pdf

Temporal Data and the Relational Model - Date et al
http://books.google.com/books?isbn=1558608559

Dozens of publications
http://timecenter.cs.aau.dk/pub.htm


Regards,

Scott Bailey


Re: Proposal - temporal contrib module

From
Pavel Stehule
Date:
2009/10/29 Scott Bailey <artacus@comcast.net>:
> I would like to add a temporal contrib module. The most important piece
> would be adding a period data type and some support functions. Jeff Davis
> and I both have temporal projects on pgFoundry, and we've been collaborating
> for a while. But there are some areas we'd like to get some advice on.
>

I thing, so it is very good idea. Temporar operation is very common task.

Regards
Pavel Stehule

> Disk format - A period can be represented as [closed-closed], (open-open),
> [closed-open) or (open-closed] intervals. Right now we convert these to the
> most common form, closed-open and store as two timestamptz's.
>
> Nulls - A common use case for periods is for modeling valid time. Often the
> end point is not known.  For instance, you know when an employee has been
> hired but the termination time typically wouldn't be known ahead of time. We
> can either represent these with a null end time or with infinity. But I'm
> not sure how to deal with them. Obviously we can test for containment and
> overlap. But what about length or set operations?
>
> Non-contiguous Sets - A period defines a contiguous set of time. But many
> times we need to work with non-contiguous sets (work shifts in a week, bus
> schedules, etc).  Right now, I'm using period arrays. But period arrays can
> contain overlapping and adjacent periods. And we have no way to indicate
> that a period array has been coalesced into a non-contiguous set. And what
> indexing strategies could be used with non-contiguous sets?
>
> Temporal Keys - We need two types of temporal keys. A primary key, exclusion
> type prevents overlap so someone isn't at two places at the same time. And a
> foreign key, inclusion type so we can check that the valid time of a child
> is contained with in the valid time of the parent. Jeff is working on the
> former, but there is no easy way to do the latter.
>
>
> There is actually a lot of theory out there but very few implementations.
> Although not an official standard, we try to follow the TSQL2 spec pretty
> closely. Further reading:
>
> Developing Time-Oriented Database Applications - Snodgrass
> http://www.cs.arizona.edu/~rts/tdbbook.pdf
>
> TSQL2 spec ftp://ftp.cs.arizona.edu/tsql/tsql2/spec.pdf
>
> Temporal Data and the Relational Model - Date et al
> http://books.google.com/books?isbn=1558608559
>
> Dozens of publications
> http://timecenter.cs.aau.dk/pub.htm
>
>
> Regards,
>
> Scott Bailey
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: Proposal - temporal contrib module

From
Heikki Linnakangas
Date:
Scott Bailey wrote:
> I would like to add a temporal contrib module. The most important piece
> would be adding a period data type and some support functions. Jeff
> Davis and I both have temporal projects on pgFoundry, and we've been
> collaborating for a while.

I presume you're going to need some backend support and possibly new
syntax for some of the operations, right? That seems more urgent to
discuss than the possible inclusion into contrib.

I'm very pleased to see people working on temporal issues, BTW! I used
to work on a database that did a lot of temporal operations, but the
DBMS didn't have any temporal data types or operations so we had to use
a lot of triggers etc. to achieve that, and it didn't perform well.

> Nulls - A common use case for periods is for modeling valid time. Often
> the end point is not known.  For instance, you know when an employee has
> been hired but the termination time typically wouldn't be known ahead of
> time. We can either represent these with a null end time or with
> infinity. But I'm not sure how to deal with them. Obviously we can test
> for containment and overlap. But what about length or set operations?

Hmm. Infinity feels like a better match. The behavior of length and set
operations falls out of that naturally. For example, length of a period
with an infinite beginning or end is infinite. For set operations, for
example the intersection of [123, infinity] and [100, 160] would be
[123, 160].

> Non-contiguous Sets - A period defines a contiguous set of time. But
> many times we need to work with non-contiguous sets (work shifts in a
> week, bus schedules, etc).  Right now, I'm using period arrays. But
> period arrays can contain overlapping and adjacent periods. And we have
> no way to indicate that a period array has been coalesced into a
> non-contiguous set. And what indexing strategies could be used with
> non-contiguous sets?

I'd stick to your current definition that a period is a contiguous set
of time. A non-contiguous set consists of multiple contiguous periods,
so it can be represented as multiple rows in a table.

> Temporal Keys - We need two types of temporal keys. A primary key,
> exclusion type prevents overlap so someone isn't at two places at the
> same time. And a foreign key, inclusion type so we can check that the
> valid time of a child is contained with in the valid time of the parent.
> Jeff is working on the former, but there is no easy way to do the latter.

I'm very excited about this. Foreign keys don't seem that hard, you'll
need foreign key triggers like we have today, but check for "within"
instead of "equal".

> Temporal Data and the Relational Model - Date et al
> http://books.google.com/books?isbn=1558608559

+1 for the approach in this book. I'm not familiar enough with the TSQL2
spec to say whether it follows it.

It should also be kept in mind that although this class of problems are
generally thought of as temporal issues, IOW dealing with time, the same
approach works with ranges of integers or any other datatype with a
well-defined sort order. It would be nice if the temporal data type
would allow that too.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Proposal - temporal contrib module

From
Richard Huxton
Date:
Heikki Linnakangas wrote:
> Scott Bailey wrote:
>> I would like to add a temporal contrib module. 

> I'm very pleased to see people working on temporal issues, BTW! 

Me too - common use-case and difficult to handle without the right
types/operators.

>> Nulls - A common use case for periods is for modeling valid time. Often
>> the end point is not known.  For instance, you know when an employee has
>> been hired but the termination time typically wouldn't be known ahead of
>> time. We can either represent these with a null end time or with
>> infinity. But I'm not sure how to deal with them. Obviously we can test
>> for containment and overlap. But what about length or set operations?
> 
> Hmm. Infinity feels like a better match. The behavior of length and set
> operations falls out of that naturally. For example, length of a period
> with an infinite beginning or end is infinite. For set operations, for
> example the intersection of [123, infinity] and [100, 160] would be
> [123, 160].

There are cases where one time is genuinely unknown, and there we need
a null. For the "until further notice" scenarios, infinity seems the
sensible choice. Where a null is present length is clearly null, and
sets I guess should propagate the nulls. [123,null] intersecting
[100,160] should be [123,null]. That's assuming we've got a guarantee
that from<=to for all periods.

>> Temporal Keys - We need two types of temporal keys. A primary key,
>> exclusion type prevents overlap so someone isn't at two places at the
>> same time. 

You're going to upset a lot of managers if they can't do that ;-)

--  Richard Huxton Archonet Ltd


Re: Proposal - temporal contrib module

From
Jeff Davis
Date:
On Thu, 2009-10-29 at 00:31 -0700, Scott Bailey wrote:
> Nulls - A common use case for periods is for modeling valid time. Often 
> the end point is not known.  For instance, you know when an employee has 
> been hired but the termination time typically wouldn't be known ahead of 
> time. We can either represent these with a null end time or with 
> infinity. But I'm not sure how to deal with them. Obviously we can test 
> for containment and overlap. But what about length or set operations?

I think we should allow the database designer flexibility here. For
instance, there are good arguments for using separate relations for
things with a known begin time and no known end, and things with a
definite begin and end time. However, infinity also makes sense,
particularly in cases where something that is true can never again be
false.

NULL support is a little stranger. We can allow it by extending the data
representation, but the semantics might not match what people expect
from NULL. Should it be treated like a NULL in an array, or a NULL in a
record value, or what? If we allow NULL on one side of a period, that
may require some backend support, depending on the semantics.

My feeling right now is to not provide a way for one side of the period
to be NULL, but if we come up with clear enough semantics maybe it's
useful. I tend to think it would cause more confusion than anything.

For any kind of built-in audit log functionality (transaction-time
based), I don't see any utility for NULL.

> Temporal Keys - We need two types of temporal keys. A primary key, 
> exclusion type prevents overlap so someone isn't at two places at the 
> same time. And a foreign key, inclusion type so we can check that the 
> valid time of a child is contained with in the valid time of the parent. 
> Jeff is working on the former, but there is no easy way to do the latter.

I believe temporal foreign keys can be implemented the same way foreign
keys are now (except with "contained by" rather than "equals"). We
should provide some support to make that easier, but I don't think
that's a major issue.

Regards,Jeff Davis



Re: Proposal - temporal contrib module

From
Jeff Davis
Date:
On Thu, 2009-10-29 at 09:37 +0000, Richard Huxton wrote:
> There are cases where one time is genuinely unknown, and there we need
> a null.

The semantics of a period with one side NULL require a more clear
definition. I don't personally see a lot of utility in trying to support
NULL semantics, but if we want to support it, it needs to be clearly
defined.

Does TSQL-2 offer any guidance here?

> That's assuming we've got a guarantee
> that from<=to for all periods.

Of course. Except that means that a NULL on one side of a period is a
little less unknown than other kinds of NULLs ;)

Regards,Jeff Davis



Re: Proposal - temporal contrib module

From
Scott Bailey
Date:
>> I would like to add a temporal contrib module. The most important piece
>> would be adding a period data type and some support functions. Jeff
>> Davis and I both have temporal projects on pgFoundry, and we've been
>> collaborating for a while.
> 
> I presume you're going to need some backend support and possibly new
> syntax for some of the operations, right? That seems more urgent to
> discuss than the possible inclusion into contrib.

Jeff Davis is already working on solving these issues for 8.5. But 
rather than wait until 8.6 or later to get a period data type added to 
core, I felt it was important to get the period type out in front of 
people to start using and testing. Plus we wanted to gauge interest from 
the community. Should we forge ahead and try to become the first general 
purpose database with support for temporal databases? Or should we wait 
another 20 years and see if an official specification materializes?

> I'm very pleased to see people working on temporal issues, BTW! I used
> to work on a database that did a lot of temporal operations, but the
> DBMS didn't have any temporal data types or operations so we had to use
> a lot of triggers etc. to achieve that, and it didn't perform well.
> 
>> Nulls - A common use case for periods is for modeling valid time. Often
>> the end point is not known.  For instance, you know when an employee has
>> been hired but the termination time typically wouldn't be known ahead of
>> time. We can either represent these with a null end time or with
>> infinity. But I'm not sure how to deal with them. Obviously we can test
>> for containment and overlap. But what about length or set operations?
> 
> Hmm. Infinity feels like a better match. The behavior of length and set
> operations falls out of that naturally. For example, length of a period
> with an infinite beginning or end is infinite. For set operations, for
> example the intersection of [123, infinity] and [100, 160] would be
> [123, 160].

Two different answers from two respondents. And is there a conceptual 
difference between NULL and +/- infinity? Nothing lasts forever. So when 
would it make sense to use one verses the other? So in the example I gave

>> Non-contiguous Sets - A period defines a contiguous set of time. But
>> many times we need to work with non-contiguous sets (work shifts in a
>> week, bus schedules, etc).  Right now, I'm using period arrays. But
>> period arrays can contain overlapping and adjacent periods. And we have
>> no way to indicate that a period array has been coalesced into a
>> non-contiguous set. And what indexing strategies could be used with
>> non-contiguous sets?
> 
> I'd stick to your current definition that a period is a contiguous set
> of time. A non-contiguous set consists of multiple contiguous periods,
> so it can be represented as multiple rows in a table.

That's pretty much my sentiments exactly. But Jeff wanted to be sure 
that we didn't make a decision now that would limit it's usefulness later.

>> Temporal Keys - We need two types of temporal keys. A primary key,
>> exclusion type prevents overlap so someone isn't at two places at the
>> same time. And a foreign key, inclusion type so we can check that the
>> valid time of a child is contained with in the valid time of the parent.
>> Jeff is working on the former, but there is no easy way to do the latter.
> 
> I'm very excited about this. Foreign keys don't seem that hard, you'll
> need foreign key triggers like we have today, but check for "within"
> instead of "equal".
> 
>> Temporal Data and the Relational Model - Date et al
>> http://books.google.com/books?isbn=1558608559
> 
> +1 for the approach in this book. I'm not familiar enough with the TSQL2
> spec to say whether it follows it.
> 
> It should also be kept in mind that although this class of problems are
> generally thought of as temporal issues, IOW dealing with time, the same
> approach works with ranges of integers or any other datatype with a
> well-defined sort order. It would be nice if the temporal data type
> would allow that too.

The period concept relates very closely to mathematical intervals. (In 
fact, I would argue that the SQL interval should actually be named 
period and the SQL period should be named interval so they matched their 
mathematical counterparts.) My primary concern is timestamp intervals, 
but I see no reason the exact same concepts wouldn't apply to intervals 
of integers, floats, dates, etc.

And actually there is a fair amount of overlap with spatial. The main 
difference being the number of dimensions. But the concepts of overlap, 
containment, and set operations like union and intersection are the same.

Scott Bailey


Re: Proposal - temporal contrib module

From
Jeff Davis
Date:
On Thu, 2009-10-29 at 10:54 +0200, Heikki Linnakangas wrote:
> I presume you're going to need some backend support and possibly new
> syntax for some of the operations, right? That seems more urgent to
> discuss than the possible inclusion into contrib.

There are various areas that need work inside the backend:

* Semantics 1. Allow temporal keys -- this is accomplished via operator exclusion     constraints, which I hope can be
committedin the next commit fest. 2. Allow postgres to understand types like PERIOD, so that it can    find important
operatorslike "@>", "&&", "<<".
 

* Syntax Sugar 1. Temporal keys 2. Temporal FKs 3. Temporal Join 4. Creating a simple audit log 5. Possible PERIOD
constructorsyntax sugar?
 

* Performance 1. Modified merge join

And I believe the rest can be done using the existing type
infrastructure, i.e. as a contrib module. I think it makes a lot of
sense to discuss and develop the backend changes and PERIOD data type in
parallel.

> Hmm. Infinity feels like a better match. The behavior of length and set
> operations falls out of that naturally. For example, length of a period
> with an infinite beginning or end is infinite. For set operations, for
> example the intersection of [123, infinity] and [100, 160] would be
> [123, 160].

I agree. If TSQL-2 addresses NULL semantics clearly enough, we might
want to allow it. I think it will just cause confusion, however.

> I'd stick to your current definition that a period is a contiguous set
> of time. A non-contiguous set consists of multiple contiguous periods,
> so it can be represented as multiple rows in a table.

I think there is a lot of value in non-contiguous sets, but PERIOD is a
good start.

> It should also be kept in mind that although this class of problems are
> generally thought of as temporal issues, IOW dealing with time, the same
> approach works with ranges of integers or any other datatype with a
> well-defined sort order.

Agreed.

> It would be nice if the temporal data type
> would allow that too.

If I understand what you're saying, you're alluding to a type where you
can do things like: RANGE(timestamptz)
which would be equivalent to a PERIOD.

Typmod almost provides enough flexibility, but it can't store a full
OID, so we'd need to get creative. There are probably some other issues
here as well, because the current type system isn't really designed for
this kind of thing. Do you have any ideas or guidance here?

Regards,Jeff Davis




Re: Proposal - temporal contrib module

From
Dimitri Fontaine
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> If I understand what you're saying, you're alluding to a type where you
> can do things like:
>   RANGE(timestamptz)
> which would be equivalent to a PERIOD.

The RANGE approach sounds so much better from here, as I have the
prefix_range example nearby... it'd be nice if it could benefit.

> Typmod almost provides enough flexibility, but it can't store a full
> OID, so we'd need to get creative. There are probably some other issues
> here as well, because the current type system isn't really designed for
> this kind of thing. Do you have any ideas or guidance here?

When talking about the extension facility it has been said PostGIS is
being creative for lacking of typmod capabilities. It could mean it's
past time for a typmod reality check?

Regards,
-- 
dim


Re: Proposal - temporal contrib module

From
Chris Browne
Date:
artacus@comcast.net (Scott Bailey) writes:
> Disk format - A period can be represented as [closed-closed],
> (open-open), [closed-open) or (open-closed] intervals. Right now we
> convert these to the most common form, closed-open and store as two
> timestamptz's.

I mentioned this at the 2009 PGCon, and it was pointed out to me that
PostgreSQL already has geometric types which already offer many of the
semantics and operators that are likely to be desired.
 <http://www.postgresql.org/docs/8.4/static/functions-geometry.html>

If direct analogy may be applied so that portions of the functionality
are drawn from previously-accepted geometric contributions, it's likely
to be a bit easier to get this into 8.5 (or so!)

FYI, I *love* the idea of having the temporal types and operators.  I'm
a lot less certain about the merits of PK/FK constraints - it is a lot
less obvious what forms of constraints will be able to be applied to
particular applications.
-- 
"I really only meant to point out how nice InterOp was for someone who
doesn't  have the weight of the  Pentagon behind him.   I really don't
imagine that the Air Force will ever be  able to operate like a small,
competitive enterprise like GM or IBM." -- Kent England


Re: Proposal - temporal contrib module

From
Jeff Davis
Date:
On Wed, 2009-11-04 at 12:08 -0500, Chris Browne wrote:
> I'm
> a lot less certain about the merits of PK/FK constraints - it is a lot
> less obvious what forms of constraints will be able to be applied to
> particular applications.

Can you clarify, a little?

A temporal key just means "non-overlapping periods of time", and that
has a very clear meaning with respect to scheduling.

Regards,Jeff Davis



Re: Proposal - temporal contrib module

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Nov 04, 2009 at 09:47:03AM -0800, Jeff Davis wrote:
> On Wed, 2009-11-04 at 12:08 -0500, Chris Browne wrote:
> > I'm
> > a lot less certain about the merits of PK/FK constraints - it is a lot
> > less obvious what forms of constraints will be able to be applied to
> > particular applications.
> 
> Can you clarify, a little?
> 
> A temporal key just means "non-overlapping periods of time", and that
> has a very clear meaning with respect to scheduling.

That would be the main application, I imagine. But "contained in" (and
its reverse) seem good candidates as well. As might be "befoe" and
"after", strictly or not.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFK8o3GBcgs9XrR2kYRAjWOAJ4lrIjJxf1UAN4tyXaVhGtlLGt8SgCeKJje
AvC6Kce87NU3nyOz82ccXaQ=
=Lrk8
-----END PGP SIGNATURE-----