Thread: Ranges for well-ordered types
I've been interested in representing and manipulating time ranges in PostgreSQL, where a time range is defined by a start time and an end time. Time ranges are useful, for example, in representing when some predicate was known valid. Similarly, time ranges can be used to represent "transaction time": the version history of the data itself. This is similar to the "time travel" feature in previous versions of PostgreSQL. (Tables that include both valid time and transaction time information are sometimes called bitemporal tables.) Ranges can also be useful in scheduling applications. The original use case that prompted me to explore this area was tracking what time periods teachers were assigned to schools (a valid- time table). Here's one way to represent this in PostgreSQL: create table teachers__schools_1 ( teacher text not null , school text not null , from_date date not null , to_date date not null , check (from_date<= to_date) , unique (teacher, school, from_date) , unique (teacher, school, to_date) ); Each row of this table represents the time range (from from_date to to_date) during which a teacher was assigned to a particular school. (Teachers can be assigned to more than one school at a time.) The check constraint guarantees that [from_date, to_date] represents a valid closed-closed interval (where the end points are included in the range). Two unique constraints are necessary to guarantee uniqueness for each row. In my use case, it would also be desirable to apply constraints to prevent overlapping and ensure continuity-- that teachers are always at some school. This can be done using constraint triggers, though making sure all of the comparisons for four dates (beginning and end points for two ranges) are done properly can be a little daunting and prone to bugs. The problem of representing time ranges in a relational database has been worked on for quite some time. Indeed, the PostgreSQL source still includes a tinterval type, where the beginning and end points are of type abstime, though this type is no longer included in the documentation. Drafts of SQL3 included the PERIOD constructor to represent date, time, and timestamp ranges as part of SQL/Temporal, but this section of the specification was not included in the final SQL:2003 standard. At least two relatively prominent books [1][2] as well as numerous papers have been written on the subject. An aside regarding terminology: In the literature I've read, time ranges are most often referred to as intervals. However, SQL already uses INTERVAL to refer to another distinct type. As mentioned above, SQL3 used the term PERIOD for a constructor to create types with a beginning point and an end point. RANGE is a reserved keyword in SQL: 2003 (related, I believe, to windowed tables). Range also has a distinct meaning in relational calculus. I'm at a bit of a loss as to how to refer to these structures with a beginning and an end point with a term that isn't already reserved in SQL or may be in the future. Suggestions welcome :) Span? Reuse tinterval? For the time being, I'll arbitrarily continue to use range. In the first six months of 2006 alone, I've noted quite a few threads related to time ranges in the various PostgreSQL mailing lists, so it seems this issue is ripe for a solution. Just two weeks ago, Albert Hertroys started work on a vector type[3] , where he rightly notes that time ranges are just an application of a general range (or vector, to use his term) that could just as easily be used with "integers, reals, points, dates, etc". For the past couple of months, I've been working with composite types and PL/pgSQL to define and manipulate date ranges and integer ranges, and see what can be abstracted out to a general range type. In the general case, a particular range value can be represented as two point values. For example, the date range [2006-01-01, 2006-05-31] (using the closed-closed interval representation) is represented by the two date "point" values 2006-01-01 and 2006-05-31. The interval range [3,9] is represented by the two integer "point" values 3 and 9. A range can be formed for any point type, where a point type is any type that's well-ordered:* the range of values is bounded (the number of values in the type is finite)* comparisons are well-defined for any two values, and* for any point p, the next point can be found using a successor function Given a well-ordered "point" type, common questions of ranges can be be answered, such as, for two ranges r1 and r2* Do r1 and r2 overlap?* Does r1 meet r2?* Is the union of r1 and r2 defined,and if so, what is it?* Is the intersection of r1 and r2 defined, and if so, what is it? The way I've thought of implementing ranges in PostgreSQL is through an additional system catalog table (called pg_ordered_type for convenience) that would store the necessary information for each point type:* ordtype: the point type (referencing pg_type.oid)* ordcmpfunc: the comparison function (referencing pg_proc.oid)*ordsucc: the successor function (referencing pg_proc.oid)* ordmin, ordmax the minimum and maximum values forthe point type (not sure how this would be represented) From one point of view, the successor function and minimum and maximum values (and perhaps the comparison function) are intrinsic attributes of the type itself. However, I can see potential usefulness using the same "base" type with different successor functions or different maximum and minimum values. For example, to represent a point type of even numbers, the base type could be integer, and the successor function would "p + 2". (This raises the question of how to restrict the values of the type to even numbers: perhaps using domains would be a solution.) Another example would be using timestamps of different precision: timestamp(6) has a different successor function than timestamp(0). It may be desirable to include in pg_ordered_type a "predecessor" function (returning the point prior to a point p) and a duration function (providing a more efficient way to return the number of points included in the interval than iterating over the successor function). With comparison and the successor function, one can define the 13 predicates (often called Allen operators) describing the relationship between any two range values: equals, before, meets, overlaps, starts, during, finishes, and inverses of the last six. Note that two of the built-in numeric types--real and double- precision--are not well-ordered, and therefore ranges for real and double-precision would not be defined. The primary reason for this is that without a successor function, it's not possible to determine whether or not two range values meet (determined by the meets Allen operator). And without a definition for meets, it's not possible to define continuity--to use the above example, that a teacher must always be assigned to some school. In that respect, the implementation I'm proposing here differs from the vector type proposed by Albert Hertroys. Perhaps there's a way to have a "looser" form of ranges that do not include a successor function, and for those looser ranges, continuity constraints couldn't be defined. However, the issues arising from the inexactness of real and double precision calls into question the usefulness of functions that rely on testing for equality (e.g., equals, meets, starts, finishes, and inverses of the last three). Another issue for a general range implementation is uniqueness. Currently btree indexes are used to enforce uniqueness in PostgreSQL. By somewhat arbitrarily defining a comparison result for the 13 possible predicates, I've been able to define an opclass for ranges so btree indexes can be defined. Note that in defining an opclass, comparision is already required, so perhaps the comparison function could be stored someplace other than the pg_ordered_type table. As for other indexes, I've looked at some of the GiST material and think that it probably can be usefully applied to ranges: for example the ip4r type[4], which defines ranges of IPv4 addresses, uses GiST for indexing. However, GiST is currently well over my head, so this is speculation on my part. Returning to my original example, with a "date_range" type, the table could be defined as: create table teachers__schools_2 ( teacher text not null , school text not null , period date_range not null , primary key (teacher, school, period) ); The original check constraint is handled by the date_range type and the two unique constraints are replaced by a single primary key constraint. Constraints for overlapping and continuity are still handled using constraint triggers, but are easier to implement using functions available to compare ranges rather than handling beginning and end points individually. I believe ranges in PostgreSQL would be useful and satisfy a need expressed in numerous mailing list threads. I hope to do my part in their implementation. I look forward to your feedback. Michael Glaesemann grzm seespotcode net [1] Richard T. Snodgrass, Developing Time-Oriented Database Applications in SQL, Morgan Kaufmann, 2000. ISBN 1-55860-436-7. Out of print, but available from the author as a PDF. (http://www.cs.arizona.edu/people/rts/tdbbook.pdf) [2] Chris Date, Hugh Darwen, Nikolas Lorentzos, Temporal Data and the Relational Model, Morgan Kaufmann, 2002. ISBN 1-55860-855-9 (http:// www.amazon.com/gp/product/1558608559/) [3] Vector type (Re: [GENERAL] challenging constraint situation - how do I make it) (http://archives.postgresql.org/pgsql-general/2006-05/ msg01279.php) [4] ip4r (http://pgfoundry.org/projects/ip4r/)
On 6/10/06, Michael Glaesemann <grzm@seespotcode.net> wrote:
Returning to my original example, with a "date_range" type, the table
could be defined as:
create table teachers__schools_2
(
teacher text not null
, school text not null
, period date_range not null
, primary key (teacher, school, period)
);
The original check constraint is handled by the date_range type and
the two unique constraints are replaced by a single primary key
constraint. Constraints for overlapping and continuity are still
handled using constraint triggers, but are easier to implement using
functions available to compare ranges rather than handling beginning
and end points individually.
I've done similar date range things by creating a composite type consisting of the lower and upper bounds, and then implementing a btree opclass where the comparator returns 0 if two ranges overlap - this allows a current btree index to enforce non-overlapping ranges, and allows indexed lookup of which range contains a particular value.
Not sure whether this covers your scenario, but it works fairly well for me :)
Ian
"Ian Caulfield" <ian.caulfield@gmail.com> writes: > I've done similar date range things by creating a composite type consisting > of the lower and upper bounds, and then implementing a btree opclass where > the comparator returns 0 if two ranges overlap - this allows a current btree > index to enforce non-overlapping ranges, and allows indexed lookup of which > range contains a particular value. And how hard did you test this? Non-transitive "equality" is certain to confuse btree, leading to wrong answers. regards, tom lane
Ian Caulfield wrote: > > On 6/10/06, *Michael Glaesemann* <grzm@seespotcode.net > <mailto:grzm@seespotcode.net>> wrote: > > Returning to my original example, with a "date_range" type, the table > could be defined as: > > create table teachers__schools_2 > ( > teacher text not null > , school text not null > , period date_range not null > , primary key (teacher, school, period) > ); > > The original check constraint is handled by the date_range type and > the two unique constraints are replaced by a single primary key > constraint. Constraints for overlapping and continuity are still > handled using constraint triggers, but are easier to implement using > functions available to compare ranges rather than handling beginning > and end points individually. > > > I've done similar date range things by creating a composite type > consisting of the lower and upper bounds, and then implementing a > btree opclass where the comparator returns 0 if two ranges overlap - > this allows a current btree index to enforce non-overlapping ranges, > and allows indexed lookup of which range contains a particular value. > > Not sure whether this covers your scenario, but it works fairly well > for me :) Why not define a start_date and end_date to determine range, and then use the date overlap functions in postgresql? Joshua D Drake > > Ian > >
On Jun 10, 2006, at 23:51 , Michael Glaesemann wrote: > A range can be formed for any point type, where a point type is > any type that's well-ordered: > * the range of values is bounded (the number of values in the type > is finite) > * comparisons are well-defined for any two values, and > * for any point p, the next point can be found using a successor > function It was pointed out to me off list that I got my definition of well- ordered wrong. I was confusing the definition of well-ordered with the overall requirements that I was using to define ranges. Well-ordered is just that for any two values a and b of a given type, a < b is defined. That's what I was attempting to get at in the second point above. The added requirements of having the type bounded (which is going to happen on a computer anyway) and having a successor function are still required for the range definition, but not part of the definition of well-orderedness per se. Michael Glaesemann grzm seespotcode net
On Jun 11, 2006, at 0:54 , Ian Caulfield wrote: > I've done similar date range things by creating a composite type > consisting of the lower and upper bounds, and then implementing a > btree opclass where the comparator returns 0 if two ranges overlap > - this allows a current btree index to enforce non-overlapping > ranges, and allows indexed lookup of which range contains a > particular value. As Tom already pointed out, this method leads to problems with btree indexes. I haven't heavily tested my own implementation (below), but it only returns 0 for equality, which is what btree expects. All other possible relationships between two ranges have a well-defined result of -1 or 1. I believe this should be enough to prevent any transitivity issues with btree. Michael Glaesemann grzm seespotcode net create type interval_date as ( _1 point_date , _2 point_date ); comment on type interval_date is 'The internal representation of date intervals, representing the closed-closed ' 'interval [_1,_2]'; create function interval_cmp( interval_date -- $1 i1 , interval_date -- $2 i2 ) returns integer strict immutable security definer language plpgsql as ' declare i1 alias for $1; i2 alias for $2; cmp integer; begin perform check_intervals(i1,i2); cmp := 1; if i1._1 = i2._1 and i1._2 = i2._2 then cmp := 0; else if (i1._2 < i2._2) or (i1._2 =i2._2 and i1._1 > i2._1) then cmp = -1; end if; end if; return cmp; end; ';
On Jun 11, 2006, at 2:34 , Michael Glaesemann wrote: > > On Jun 11, 2006, at 0:54 , Ian Caulfield wrote: > >> I've done similar date range things by creating a composite type >> consisting of the lower and upper bounds, and then implementing a >> btree opclass where the comparator returns 0 if two ranges overlap >> - this allows a current btree index to enforce non-overlapping >> ranges, and allows indexed lookup of which range contains a >> particular value. > > As Tom already pointed out, this method leads to problems with > btree indexes. I haven't heavily tested my own implementation > (below), but it only returns 0 for equality, which is what btree > expects. All other possible relationships between two ranges have a > well-defined result of -1 or 1. I believe this should be enough to > prevent any transitivity issues with btree. Of course, this method doesn't provide the non-overlapping constraint. That still needs to be handled by a constraint trigger. Michael Glaesemann grzm seespotcode net
On Sat, Jun 10, 2006 at 23:51:58 +0900, Michael Glaesemann <grzm@seespotcode.net> wrote: > Each row of this table represents the time range (from from_date to > to_date) during which a teacher was assigned to a particular school. > (Teachers can be assigned to more than one school at a time.) The > check constraint guarantees that [from_date, to_date] represents a > valid closed-closed interval (where the end points are included in > the range). Two unique constraints are necessary to guarantee I think you might want to reconsider your design. It works well for dates because sets of dates are made of of isolated points and such sets are both open and closed. If you are using time, I think it will be more convenient to use a closed, open representation.
On Jun 11, 2006, at 5:15 , Bruno Wolff III wrote: > I think you might want to reconsider your design. It works well for > dates > because sets of dates are made of of isolated points and such sets are > both open and closed. If you are using time, I think it will be > more convenient > to use a closed, open representation. Under design I proposed, closed-closed and closed-open are just two different representations of the same range: to the commonly used notation, the closed-open range [p1, p2) is equivalent to the closed- closed range [p1, next(p2)], where next() is the successor function. I agree than depending on the context, it may be better to use one representation than the other (a budget meeting that lasts from 10:00 until 11:00 meets but doesn't share any points with an early lunch meeting that starts at 11:00). Perhaps there should be probably some to_char functions to format the range in the desired form. Time (and timestamp) is a bit of a issue conceptually. The "default" successor function would depend on the precision of the timestamp. timestamp(0) would have a successor function of + 1 second, while timestamp(3) would have a successor function of + .001 second. In the above example, Monday's budget meeting in Tokyo from 10:00 until 11:00 could be represented with ranges of timestamp(0) with time zone as [2006-06-12 10:00:00+09, 2006-06-12 11:00:00+09) or as [2006-06-12 10:00:00+09, 2006-06-12 10:59:59+09] With timestamp(3) with time zone, that'd be [2006-06-12 10:00:00.000+09, 2006-06-12 11:00:00.000+09) or as [2006-06-12 10:00:00.000+09, 2006-06-12 10:59:59.999+09] Most people would be more comfortable with the first representation of each pair, but the two representations in each pair represent the same range. For a lot of scheduling applications, using timestamps with a precision greater that 0 probably wouldn't be very useful (and when not using integer datetimes, not all that exact). Indeed, some scheduling applications may want a precision of 1 minute, rather than 1 second, or perhaps a precision of 15 minutes, or even an hour. I see this as a limitation of the timestamp type, and perhaps a workaround could be found using check constraints and more sophisticated successor functions. For example, a first cut of a successor function for a timestamp with precision of 1 hour might use + 3600 seconds, but the difference in seconds between the top of any two hours may not necessarily be 3600 seconds in some corner cases when the calendar has changed. In those cases, the successor function would need to be sure to return the next hour, rather than the previous hour + 3600 seconds. (Perhaps the calendar has never made a change where this would be a problem, but for some arbitrary timestamp precision, for example 1 day, this could be true. I haven't done enough research yet to determine how much of a problem this is. In those cases it might be better to use dates than timestamps.) With time zones and daylight saving time, this becomes even more interesting, especially for time zone offsets that aren't integral hours (e.g., South Australia Standard Time +9:30, Iran Time +3:30, India Time +5:30). A 1 hour precision requirement would need to include the applicable time zone. There's been previous discussion of including such time zone information in the timestamp value, but as far as I know, no work has been done in that direction yet. These are interesting questions, and improvements in timestamp can make ranges even more convenient. I still see utility in ranges using the current timestamp implementation as well. Michael Glaesemann grzm seespotcode net
On Sun, Jun 11, 2006 at 10:18:11AM +0900, Michael Glaesemann wrote: > > On Jun 11, 2006, at 5:15 , Bruno Wolff III wrote: > > >I think you might want to reconsider your design. It works well for > >dates > >because sets of dates are made of of isolated points and such sets are > >both open and closed. If you are using time, I think it will be > >more convenient > >to use a closed, open representation. > > Under design I proposed, closed-closed and closed-open are just two > different representations of the same range: to the commonly used > notation, the closed-open range [p1, p2) is equivalent to the closed- > closed range [p1, next(p2)], where next() is the successor function. Why try messing aronud with a successor function when you can just use < instead of <= ? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Sun, Jun 11, 2006 at 10:18:11 +0900, Michael Glaesemann <grzm@seespotcode.net> wrote: > > Time (and timestamp) is a bit of a issue conceptually. The "default" > successor function would depend on the precision of the timestamp. And in the ideal case it doesn't exist. That is why I think a closed, open interval is a better way to go.
On Jun 11, 2006, at 10:57 , Jim C. Nasby wrote: > On Sun, Jun 11, 2006 at 10:18:11AM +0900, Michael Glaesemann wrote: >> >> Under design I proposed, closed-closed and closed-open are just two >> different representations of the same range: to the commonly used >> notation, the closed-open range [p1, p2) is equivalent to the >> closed- >> closed range [p1, next(p2)], where next() is the successor function. > > Why try messing aronud with a successor function when you can just > use < > instead of <= ? If I understand you correctly, you're pointing out that the constraints for a valid range in closed-closed and closed-open representation are different: for a valid closed-open range r1 [a1, b1), a1 < b1 for a valid closed-closed range r2 [a2, b2], a2 <= b2 That's different from being able to show equivalence between two ranges in different representations, e.g., r1 = r2 iff a1 = a2 and b1 = next(b2). As Bruno pointed out earlier, in some cases, a closed- open representation is desirable, and I think that in others, such as date ranges, a closed-closed representation is useful. Another place where I'd use a closed-closed representation would be for describing score ranges for grades (e.g., 70-79 is a C, 80-89 is a B). I'm not sure how to go about converting between these two representations without using a successor (or predecessor) function. Another way of looking at not having a successor function is not restricting ranges to be comprised of discrete point types. In that case, you can't really use closed-closed representation at all (or open-open, for that matter), because the successor function is necessary to define the meets and before predicates, or to convert between closed-closed and closed-open representations, if one is using closed-open representation internally. Another wrinkle in cases without a defined successor function is boundary cases: ranges that include one or both of the bounds of the point type. For example, with a closed-open representation, a range can't include the upper bound of the point type. Perhaps a way around this would be to allow infinity as a possible value: the date range ['2006-06-11', 'infinity') would describe a range from June 11, 2006 through the upper bound of the date type (5874897-12-31 on my machine, though interestingly enough: postgres=# SELECT '5874897-12-31'::date; date --------------- 5874897-12-31 (1 row) postgres=# SELECT '5874897-12-31'::date + 2; ?column? --------------- 5874898-01-02 (1 row) postgres=# SELECT '5874898-01-02'::date; ERROR: date out of range: "5874898-01-02" postgres=# SELECT ('5874897-12-31'::date + 2)::date; date --------------- 5874898-01-02 (1 row) postgres=# SELECT '5874897-12-31'::date + interval '2 days'; ?column? ---------------------------- 29357-07-06 15:41:44.48384 (1 row) postgres=# SELECT ('5874897-12-31'::date + interval '2 days')::date; date ------------- 29357-07-06 (1 row) postgres=# select version(); version ------------------------------------------------------------------------ ---------------------------------------------------------------------- PostgreSQL 8.1.4 on powerpc-apple-darwin8.6.0, compiled by GCC powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5341) (1 row) That looks a bit odd. :( Also, a successor function is very useful in testing other predicates. To keep things simpler, I'm going to use the same representation for both ranges, as internally you'd probably convert the ranges to some canonical representation for comparison. Whether that canonical representation is closed-closed, closed-open, or a point and an interval doesn't really matter. Practically, I think being able to use both closed-closed and closed- open representations as needed is useful, and for most purposes, the types can be considered discrete and bounded. Is there a lot to be gained by not defining a successor function? Michael Glaesemann grzm seespotcode net
On Jun 11, 2006, at 14:45 , Bruno Wolff III wrote: > On Sun, Jun 11, 2006 at 10:18:11 +0900, > Michael Glaesemann <grzm@seespotcode.net> wrote: >> >> Time (and timestamp) is a bit of a issue conceptually. The "default" >> successor function would depend on the precision of the timestamp. > > And in the ideal case it doesn't exist. That is why I think a > closed, open > interval is a better way to go. How would you go about converting a closed-open representation to a closed-closed representation for display purposes? Or inserting data that is provided in closed-closed representation? Michael Glaesemann grzm seespotcode net
On Sun, Jun 11, 2006 at 03:22:55PM +0900, Michael Glaesemann wrote: > > On Jun 11, 2006, at 14:45 , Bruno Wolff III wrote: > > >On Sun, Jun 11, 2006 at 10:18:11 +0900, > > Michael Glaesemann <grzm@seespotcode.net> wrote: > >> > >>Time (and timestamp) is a bit of a issue conceptually. The "default" > >>successor function would depend on the precision of the timestamp. > > > >And in the ideal case it doesn't exist. That is why I think a > >closed, open > >interval is a better way to go. > > How would you go about converting a closed-open representation to a > closed-closed representation for display purposes? Or inserting data > that is provided in closed-closed representation? Perhaps it makes more sense to consider the four possibilities { ( ), ( ], [ ), [ ] } as different data types. I realize that mathematically, there's probably plenty of reasons to convert between closed and open on both ends (though I admit I can't think of any reason to do this), but my focus is on what I believe to be (by far and away) the most common PostgreSQL use case: defining timestamp ranges. And that's something that needs to be closed, open. (Sadly, I see people mess that up frequently.) If this case can be well satisfied by an interval type that depends on a sucessor function without a bunch of complexities (in code or operation), then that's great. I'm worried that that's not the case (the issue of various timestamp precisions is worrying enough by itself), and I'd much rather see a solid closed, open interval added than nothing at all. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Michael, First off, mark me down as one of the people interested in having this ... I've been hacking a lot of the same functionality using data-push functions and triggers, and boy howdy, it's messy. I do think Jim is right, though, in that we may want to look for portions of the functionality which are achievable in the context of one PostgreSQL version, unless you're going to be working full-time on this patch. Or maybe queue it up for next summer's Summer of Code. Not sure what that portion would be, though. In real-world calendaring applications, I *certainly* see the need for a successor function. However, that would require being able to define timestamps with a variable precision, e.g. TIMESTAMP('5 minutes'). This, by itself would be a significant effort, yet useful ... maybe that's where to start? You're probably going to have to give up on B-Tree indexes for PERIODs, and look towards GiST. For one thing, I would see UNIQUE in the context of a PERIOD defined as non-overlapping. e.g.: Given UNIQUE (name, period) Name Period Joe 2006-06-11 14:00:00->17:35:00 Marsha 2006-06-11 15:00:00->19:00:00 Ralph 2006-06-11 15:15:00->15:30:00 I would want (in a calendaring application) an attempt to insert: Joe 2006-06-11 17:00:00->19:00:00 ... to fail, as well as any attempt to: UPDATE periods SET name = 'Marsha' WHERE name = 'Ralph'; ... although it's possible that UNIQUE is not really the right name for that kind of constraint, although it serves the same purpose. But I don't think in this context that a B-Tree would be capable of fulfilling it. -- Josh Berkus PostgreSQL @ Sun San Francisco
On Sun, Jun 11, 2006 at 11:22:41AM -0700, Josh Berkus wrote: > You're probably going to have to give up on B-Tree indexes for PERIODs, and > look towards GiST. For one thing, I would see UNIQUE in the context of a > PERIOD defined as non-overlapping. e.g.: If GiST could define UNIQUE indexes, this probably would've been done already. Fix that, and the index itself will appear shortly afterwards... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Sun, Jun 11, 2006 at 15:13:39 +0900, Michael Glaesemann <grzm@seespotcode.net> wrote: > > That's different from being able to show equivalence between two > ranges in different representations, e.g., r1 = r2 iff a1 = a2 and b1 > = next(b2). As Bruno pointed out earlier, in some cases, a closed- > open representation is desirable, and I think that in others, such as > date ranges, a closed-closed representation is useful. Another place > where I'd use a closed-closed representation would be for describing > score ranges for grades (e.g., 70-79 is a C, 80-89 is a B). I'm not > sure how to go about converting between these two representations > without using a successor (or predecessor) function. Date ranges are really closed open as well (as finite sets of isolated points are both open and closed). The only oddity would be that the date used to indicate the open end of the range might not be what the user expects.
Thanks to everyone for the feedback that I've received so far. It's clear that there's interest in this. On Jun 12, 2006, at 3:22 , Josh Berkus wrote: > I do think Jim is right, though, in that we may want to look for > portions of > the functionality which are achievable in the context of one > PostgreSQL > version, unless you're going to be working full-time on this patch. I definitely agree with implementing it in parts. I doubt it's possible, but perhaps a first bit might make it into 8.2 :) > In real-world calendaring applications, I *certainly* see the need > for a > successor function. However, that would require being able to define > timestamps with a variable precision, e.g. TIMESTAMP('5 minutes'). > This, by > itself would be a significant effort, yet useful ... maybe that's > where to > start? As mentioned in an earlier email, I think calendaring applications in particular would benefit from timestamp precisions of less than 1 second, e.g., TIMESTAMP('5 minutes') or TIMESTAMP('1 hour'). However, I think this is a thorny problem. To elaborate, I believe the precision has to be relative to some "baseline". From 12:00, 30 minute precision would presumably allow 12:00, 12:30, 13:00, 13:30, and so on. Precision of '1 hour' would allow 12:00, 13:00, 14:00, and so on. But these are relative to the time zone they're in. While 12:00 in Tokyo (+9) would be a timestamp value with 1 hour precision, that same timestamp is 4:30 in Tehran (+3:30) if I got the math right. Is 4:30 a timestamp value with 1 hour precision? Because of this, I think timestamp precisions of less than 1 second (timestamp (0)) require storing the time zone as part of the timestamp value. Pushing this even further, would we allow arbitrary precision? For example, would 45-minute precision be allowed? In that case, I believe we'd need to go further than storing just the time zone with the timestamp value. The timestamp value would have to be relative to some baseline timestamp to be able to calculate whether or not the difference between any particular timestamp and the baseline timestamp is integral. Perhaps this could be accomplished using domains and some additional checking function? I'm not sure. It's enough to make me want to forget about the idea of disallowing any precision that is not an evenly divided into the next larger "time part": any precision between 0 seconds and 1 minute would have to be a number of seconds evenly divided into 60; between 1 hour and 1 day, precision would have to be one of the values 1, 2, 3, 4, 6, 8, or 12 hours. I've been able to discuss the issue of timestamp precision without bringing up successor functions or ranges at all, and indeed I think it's orthogonal to the range implementation. I think they're both concepts that should be included in PostgreSQL, but as for myself, I'm more interested in the range implementation than the the timestamp precision issue. By the way, anyone care to weigh in on what term we should use when discussing this? Josh has used PERIOD. Should we go with that for now? A somewhat related issue: would we want any implementation to follow (at least part) of the not-yet-standard SQL/Temporal draft? Or would it be more desirable to steer clear of using any terms/syntax that was included in an attempt to prevent any possible conflict with a future SQL spec? > You're probably going to have to give up on B-Tree indexes for > PERIODs, and > look towards GiST. For one thing, I would see UNIQUE in the > context of a > PERIOD defined as non-overlapping. e.g.: I think that a non-overlapping constraint goes above and beyond what UNIQUE requires. In my opinion, UNIQUE should test for equality, rather than non-overlapping, as that keeps the meaning of UNIQUE consistent across all types and may actually be useful in some instances. I do think it would be convenient to have some sort of syntax that would provide a non-overlapping constraint rather than having to code up a constraint trigger every time you wanted to do this. As Martijn pointed out, when GiST can be used for a UNIQUE constraint, we should be able to define the non-overlapping constraint quite easily. So this could be thought of as a third orthogonal issue for ranges, the first two being the range type constructor and timestamp precision < 1 second. Any one of these three could be done independently and improve PostgreSQL. In combination they are definitely a very nice package. On Jun 13, 2006, at 13:25 , Bruno Wolff III wrote: > Date ranges are really closed open as well (as finite sets of > isolated points > are both open and closed). The only oddity would be that the date > used to > indicate the open end of the range might not be what the user expects. I think it's definitely a matter of interpretation. [2006-01-01, 2006-12-31] and [2006-01-01, 2007-01-01) both include the same days. Who's to say which is the "real" representation? For all practical purposes (i.e., what can be represented within the database) [2006-01-01 00:00:00+0, 2006-12-31 23:59:59] and [2006-01-01 00:00:00, 2007-01-01 00:00:00+0] represent the same timestamp(0) with time zone ranges as well. While one might idealize time to be continuous, as far as I know there isn't a way to represent time that way in a computer, at the very least, not in PostgreSQL. And for the very reason that it might not be what the user expects, if there's a way to convert between closed-open and closed-closed as appropriate, I think it makes it much more use friendly to do so. For example, the closed-closed representation is equivalent to what BETWEEN does. It would be very nice to be able to provide sometime equivalent with ranges. As for the successor function itself: Any "exact" datatype, such as timestamp (at least with --enable-integer-datetimes), date, integer, or numeric, has some built-in precision anyway and a successor function follows quite directly from that precision. I don't see that as problematic or even very difficult. Thanks again for your comments, past, present and future! It's been very helpful for me to hear from others on this. Michael Glaesemann grzm seespotcode net
On Wed, Jun 14, 2006 at 15:47:16 +0900, Michael Glaesemann <grzm@seespotcode.net> wrote: > > On Jun 13, 2006, at 13:25 , Bruno Wolff III wrote: > > >Date ranges are really closed open as well (as finite sets of > >isolated points > >are both open and closed). The only oddity would be that the date > >used to > >indicate the open end of the range might not be what the user expects. > > I think it's definitely a matter of interpretation. [2006-01-01, > 2006-12-31] and [2006-01-01, 2007-01-01) both include the same days. > Who's to say which is the "real" representation? For all practical They are both real. In part my point was the reason the closed, closed form works well for overlap checking is because the sets are also closed, open and behave like that as well. (Though the user visible names are different.) > purposes (i.e., what can be represented within the database) > [2006-01-01 00:00:00+0, 2006-12-31 23:59:59] and [2006-01-01 > 00:00:00, 2007-01-01 00:00:00+0] represent the same timestamp(0) with > time zone ranges as well. While one might idealize time to be > continuous, as far as I know there isn't a way to represent time that > way in a computer, at the very least, not in PostgreSQL. Which is a good reason to used the Closed, Open definition. Then you don't have to work about whether postgres has been built with integer timestamps or the details of the floating point hardware your database is running on. > And for the very reason that it might not be what the user expects, > if there's a way to convert between closed-open and closed-closed as > appropriate, I think it makes it much more use friendly to do so. For > example, the closed-closed representation is equivalent to what > BETWEEN does. It would be very nice to be able to provide sometime > equivalent with ranges. I don't think it is unreasonable to have a different external representation for date ranges and timestamp ranges. It isn't conistant, but I think it is more readily understandable. Internally, it will probably be better to use closed, open for both to keep the code consistant to allow for reuse and better understandibility at that level. > As for the successor function itself: Any "exact" datatype, such as > timestamp (at least with --enable-integer-datetimes), date, integer, > or numeric, has some built-in precision anyway and a successor > function follows quite directly from that precision. I don't see that > as problematic or even very difficult. The successor function for timestamp when not using integer datetimes is going to depend on the underlying hardware. I think that will be a problem unless you are planning to force the use of integer datetimes. I don't think the project is ready for that yet, though in the long run I think it is a better way to go.