Thread: Ranges for well-ordered types

Ranges for well-ordered types

From
Michael Glaesemann
Date:
I've been interested in representing and manipulating time ranges in  
PostgreSQL, where a time range is defined by a start time and an end  
time. Time ranges are useful, for example, in representing when some  
predicate was known valid. Similarly, time ranges can be used to  
represent "transaction time": the version history of the data itself.  
This is similar to the "time travel" feature in previous versions of  
PostgreSQL. (Tables that include both valid time and transaction time  
information are sometimes called bitemporal tables.) Ranges can also  
be useful in scheduling applications.

The original use case that prompted me to explore this area was  
tracking what time periods teachers were assigned to schools (a valid- 
time table). Here's one way to represent this in PostgreSQL:

create table teachers__schools_1
(    teacher text not null    , school text not null    , from_date date not null    , to_date date not null    , check
(from_date<= to_date)    , unique (teacher, school, from_date)    , unique (teacher, school, to_date)
 
);

Each row of this table represents the time range (from from_date to  
to_date) during which a teacher was assigned to a particular school.  
(Teachers can be assigned to more than one school at a time.) The  
check constraint guarantees that [from_date, to_date] represents a  
valid closed-closed interval (where the end points are included in  
the range). Two unique constraints are necessary to guarantee  
uniqueness for each row. In my use case, it would also be desirable  
to apply constraints to prevent overlapping and ensure continuity-- 
that teachers are always at some school. This can be done using  
constraint triggers, though making sure all of the comparisons for  
four dates (beginning and end points for two ranges) are done  
properly can be a little daunting and prone to bugs.

The problem of representing time ranges in a relational database has  
been worked on for quite some time. Indeed, the PostgreSQL source  
still includes a tinterval type, where the beginning and end points  
are of type abstime, though this type is no longer included in the  
documentation. Drafts of SQL3 included the PERIOD constructor to  
represent date, time, and timestamp ranges as part of SQL/Temporal,  
but this section of the specification was not included in the final  
SQL:2003 standard. At least two relatively prominent books [1][2] as  
well as numerous papers have been written on the subject.

An aside regarding terminology: In the literature I've read, time  
ranges are most often referred to as intervals. However, SQL already  
uses INTERVAL to refer to another distinct type. As mentioned above,  
SQL3 used the term PERIOD for a constructor to create types with a  
beginning point and an end point. RANGE is a reserved keyword in SQL: 
2003 (related, I believe, to windowed tables). Range also has a  
distinct meaning in relational calculus. I'm at a bit of a loss as to  
how to refer to these structures with a beginning and an end point  
with a term that isn't already reserved in SQL or may be in the  
future. Suggestions welcome :) Span? Reuse tinterval? For the time  
being, I'll arbitrarily continue to use range.

In the first six months of 2006 alone, I've noted quite a few threads  
related to time ranges in the various PostgreSQL mailing lists, so it  
seems this issue is ripe for a solution. Just two weeks ago, Albert  
Hertroys started work on a vector type[3] , where he rightly notes  
that time ranges are just an application of a general range (or  
vector, to use his term) that could just as easily be used with  
"integers, reals, points, dates, etc". For the past couple of months,  
I've been working with composite types and PL/pgSQL to define and  
manipulate date ranges and integer ranges, and see what can be  
abstracted out to a general range type.

In the general case, a particular range value can be represented as  
two point values. For example, the date range [2006-01-01,  
2006-05-31] (using the closed-closed interval representation) is  
represented by the two date "point" values 2006-01-01 and 2006-05-31.  
The interval range [3,9] is represented by the two integer "point"  
values 3 and 9. A range can be formed for any point type, where a  
point type is any type that's well-ordered:* the range of values is bounded (the number of values in the type  
is finite)* comparisons are well-defined for any two values, and* for any point p, the next point can be found using a
successor 
 
function

Given a well-ordered "point" type, common questions of ranges can be  
be answered, such as, for two ranges r1 and r2* Do r1 and r2 overlap?* Does r1 meet r2?* Is the union of r1 and r2
defined,and if so, what is it?* Is the intersection of r1 and r2 defined, and if so, what is it?
 

The way I've thought of implementing ranges in PostgreSQL is through  
an additional system catalog table (called pg_ordered_type for  
convenience) that would store the necessary information for each  
point type:* ordtype: the point type (referencing pg_type.oid)* ordcmpfunc: the comparison function (referencing
pg_proc.oid)*ordsucc: the successor function (referencing pg_proc.oid)* ordmin, ordmax the minimum and maximum values
forthe point type  
 
(not sure how this would be represented)
From one point of view, the successor function and minimum and  
maximum values (and perhaps the comparison function) are intrinsic  
attributes of the type itself. However, I can see potential  
usefulness using the same "base" type with different successor  
functions or different maximum and minimum values. For example, to  
represent a point type of even numbers, the base type could be  
integer, and the successor function would "p + 2". (This raises the  
question of how to restrict the values of the type to even numbers:  
perhaps using domains would be a solution.) Another example would be  
using timestamps of different precision: timestamp(6) has a different  
successor function than timestamp(0).

It may be desirable to include in pg_ordered_type a "predecessor"  
function (returning the point prior to a point p) and a duration  
function (providing a more efficient way to return the number of  
points included in the interval than iterating over the successor  
function).

With comparison and the successor function, one can define the 13  
predicates (often called Allen operators) describing the relationship  
between any two range values: equals, before, meets, overlaps,  
starts, during, finishes, and inverses of the last six.

Note that two of the built-in numeric types--real and double- 
precision--are not well-ordered, and therefore ranges for real and  
double-precision would not be defined. The primary reason for this is  
that without a successor function, it's not possible to determine  
whether or not two range values meet (determined by the meets Allen  
operator). And without a definition for meets, it's not possible to  
define continuity--to use the above example, that a teacher must  
always be assigned to some school. In that respect, the  
implementation I'm proposing here differs from the vector type  
proposed by Albert Hertroys. Perhaps there's a way to have a "looser"  
form of ranges that do not include a successor function, and for  
those looser ranges, continuity constraints couldn't be defined.  
However, the issues arising from the inexactness of real and double  
precision calls into question the usefulness of functions that rely  
on testing for equality (e.g., equals, meets, starts, finishes, and  
inverses of the last three).

Another issue for a general range implementation is uniqueness.  
Currently btree indexes are used to enforce uniqueness in PostgreSQL.  
By somewhat arbitrarily defining a comparison result for the 13  
possible predicates, I've been able to define an opclass for ranges  
so btree indexes can be defined. Note that in defining an opclass,  
comparision is already required, so perhaps the comparison function
could be stored someplace other than the pg_ordered_type table.

As for other indexes, I've looked at some of the GiST material and  
think that it probably can be usefully applied to ranges: for example  
the ip4r type[4], which defines ranges of IPv4 addresses, uses GiST  
for indexing. However, GiST is currently well over my head, so this  
is speculation on my part.

Returning to my original example, with a "date_range" type, the table  
could be defined as:

create table teachers__schools_2
(    teacher text not null    , school text not null    , period date_range not null    , primary key (teacher, school,
period)
);

The original check constraint is handled by the date_range type and  
the two unique constraints are replaced by a single primary key  
constraint. Constraints for overlapping and continuity are still  
handled using constraint triggers, but are easier to implement using  
functions available to compare ranges rather than handling beginning  
and end points individually.

I believe ranges in PostgreSQL would be useful and satisfy a need  
expressed in numerous mailing list threads. I hope to do my part in  
their implementation. I look forward to your feedback.

Michael Glaesemann
grzm seespotcode net

[1] Richard T. Snodgrass, Developing Time-Oriented Database  
Applications in SQL, Morgan Kaufmann, 2000. ISBN 1-55860-436-7. Out  
of print, but available from the author as a PDF.
(http://www.cs.arizona.edu/people/rts/tdbbook.pdf)
[2] Chris Date, Hugh Darwen, Nikolas Lorentzos, Temporal Data and the  
Relational Model, Morgan Kaufmann, 2002. ISBN 1-55860-855-9 (http:// 
www.amazon.com/gp/product/1558608559/)
[3] Vector type (Re: [GENERAL] challenging constraint situation - how  
do I make it) (http://archives.postgresql.org/pgsql-general/2006-05/ 
msg01279.php)
[4] ip4r (http://pgfoundry.org/projects/ip4r/)



Re: Ranges for well-ordered types

From
"Ian Caulfield"
Date:

On 6/10/06, Michael Glaesemann <grzm@seespotcode.net> wrote:
Returning to my original example, with a "date_range" type, the table
could be defined as:

create table teachers__schools_2
(
    teacher text not null
    , school text not null
    , period date_range not null
    , primary key (teacher, school, period)
);

The original check constraint is handled by the date_range type and
the two unique constraints are replaced by a single primary key
constraint. Constraints for overlapping and continuity are still
handled using constraint triggers, but are easier to implement using
functions available to compare ranges rather than handling beginning
and end points individually.
 
I've done similar date range things by creating a composite type consisting of the lower and upper bounds, and then implementing a btree opclass where the comparator returns 0 if two ranges overlap - this allows a current btree index to enforce non-overlapping ranges, and allows indexed lookup of which range contains a particular value.
 
Not sure whether this covers your scenario, but it works fairly well for me :)
 
Ian

 

Re: Ranges for well-ordered types

From
Tom Lane
Date:
"Ian Caulfield" <ian.caulfield@gmail.com> writes:
> I've done similar date range things by creating a composite type consisting
> of the lower and upper bounds, and then implementing a btree opclass where
> the comparator returns 0 if two ranges overlap - this allows a current btree
> index to enforce non-overlapping ranges, and allows indexed lookup of which
> range contains a particular value.

And how hard did you test this?  Non-transitive "equality" is certain to
confuse btree, leading to wrong answers.
        regards, tom lane


Re: Ranges for well-ordered types

From
"Joshua D. Drake"
Date:
Ian Caulfield wrote:
>
> On 6/10/06, *Michael Glaesemann* <grzm@seespotcode.net 
> <mailto:grzm@seespotcode.net>> wrote:
>
>     Returning to my original example, with a "date_range" type, the table
>     could be defined as:
>
>     create table teachers__schools_2
>     (
>         teacher text not null
>         , school text not null
>         , period date_range not null
>         , primary key (teacher, school, period)
>     );
>
>     The original check constraint is handled by the date_range type and
>     the two unique constraints are replaced by a single primary key
>     constraint. Constraints for overlapping and continuity are still
>     handled using constraint triggers, but are easier to implement using
>     functions available to compare ranges rather than handling beginning
>     and end points individually.
>
>  
> I've done similar date range things by creating a composite type 
> consisting of the lower and upper bounds, and then implementing a 
> btree opclass where the comparator returns 0 if two ranges overlap - 
> this allows a current btree index to enforce non-overlapping ranges, 
> and allows indexed lookup of which range contains a particular value.
>  
> Not sure whether this covers your scenario, but it works fairly well 
> for me :)

Why not define a start_date and end_date to determine range, and then 
use the date overlap functions in postgresql?

Joshua D Drake


>  
> Ian
>
>  



Re: Ranges for well-ordered types

From
Michael Glaesemann
Date:
On Jun 10, 2006, at 23:51 , Michael Glaesemann wrote:

>  A range can be formed for any point type, where a point type is  
> any type that's well-ordered:
>     * the range of values is bounded (the number of values in the type  
> is finite)
>     * comparisons are well-defined for any two values, and
>     * for any point p, the next point can be found using a successor  
> function

It was pointed out to me off list that I got my definition of well- 
ordered wrong. I was confusing the definition of well-ordered with  
the overall requirements that I was using to define ranges.

Well-ordered is just that for any two values a and b of a given type,  
a < b is defined. That's what I was attempting to get at in the  
second point above. The added requirements of having the type bounded  
(which is going to happen on a computer anyway) and having a  
successor function are still required for the range definition, but  
not part of the definition of well-orderedness per se.

Michael Glaesemann
grzm seespotcode net





Re: Ranges for well-ordered types

From
Michael Glaesemann
Date:
On Jun 11, 2006, at 0:54 , Ian Caulfield wrote:

> I've done similar date range things by creating a composite type  
> consisting of the lower and upper bounds, and then implementing a  
> btree opclass where the comparator returns 0 if two ranges overlap  
> - this allows a current btree index to enforce non-overlapping  
> ranges, and allows indexed lookup of which range contains a  
> particular value.

As Tom already pointed out, this method leads to problems with btree  
indexes. I haven't heavily tested my own implementation (below), but  
it only returns 0 for equality, which is what btree expects. All  
other possible relationships between two ranges have a well-defined  
result of -1 or 1. I believe this should be enough to prevent any  
transitivity issues with btree.

Michael Glaesemann
grzm seespotcode net


create type interval_date as
(    _1 point_date    , _2 point_date
);
comment on type interval_date is
'The internal representation of date intervals, representing the  
closed-closed '
'interval [_1,_2]';

create function interval_cmp(    interval_date -- $1 i1    , interval_date -- $2 i2    ) returns integer
strict
immutable
security definer
language plpgsql as '
declare    i1 alias for $1;    i2 alias for $2;    cmp integer;
begin    perform check_intervals(i1,i2);
    cmp := 1;
    if i1._1 = i2._1        and i1._2 = i2._2    then cmp := 0;    else        if (i1._2 < i2._2)            or (i1._2
=i2._2                and i1._1 > i2._1)        then cmp = -1;        end if;    end if;
 
    return cmp;
end;
';



Re: Ranges for well-ordered types

From
Michael Glaesemann
Date:
On Jun 11, 2006, at 2:34 , Michael Glaesemann wrote:

>
> On Jun 11, 2006, at 0:54 , Ian Caulfield wrote:
>
>> I've done similar date range things by creating a composite type  
>> consisting of the lower and upper bounds, and then implementing a  
>> btree opclass where the comparator returns 0 if two ranges overlap  
>> - this allows a current btree index to enforce non-overlapping  
>> ranges, and allows indexed lookup of which range contains a  
>> particular value.
>
> As Tom already pointed out, this method leads to problems with  
> btree indexes. I haven't heavily tested my own implementation  
> (below), but it only returns 0 for equality, which is what btree  
> expects. All other possible relationships between two ranges have a  
> well-defined result of -1 or 1. I believe this should be enough to  
> prevent any transitivity issues with btree.

Of course, this method doesn't provide the non-overlapping  
constraint. That still needs to be handled by a constraint trigger.

Michael Glaesemann
grzm seespotcode net





Re: Ranges for well-ordered types

From
Bruno Wolff III
Date:
On Sat, Jun 10, 2006 at 23:51:58 +0900, Michael Glaesemann <grzm@seespotcode.net> wrote:
> Each row of this table represents the time range (from from_date to  
> to_date) during which a teacher was assigned to a particular school.  
> (Teachers can be assigned to more than one school at a time.) The  
> check constraint guarantees that [from_date, to_date] represents a  
> valid closed-closed interval (where the end points are included in  
> the range). Two unique constraints are necessary to guarantee  

I think you might want to reconsider your design. It works well for dates
because sets of dates are made of of isolated points and such sets are
both open and closed. If you are using time, I think it will be more convenient
to use a closed, open representation.


Re: Ranges for well-ordered types

From
Michael Glaesemann
Date:
On Jun 11, 2006, at 5:15 , Bruno Wolff III wrote:

> I think you might want to reconsider your design. It works well for  
> dates
> because sets of dates are made of of isolated points and such sets are
> both open and closed. If you are using time, I think it will be  
> more convenient
> to use a closed, open representation.

Under design I proposed, closed-closed and closed-open are just two  
different representations of the same range: to the commonly used  
notation, the closed-open range [p1, p2) is equivalent to  the closed- 
closed range [p1, next(p2)], where next() is the successor function.  
I agree than depending on the context, it may be better to use one  
representation than the other (a budget meeting that lasts from 10:00  
until 11:00 meets but doesn't share any points with an early lunch  
meeting that starts at 11:00). Perhaps there should be probably some  
to_char functions to format the range in the desired form.

Time (and timestamp) is a bit of a issue conceptually. The "default"  
successor function would depend on the precision of the timestamp.  
timestamp(0) would have a successor function of + 1 second, while  
timestamp(3) would have a successor function of + .001 second. In the  
above example, Monday's budget meeting in Tokyo from 10:00 until  
11:00 could be represented with ranges of timestamp(0) with time zone as
[2006-06-12 10:00:00+09, 2006-06-12 11:00:00+09)
or as
[2006-06-12 10:00:00+09, 2006-06-12 10:59:59+09]

With timestamp(3) with time zone, that'd be
[2006-06-12 10:00:00.000+09, 2006-06-12 11:00:00.000+09)
or as
[2006-06-12 10:00:00.000+09, 2006-06-12 10:59:59.999+09]

Most people would be more comfortable with the first representation  
of each pair, but the two representations in each pair represent the  
same range.

For a lot of scheduling applications, using timestamps with a  
precision greater that 0 probably wouldn't be very useful (and when  
not using integer datetimes, not all that exact). Indeed, some  
scheduling applications may want a precision of 1 minute, rather than  
1 second, or perhaps a precision of 15 minutes, or even an hour. I  
see this as a limitation of the timestamp type, and perhaps a  
workaround could be found using check constraints and more  
sophisticated successor functions.

For example, a first cut of a successor function for a timestamp with  
precision of 1 hour might use + 3600 seconds, but the difference in  
seconds between the top of any two hours may not necessarily be 3600  
seconds in some corner cases when the calendar has changed. In those  
cases, the successor function would need to be sure to return the  
next hour, rather than the previous hour + 3600 seconds. (Perhaps the  
calendar has never made a change where this would be a problem, but  
for some arbitrary timestamp precision, for example 1 day, this could  
be true. I haven't done enough research yet to determine how much of  
a problem this is. In those cases it might be better to use dates  
than timestamps.)

With time zones and daylight saving time, this becomes even more  
interesting, especially for time zone offsets that aren't integral  
hours (e.g., South Australia Standard Time +9:30, Iran Time +3:30,  
India Time +5:30). A 1 hour precision requirement would need to  
include the applicable time zone. There's been previous discussion of  
including such time zone information in the timestamp value, but as  
far as I know, no work has been done in that direction yet.

These are interesting questions, and improvements in timestamp can  
make ranges even more convenient. I still see utility in ranges using  
the current timestamp implementation as well.


Michael Glaesemann
grzm seespotcode net





Re: Ranges for well-ordered types

From
"Jim C. Nasby"
Date:
On Sun, Jun 11, 2006 at 10:18:11AM +0900, Michael Glaesemann wrote:
> 
> On Jun 11, 2006, at 5:15 , Bruno Wolff III wrote:
> 
> >I think you might want to reconsider your design. It works well for  
> >dates
> >because sets of dates are made of of isolated points and such sets are
> >both open and closed. If you are using time, I think it will be  
> >more convenient
> >to use a closed, open representation.
> 
> Under design I proposed, closed-closed and closed-open are just two  
> different representations of the same range: to the commonly used  
> notation, the closed-open range [p1, p2) is equivalent to  the closed- 
> closed range [p1, next(p2)], where next() is the successor function.  
Why try messing aronud with a successor function when you can just use <
instead of <= ?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Ranges for well-ordered types

From
Bruno Wolff III
Date:
On Sun, Jun 11, 2006 at 10:18:11 +0900, Michael Glaesemann <grzm@seespotcode.net> wrote:
> 
> Time (and timestamp) is a bit of a issue conceptually. The "default"  
> successor function would depend on the precision of the timestamp.  

And in the ideal case it doesn't exist. That is why I think a closed, open
interval is a better way to go.


Re: Ranges for well-ordered types

From
Michael Glaesemann
Date:
On Jun 11, 2006, at 10:57 , Jim C. Nasby wrote:

> On Sun, Jun 11, 2006 at 10:18:11AM +0900, Michael Glaesemann wrote:
>>
>> Under design I proposed, closed-closed and closed-open are just two
>> different representations of the same range: to the commonly used
>> notation, the closed-open range [p1, p2) is equivalent to  the  
>> closed-
>> closed range [p1, next(p2)], where next() is the successor function.
>
> Why try messing aronud with a successor function when you can just  
> use <
> instead of <= ?

If I understand you correctly, you're pointing out that the  
constraints for a valid range in closed-closed and closed-open  
representation are different:

for a valid closed-open range r1 [a1, b1), a1 < b1
for a valid closed-closed range r2 [a2, b2], a2 <= b2

That's different from being able to show equivalence between two  
ranges in different representations, e.g., r1 = r2 iff a1 = a2 and b1  
= next(b2). As Bruno pointed out earlier, in some cases, a closed- 
open representation is desirable, and I think that in others, such as  
date ranges, a closed-closed representation is useful. Another place  
where I'd use a closed-closed representation would be for describing  
score ranges for grades (e.g., 70-79 is a C, 80-89 is a B). I'm not  
sure how to go about converting between these two representations  
without using a successor (or predecessor) function.

Another way of looking at not having a successor function is not  
restricting ranges to be comprised of discrete point types. In that  
case, you can't really use closed-closed representation at all (or  
open-open, for that matter), because the successor function is  
necessary to define the meets and before predicates, or to convert  
between closed-closed and closed-open representations, if one is  
using closed-open representation internally.

Another wrinkle in cases without a defined successor function is  
boundary cases: ranges that include one or both of the bounds of the  
point type. For example, with a closed-open representation, a range  
can't include the upper bound of the point type. Perhaps a way around  
this would be to allow infinity as a possible value: the date range  
['2006-06-11', 'infinity') would describe a range from June 11, 2006  
through the upper bound of the date type (5874897-12-31 on my  
machine, though interestingly enough:

postgres=# SELECT '5874897-12-31'::date;     date
---------------
5874897-12-31
(1 row)

postgres=# SELECT '5874897-12-31'::date + 2;   ?column?
---------------
5874898-01-02
(1 row)

postgres=# SELECT '5874898-01-02'::date;
ERROR:  date out of range: "5874898-01-02"
postgres=# SELECT ('5874897-12-31'::date + 2)::date;     date
---------------
5874898-01-02
(1 row)

postgres=# SELECT '5874897-12-31'::date + interval '2 days';          ?column?
----------------------------
29357-07-06 15:41:44.48384
(1 row)

postgres=# SELECT ('5874897-12-31'::date + interval '2 days')::date;    date
-------------
29357-07-06
(1 row)

postgres=# select version();                                                                    
version
------------------------------------------------------------------------ 
----------------------------------------------------------------------
PostgreSQL 8.1.4 on powerpc-apple-darwin8.6.0, compiled by GCC  
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.  
build 5341)
(1 row)

That looks a bit odd. :(

Also, a successor function is very useful in testing other  
predicates. To keep things simpler, I'm going to use the same  
representation for both ranges, as internally you'd probably convert  
the ranges to some canonical representation for comparison. Whether  
that canonical representation is closed-closed, closed-open, or a  
point and an interval doesn't really matter.

Practically, I think being able to use both closed-closed and closed- 
open representations as needed is useful, and for most purposes, the  
types can be considered discrete and bounded. Is there a lot to be  
gained by not defining a successor function?

Michael Glaesemann
grzm seespotcode net



Re: Ranges for well-ordered types

From
Michael Glaesemann
Date:
On Jun 11, 2006, at 14:45 , Bruno Wolff III wrote:

> On Sun, Jun 11, 2006 at 10:18:11 +0900,
>   Michael Glaesemann <grzm@seespotcode.net> wrote:
>>
>> Time (and timestamp) is a bit of a issue conceptually. The "default"
>> successor function would depend on the precision of the timestamp.
>
> And in the ideal case it doesn't exist. That is why I think a  
> closed, open
> interval is a better way to go.

How would you go about converting a closed-open representation to a  
closed-closed representation for display purposes? Or inserting data  
that is provided in closed-closed representation?

Michael Glaesemann
grzm seespotcode net





Re: Ranges for well-ordered types

From
"Jim C. Nasby"
Date:
On Sun, Jun 11, 2006 at 03:22:55PM +0900, Michael Glaesemann wrote:
> 
> On Jun 11, 2006, at 14:45 , Bruno Wolff III wrote:
> 
> >On Sun, Jun 11, 2006 at 10:18:11 +0900,
> >  Michael Glaesemann <grzm@seespotcode.net> wrote:
> >>
> >>Time (and timestamp) is a bit of a issue conceptually. The "default"
> >>successor function would depend on the precision of the timestamp.
> >
> >And in the ideal case it doesn't exist. That is why I think a  
> >closed, open
> >interval is a better way to go.
> 
> How would you go about converting a closed-open representation to a  
> closed-closed representation for display purposes? Or inserting data  
> that is provided in closed-closed representation?

Perhaps it makes more sense to consider the four possibilities
{ ( ), ( ], [ ), [ ] }
as different data types.

I realize that mathematically, there's probably plenty of reasons to
convert between closed and open on both ends (though I admit I can't
think of any reason to do this), but my focus is on what I believe to be
(by far and away) the most common PostgreSQL use case: defining
timestamp ranges. And that's something that needs to be closed, open.
(Sadly, I see people mess that up frequently.)

If this case can be well satisfied by an interval type that depends on a
sucessor function without a bunch of complexities (in code or
operation), then that's great. I'm worried that that's not the case (the
issue of various timestamp precisions is worrying enough by itself), and
I'd much rather see a solid closed, open interval added than nothing at
all.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Ranges for well-ordered types

From
Josh Berkus
Date:
Michael,

First off, mark me down as one of the people interested in having this ... 
I've been hacking a lot of the same functionality using data-push functions 
and triggers, and boy howdy, it's messy.

I do think Jim is right, though, in that we may want to look for portions of 
the functionality which are achievable in the context of one PostgreSQL 
version, unless you're going to be working full-time on this patch.   Or 
maybe queue it up for next summer's Summer of Code.  Not sure what that 
portion would be, though.

In real-world calendaring applications, I *certainly* see the need for a 
successor function.  However, that would require being able to define 
timestamps with a variable precision, e.g. TIMESTAMP('5 minutes').  This, by 
itself would be a significant effort, yet useful ... maybe that's where to 
start?

You're probably going to have to give up on B-Tree indexes for PERIODs, and 
look towards GiST.  For one thing, I would see UNIQUE in the context of a 
PERIOD defined as non-overlapping.  e.g.:

Given
UNIQUE (name, period)
Name        Period
Joe            2006-06-11 14:00:00->17:35:00
Marsha        2006-06-11 15:00:00->19:00:00
Ralph        2006-06-11 15:15:00->15:30:00

I would want (in a calendaring application) an attempt to insert:
Joe            2006-06-11 17:00:00->19:00:00
... to fail, as well as any attempt to:
UPDATE periods SET name = 'Marsha' WHERE name = 'Ralph';

... although it's possible that UNIQUE is not really the right name for that 
kind of constraint, although it serves the same purpose.   But I don't think 
in this context that a B-Tree would be capable of fulfilling it.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: Ranges for well-ordered types

From
Martijn van Oosterhout
Date:
On Sun, Jun 11, 2006 at 11:22:41AM -0700, Josh Berkus wrote:
> You're probably going to have to give up on B-Tree indexes for PERIODs, and
> look towards GiST.  For one thing, I would see UNIQUE in the context of a
> PERIOD defined as non-overlapping.  e.g.:

If GiST could define UNIQUE indexes, this probably would've been done
already. Fix that, and the index itself will appear shortly
afterwards...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Ranges for well-ordered types

From
Bruno Wolff III
Date:
On Sun, Jun 11, 2006 at 15:13:39 +0900, Michael Glaesemann <grzm@seespotcode.net> wrote:
> 
> That's different from being able to show equivalence between two  
> ranges in different representations, e.g., r1 = r2 iff a1 = a2 and b1  
> = next(b2). As Bruno pointed out earlier, in some cases, a closed- 
> open representation is desirable, and I think that in others, such as  
> date ranges, a closed-closed representation is useful. Another place  
> where I'd use a closed-closed representation would be for describing  
> score ranges for grades (e.g., 70-79 is a C, 80-89 is a B). I'm not  
> sure how to go about converting between these two representations  
> without using a successor (or predecessor) function.

Date ranges are really closed open as well (as finite sets of isolated points
are both open and closed). The only oddity would be that the date used to
indicate the open end of the range might not be what the user expects.


Re: Ranges for well-ordered types

From
Michael Glaesemann
Date:
Thanks to everyone for the feedback that I've received so far. It's  
clear that there's interest in this.

On Jun 12, 2006, at 3:22 , Josh Berkus wrote:

> I do think Jim is right, though, in that we may want to look for  
> portions of
> the functionality which are achievable in the context of one  
> PostgreSQL
> version, unless you're going to be working full-time on this patch.

I definitely agree with implementing it in parts. I doubt it's  
possible, but perhaps a first bit might make it into 8.2 :)

> In real-world calendaring applications, I *certainly* see the need  
> for a
> successor function.  However, that would require being able to define
> timestamps with a variable precision, e.g. TIMESTAMP('5 minutes').   
> This, by
> itself would be a significant effort, yet useful ... maybe that's  
> where to
> start?

As mentioned in an earlier email, I think calendaring applications in  
particular would benefit from timestamp precisions of less than 1  
second, e.g., TIMESTAMP('5 minutes') or TIMESTAMP('1 hour'). However,  
I think this is a thorny problem. To elaborate, I believe the  
precision has to be relative to some "baseline". From 12:00, 30  
minute precision would presumably allow 12:00, 12:30, 13:00, 13:30,  
and so on. Precision of '1 hour' would allow 12:00, 13:00, 14:00, and  
so on. But these are relative to the time zone they're in. While  
12:00 in Tokyo (+9) would be a timestamp value with 1 hour precision,  
that same timestamp is 4:30 in Tehran (+3:30) if I got the math  
right. Is 4:30 a timestamp value with 1 hour precision? Because of  
this, I think timestamp precisions of less than 1 second (timestamp 
(0)) require storing the time zone as part of the timestamp value.

Pushing this even further, would we allow arbitrary precision? For  
example, would 45-minute precision be allowed? In that case, I  
believe we'd need to go further than storing just the time zone with  
the timestamp value. The timestamp value would have to be relative to  
some baseline timestamp to be able to calculate whether or not the  
difference between any particular timestamp and the baseline  
timestamp is integral. Perhaps this could be accomplished using  
domains and some additional checking function? I'm not sure. It's  
enough to make me want to forget about the idea of disallowing any  
precision that is not an evenly divided into the next larger "time  
part": any precision between 0 seconds and 1 minute would have to be  
a number of seconds evenly divided into 60; between 1 hour and 1 day,  
precision would have to be one of the values 1, 2, 3, 4, 6, 8, or 12  
hours.

I've been able to discuss the issue of timestamp precision without  
bringing up successor functions or ranges at all, and indeed I think  
it's orthogonal to the range implementation. I think they're both  
concepts that should be included in PostgreSQL, but as for myself,  
I'm more interested in the range implementation than the the  
timestamp precision issue.

By the way, anyone care to weigh in on what term we should use when  
discussing this? Josh has used PERIOD. Should we go with that for now?

A somewhat related issue: would we want any implementation to follow  
(at least part) of the not-yet-standard SQL/Temporal draft? Or would  
it be more desirable to steer clear of using any terms/syntax that  
was included in an attempt to prevent any possible conflict with a  
future SQL spec?

> You're probably going to have to give up on B-Tree indexes for  
> PERIODs, and
> look towards GiST.  For one thing, I would see UNIQUE in the  
> context of a
> PERIOD defined as non-overlapping.  e.g.:

I think that a non-overlapping constraint goes above and beyond what  
UNIQUE requires. In my opinion, UNIQUE should test for equality,  
rather than non-overlapping, as that keeps the meaning of UNIQUE  
consistent across all types and may actually be useful in some  
instances. I do think it would be convenient to have some sort of  
syntax that would provide a non-overlapping constraint rather than  
having to code up a constraint trigger every time you wanted to do  
this. As Martijn pointed out, when GiST can be used for a UNIQUE  
constraint, we should be able to define the non-overlapping  
constraint quite easily. So this could be thought of as a third  
orthogonal issue for ranges, the first two being the range type  
constructor and timestamp precision < 1 second. Any one of these  
three could be done independently and improve PostgreSQL. In  
combination they are definitely a very nice package.

On Jun 13, 2006, at 13:25 , Bruno Wolff III wrote:

> Date ranges are really closed open as well (as finite sets of  
> isolated points
> are both open and closed). The only oddity would be that the date  
> used to
> indicate the open end of the range might not be what the user expects.

I think it's definitely a matter of interpretation. [2006-01-01,  
2006-12-31] and [2006-01-01, 2007-01-01) both include the same days.  
Who's to say which is the "real" representation? For all practical  
purposes (i.e., what can be represented within the database)  
[2006-01-01 00:00:00+0, 2006-12-31 23:59:59] and [2006-01-01  
00:00:00, 2007-01-01 00:00:00+0] represent the same timestamp(0) with  
time zone ranges as well. While one might idealize time to be  
continuous, as far as I know there isn't a way to represent time that  
way in a computer, at the very least, not in PostgreSQL.

And for the very reason that it might not be what the user expects,  
if there's a way to convert between closed-open and closed-closed as  
appropriate, I think it makes it much more use friendly to do so. For  
example, the closed-closed representation is equivalent to what  
BETWEEN  does. It would be very nice to be able to provide sometime  
equivalent with ranges.

As for the successor function itself: Any "exact" datatype, such as  
timestamp (at least with --enable-integer-datetimes), date, integer,  
or numeric, has some built-in precision anyway and a successor  
function follows quite directly from that precision. I don't see that  
as problematic or even very difficult.

Thanks again for your comments, past, present and future! It's been  
very helpful for me to hear from others on this.

Michael Glaesemann
grzm seespotcode net



Re: Ranges for well-ordered types

From
Bruno Wolff III
Date:
On Wed, Jun 14, 2006 at 15:47:16 +0900, Michael Glaesemann <grzm@seespotcode.net> wrote:
> 
> On Jun 13, 2006, at 13:25 , Bruno Wolff III wrote:
> 
> >Date ranges are really closed open as well (as finite sets of  
> >isolated points
> >are both open and closed). The only oddity would be that the date  
> >used to
> >indicate the open end of the range might not be what the user expects.
> 
> I think it's definitely a matter of interpretation. [2006-01-01,  
> 2006-12-31] and [2006-01-01, 2007-01-01) both include the same days.  
> Who's to say which is the "real" representation? For all practical  

They are both real. In part my point was the reason the closed, closed form
works well for overlap checking is because the sets are also closed, open
and behave like that as well. (Though the user visible names are different.)

> purposes (i.e., what can be represented within the database)  
> [2006-01-01 00:00:00+0, 2006-12-31 23:59:59] and [2006-01-01  
> 00:00:00, 2007-01-01 00:00:00+0] represent the same timestamp(0) with  
> time zone ranges as well. While one might idealize time to be  
> continuous, as far as I know there isn't a way to represent time that  
> way in a computer, at the very least, not in PostgreSQL.

Which is a good reason to used the Closed, Open definition. Then you don't
have to work about whether postgres has been built with integer timestamps
or the details of the floating point hardware your database is running on.

> And for the very reason that it might not be what the user expects,  
> if there's a way to convert between closed-open and closed-closed as  
> appropriate, I think it makes it much more use friendly to do so. For  
> example, the closed-closed representation is equivalent to what  
> BETWEEN  does. It would be very nice to be able to provide sometime  
> equivalent with ranges.

I don't think it is unreasonable to have a different external representation
for date ranges and timestamp ranges. It isn't conistant, but I think it is
more readily understandable. Internally, it will probably be better to use
closed, open for both to keep the code consistant to allow for reuse and
better understandibility at that level.

> As for the successor function itself: Any "exact" datatype, such as  
> timestamp (at least with --enable-integer-datetimes), date, integer,  
> or numeric, has some built-in precision anyway and a successor  
> function follows quite directly from that precision. I don't see that  
> as problematic or even very difficult.

The successor function for timestamp when not using integer datetimes is
going to depend on the underlying hardware. I think that will be a problem
unless you are planning to force the use of integer datetimes. I don't
think the project is ready for that yet, though in the long run I think it
is a better way to go.