Re: Ranges for well-ordered types - Mailing list pgsql-hackers

From Michael Glaesemann
Subject Re: Ranges for well-ordered types
Date
Msg-id F329B457-0572-49FC-BED7-47A81D2B3D5A@seespotcode.net
Whole thread Raw
In response to Re: Ranges for well-ordered types  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Ranges for well-ordered types  (Bruno Wolff III <bruno@wolff.to>)
List pgsql-hackers
Thanks to everyone for the feedback that I've received so far. It's  
clear that there's interest in this.

On Jun 12, 2006, at 3:22 , Josh Berkus wrote:

> I do think Jim is right, though, in that we may want to look for  
> portions of
> the functionality which are achievable in the context of one  
> PostgreSQL
> version, unless you're going to be working full-time on this patch.

I definitely agree with implementing it in parts. I doubt it's  
possible, but perhaps a first bit might make it into 8.2 :)

> In real-world calendaring applications, I *certainly* see the need  
> for a
> successor function.  However, that would require being able to define
> timestamps with a variable precision, e.g. TIMESTAMP('5 minutes').   
> This, by
> itself would be a significant effort, yet useful ... maybe that's  
> where to
> start?

As mentioned in an earlier email, I think calendaring applications in  
particular would benefit from timestamp precisions of less than 1  
second, e.g., TIMESTAMP('5 minutes') or TIMESTAMP('1 hour'). However,  
I think this is a thorny problem. To elaborate, I believe the  
precision has to be relative to some "baseline". From 12:00, 30  
minute precision would presumably allow 12:00, 12:30, 13:00, 13:30,  
and so on. Precision of '1 hour' would allow 12:00, 13:00, 14:00, and  
so on. But these are relative to the time zone they're in. While  
12:00 in Tokyo (+9) would be a timestamp value with 1 hour precision,  
that same timestamp is 4:30 in Tehran (+3:30) if I got the math  
right. Is 4:30 a timestamp value with 1 hour precision? Because of  
this, I think timestamp precisions of less than 1 second (timestamp 
(0)) require storing the time zone as part of the timestamp value.

Pushing this even further, would we allow arbitrary precision? For  
example, would 45-minute precision be allowed? In that case, I  
believe we'd need to go further than storing just the time zone with  
the timestamp value. The timestamp value would have to be relative to  
some baseline timestamp to be able to calculate whether or not the  
difference between any particular timestamp and the baseline  
timestamp is integral. Perhaps this could be accomplished using  
domains and some additional checking function? I'm not sure. It's  
enough to make me want to forget about the idea of disallowing any  
precision that is not an evenly divided into the next larger "time  
part": any precision between 0 seconds and 1 minute would have to be  
a number of seconds evenly divided into 60; between 1 hour and 1 day,  
precision would have to be one of the values 1, 2, 3, 4, 6, 8, or 12  
hours.

I've been able to discuss the issue of timestamp precision without  
bringing up successor functions or ranges at all, and indeed I think  
it's orthogonal to the range implementation. I think they're both  
concepts that should be included in PostgreSQL, but as for myself,  
I'm more interested in the range implementation than the the  
timestamp precision issue.

By the way, anyone care to weigh in on what term we should use when  
discussing this? Josh has used PERIOD. Should we go with that for now?

A somewhat related issue: would we want any implementation to follow  
(at least part) of the not-yet-standard SQL/Temporal draft? Or would  
it be more desirable to steer clear of using any terms/syntax that  
was included in an attempt to prevent any possible conflict with a  
future SQL spec?

> You're probably going to have to give up on B-Tree indexes for  
> PERIODs, and
> look towards GiST.  For one thing, I would see UNIQUE in the  
> context of a
> PERIOD defined as non-overlapping.  e.g.:

I think that a non-overlapping constraint goes above and beyond what  
UNIQUE requires. In my opinion, UNIQUE should test for equality,  
rather than non-overlapping, as that keeps the meaning of UNIQUE  
consistent across all types and may actually be useful in some  
instances. I do think it would be convenient to have some sort of  
syntax that would provide a non-overlapping constraint rather than  
having to code up a constraint trigger every time you wanted to do  
this. As Martijn pointed out, when GiST can be used for a UNIQUE  
constraint, we should be able to define the non-overlapping  
constraint quite easily. So this could be thought of as a third  
orthogonal issue for ranges, the first two being the range type  
constructor and timestamp precision < 1 second. Any one of these  
three could be done independently and improve PostgreSQL. In  
combination they are definitely a very nice package.

On Jun 13, 2006, at 13:25 , Bruno Wolff III wrote:

> Date ranges are really closed open as well (as finite sets of  
> isolated points
> are both open and closed). The only oddity would be that the date  
> used to
> indicate the open end of the range might not be what the user expects.

I think it's definitely a matter of interpretation. [2006-01-01,  
2006-12-31] and [2006-01-01, 2007-01-01) both include the same days.  
Who's to say which is the "real" representation? For all practical  
purposes (i.e., what can be represented within the database)  
[2006-01-01 00:00:00+0, 2006-12-31 23:59:59] and [2006-01-01  
00:00:00, 2007-01-01 00:00:00+0] represent the same timestamp(0) with  
time zone ranges as well. While one might idealize time to be  
continuous, as far as I know there isn't a way to represent time that  
way in a computer, at the very least, not in PostgreSQL.

And for the very reason that it might not be what the user expects,  
if there's a way to convert between closed-open and closed-closed as  
appropriate, I think it makes it much more use friendly to do so. For  
example, the closed-closed representation is equivalent to what  
BETWEEN  does. It would be very nice to be able to provide sometime  
equivalent with ranges.

As for the successor function itself: Any "exact" datatype, such as  
timestamp (at least with --enable-integer-datetimes), date, integer,  
or numeric, has some built-in precision anyway and a successor  
function follows quite directly from that precision. I don't see that  
as problematic or even very difficult.

Thanks again for your comments, past, present and future! It's been  
very helpful for me to hear from others on this.

Michael Glaesemann
grzm seespotcode net



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: CSV mode option for pg_dump
Next
From: Martijn van Oosterhout
Date:
Subject: Re: postgresql and process titles