Re: Range types - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Range types |
Date | |
Msg-id | 1260984677.15987.1415.camel@jdavis Whole thread Raw |
In response to | Range types (Scott Bailey <artacus@comcast.net>) |
Responses |
Re: Range types
Re: Range types Re: Range types |
List | pgsql-hackers |
On Sun, 2009-12-13 at 23:49 -0800, Scott Bailey wrote: > So basically I have an anyrange pseudo type with the functions prev, > next, last, etc defined. So instead of hard coding range types, we would > allow the user to define their own range types. Basically if we are able > to determine the previous and next values of the base types we'd be able > to define a range type. I'm envisioning in a manner much like defining > an enum type. After an off-list discussion with Scott, I think there may be a solution here that works for everyone if we don't try so hard to unify the implementation of discrete and continuous ranges. The API should be very similar, of course, but the implementation doesn't need to be. Continuous ranges absolutely require the following information: start, end, and inclusivity information. But discrete ranges can instead be stored by counting the number of granules from the start point. For instance, it could be stored as: start, num_granules. That has a lot of benefits for discrete ranges of time. First of all, it allows the algebra to work reasonably well for the "days" and "months" part of the interval, so we can allow a granule of 1 day/week/month/year for a timestamp range. For output of the range, we can then just multiply the granule by the number of granules, and add that to the start time; thus avoiding the "incremental addition" problem with date math. I think this works reasonably well for timestamp/date ranges -- let me know if there is a problem here (aside from timestamptz, which I address below). Secondly, in the case of a timestamp range, we can use 7 bytes for storing the number of granules rather than another full 8-byte timestamp, leaving one byte for flags to represent NULL boundaries, infinite boundaries, etc. For timestamps that would still mean that an interval could be 2000 years long with '1 microsecond' granularity. For dates, 3 bytes is sufficient for a date range 45000 years long with granules of '1 day'. That means that we can get back down to a 16 byte representation for timestamp ranges, or 8 byte representation for date ranges. There are a few details, like infinite ranges, but those can be pretty easily solved with flags as well. There's one problem, and that's for timestamptz ranges with intervals that include days and months. Timezone adjustments are just not well-defined for that kind of granule (nor would it be particularly useful even if it magically worked), so this would have to be blocked somehow. I think that's a special case, and we could provide the user with a nice error message telling the user to use a date or timestamp range instead. So, the idea is to default to a continuous range type, but if the user supplies a granule, prior and next functions, and other necessary details, then it becomes a discrete range type. * continuous ranges can still have everything that everyone wants, including flags to indicate special values.* discreterange granule is specified explicitly, so it's not an "implementation detail"* discrete ranges can have a compactrepresentation* discrete ranges would still have room for flags to indicate special values Comments? Regards,Jeff Davis
pgsql-hackers by date: