Re: Range types - Mailing list pgsql-hackers

From Scott Bailey
Subject Re: Range types
Date
Msg-id 4B2921D6.3010204@comcast.net
Whole thread Raw
In response to Re: Range types  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Range types  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Jeff Davis wrote:
> On Sun, 2009-12-13 at 23:49 -0800, Scott Bailey wrote:
>> So basically I have an anyrange pseudo type with the functions prev, 
>> next, last, etc defined. So instead of hard coding range types, we would 
>> allow the user to define their own range types. Basically if we are able 
>> to determine the previous and next values of the base types we'd be able 
>> to define a range type. I'm envisioning in a manner much like defining 
>> an enum type.
> 
> After an off-list discussion with Scott, I think there may be a solution
> here that works for everyone if we don't try so hard to unify the
> implementation of discrete and continuous ranges. The API should be very
> similar, of course, but the implementation doesn't need to be.
> 
> Continuous ranges absolutely require the following information: start,
> end, and inclusivity information.
> 
> But discrete ranges can instead be stored by counting the number of
> granules from the start point. For instance, it could be stored as:
> start, num_granules.
> 
> That has a lot of benefits for discrete ranges of time. First of all, it
> allows the algebra to work reasonably well for the "days" and "months"
> part of the interval, so we can allow a granule of 1 day/week/month/year
> for a timestamp range. For output of the range, we can then just
> multiply the granule by the number of granules, and add that to the
> start time; thus avoiding the "incremental addition" problem with date
> math. I think this works reasonably well for timestamp/date ranges --
> let me know if there is a problem here (aside from timestamptz, which I
> address below).
> 
> Secondly, in the case of a timestamp range, we can use 7 bytes for
> storing the number of granules rather than another full 8-byte
> timestamp, leaving one byte for flags to represent NULL boundaries,
> infinite boundaries, etc. For timestamps that would still mean that an
> interval could be 2000 years long with '1 microsecond' granularity. For
> dates, 3 bytes is sufficient for a date range 45000 years long with
> granules of '1 day'. That means that we can get back down to a 16 byte
> representation for timestamp ranges, or 8 byte representation for date
> ranges. There are a few details, like infinite ranges, but those can be
> pretty easily solved with flags as well.
> 
> There's one problem, and that's for timestamptz ranges with intervals
> that include days and months. Timezone adjustments are just not
> well-defined for that kind of granule (nor would it be particularly
> useful even if it magically worked), so this would have to be blocked
> somehow. I think that's a special case, and we could provide the user
> with a nice error message telling the user to use a date or timestamp
> range instead.
> 
> So, the idea is to default to a continuous range type, but if the user
> supplies a granule, prior and next functions, and other necessary
> details, then it becomes a discrete range type.
> 
>  * continuous ranges can still have everything that everyone wants, 
>    including flags to indicate special values.
>  * discrete range granule is specified explicitly, so it's not an 
>    "implementation detail"
>  * discrete ranges can have a compact representation
>  * discrete ranges would still have room for flags to indicate special 
>    values
>  
> Comments?

As I pointed out off-list, I think the granularity for timestamp range 
should be limited to hours and smaller. Anything larger is asking for 
trouble. And quite honestly if they wanted day granularity, they should 
use date range. Also, I think the granule should be same type as 
returned when subtracting two subtypes. So granule of date range should 
be int not interval. And if user wanted something with month 
granularity, perhaps an enum range of 'YYYYMM' would be better.

Quite honestly the following 3 cases would probably meet 99% of need:
CREATE TYPE period AS RANGE(timestamptz(0), interval '1 s');
CREATE TYPE period AS RANGE(timestamptz(3), interval '1 ms');
CREATE TYPE period AS RANGE(timestamptz, interval '1 us');


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Range types
Next
From: Alvaro Herrera
Date:
Subject: Re: Update on true serializable techniques in MVCC