WIP: Range Types - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | WIP: Range Types |
Date | |
Msg-id | 1294126185.18031.3395.camel@jdavis Whole thread Raw |
Responses |
Re: WIP: Range Types
Re: WIP: Range Types Re: WIP: Range Types Re: WIP: Range Types Re: WIP: Range Types |
List | pgsql-hackers |
I have been updating my work in progress here: http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes Right now, it's not in a reviewable state, but those interested can glance through the code. Quick synopsis (for illustration purposes only; don't expect much from the current code): CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp); Much of the previous discussion seemed to focus on two issues: 1. Reconciling discrete ranges (like ranges of integers) and continuous ranges (like ranges of numeric). I liked Robert's suggestion here: http://archives.postgresql.org/message-id/AANLkTiks_x93_k82b4f_ga634wCi0oeb9fTrUrF28EGM@mail.gmail.com which says that the user can just define a "canonicalize" function that will take a range as input (or perhaps the logical pieces of a range) and put it into an appropriate canonical representation. For instance, int4range_canonical might take (1,4] and turn it into [2,4]. This is similar to a few other ideas, but Robert's idea seems to require the least effort by the person defining the range type, because postgresql can still handle representation. It doesn't allow for all of the suggested features. In particular, it would not allow "granules" to be specified for discrete ranges. But on balance, it seems like this is the most conceptually simple and I think it satisfies the primary use cases. 2. Representational issues. There are many possibilities here: a. flags for inclusivity, start, and offset b. flags for inclusivity,start, and end c. if it's a discrete range, start and end only might suffice d. if it's a discrete range, perhapssomething involving "granules" (a) might be interesting, and for some data types might be more compact, but it introduces a new datatype that is distinct from the range's subtype: the "difference type" (that is, for timestamps it's "interval"). This approach seemed reasonable on paper, but it involves a lot of extra complexity, and introduces some strange assumptions (using an offset of "1 month" versus "30 days" can't be allowed). (c) and (d) are rejected because they require different code paths for discrete and continuous ranges. I chose (b). This is the simplest. If desired, we could still allow the user to specify their own serialize/deserialize functions, which can get most of the benefits of the other ones anyway. Other issues: 1. I plan to introduce an ANYRANGE type. 2. We need to use the subtype's IO functions, but those may not be immutable. So, rather than create new IO functions for each range type, I was thinking that I'd use just three (anyrange_i_in, anyrange_s_in, and anyrange_v_in), and select the right one at definition time, based on the subtype's IO functions' volatility. That seems like a bit of a hack -- any better ideas? 3. Right now I allow user-defined parse/deparse functions to be specified. In almost all cases, I would think that we want the text format to be something like: [ 2010-01-01, 2011-01-01 ) where the brackets denote inclusivity, and the left and right sides can be optionally double-quoted. Is it even worth having these parse/deparse functions, or should we just force the "obvious" format? 4. For the GiST penalty function, and perhaps some picksplit algorithms, it might be nice to know the length of a range, or do some other kinds of math. It introduces a lot of complexity to try to define math functions for each subtype, and try to make sure they behave sanely. So I was thinking that the user might need to specify a function that converts the subtype into a float that approximates a value's position in the total order. Any better ideas? Overall: I think this is one of the simpler designs. Conceptually, defining new ranges of different granularity with ease sounds like a great idea -- but it introduces a lot of complexity (and generated a lot of different opinions), so it was not included in this design. Similarly, I am leaning away from lots of user-specified options unless there is a real use case. Any suggestions or comments welcome. Regards,Jeff Davis
pgsql-hackers by date: