Thread: WIP: RangeTypes
Ok, I have made some progress. This is still a proof-of-concept patch, but the important pieces are working together. Synopsis: CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp); SELECT range_eq('[1,2.2)'::numrange,'[1,2.2]'); SELECT range_lbound('(3.7,9]'::numrange); SELECT range(6.7); SELECT '-'::numrange; -- empty SELECT '[1, NULL]'::numrange; -- ] will become ) SELECT '(INF, 3)'::numrange; I haven't completed many of the other generic functions, because I'd like to make sure I'm on the right track first. The important thing about the functions above is that they show ANYRANGE working in conjunction with ANYELEMENT in various combinations, which was a significant part of this patch. Here are the open items: 1. Generic functions -- most of which are fairly obvious. However, I want to make sure I'm on the right track first. 2. GiST -- I'll need a mechanism to implement the "penalty" function, and perhaps I'll also need additional support for the picksplit function. For the "penalty" function, I think I'll need to require a function to convert the subtype into a float, and I can use that to find a distance (which can be the penalty). That should also satisfy anything that picksplit might need. 3. Typmod -- There is still one annoyance about typmod remaining. I need to treat it like an array in find_typmod_coercion_function(), and then create a coercion expression. Is it worth it? Would typmod on a range be confusing, or should I just finish this item up? 4. Docs 5. Tests 6. pg_dump -- should be pretty easy; I just want to settle some of the other stuff first. 7. Right now the parse function is quite dumb. Is there some example code I should follow to make sure I get this right? 8. In order to properly support the various combinations of ANYRANGE and ANYELEMENT in a function definition (which are all important), we need to be able to determine the range type given a subtype. That means that each subtype can only have one associated range, which sounds somewhat limiting, but it can be worked around by using domains. I don't think this is a major limitation. Comments? 9. Representation -- right now I store the OID of the range type in the range itself, much like arrays, in order to call the find the functions to operate on the subtype. Robert has some justifiable concerns about that 4-byte overhead. Possible ideas: * Forget about ANYRANGE altogether, and generate new catalog entries for the generic functions for each new range type defined. I don't particularly like this approach because it makes it very difficult to define new generic functions. * Somehow fix the type system so that we know the specific types of arguments in all situations. I don't know if this is feasible. * Store a 8- or 16-bit unique number in pg_range, and store that number in the representation. That would be pretty ugly, and limit the total possible range types defined at once, but it saves a couple bytes per value. * Try to somehow mimic what records do. Records use a global array and use the typmod as an index into that array. It looks like a hack to me, but might be worth borrowing anyway. Also related to representation: * Right now I always align the subtypes within the range according to typalign. I could avoid that by packing the bytes tightly, and then copying them around later. Suggestions? And what should the overall alignment of the range type be? * If it's a fixed-length type, we can save the varlena header byte on the overall range; but we lose the ability to save space when one of the boundaries of the range is missing (NULL or INF), and it would complicate the code a little. Thoughts? Regards, Jeff Davis
Attachment
On Tue, Jan 11, 2011 at 01:16:47AM -0800, Jeff Davis wrote: > Ok, I have made some progress. This is still a proof-of-concept patch, > but the important pieces are working together. > > Synopsis: > > CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, > SUBTYPE_CMP=numeric_cmp); > > SELECT range_eq('[1,2.2)'::numrange,'[1,2.2]'); > SELECT range_lbound('(3.7,9]'::numrange); > SELECT range(6.7); > SELECT '-'::numrange; -- empty > SELECT '[1, NULL]'::numrange; -- ] will become ) > SELECT '(INF, 3)'::numrange; > > I haven't completed many of the other generic functions, because I'd > like to make sure I'm on the right track first. The important thing > about the functions above is that they show ANYRANGE working in > conjunction with ANYELEMENT in various combinations, which was a > significant part of this patch. > > Here are the open items: > > 1. Generic functions -- most of which are fairly obvious. However, I > want to make sure I'm on the right track first. > > 2. GiST -- I'll need a mechanism to implement the "penalty" function, > and perhaps I'll also need additional support for the picksplit > function. For the "penalty" function, I think I'll need to require a > function to convert the subtype into a float, and I can use that to find > a distance (which can be the penalty). That should also satisfy anything > that picksplit might need. > > 3. Typmod -- There is still one annoyance about typmod remaining. I need > to treat it like an array in find_typmod_coercion_function(), and then > create a coercion expression. Is it worth it? Would typmod on a range be > confusing, or should I just finish this item up? Probably not worth it for the first round. > 4. Docs Happy to help evenings this week :) > 5. Tests Same. What do you have so far? > 6. pg_dump -- should be pretty easy; I just want to settle some of the > other stuff first. > > 7. Right now the parse function is quite dumb. Is there some example > code I should follow to make sure I get this right? KISS is a fine principle. Do you really need it smart on the first round? :) > 8. In order to properly support the various combinations of ANYRANGE and > ANYELEMENT in a function definition (which are all important), we need > to be able to determine the range type given a subtype. That means that > each subtype can only have one associated range, which sounds somewhat > limiting, but it can be worked around by using domains. I don't think > this is a major limitation. Comments? As we get a more nuanced type system, this is one of the things that will need to get reworked, so I'd say it's better not to put too much effort into things that a refactor of the type system <http://wiki.postgresql.org/wiki/Refactor_Type_System> would make much better, at least right now. > Also related to representation: > > * Right now I always align the subtypes within the range according to > typalign. I could avoid that by packing the bytes tightly, and then > copying them around later. Suggestions? And what should the overall > alignment of the range type be? For the first cut, the simplest possible. > * If it's a fixed-length type, we can save the varlena header byte on > the overall range; but we lose the ability to save space when one of the > boundaries of the range is missing (NULL or INF), and it would > complicate the code a little. Thoughts? Probably not worth complicating the code at this stage. KISS again :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, 2011-01-11 at 11:13 -0800, David Fetter wrote: > > 3. Typmod -- There is still one annoyance about typmod remaining. I need > > to treat it like an array in find_typmod_coercion_function(), and then > > create a coercion expression. Is it worth it? Would typmod on a range be > > confusing, or should I just finish this item up? > > Probably not worth it for the first round. OK, I'll block typmods for range types for now. > > 4. Docs > > Happy to help evenings this week :) > > > 5. Tests > > Same. What do you have so far? Great! I think the best tests would be around the ANYRANGE type mechanism to see if anything seems wrong or limiting. Particularly, its interaction with ANYELEMENT. > > 7. Right now the parse function is quite dumb. Is there some example > > code I should follow to make sure I get this right? > > KISS is a fine principle. Do you really need it smart on the first > round? :) Well, it needs to be correct ;) Specifically, I think there will be a problem if there is a multibyte character following a backslash. There may be other problems, as well. I could probably get these fixed, but it might be better to follow patterns in other code. I'll look into it. > > 8. In order to properly support the various combinations of ANYRANGE and > > ANYELEMENT in a function definition (which are all important), we need > > to be able to determine the range type given a subtype. That means that > > each subtype can only have one associated range, which sounds somewhat > > limiting, but it can be worked around by using domains. I don't think > > this is a major limitation. Comments? > > As we get a more nuanced type system, this is one of the things that > will need to get reworked, so I'd say it's better not to put too much > effort into things that a refactor of the type system > <http://wiki.postgresql.org/wiki/Refactor_Type_System> would make much > better, at least right now. Sounds good. I don't think this is an actual problem, so I'll consider this a non-issue unless someone else has a comment. > > Also related to representation: > > > > * Right now I always align the subtypes within the range according to > > typalign. I could avoid that by packing the bytes tightly, and then > > copying them around later. Suggestions? And what should the overall > > alignment of the range type be? > > For the first cut, the simplest possible. OK. It's already about as simple as it can get, but might be fairly wasteful. > > * If it's a fixed-length type, we can save the varlena header byte on > > the overall range; but we lose the ability to save space when one of the > > boundaries of the range is missing (NULL or INF), and it would > > complicate the code a little. Thoughts? > > Probably not worth complicating the code at this stage. KISS again :) OK. Regards,Jeff Davis
Updated patch. Summary of changes: * More generic functions * pg_dump support * remove typmod support until it can be done correctly * added some tests There is still quite a bit left, including (numbers match up with previous TODO list): 1. Generic functions -- still more work to do here. Handling the combination of continuous range semantics with NULLs requires quite a lot of special cases, because it's hard to share code among functions. Even something as simple as "equals" is not as trivial as it sounds. Perhaps I'm missing some cleaner abstractions, or perhaps I'm over-thinking the null semantics. 3. perhaps fix typmod 4. documentation 5. more tests 7. better parser Regards, Jeff Davis
Attachment
When defining generic range functions, there is quite a bit of extra complexity needed to handle special cases. The special cases are due to:* empty ranges* ranges with infinite boundaries* ranges with NULL boundaries* ranges with exclusivebounds (e.g. "(" or ")"). Infinite bounds, and exclusive bounds can both be handled somewhat reasonably, and the complexity can be somewhat hidden. Empty ranges are a special case, but can be handled at the top of the generic function in a straightforward way. NULL bounds, however, have been causing me a little frustration. A reasonable interpretation of boolean operators that operate on ranges might be: "true or false if we can prove it from only the inputs; else NULL". This gets a little interesting because a NULL value as a range boundary isn't 100% unknown: it's known to be on one side of the other bound (assuming that the other side is known). This is similar to how AND and OR behave for NULL. For instance, take the simple definition of "contains": r1.a <= r2.a AND r1.b >= r2.b (where "a" is the lower bound and "b" is the upper) Consider r1: [NULL, 10], r2: [20, NULL]. Contains should return "false" according to our rule above, because no matter what the values of r1.a and r2.b, the ranges can't possibly overlap. So, now, more complexity needs to be added. We can be more redundant and do: r1.a <= r2.a AND r1.b <= r2 AND r1.a <= r2.b AND r1.b >= r2.a That seems a little error-prone and harder to understand. Then, when we have functions that operate on ranges and return ranges, we're not dealing with 3VL exactly, but some other intuition about what NULL should do. The semantics get a lot more complicated and hard to reason about. For instance, what about: (NULL, 5) INTERSECT (3, NULL) Should that evaluate to NULL, (NULL, NULL), or throw an error? What about: (NULL, 5) MINUS (NULL, 7) (NULL, 5) MINUS (3, NULL) I feel like I'm making this too complicated. Should I just scope out NULL range boundaries for the first cut, and leave room in the representation so that it can be added when there is a more thorough proposal for NULL range boundaries? Regards,Jeff Davis
On Mon, Jan 17, 2011 at 01:09:26PM -0800, Jeff Davis wrote: > When defining generic range functions, there is quite a bit of extra > complexity needed to handle special cases. > > The special cases are due to: > * empty ranges > * ranges with infinite boundaries > * ranges with NULL boundaries > * ranges with exclusive bounds (e.g. "(" or ")"). > > Infinite bounds, and exclusive bounds can both be handled somewhat > reasonably, and the complexity can be somewhat hidden. Empty ranges > are a special case, but can be handled at the top of the generic > function in a straightforward way. > > NULL bounds, however, have been causing me a little frustration. > [Explanation and illustrations]. In that case, let's leave them out for this cut. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Jeff Davis <pgsql@j-davis.com> writes: > I feel like I'm making this too complicated. Should I just scope out > NULL range boundaries for the first cut, and leave room in the > representation so that it can be added when there is a more thorough > proposal for NULL range boundaries? +1. I'm far from convinced that a null boundary is sane at all. If you don't know the value, how do you know it's greater/less than the other bound? regards, tom lane
On 1/17/11 1:09 PM, Jeff Davis wrote: > I feel like I'm making this too complicated. Should I just scope out > NULL range boundaries for the first cut, and leave room in the > representation so that it can be added when there is a more thorough > proposal for NULL range boundaries? Well, NULL range boundaries aren't usable with Temporal, and yet I wrote a whole scheduling application around it. So I think it's OK to have them as a TODO and raise an error for now. Heck, we had arrays which didn't accept NULLs for years. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
New patch. I added a lot of generic range functions, and a lot of operators. There is still more work to do, this is just an updated patch. The latest can be seen on the git repository, as well: http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes Regards, Jeff Davis
Attachment
On Thu, Jan 20, 2011 at 4:29 AM, Jeff Davis <pgsql@j-davis.com> wrote: > New patch. I added a lot of generic range functions, and a lot of > operators. > > There is still more work to do, this is just an updated patch. The > latest can be seen on the git repository, as well: So is this 9.2 material at this point? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 2011-01-21 at 12:31 -0500, Robert Haas wrote: > On Thu, Jan 20, 2011 at 4:29 AM, Jeff Davis <pgsql@j-davis.com> wrote: > > New patch. I added a lot of generic range functions, and a lot of > > operators. > > > > There is still more work to do, this is just an updated patch. The > > latest can be seen on the git repository, as well: > > So is this 9.2 material at this point? Regardless of whether it's eligible to be in 9.1, I plan to keep working on it. I would appreciate some overall feedback during this commitfest. Much of the code is there, so it would be helpful if we could settle issues like representation, functionality, interface, catalog, API, grammar, and naming. Otherwise, those issues will just be a reason to bounce it from commitfest-next, as well. Regards,Jeff Davis
On Fri, Jan 21, 2011 at 2:30 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, 2011-01-21 at 12:31 -0500, Robert Haas wrote: >> On Thu, Jan 20, 2011 at 4:29 AM, Jeff Davis <pgsql@j-davis.com> wrote: >> > New patch. I added a lot of generic range functions, and a lot of >> > operators. >> > >> > There is still more work to do, this is just an updated patch. The >> > latest can be seen on the git repository, as well: >> >> So is this 9.2 material at this point? > > Regardless of whether it's eligible to be in 9.1, I plan to keep working > on it. > > I would appreciate some overall feedback during this commitfest. Much of > the code is there, so it would be helpful if we could settle issues like > representation, functionality, interface, catalog, API, grammar, and > naming. Otherwise, those issues will just be a reason to bounce it from > commitfest-next, as well. Agreed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Updated patch. Changes: * Documentation for operators/functions * a comprehensive set of operators and functions * BTree opclass * Hash opclass * built-in range types: - PERIOD (timestamp) - PERIODTZ (timestamptz) - DATERANGE (date) - INTRANGE (int4) - NUMRANGE (numeric) * added subtype float function to the API, which will be useful for GiST * created canonical functions for intrange and daterange, so that: '[1,5]'::intrange = '[1,6)'::intrange * added length() function, written in SQL as: select upper($1) - lower($1) which uses polymorphic "-" operator to avoid the need to give the subtype subtract function and return type to the generic API Open items: * More documentation work * Settle any representation/alignment concerns * Should the new length() function be marked as immutable, stable, or volatile? It uses the polymorphic "-" operator, and I suppose someone could define a non-immutable version of that before calling length(). Then again, it is likely to be inlined anyway, right? * GiST - docs - catalog work - implementation * typmod support (optional) This is nearing completion. GiST is by far the most amount of effort remaining that I'm aware of. Comments about the API, naming, representation, interface, funcationality, grammar, etc. are welcome. Regards, Jeff Davis
Attachment
On Thu, Jan 27, 2011 at 11:45:30PM -0800, Jeff Davis wrote: > Updated patch. > > Changes: > > * Documentation for operators/functions > * a comprehensive set of operators and functions > * BTree opclass Yay! > * Hash opclass > * built-in range types: > - PERIOD (timestamp) > - PERIODTZ (timestamptz) For consistency, and in order not to continue our atrocious naming tradition, I'd like to propose that the above be named timestamprange (tsrange for short) and timestamptzrange (tstzrange for short). > - DATERANGE (date) Yay! > - INTRANGE (int4) int4range/intrange and the missing bigintrange/int8range > - NUMRANGE (numeric) numericrange/numrange. Should there also be a timerange and a timetzrange? > * added subtype float function to the API, which will be useful for > GiST w00t! > * created canonical functions for intrange and daterange, so that: > '[1,5]'::intrange = '[1,6)'::intrange Excellent! > * added length() function, written in SQL as: > select upper($1) - lower($1) > which uses polymorphic "-" operator to avoid the need to > give the subtype subtract function and return type to the generic > API > > Open items: > > * More documentation work > * Settle any representation/alignment concerns > * Should the new length() function be marked as immutable, stable, > or volatile? It uses the polymorphic "-" operator, and I suppose > someone could define a non-immutable version of that before calling > length(). Then again, it is likely to be inlined anyway, right? > * GiST > - docs > - catalog work > - implementation > * typmod support (optional) > > This is nearing completion. GiST is by far the most amount of effort > remaining that I'm aware of. Comments about the API, naming, > representation, interface, funcationality, grammar, etc. are welcome. > > Regards, > Jeff Davis I'd offer to help, but personal matters press this weekend :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, 2011-01-28 at 09:17 -0800, David Fetter wrote: > For consistency, and in order not to continue our atrocious naming > tradition, I'd like to propose that the above be named timestamprange > (tsrange for short) and timestamptzrange (tstzrange for short). No real objection, but I'd like to see if someone else will second it. Also, I don't think aliases are very easy to define. They appear to all be special cases in the backend code, without catalog support. Should I use domains? If not, I think we'll have to stick to one name. > > - INTRANGE (int4) > > int4range/intrange and the missing bigintrange/int8range I thought about adding int8range, and the first time around that's what I tried. But then I realized that the literal "4" is interpreted as an int4, meaning that "range(1,10)" would be interpreted as int4range, so int8range was slightly annoying to use because you have to cast the literals. Also, the storage is not particularly efficient right now anyway, so if you need int8range, you could probably use numrange instead. I don't mind either way. If you think someone will use it, I'll add it. > Should there also be a timerange and a timetzrange? I thought about it, and I realized that I've never seen the "time" type used. Again, I'll add it if someone will use it. Keep in mind that it's fairly easy for people to add their own range types. The most difficult part is defining the "canonical" function if it is applicable, and the "subtype_float" function which is necessary for GiST. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Fri, 2011-01-28 at 09:17 -0800, David Fetter wrote: >> For consistency, and in order not to continue our atrocious naming >> tradition, I'd like to propose that the above be named timestamprange >> (tsrange for short) and timestamptzrange (tstzrange for short). > No real objection, but I'd like to see if someone else will second it. > Also, I don't think aliases are very easy to define. They are not, and should be avoided. I don't think we have *any* typename aliases except for cases required by SQL standard. >> Should there also be a timerange and a timetzrange? > I thought about it, and I realized that I've never seen the "time" type > used. Again, I'll add it if someone will use it. I have no idea what the semantics of timetzrange would be. Even timerange would be a bit funny --- is 11PM before or after 1AM? regards, tom lane
On Jan 28, 2011, at 9:48 AM, Jeff Davis wrote: > On Fri, 2011-01-28 at 09:17 -0800, David Fetter wrote: >> For consistency, and in order not to continue our atrocious naming >> tradition, I'd like to propose that the above be named timestamprange >> (tsrange for short) and timestamptzrange (tstzrange for short). > > No real objection, but I'd like to see if someone else will second it. +1 in principal. I think we should try to avoid the user of the term "period" if possible, and I see definite benefits toa simple model of $typename . 'range'; > Keep in mind that it's fairly easy for people to add their own range > types. The most difficult part is defining the "canonical" function if > it is applicable, and the "subtype_float" function which is necessary > for GiST. Is there GIN support? GIN seems to be the preferred index type for this sort of thing, no? Best, David
pgsql@j-davis.com (Jeff Davis) writes: > On Fri, 2011-01-28 at 09:17 -0800, David Fetter wrote: >> For consistency, and in order not to continue our atrocious naming >> tradition, I'd like to propose that the above be named timestamprange >> (tsrange for short) and timestamptzrange (tstzrange for short). > > No real objection, but I'd like to see if someone else will second it. > > Also, I don't think aliases are very easy to define. They appear to all > be special cases in the backend code, without catalog support. Should I > use domains? If not, I think we'll have to stick to one name. Somehow, rangets, rangetstz seem better to me, but that's not a deep issue. I'm not certain of the basis for *truly* preferring an ordering of the components (ts/timestamp, tz, range). As long as it's rational, and not too terribly inconsistent with other prefix/suffix handlings, I'm fine with it. Mind you, timestamptzrange seems a mite *long* to me. >> > - INTRANGE (int4) >> >> int4range/intrange and the missing bigintrange/int8range > > I thought about adding int8range, and the first time around that's what > I tried. But then I realized that the literal "4" is interpreted as an > int4, meaning that "range(1,10)" would be interpreted as int4range, so > int8range was slightly annoying to use because you have to cast the > literals. > > Also, the storage is not particularly efficient right now anyway, so if > you need int8range, you could probably use numrange instead. > > I don't mind either way. If you think someone will use it, I'll add it. Making sure it's consistent with int4, int8, bigint sure seems like a good idea. >> Should there also be a timerange and a timetzrange? > > I thought about it, and I realized that I've never seen the "time" type > used. Again, I'll add it if someone will use it. > > Keep in mind that it's fairly easy for people to add their own range > types. The most difficult part is defining the "canonical" function if > it is applicable, and the "subtype_float" function which is necessary > for GiST. I don't see much use for "time"; it is *so* likely that you'll need date overlaps that it's difficult for it to be useful without making it extremely magical (e.g. - stowing a lot of logic inside that adds in date information behind the scenes). FYI, it's compiling and testing fine for me. This one strikes me as an exciting change, once GIST is in place. Well, actually, even without it :-). postgres@localhost-> insert into foo (dr) values ('[2010-01-01,2011-12-31)'); INSERT 0 1 postgres@localhost-> select * from foo;id | dr ----+---------------------------- 1 | [ 2010-01-01, 2011-12-31 ) (1 row) -- let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];; http://linuxfinances.info/info/rdbms.html If vegetarians eat vegetables, what do humanitarians eat?
On 28 January 2011 07:45, Jeff Davis <pgsql@j-davis.com> wrote: > Updated patch. > > Changes: > > * Documentation for operators/functions > * a comprehensive set of operators and functions > * BTree opclass > * Hash opclass > * built-in range types: > - PERIOD (timestamp) > - PERIODTZ (timestamptz) > - DATERANGE (date) > - INTRANGE (int4) > - NUMRANGE (numeric) > * added subtype float function to the API, which will be useful for > GiST > * created canonical functions for intrange and daterange, so that: > '[1,5]'::intrange = '[1,6)'::intrange > * added length() function, written in SQL as: > select upper($1) - lower($1) > which uses polymorphic "-" operator to avoid the need to > give the subtype subtract function and return type to the generic > API > > Open items: > > * More documentation work > * Settle any representation/alignment concerns > * Should the new length() function be marked as immutable, stable, > or volatile? It uses the polymorphic "-" operator, and I suppose > someone could define a non-immutable version of that before calling > length(). Then again, it is likely to be inlined anyway, right? > * GiST > - docs > - catalog work > - implementation > * typmod support (optional) > > This is nearing completion. GiST is by far the most amount of effort > remaining that I'm aware of. Comments about the API, naming, > representation, interface, funcationality, grammar, etc. are welcome. > > Regards, > Jeff Davis Very nice work Jeff! This is not very graceful: postgres=# CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, SUBTYPE_CMP=numeric_cmp); ERROR: duplicate key value violates unique constraint "pg_range_rgnsubtype_index" DETAIL: Key (rngsubtype)=(1700) already exists. Also, if I try the same, but with a different name for the type, I get the same error. Why does that restriction exist? Can't you have types which happen to use the exact same subtype? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
On 28 January 2011 20:28, Thom Brown <thom@linux.com> wrote: > On 28 January 2011 07:45, Jeff Davis <pgsql@j-davis.com> wrote: >> Updated patch. >> >> Changes: >> >> * Documentation for operators/functions >> * a comprehensive set of operators and functions >> * BTree opclass >> * Hash opclass >> * built-in range types: >> - PERIOD (timestamp) >> - PERIODTZ (timestamptz) >> - DATERANGE (date) >> - INTRANGE (int4) >> - NUMRANGE (numeric) >> * added subtype float function to the API, which will be useful for >> GiST >> * created canonical functions for intrange and daterange, so that: >> '[1,5]'::intrange = '[1,6)'::intrange >> * added length() function, written in SQL as: >> select upper($1) - lower($1) >> which uses polymorphic "-" operator to avoid the need to >> give the subtype subtract function and return type to the generic >> API >> >> Open items: >> >> * More documentation work >> * Settle any representation/alignment concerns >> * Should the new length() function be marked as immutable, stable, >> or volatile? It uses the polymorphic "-" operator, and I suppose >> someone could define a non-immutable version of that before calling >> length(). Then again, it is likely to be inlined anyway, right? >> * GiST >> - docs >> - catalog work >> - implementation >> * typmod support (optional) >> >> This is nearing completion. GiST is by far the most amount of effort >> remaining that I'm aware of. Comments about the API, naming, >> representation, interface, funcationality, grammar, etc. are welcome. >> >> Regards, >> Jeff Davis > > Very nice work Jeff! > > This is not very graceful: > > postgres=# CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, > SUBTYPE_CMP=numeric_cmp); > ERROR: duplicate key value violates unique constraint > "pg_range_rgnsubtype_index" > DETAIL: Key (rngsubtype)=(1700) already exists. > > Also, if I try the same, but with a different name for the type, I get > the same error. Why does that restriction exist? Can't you have > types which happen to use the exact same subtype? Also, how do you remove a range type which coincides with a system range type. For example: postgres=# CREATE TYPE numrange AS RANGE (SUBTYPE=interval, SUBTYPE_CMP=interval_cmp); CREATE TYPE postgres=# drop type numrange; ERROR: cannot drop type numrange because it is required by the database system Is this because I shouldn't have been able to create this type in the first place? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
On Fri, 2011-01-28 at 21:52 +0000, Thom Brown wrote: > > This is not very graceful: > > > > postgres=# CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, > > SUBTYPE_CMP=numeric_cmp); > > ERROR: duplicate key value violates unique constraint > > "pg_range_rgnsubtype_index" > > DETAIL: Key (rngsubtype)=(1700) already exists. You're right, that should be a much nicer error message. > > Also, if I try the same, but with a different name for the type, I get > > the same error. Why does that restriction exist? Can't you have > > types which happen to use the exact same subtype? At first, that's how I designed it. Then, I realized that the type system needs to know the range type from the element type in order for something like ANYRANGE to work. There's a workaround though: create a domain over numeric, and then create a range over mynumeric. =# create domain mynumeric as numeric; CREATE DOMAIN =# create type numrange2 as range (subtype=numeric, subtype_cmp=numeric_cmp); ERROR: duplicate key value violates unique constraint "pg_range_rgnsubtype_index" DETAIL: Key (rngsubtype)=(1700) already exists. =# create type numrange2 as range (subtype=mynumeric, subtype_cmp=numeric_cmp); CREATE TYPE =# select range(1.1::mynumeric,2.2::mynumeric); range --------------[ 1.1, 2.2 ) (1 row) > Also, how do you remove a range type which coincides with a system > range type. For example: > > postgres=# CREATE TYPE numrange AS RANGE (SUBTYPE=interval, > SUBTYPE_CMP=interval_cmp); > CREATE TYPE > postgres=# drop type numrange; > ERROR: cannot drop type numrange because it is required by the database system > > Is this because I shouldn't have been able to create this type in the > first place? The types are in two different schemas. It's just as though you created a table called pg_class. To drop the one you created, do: DROP TYPE public.numrange; Regards,Jeff Davis
On Fri, 2011-01-28 at 10:41 -0800, David E. Wheeler wrote: > +1 in principal. I think we should try to avoid the user of the term > "period" if possible, and I see definite benefits to a simple model of > $typename . 'range'; Interesting, I didn't realize that PERIOD was such an undesirable type name. > Is there GIN support? GIN seems to be the preferred index type for > this sort of thing, no? GiST is the natural index access method if we approach ranges as a spatial type. I don't quite know what you have in mind for GIN; what keys would you extract from the value '[1.23,4.56)' ? Regards,Jeff Davis
On 29 January 2011 18:52, Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, 2011-01-28 at 21:52 +0000, Thom Brown wrote: >> Also, how do you remove a range type which coincides with a system >> range type. For example: >> >> postgres=# CREATE TYPE numrange AS RANGE (SUBTYPE=interval, >> SUBTYPE_CMP=interval_cmp); >> CREATE TYPE >> postgres=# drop type numrange; >> ERROR: cannot drop type numrange because it is required by the database system >> >> Is this because I shouldn't have been able to create this type in the >> first place? > > The types are in two different schemas. It's just as though you created > a table called pg_class. > > To drop the one you created, do: > DROP TYPE public.numrange; *facepalm* Of course. :) My bad. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
On Jan 29, 2011, at 10:57 AM, Jeff Davis wrote: > On Fri, 2011-01-28 at 10:41 -0800, David E. Wheeler wrote: >> +1 in principal. I think we should try to avoid the user of the term >> "period" if possible, and I see definite benefits to a simple model of >> $typename . 'range'; > > Interesting, I didn't realize that PERIOD was such an undesirable type > name. It's not *hugely* undesirable. I just tend to think that "range" is more so. >> Is there GIN support? GIN seems to be the preferred index type for >> this sort of thing, no? > > GiST is the natural index access method if we approach ranges as a > spatial type. I don't quite know what you have in mind for GIN; what > keys would you extract from the value '[1.23,4.56)' ? I think I'm just revealing my ignorance of these index types and what they're good for. My impression has been that GIN wasa better but less-full-featured alternative to GiST and getting better with Tom's recent fixes for its handling of NULLs.But, uh, obviously not. Best, David
On Fri, 2011-01-28 at 14:15 -0500, Chris Browne wrote: > Mind you, timestamptzrange seems a mite *long* to me. Right. I think we might need to compromise here an use some shorter names. tsrange/tstzrange/numrange seem reasonable to me. > Making sure it's consistent with int4, int8, bigint sure seems like a > good idea. OK, I'll change intrange to int4range, and add int8range. int2range doesn't seem useful, though. Regards,Jeff Davis
On Sat, 2011-01-29 at 11:00 -0800, David E. Wheeler wrote: > I think I'm just revealing my ignorance of these index types and what > they're good for. My impression has been that GIN was a better but > less-full-featured alternative to GiST and getting better with Tom's > recent fixes for its handling of NULLs. But, uh, obviously not. The idea of GIN is that you store multiple entries for each tuple you insert. So, inserting a tuple containing the document 'hello world' would store the keys "hello" and "world" both pointing back to that tuple. It also makes sense for arrays. But ranges are arbitrarily long, and don't have any defined "step", so that means an infinite number of keys. GiST works better for that. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Fri, 2011-01-28 at 21:52 +0000, Thom Brown wrote: > Also, if I try the same, but with a different name for the type, I get > the same error. Why does that restriction exist? Can't you have > types which happen to use the exact same subtype? > At first, that's how I designed it. Then, I realized that the type > system needs to know the range type from the element type in order for > something like ANYRANGE to work. That seems like a fairly bad restriction. In a datatype with multiple useful sort orderings, it'd be desirable to be able to create a range type for each such ordering, no? I'd be inclined to think of a range type as being defined by element type plus a btree opfamily. Maybe it'd be okay to insist on that combination as being unique. regards, tom lane
On Sat, 2011-01-29 at 14:42 -0500, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > On Fri, 2011-01-28 at 21:52 +0000, Thom Brown wrote: > > Also, if I try the same, but with a different name for the type, I get > > the same error. Why does that restriction exist? Can't you have > > types which happen to use the exact same subtype? > > > At first, that's how I designed it. Then, I realized that the type > > system needs to know the range type from the element type in order for > > something like ANYRANGE to work. > > That seems like a fairly bad restriction. In a datatype with multiple > useful sort orderings, it'd be desirable to be able to create a range > type for each such ordering, no? I'd be inclined to think of a range > type as being defined by element type plus a btree opfamily. Maybe it'd > be okay to insist on that combination as being unique. I couldn't find another way to make a function with a definition like: range(ANYELEMENT, ANYELEMENT) returns ANYRANGE work. And it seemed worse to live without a constructor like that. Ideas? Also, it's not based on the btree opfamily right now. It's just based on a user-supplied compare function. I think I could change it to store the opfamily instead, if you think that's a better idea. Regards,Jeff Davis
On 29 January 2011 19:53, Jeff Davis <pgsql@j-davis.com> wrote: > On Sat, 2011-01-29 at 14:42 -0500, Tom Lane wrote: >> Jeff Davis <pgsql@j-davis.com> writes: >> > On Fri, 2011-01-28 at 21:52 +0000, Thom Brown wrote: >> > Also, if I try the same, but with a different name for the type, I get >> > the same error. Why does that restriction exist? Can't you have >> > types which happen to use the exact same subtype? >> >> > At first, that's how I designed it. Then, I realized that the type >> > system needs to know the range type from the element type in order for >> > something like ANYRANGE to work. >> >> That seems like a fairly bad restriction. In a datatype with multiple >> useful sort orderings, it'd be desirable to be able to create a range >> type for each such ordering, no? I'd be inclined to think of a range >> type as being defined by element type plus a btree opfamily. Maybe it'd >> be okay to insist on that combination as being unique. > > I couldn't find another way to make a function with a definition like: > > range(ANYELEMENT, ANYELEMENT) returns ANYRANGE > > work. And it seemed worse to live without a constructor like that. > Ideas? > > Also, it's not based on the btree opfamily right now. It's just based on > a user-supplied compare function. I think I could change it to store the > opfamily instead, if you think that's a better idea. Probably ignorance here, but why does the following not work? postgres=# select '[18,20]'::numrange @> 19; ERROR: operator does not exist: numrange @> integer LINE 1: select '[18,20]'::numrange @> 19; ^ HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. I can see both the wiki page on range types and the pg_operator table appear to indicate this should work: postgres=# select o.oprname, tl.typname as lefttype, tr.typname as righttype from pg_operator o left join pg_type tl on o.oprleft = tl.oid left join pg_type tr on o.oprright = tr.oid where 'anyrange' in (tl.typname, tr.typname) and oprname = '@>';oprname | lefttype | righttype ---------+----------+-------------@> | anyrange | anynonarray@> | anyrange | anyrange (2 rows) -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
On 30 January 2011 02:55, Thom Brown <thom@linux.com> wrote: > On 29 January 2011 19:53, Jeff Davis <pgsql@j-davis.com> wrote: >> On Sat, 2011-01-29 at 14:42 -0500, Tom Lane wrote: >>> Jeff Davis <pgsql@j-davis.com> writes: >>> > On Fri, 2011-01-28 at 21:52 +0000, Thom Brown wrote: >>> > Also, if I try the same, but with a different name for the type, I get >>> > the same error. Why does that restriction exist? Can't you have >>> > types which happen to use the exact same subtype? >>> >>> > At first, that's how I designed it. Then, I realized that the type >>> > system needs to know the range type from the element type in order for >>> > something like ANYRANGE to work. >>> >>> That seems like a fairly bad restriction. In a datatype with multiple >>> useful sort orderings, it'd be desirable to be able to create a range >>> type for each such ordering, no? I'd be inclined to think of a range >>> type as being defined by element type plus a btree opfamily. Maybe it'd >>> be okay to insist on that combination as being unique. >> >> I couldn't find another way to make a function with a definition like: >> >> range(ANYELEMENT, ANYELEMENT) returns ANYRANGE >> >> work. And it seemed worse to live without a constructor like that. >> Ideas? >> >> Also, it's not based on the btree opfamily right now. It's just based on >> a user-supplied compare function. I think I could change it to store the >> opfamily instead, if you think that's a better idea. > > Probably ignorance here, but why does the following not work? > > postgres=# select '[18,20]'::numrange @> 19; > ERROR: operator does not exist: numrange @> integer > LINE 1: select '[18,20]'::numrange @> 19; > ^ > HINT: No operator matches the given name and argument type(s). You > might need to add explicit type casts. > > > I can see both the wiki page on range types and the pg_operator table > appear to indicate this should work: > > postgres=# select o.oprname, tl.typname as lefttype, tr.typname as > righttype from pg_operator o left join pg_type tl on o.oprleft = > tl.oid left join pg_type tr on o.oprright = tr.oid where 'anyrange' in > (tl.typname, tr.typname) and oprname = '@>'; > oprname | lefttype | righttype > ---------+----------+------------- > @> | anyrange | anynonarray > @> | anyrange | anyrange > (2 rows) As for docs, anyrange will need mentioning as part of the information about polymorphic types: http://developer.postgresql.org/pgdocs/postgres/extend-type-system.html And on the pseudo-types page: http://developer.postgresql.org/pgdocs/postgres/datatype-pseudo.html -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
[ trying a third time to send this message, apparently there were infrastructure problems before ] On Sun, 2011-01-30 at 02:55 +0000, Thom Brown wrote: > postgres=# select '[18,20]'::numrange @> 19; > ERROR: operator does not exist: numrange @> integer > LINE 1: select '[18,20]'::numrange @> 19; > ^ > HINT: No operator matches the given name and argument type(s). You > might need to add explicit type casts. It's because it doesn't know the type on the right side, and assumes it's an int4. select '[18,20]'::numrange @> 19.0; works. Regards,Jeff Davis
On 30 January 2011 21:33, Jeff Davis <pgsql@j-davis.com> wrote: > [ trying a third time to send this message, apparently there were > infrastructure problems before ] > > On Sun, 2011-01-30 at 02:55 +0000, Thom Brown wrote: >> postgres=# select '[18,20]'::numrange @> 19; >> ERROR: operator does not exist: numrange @> integer >> LINE 1: select '[18,20]'::numrange @> 19; >> ^ >> HINT: No operator matches the given name and argument type(s). You >> might need to add explicit type casts. > > It's because it doesn't know the type on the right side, and assumes > it's an int4. > > select '[18,20]'::numrange @> 19.0; > > works. My misapprehension stems from the assumption that the anyrange,anynonarray entry for the @> operator, and the contains(anyrange, anynonarray) function would resolve since numrange is a subset of anyrange and int4 is a subset of anynonarray. Obviously it shouldn't work as the underlying type of the range isn't an integer, but just trying to understand how the error message came about. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
Jeff Davis <pgsql@j-davis.com> writes: > On Sun, 2011-01-30 at 02:55 +0000, Thom Brown wrote: >> postgres=# select '[18,20]'::numrange @> 19; >> ERROR: operator does not exist: numrange @> integer >> LINE 1: select '[18,20]'::numrange @> 19; >> ^ >> HINT: No operator matches the given name and argument type(s). You >> might need to add explicit type casts. > It's because it doesn't know the type on the right side, and assumes > it's an int4. Well, yeah, it is an int4. The question ought to be phrased "why does the parser fail to promote the int4 to numeric?". There might be some excuse for an "operator is not unique" here, but I don't understand the above failure --- it should be able to use an implicit coercion from int4 to numeric. regards, tom lane
Another updated patch. Improvements: * Full GiST support - Thanks to Alexander Korotkov for sending me a new picksplit algorithm for my "temporal" project on pgfoundry. I modified it for use with range types, including a (hopefully) intelligent way of handling empty and unbounded ranges. * Quite a few tests added, some cleanup done Open items: * naming issues: - period -> tsrange ? - periodtz -> tstzrange ? - intrange -> int4range * add int8range * Documentation improvements - CREATE TYPE - ANYRANGE - Data Types section * Thom Brown and Tom Lane pointed out that the type inferencing should be able to promote int4 to numeric for queries like: select '[18,20]'::numrange @> 19; * Should the SQL function length(), which relies on polymorphic "-", be marked immutable, stable, or volatile? * representation or alignment issues * parser should be improved to handle spaces and quoting better * Should btree_gist be pulled into core to make it easier to use exclusion constraints with range types? * Typmod (optional) Regards, Jeff Davis
Attachment
On Sun, 2011-01-30 at 17:14 -0500, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > On Sun, 2011-01-30 at 02:55 +0000, Thom Brown wrote: > >> postgres=# select '[18,20]'::numrange @> 19; > >> ERROR: operator does not exist: numrange @> integer > >> LINE 1: select '[18,20]'::numrange @> 19; > >> ^ > >> HINT: No operator matches the given name and argument type(s). You > >> might need to add explicit type casts. > > > It's because it doesn't know the type on the right side, and assumes > > it's an int4. > > Well, yeah, it is an int4. The question ought to be phrased "why does > the parser fail to promote the int4 to numeric?". There might be some > excuse for an "operator is not unique" here, but I don't understand the > above failure --- it should be able to use an implicit coercion from > int4 to numeric. The problem exists for arrays, as well, so I think this is just a limitation of the type system. Regards,Jeff Davis postgres=# select ARRAY[1.4,1.5,1.6]::numeric[] || 5.0; ?column? -------------------{1.4,1.5,1.6,5.0} (1 row) postgres=# select ARRAY[1.4,1.5,1.6]::numeric[] || 5; ERROR: operator does not exist: numeric[] || integer LINE 1: select ARRAY[1.4,1.5,1.6]::numeric[] || 5; ^ HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
On sön, 2011-01-30 at 14:52 -0800, Jeff Davis wrote: > * naming issues: > - period -> tsrange ? > - periodtz -> tstzrange ? > - intrange -> int4range Have you considered a grammar approach like for arrays, so that you would write something like CREATE TABLE ... ( foo RANGE OF int ); instead of explicitly creating a range type for every scalar type in existence? I think that that might be easier to use in the common case. I guess the trick might be how to store and pass the operator class and some other parameters.
On Mon, 2011-02-07 at 20:32 +0200, Peter Eisentraut wrote: > Have you considered a grammar approach like for arrays, so that you > would write something like > > CREATE TABLE ... ( > foo RANGE OF int > ); > > instead of explicitly creating a range type for every scalar type in > existence? I think that that might be easier to use in the common case. It would be nice, but the type system just isn't powerful enough to express things like that right now, as far as I can tell. That works for arrays because every type in PG has a second pg_type entry for the array type. I don't think we want to do something similar for range types -- especially if there are alternative range types for a given base type. Regards,Jeff Davis