Thread: Range Types
New patch. All known TODO items are closed, although I should do a cleanup pass over the code and docs. Fixed in this patch: * Many documentation improvements * Added INT8RANGE * Renamed PERIOD[TZ] -> TS[TZ]RANGE * Renamed INTRANGE -> INT4RANGE * Improved parser's handling of whitespace and quotes * Support for PL/pgSQL functions with ANYRANGE arguments/returns * Make "subtype_float" function no longer a requirement for GiST, but it should still be supplied for the penalty function to be useful. There are some open issues remaining, however: * Is typmod worth doing? I could complete it pretty quickly, but it involves introducing a new Node type, which seems a little messy for the benefit. * Should pg_range reference the btree opclass or the compare function directly? * Should subtype_cmp default to the default btree opclass's compare function? * Right now only superusers can define a range type. Should we open it up to normal users? * Should the SQL (inlinable) function "length", which relies on polymorphic "-", be immutable, strict, or volatile? * Later we might consider whether we should include btree_gist in core, to make range types more useful with exclusion constraints out-of-the-box. This should be left for later, I'm just including this for the archives so it doesn't get lost. Regards, Jeff Davis
Attachment
On Sun, February 6, 2011 07:41, Jeff Davis wrote: > New patch. All known TODO items are closed, although I should do a I've been testing and find the patch to be generally very stable. More review follows ASAP, but I wanted to mention this curious 'bug' already. (below all with latest git + patch 2011.02.05; but occurred also in 2011.01.30 version) I was trying where intrange @> integer which admittedly is not in the documentation, but does already half work, and would be really convenient to have. As it stands the construct seems to fail after ANALYZE, when there is more than 1 row: -- file t/cache_lookup_failure2.sql drop table if exists t; create table t (ir int4range); insert into t values (range(1,11)); select count(*) from t; select * from t where ir @> 5; --> OK analyze t; select * from t where ir @> 5; --> OK insert into t values (range(1,11)); select * from t where ir @> 5; analyze t; select * from t where ir @> 5; ------------------------ running that file gives me PGPORT=6563 PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.range_types/data PGDATABASE=testdb 2011.02.06 20:03:03 rijkers@denkraam:/var/data1/pg_stuff/pg_sql/pgsql.range_types [0] $ clear; psql -Xqa -d testdb -f t/cache_lookup_failure2.sql drop table if exists t; create table t (ir int4range); insert into t values (range(1,11)); select count(*) from t;count ------- 1 (1 row) select * from t where ir @> 5; --> OK ir -----------[ 1, 11 ) (1 row) analyze t; select * from t where ir @> 5; --> OK ir -----------[ 1, 11 ) (1 row) insert into t values (range(1,11)); select * from t where ir @> 5; ir -----------[ 1, 11 )[ 1, 11 ) (2 rows) analyze t; select * from t where ir @> 5; psql:t/cache_lookup_failure2.sql:11: ERROR: cache lookup failed for function 0 (of course I realize that ... WHERE ir @> range(5); "fixes" the problem, but it would be really nice to have it work on integers) Thanks, Erik Rijkers
Hi, Jeff Davis <pgsql@j-davis.com> writes: > * Should pg_range reference the btree opclass or the compare function > directly? I would say yes. We use the btree opclass in other similar situations. > * Should subtype_cmp default to the default btree opclass's compare > function? My vote is yes too. > * Right now only superusers can define a range type. Should we open it > up to normal users? Is there any reason to restrict who's get to use the feature? I don't see any… > * Should the SQL (inlinable) function "length", which relies on > polymorphic "-", be immutable, strict, or volatile? I would think stable: polymorphic means that the function implementing the "-" operator depends on the argument. I don't recall that it depends on them in a volatile way… except if you change the operator definition, which is possible to do (so not immutable). > * Later we might consider whether we should include btree_gist in > core, to make range types more useful with exclusion constraints > out-of-the-box. This should be left for later, I'm just including this > for the archives so it doesn't get lost. I would expect the extension to have something to offer here. ~=# create extension btree_gist with schema pg_catalog; CREATE EXTENSION Now you can act as if it were part of core. Including not being able to ALTER EXTENSION SET SCHEMA (cannot remove dependency on schema pg_catalog because it is a system object), but maybe the recent changes that Tom done on the usage of DEPENDENCY_INTERNAL in the extension patch will open that again. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, 2011-02-06 at 20:10 +0100, Erik Rijkers wrote: > I was trying > where intrange @> integer > > which admittedly is not in the documentation, > but does already half work, and would be really > convenient to have. As it stands the construct > seems to fail after ANALYZE, when there is more > than 1 row: Thank you for the report! I actually did make some mention of that in the documentation, albeit brief (in the operators table, using timestamp as an example). I have a fix for it. There may still be an issue with the constructors like range(1,2), so I'll look into it a little more, but an updated patch should come soon. Regards,Jeff Davis
On Mon, 2011-02-07 at 13:33 +0100, Dimitri Fontaine wrote: > Hi, > > Jeff Davis <pgsql@j-davis.com> writes: > > * Should pg_range reference the btree opclass or the compare function > > directly? > > I would say yes. We use the btree opclass in other similar situations. Ok, but what should the parameter to CREATE TYPE ... AS RANGE be then? CREATE TYPE foo AS RANGE ( SUBTYPE = ... SUBTYPE_BTREE_OPERATOR_CLASS = ... ); is a little verbose. Ideas? > Is there any reason to restrict who's get to use the feature? I don't > see any… Mostly just paranoia. If they define a strange canonical function, maybe that would cause a problem. Then again, they would have to define that in C to cause a problem anyway. I'll leave it as superuser-only for now, and see if anyone else raises potential problems. > > * Should the SQL (inlinable) function "length", which relies on > > polymorphic "-", be immutable, strict, or volatile? > > I would think stable: polymorphic means that the function > implementing the "-" operator depends on the argument. I don't recall > that it depends on them in a volatile way… except if you change the > operator definition, which is possible to do (so not immutable). I was concerned about someone defining "-" with a stable or volatile function as the definition. I'm not sure if that is a reasonable concern or not. > > * Later we might consider whether we should include btree_gist in > > core, to make range types more useful with exclusion constraints > > out-of-the-box. This should be left for later, I'm just including this > > for the archives so it doesn't get lost. > > I would expect the extension to have something to offer here. Yes. With extensions and PGXN, I would hope that installing btree_gist would not be much of a problem. However, I eventually intend to submit features like "RANGE KEY", a language extension that would need something like btree_gist to work very well at all. Technically btree_gist is not required, but in practice it is necessary to use ranges and exclusion constraints together effectively. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > Ok, but what should the parameter to CREATE TYPE ... AS RANGE be then? > > CREATE TYPE foo AS RANGE ( > SUBTYPE = ... > SUBTYPE_BTREE_OPERATOR_CLASS = ... > ); > > is a little verbose. Ideas? I would think CREATE TYPE foo AS RANGE (bar) USING (btree_ops); The USING clause is optional, because you generally have a default btree opclass for the datatype. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Mon, 2011-02-07 at 18:23 +0100, Dimitri Fontaine wrote: > I would think > > CREATE TYPE foo AS RANGE (bar) USING (btree_ops); > > The USING clause is optional, because you generally have a default btree > opclass for the datatype. There are other options, like "CANONICAL", so where do those fit? If CREATE TYPE already has an options list, it seems a little strange to add grammar to support this feature. "USING" doesn't seem to mean a lot, except that we happen to use it in other contexts to mean "operator class". Regards,Jeff Davis
On Sun, February 6, 2011 07:41, Jeff Davis wrote: > New patch. All known TODO items are closed, although I should do a > cleanup pass over the code and docs. > > Fixed in this patch: > > * Many documentation improvements > * Added INT8RANGE > * Renamed PERIOD[TZ] -> TS[TZ]RANGE > * Renamed INTRANGE -> INT4RANGE > * Improved parser's handling of whitespace and quotes > * Support for PL/pgSQL functions with ANYRANGE arguments/returns > * Make "subtype_float" function no longer a requirement for GiST, > but it should still be supplied for the penalty function to be > useful. > I'm afraid even the review is WIP, but I thought I'd post what I have. Context: At the moment we use postbio (see below) range functionality, to search ranges and overlap in large DNA databases ('genomics'). We would be happy if a core data type could replace that. It does look like the present patch is ready to do those same tasks, of which the main one for us is gist-indexed ranges. We also use btree_gist with that, so to include that in core would make sense in this regard. test config: ./ configure \ --prefix=/var/data1/pg_stuff/pg_installations/pgsql.range_types \ --with-pgport=6563 \ --enable-depend \ --enable-cassert\ --enable-debug \ --with-perl \ --with-openssl \ --with-libxml \ --enable-dtrace compile, make, check, install all OK. ------------------ Submission review: ------------------ * Is the patch in context diff format? Yes. * Does it apply cleanly to the current git master? It applied cleanly. (after the large serialisation commit 2011.02.08 it will need some changes/rebase) * Does it include reasonable tests, necessary doc patches, etc? Yes, there are many tests; the documentation is good. Small improvements below. ----------------- Usability review ----------------- Read what the patch is supposed to do, and consider: * Does the patch actually implement that? Yes. * Do we want that? Yes. * Do we already have it? contrib/seg has some similar functionalities: "seg is a data type for representing line segments, or floating point intervals". And on pgFoundry there is a seg spin-off "postbio", tailored to genomic data (with gist-indexing). (see postbio manual: http://postbio.projects.postgresql.org/ ) * Does it follow SQL spec, or the community-agreed behavior? I don't know - I couldn't find much in the SQL-spec on a range datatype. The ranges behaviour has been discussed on -hackers. * Does it include pg_dump support (if applicable)? dump/restore were fine in the handful of range-tables which I moved between machines. * Are there dangers? Not that I could find. * Have all the bases been covered? I think the functionality looks fairly complete. ------------- Feature test: ------------- * Does the feature work as advertised? The patch seems very stable. My focus has been mainly on the intranges. I tested by parsing documentation- and regression examples, and parametrising them in a perl harness, to generate many thousands of range combinations. I found only a single small problem (posted earlier - Jeff Davis solved it already apparently). see: http://archives.postgresql.org/pgsql-hackers/2011-02/msg00387.php * Are there corner cases the author has failed to consider? No. * Are there any assertion failures or crashes? I haven't seen a single one. -------------- Documentation: -------------- Section 9.18: table 9-42. range functions: The following functions are missing (I encountered them in the regression tests): contained_by() range_eq() section 'Constructing Ranges' (8.16.6): In the code example, remove the following line: "-- the int4range result will appearin the canonical format" it doesn't make sense there. At this place "canonical format" has not been discussed; maybe it is not even discussed anywhere. also (same place): 'where "_" is used to mean "exclusive" and "" is used to mean "inclusive".' should be: 'where "_" is used to mean "exclusive" and "i" is used to mean "inclusive".' And btw: it should mention here that the range*inf* functions, an underscore to separate 'range' from the rest of the function name, e.g.: range_linfi_() => infinite lower bound, inclusiveupper bound I still want to do Performance review and Coding review. FWIW, I would like to repeat that my impression is that the patch is very stable, especially with regard to the intranges (tested extensively). regards, Erik Rijkers
On Tue, 2011-02-08 at 09:10 -0800, Jeff Davis wrote: > On Mon, 2011-02-07 at 18:23 +0100, Dimitri Fontaine wrote: > > I would think > > > > CREATE TYPE foo AS RANGE (bar) USING (btree_ops); > > > > The USING clause is optional, because you generally have a default btree > > opclass for the datatype. > > There are other options, like "CANONICAL", so where do those fit? > > If CREATE TYPE already has an options list, it seems a little strange to > add grammar to support this feature. "USING" doesn't seem to mean a lot, > except that we happen to use it in other contexts to mean "operator > class". For the user-facing part, how about just passing it as a parameter called "SUBTYPE_OPCLASS"? It sounds a little on the "internal detail" side, but so do some other type definition parameters. As for the catalog, I'm inclined to leave the compare function in there directly and just add a dependency on the opclass. That way, it's only one syscache lookup rather than two, to get the compare function oid. Then again, perhaps that doesn't matter anyway. Thoughts? Regards,Jeff Davis
On Tue, 2011-02-08 at 20:43 +0100, Erik Rijkers wrote: > -------------- > Documentation: > -------------- > > Section 9.18: > table 9-42. range functions: > The following functions are missing (I encountered them in the regression tests): > contained_by() > range_eq() I opted not to document functions that mostly exist to implement operators. Some people might prefer the names, but then they don't benefit from GiST index searches. So, I left this as-is for now. > section 'Constructing Ranges' (8.16.6): > In the code example, remove the following line: > "-- the int4range result will appear in the canonical format" > it doesn't make sense there. At this place "canonical format" has not been discussed; > maybe it is not even discussed anywhere. Thank you. I have added a forward reference. > also (same place): > 'where "_" is used to mean "exclusive" and "" is used to mean "inclusive".' > should be: > 'where "_" is used to mean "exclusive" and "i" is used to mean "inclusive".' Thank you, fixed. > And btw: it should mention here that the range*inf* functions, > an underscore to separate 'range' from the rest of the function name, e.g.: > range_linfi_() => infinite lower bound, inclusive upper bound I tried to clarify that section. Thank you for the review! Updated patch forthcoming. Regards,Jeff Davis
Updated patch. Changes: * Addressed Erik's review comments. * Fixed issue with "range @> elem" found by Erik. * Merged with latest HEAD * Changed representation to be more efficient and more robust (could use some testing though, because I just did this tonight) TODO: * send/recv -- just noticed this tonight, no reason not to do it Open Items: * Maybe typmod * grammar -- ask for btree opclass, or compare function? * catalog -- store btree opclass, or compare function? * should non-superusers be able to create range types? * constructor issues I just posted about * SQL length function --immutable/stable/volatile? As always, my repo is here: http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes Regards, Jeff Davis
Attachment
On Wed, February 9, 2011 09:35, Jeff Davis wrote: > Updated patch. > Thanks! I just wanted to mention that this latest patch doesn't quite apply as-is, because of catversion changes. I've removed the change to catversion.h (18 lines, starting at 4985) from the patch file; then it applies cleanly. Erik Rijkers > > As always, my repo is here: > > http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes > > Regards, > Jeff Davis > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
"Erik Rijkers" <er@xs4all.nl> writes: > On Wed, February 9, 2011 09:35, Jeff Davis wrote: >> Updated patch. > I just wanted to mention that this latest patch doesn't quite apply as-is, because of catversion changes. Just a note: standard practice is for submitted patches to *not* touch catversion.h. The committer will add that change before committing. Otherwise, it's just guaranteed to cause merge problems such as this one. (It's not unreasonable to mention the need for a catversion bump in the description of the patch, if you think the committer might not realize it.) regards, tom lane
On Thu, 2011-02-10 at 12:04 -0500, Tom Lane wrote: > "Erik Rijkers" <er@xs4all.nl> writes: > > On Wed, February 9, 2011 09:35, Jeff Davis wrote: > >> Updated patch. > > > I just wanted to mention that this latest patch doesn't quite apply as-is, because of catversion changes. > > Just a note: standard practice is for submitted patches to *not* touch > catversion.h. The committer will add that change before committing. > Otherwise, it's just guaranteed to cause merge problems such as this > one. (It's not unreasonable to mention the need for a catversion bump > in the description of the patch, if you think the committer might not > realize it.) OK, I'll remove that then. I originally put it there so that I wouldn't mix up data directories with a patch I'm reviewing, but I agree that it seems easier this way. Regards,Jeff Davis
On tor, 2011-02-10 at 09:28 -0800, Jeff Davis wrote: > I originally put it there so that I wouldn't mix up data directories > with a patch I'm reviewing, but I agree that it seems easier this way. FWIW, I disagree with Tom and do recommend putting the catversion change in the patch.
On Thu, 2011-02-10 at 15:38 +0100, Erik Rijkers wrote: > I've removed the change to catversion.h (18 lines, starting at 4985) from the patch file; then it > applies cleanly. I should mention that the last patch changed the representation to be more compact. So, if you have any existing test data it will need to be reloaded to work with the latest. Regards,Jeff Davis
On 10.02.2011 20:01, Peter Eisentraut wrote: > On tor, 2011-02-10 at 09:28 -0800, Jeff Davis wrote: >> I originally put it there so that I wouldn't mix up data directories >> with a patch I'm reviewing, but I agree that it seems easier this way. > > FWIW, I disagree with Tom and do recommend putting the catversion change > in the patch. I'm very bad at remembering to bump it, so I also won't mind patch authors doing it. The ideal reminder would be some special comment you could put on the catversion line that would cause "git push" to fail if it's still there when I try to push the commit to the repository. There doesn't seem to be a "pre-push" hook in git, although some googling suggests that it would be quite easy to write a small wrapper shell script to check that. I'm seriously considering to do that, given that I more often forget to bump catversion than not. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Feb 10, 2011 at 1:23 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 10.02.2011 20:01, Peter Eisentraut wrote: >> >> On tor, 2011-02-10 at 09:28 -0800, Jeff Davis wrote: >>> >>> I originally put it there so that I wouldn't mix up data directories >>> with a patch I'm reviewing, but I agree that it seems easier this way. >> >> FWIW, I disagree with Tom and do recommend putting the catversion change >> in the patch. > > I'm very bad at remembering to bump it, so I also won't mind patch authors > doing it. > > The ideal reminder would be some special comment you could put on the > catversion line that would cause "git push" to fail if it's still there when > I try to push the commit to the repository. There doesn't seem to be a > "pre-push" hook in git, although some googling suggests that it would be > quite easy to write a small wrapper shell script to check that. I'm > seriously considering to do that, given that I more often forget to bump > catversion than not. And I share Tom's preference, which is to not include it, because I usually apply patches using patch, and when diff hunks fail it's a nuisance for me. So basically, do whatever you want, someone won't like it no matter what. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, February 9, 2011 09:35, Jeff Davis wrote: > Updated patch. > The operators << >> and -|- have the following behavior with empty ranges: testdb=# select '-'::int4range << range(200,300); ERROR: empty range testdb=# select '-'::int4range >> range(200,300); ERROR: empty range testdb=# select '-'::int4range -|- range(200,300); ERROR: empty range I'm not sure if that is deliberate behavior, but they seem almost bugs to me. Wouldn't it be better (and more practical) if these would return false (or perhaps NULL, for 'unknown') ? (the same goes for all the other range types, btw.) Erik Rijkers
On Fri, 2011-02-11 at 15:09 +0100, Erik Rijkers wrote: > On Wed, February 9, 2011 09:35, Jeff Davis wrote: > > Updated patch. > > > > The operators << >> and -|- have the following behavior with empty ranges: > > testdb=# select '-'::int4range << range(200,300); > ERROR: empty range > testdb=# select '-'::int4range >> range(200,300); > ERROR: empty range > testdb=# select '-'::int4range -|- range(200,300); > ERROR: empty range > > I'm not sure if that is deliberate behavior, but they seem > almost bugs to me. It's deliberate, but it looks like the error messages could use some improvement. > Wouldn't it be better (and more practical) if these would > return false (or perhaps NULL, for 'unknown') ? I'm hesitant to return NULL when the inputs are known. If we were to define these functions for empty ranges, I would think they would all return true. "<<" and ">>" ("strictly left of" and "strictly right of", respectively) could be seen to start out as true and return false if it finds a point overlapping or on the other side. The primary use case for "-|-" (adjacent) is to see if your ranges are contiguous and non-overlapping. For empty ranges, that seems to be true. I'm not disagreeing with your interpretation really. I think that different people will assume different behavior, and so it's more likely to cause confusion. An error early on will allow them to do something like: CASE WHEN myrange? THEN myrange -|- range(10,20) ELSE TRUE END So that they (and anyone who reads their query) can see explicitly what's happening, without looking in the manual for details. I'm open to suggestion, however. If we can get a reasonable consensus on the values these functions should return, I'll change it. Regards,Jeff Davis
On Fri, Feb 11, 2011 at 11:55 AM, Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, 2011-02-11 at 15:09 +0100, Erik Rijkers wrote: >> On Wed, February 9, 2011 09:35, Jeff Davis wrote: >> > Updated patch. >> > >> >> The operators << >> and -|- have the following behavior with empty ranges: >> >> testdb=# select '-'::int4range << range(200,300); >> ERROR: empty range >> testdb=# select '-'::int4range >> range(200,300); >> ERROR: empty range >> testdb=# select '-'::int4range -|- range(200,300); >> ERROR: empty range >> >> I'm not sure if that is deliberate behavior, but they seem >> almost bugs to me. > > It's deliberate, but it looks like the error messages could use some > improvement. > >> Wouldn't it be better (and more practical) if these would >> return false (or perhaps NULL, for 'unknown') ? > > I'm hesitant to return NULL when the inputs are known. > > If we were to define these functions for empty ranges, I would think > they would all return true. > > "<<" and ">>" ("strictly left of" and "strictly right of", respectively) > could be seen to start out as true and return false if it finds a point > overlapping or on the other side. > > The primary use case for "-|-" (adjacent) is to see if your ranges are > contiguous and non-overlapping. For empty ranges, that seems to be true. > > I'm not disagreeing with your interpretation really. I think that > different people will assume different behavior, and so it's more likely > to cause confusion. An error early on will allow them to do something > like: > CASE WHEN myrange? THEN myrange -|- range(10,20) ELSE TRUE END > So that they (and anyone who reads their query) can see explicitly > what's happening, without looking in the manual for details. > > I'm open to suggestion, however. If we can get a reasonable consensus on > the values these functions should return, I'll change it. For what it's worth, my completely uninformed opinion is that comparison operators shouldn't error out. I haven't read the patch so I'm not sure what those operators are defined to do, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 2011-02-11 at 12:03 -0500, Robert Haas wrote: > For what it's worth, my completely uninformed opinion is that > comparison operators shouldn't error out. I haven't read the patch so > I'm not sure what those operators are defined to do, though. ">>" means "strictly right of" "<<" means "strictly left of" "-|-" means "adjacent" (touching but not overlapping) I'm open to suggestion about how those behave with empty ranges. Regards,Jeff Davis
On Fri, Feb 11, 2011 at 12:26 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, 2011-02-11 at 12:03 -0500, Robert Haas wrote: >> For what it's worth, my completely uninformed opinion is that >> comparison operators shouldn't error out. I haven't read the patch so >> I'm not sure what those operators are defined to do, though. > > ">>" means "strictly right of" > "<<" means "strictly left of" > "-|-" means "adjacent" (touching but not overlapping) > > I'm open to suggestion about how those behave with empty ranges. Hmm, so an empty range is a range that includes nothing at all, right?Not "everything in the world"? Are we sure we even want to have that concept? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Jeff Davis <pgsql@j-davis.com> wrote: > ">>" means "strictly right of" > "<<" means "strictly left of" > "-|-" means "adjacent" (touching but not overlapping) > > I'm open to suggestion about how those behave with empty ranges. OK, that still leaves a lot to the imagination, though. To try to clarify in *my* mind: "empty range" ============= Zero length? If so, is it fixed at some point, but empty? '(x,x)'? '[x,x)'? Is it everything? '[-inf,+inf]'? Is it really meaningfully distinct from NULL? Where do you see it being useful? -Kevin
On 02/11/2011 12:36 PM, Robert Haas wrote: > On Fri, Feb 11, 2011 at 12:26 PM, Jeff Davis<pgsql@j-davis.com> wrote: >> On Fri, 2011-02-11 at 12:03 -0500, Robert Haas wrote: >>> For what it's worth, my completely uninformed opinion is that >>> comparison operators shouldn't error out. I haven't read the patch so >>> I'm not sure what those operators are defined to do, though. >> ">>" means "strictly right of" >> "<<" means "strictly left of" >> "-|-" means "adjacent" (touching but not overlapping) >> >> I'm open to suggestion about how those behave with empty ranges. > Hmm, so an empty range is a range that includes nothing at all, right? > Not "everything in the world"? > > Are we sure we even want to have that concept? I have no particular opinion on that, but if we do then ISTM all the above (and particularly the last) should return false if either operand is an empty range. cheers andrew
> "empty range" > ============= > Zero length? > If so, is it fixed at some point, but empty? > '(x,x)'? > '[x,x)'? Neither of the above should be possible, I think. The expression "(x" logically excludes the expression "x)". However, "[x,x]" would be valid, and would be a zero-length interval at the point "x". > Is it everything? > '[-inf,+inf]'? No, that's "everything" which is a different concept. The above also ought to be possible, and overlap everything. > Is it really meaningfully distinct from NULL? Yes. NULL means "I don't know". If a range type IS NULL, then any operation performed with it ought to be NULL. Hence: IF y > x, THEN: [x,x] << [y,z) == TRUE [x,x] -|- (x,y] == TRUE NULL << [y,z} IS NULL [-inf,+inf] << [y,z) == FALSE I can imagine using all of these constructs in actual applications. In fact, I have *already* used [-inf,+inf] -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Fri, February 11, 2011 19:02, Josh Berkus wrote: > >> "empty range" >> ============= >> Zero length? >> If so, is it fixed at some point, but empty? >> '(x,x)'? >> '[x,x)'? > > Neither of the above should be possible, I think. The expression "(x" > logically excludes the expression "x)". > > However, "[x,x]" would be valid, and would be a zero-length interval at > the point "x". > >> Is it everything? >> '[-inf,+inf]'? > > No, that's "everything" which is a different concept. The above also > ought to be possible, and overlap everything. > >> Is it really meaningfully distinct from NULL? > > Yes. NULL means "I don't know". If a range type IS NULL, then any > operation performed with it ought to be NULL. Hence: > > IF y > x, THEN: > > [x,x] << [y,z) == TRUE > [x,x] -|- (x,y] == TRUE > NULL << [y,z} IS NULL > [-inf,+inf] << [y,z) == FALSE > > I can imagine using all of these constructs in actual applications. In > fact, I have *already* used [-inf,+inf] > You say yes, but you don't mention "empty range". Maybe we can indeed do without the concept of an empty-range?; it might simplify implementation as well as usage. I'm not decided, but I do find it hard to find a plausible use-case for it.
Josh Berkus <josh@agliodbs.com> wrote: > I can imagine using all of these constructs in actual > applications. But which of them, if any, is an "empty range"? -Kevin
Where are we on this? --------------------------------------------------------------------------- Erik Rijkers wrote: > On Wed, February 9, 2011 09:35, Jeff Davis wrote: > > Updated patch. > > > > The operators << >> and -|- have the following behavior with empty ranges: > > testdb=# select '-'::int4range << range(200,300); > ERROR: empty range > testdb=# select '-'::int4range >> range(200,300); > ERROR: empty range > testdb=# select '-'::int4range -|- range(200,300); > ERROR: empty range > > I'm not sure if that is deliberate behavior, but they seem > almost bugs to me. > > Wouldn't it be better (and more practical) if these would > return false (or perhaps NULL, for 'unknown') ? > > (the same goes for all the other range types, btw.) > > > Erik Rijkers > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Fri, 2011-03-11 at 08:37 -0500, Bruce Momjian wrote: > Where are we on this? The options are: 1. Rip out empty ranges. Several people have been skeptical of their usefulness, but I don't recall anyone directly saying that they should be removed. Robert Haas made the point that range types aren't closed under UNION: http://archives.postgresql.org/pgsql-hackers/2011-02/msg01045.php So the additional nice mathematical properties provided by empty ranges are not as important (because it wouldn't be perfect anyway). 2. Change the semantics. Erik Rijkers suggested that we define all operators for empty ranges, perhaps using NULL semantics: http://archives.postgresql.org/pgsql-hackers/2011-02/msg00942.php And Kevin Grittner suggested that there could be discrete ranges of zero length yet a defined starting point: http://archives.postgresql.org/pgsql-hackers/2011-02/msg01042.php 3. Leave empty ranges with the existing "empty set" semantics. Nathan Boley made a good point here: http://archives.postgresql.org/pgsql-hackers/2011-02/msg01108.php Right now it's #3, and I lean pretty strongly toward keeping it. Without #3, people will get confused when fairly simple operations fail in a data-dependent way (at runtime). With #3, people will run into problems only in situations where it is fairly dubious to have an empty range anyway (and therefore likely a real error), such as finding ranges "left of" an empty range. Otherwise, I'd prefer #1 to #2. I think #2 is a bad path to take, and we'll end up with a lot of unintuitive and error-prone operators. Regards,Jeff Davis
On Fri, Mar 11, 2011 at 12:37 PM, Jeff Davis <pgsql@j-davis.com> wrote: > Right now it's #3, and I lean pretty strongly toward keeping it. Without > #3, people will get confused when fairly simple operations fail in a > data-dependent way (at runtime). With #3, people will run into problems > only in situations where it is fairly dubious to have an empty range > anyway (and therefore likely a real error), such as finding ranges "left > of" an empty range. That seems pretty apropos to me. > Otherwise, I'd prefer #1 to #2. I think #2 is a bad path to take, and > we'll end up with a lot of unintuitive and error-prone operators. I think back to your essay on the nonintuitiveness of NULL (<http://thoughts.j-davis.com/2009/08/02/what-is-the-deal-with-nulls/>), and suggest the thought that picking #2 would add to the already existent confusion. -- http://linuxfinances.info/info/linuxdistributions.html