Thread: Range Types

Range Types

From

Jeff Davis

Date:

06 February 2011, 02:49:38

New patch. All known TODO items are closed, although I should do a
cleanup pass over the code and docs.

Fixed in this patch:

  * Many documentation improvements
  * Added INT8RANGE
  * Renamed PERIOD[TZ] -> TS[TZ]RANGE
  * Renamed INTRANGE -> INT4RANGE
  * Improved parser's handling of whitespace and quotes
  * Support for PL/pgSQL functions with ANYRANGE arguments/returns
  * Make "subtype_float" function no longer a requirement for GiST,
    but it should still be supplied for the penalty function to be
    useful.

There are some open issues remaining, however:

  * Is typmod worth doing? I could complete it pretty quickly, but it
involves introducing a new Node type, which seems a little messy for the
benefit.

  * Should pg_range reference the btree opclass or the compare function
directly?

  * Should subtype_cmp default to the default btree opclass's compare
function?

  * Right now only superusers can define a range type. Should we open it
up to normal users?

  * Should the SQL (inlinable) function "length", which relies on
polymorphic "-", be immutable, strict, or volatile?

  * Later we might consider whether we should include btree_gist in
core, to make range types more useful with exclusion constraints
out-of-the-box. This should be left for later, I'm just including this
for the archives so it doesn't get lost.

Regards,
    Jeff Davis

Attachment

rangetypes-20110205.patch.gz

Re: Range Types - cache lookup failed for function

From

"Erik Rijkers"

Date:

06 February 2011, 15:20:40

On Sun, February 6, 2011 07:41, Jeff Davis wrote:
> New patch. All known TODO items are closed, although I should do a



I've been testing and find the patch to be
generally very stable.

More review follows ASAP, but I wanted to mention
this curious 'bug' already.

(below all with latest git + patch 2011.02.05; but
occurred also in 2011.01.30 version)

I was trying   where intrange @> integer

which admittedly is not in the documentation,
but does already half work, and would be really
convenient to have.  As it stands the construct
seems to fail after ANALYZE, when there is more
than 1 row:

-- file t/cache_lookup_failure2.sql
drop table if exists t;
create table t (ir int4range);
insert into t values (range(1,11));
select count(*) from t;
select * from t where ir @> 5; --> OK
analyze t;
select * from t where ir @> 5; --> OK
insert into t values (range(1,11));
select * from t where ir @> 5;
analyze t;
select * from t where ir @> 5;
------------------------
running that file gives me

PGPORT=6563
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.range_types/data
PGDATABASE=testdb

2011.02.06 20:03:03 rijkers@denkraam:/var/data1/pg_stuff/pg_sql/pgsql.range_types [0]
$ clear; psql -Xqa -d testdb -f t/cache_lookup_failure2.sql
drop table if exists t;
create table t (ir int4range);
insert into t values (range(1,11));
select count(*) from t;count
-------    1
(1 row)

select * from t where ir @> 5; --> OK   ir
-----------[ 1, 11 )
(1 row)

analyze t;
select * from t where ir @> 5; --> OK   ir
-----------[ 1, 11 )
(1 row)

insert into t values (range(1,11));
select * from t where ir @> 5;   ir
-----------[ 1, 11 )[ 1, 11 )
(2 rows)

analyze t;
select * from t where ir @> 5;
psql:t/cache_lookup_failure2.sql:11: ERROR:  cache lookup failed for function 0


(of course I realize that  ... WHERE ir @> range(5);
"fixes" the problem, but it would be
really nice to have it work on integers)

Thanks,

Erik Rijkers

Re: Range Types

From

Dimitri Fontaine

Date:

07 February 2011, 08:33:34

Hi,

Jeff Davis <pgsql@j-davis.com> writes:
>   * Should pg_range reference the btree opclass or the compare function
> directly?

I would say yes.  We use the btree opclass in other similar situations.

>   * Should subtype_cmp default to the default btree opclass's compare
> function?

My vote is yes too.

>   * Right now only superusers can define a range type. Should we open it
> up to normal users?

Is there any reason to restrict who's get to use the feature?  I don't
see any…

>   * Should the SQL (inlinable) function "length", which relies on
> polymorphic "-", be immutable, strict, or volatile?

I would think stable: polymorphic means that the function
implementing the "-" operator depends on the argument.  I don't recall
that it depends on them in a volatile way… except if you change the
operator definition, which is possible to do (so not immutable).

>   * Later we might consider whether we should include btree_gist in
> core, to make range types more useful with exclusion constraints
> out-of-the-box. This should be left for later, I'm just including this
> for the archives so it doesn't get lost.

I would expect the extension to have something to offer here.
 ~=# create extension btree_gist with schema pg_catalog; CREATE EXTENSION

Now you can act as if it were part of core.  Including not being able to
ALTER EXTENSION SET SCHEMA (cannot remove dependency on schema
pg_catalog because it is a system object), but maybe the recent changes
that Tom done on the usage of DEPENDENCY_INTERNAL in the extension patch
will open that again.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Range Types - cache lookup failed for function

From

Jeff Davis

Date:

07 February 2011, 12:53:06

On Sun, 2011-02-06 at 20:10 +0100, Erik Rijkers wrote:
> I was trying
>     where intrange @> integer
> 
> which admittedly is not in the documentation,
> but does already half work, and would be really
> convenient to have.  As it stands the construct
> seems to fail after ANALYZE, when there is more
> than 1 row:

Thank you for the report! I actually did make some mention of that in
the documentation, albeit brief (in the operators table, using timestamp
as an example).

I have a fix for it. There may still be an issue with the constructors
like range(1,2), so I'll look into it a little more, but an updated
patch should come soon.

Regards,Jeff Davis

Re: Range Types

From

Jeff Davis

Date:

07 February 2011, 13:02:29

On Mon, 2011-02-07 at 13:33 +0100, Dimitri Fontaine wrote:
> Hi,
> 
> Jeff Davis <pgsql@j-davis.com> writes:
> >   * Should pg_range reference the btree opclass or the compare function
> > directly?
> 
> I would say yes.  We use the btree opclass in other similar situations.

Ok, but what should the parameter to CREATE TYPE ... AS RANGE be then?

CREATE TYPE foo AS RANGE ( SUBTYPE = ... SUBTYPE_BTREE_OPERATOR_CLASS = ...
);

is a little verbose. Ideas?

> Is there any reason to restrict who's get to use the feature?  I don't
> see any…

Mostly just paranoia. If they define a strange canonical function, maybe
that would cause a problem. Then again, they would have to define that
in C to cause a problem anyway. I'll leave it as superuser-only for now,
and see if anyone else raises potential problems.

> >   * Should the SQL (inlinable) function "length", which relies on
> > polymorphic "-", be immutable, strict, or volatile?
> 
> I would think stable: polymorphic means that the function
> implementing the "-" operator depends on the argument.  I don't recall
> that it depends on them in a volatile way… except if you change the
> operator definition, which is possible to do (so not immutable).

I was concerned about someone defining "-" with a stable or volatile
function as the definition. I'm not sure if that is a reasonable concern
or not.

> >   * Later we might consider whether we should include btree_gist in
> > core, to make range types more useful with exclusion constraints
> > out-of-the-box. This should be left for later, I'm just including this
> > for the archives so it doesn't get lost.
> 
> I would expect the extension to have something to offer here.

Yes. With extensions and PGXN, I would hope that installing btree_gist
would not be much of a problem. However, I eventually intend to submit
features like "RANGE KEY", a language extension that would need
something like btree_gist to work very well at all. Technically
btree_gist is not required, but in practice it is necessary to use
ranges and exclusion constraints together effectively.

Regards,Jeff Davis

Re: Range Types

From

Dimitri Fontaine

Date:

07 February 2011, 13:23:46

Jeff Davis <pgsql@j-davis.com> writes:
> Ok, but what should the parameter to CREATE TYPE ... AS RANGE be then?
>
> CREATE TYPE foo AS RANGE (
>   SUBTYPE = ...
>   SUBTYPE_BTREE_OPERATOR_CLASS = ...
> );
>
> is a little verbose. Ideas?

I would think
 CREATE TYPE foo AS RANGE (bar) USING (btree_ops);

The USING clause is optional, because you generally have a default btree
opclass for the datatype.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Range Types

From

Jeff Davis

Date:

08 February 2011, 13:10:46

On Mon, 2011-02-07 at 18:23 +0100, Dimitri Fontaine wrote:
> I would think
> 
>   CREATE TYPE foo AS RANGE (bar) USING (btree_ops);
> 
> The USING clause is optional, because you generally have a default btree
> opclass for the datatype.

There are other options, like "CANONICAL", so where do those fit?

If CREATE TYPE already has an options list, it seems a little strange to
add grammar to support this feature. "USING" doesn't seem to mean a lot,
except that we happen to use it in other contexts to mean "operator
class".

Regards,Jeff Davis

Re: REVIEW Range Types

From

"Erik Rijkers"

Date:

08 February 2011, 15:43:28

On Sun, February 6, 2011 07:41, Jeff Davis wrote:
> New patch. All known TODO items are closed, although I should do a
> cleanup pass over the code and docs.
>
> Fixed in this patch:
>
>   * Many documentation improvements
>   * Added INT8RANGE
>   * Renamed PERIOD[TZ] -> TS[TZ]RANGE
>   * Renamed INTRANGE -> INT4RANGE
>   * Improved parser's handling of whitespace and quotes
>   * Support for PL/pgSQL functions with ANYRANGE arguments/returns
>   * Make "subtype_float" function no longer a requirement for GiST,
>     but it should still be supplied for the penalty function to be
>     useful.
>

I'm afraid even the review is WIP, but I thought I'd post what I have.

Context: At the moment we use postbio (see below) range functionality, to search ranges and
overlap in large DNA databases ('genomics').  We would be happy if a core data type could replace
that.  It does look like the present patch is ready to do those same tasks, of which the main one
for us is gist-indexed ranges. We also use btree_gist with that, so to include that in core would
make sense in this regard.

test config:

./ configure \ --prefix=/var/data1/pg_stuff/pg_installations/pgsql.range_types \ --with-pgport=6563 \ --enable-depend \
--enable-cassert\ --enable-debug \ --with-perl \ --with-openssl \ --with-libxml \ --enable-dtrace

compile, make, check, install all OK.

------------------
Submission review:
------------------

* Is the patch in context diff format?

Yes.

* Does it apply cleanly to the current git master?

It applied cleanly. (after the large serialisation commit 2011.02.08 it will need some
changes/rebase)

* Does it include reasonable tests, necessary doc patches, etc?

Yes, there are many tests; the documentation is good. Small improvements below.

-----------------
Usability review
-----------------

Read what the patch is supposed to do, and consider:

* Does the patch actually implement that?

Yes.

* Do we want that?

Yes.

* Do we already have it?

contrib/seg has some similar functionalities: "seg is a data type for representing line segments,
or floating point intervals".

And on pgFoundry there is a seg spin-off  "postbio", tailored to genomic data (with gist-indexing).
(see postbio manual: http://postbio.projects.postgresql.org/ )

* Does it follow SQL spec, or the community-agreed behavior?

I don't know - I couldn't find much in the SQL-spec on a range datatype.

The ranges behaviour has been discussed on -hackers.

* Does it include pg_dump support (if applicable)?

dump/restore were fine in the handful of range-tables
which I moved between machines.

* Are there dangers?

Not that I could find.

* Have all the bases been covered?

I think the functionality looks fairly complete.

-------------
Feature test:
-------------

* Does the feature work as advertised?

The patch seems very stable.  My focus has been mainly on the intranges. I tested by parsing
documentation- and regression examples, and parametrising them in a perl harness, to generate many
thousands of range combinations.  I found only a single small problem (posted earlier - Jeff Davis
solved it already apparently).

see: http://archives.postgresql.org/pgsql-hackers/2011-02/msg00387.php

* Are there corner cases the author has failed to consider?

No.

* Are there any assertion failures or crashes?

I haven't seen a single one.

--------------
Documentation:
--------------

Section 9.18: table 9-42. range functions: The following functions are missing (I encountered them in the regression
tests):  contained_by()   range_eq()

section 'Constructing Ranges' (8.16.6): In the code example, remove the following line:   "-- the int4range result will
appearin the canonical format" it doesn't make sense there.  At this place "canonical format" has not been discussed;

maybe it is not even discussed anywhere.

also (same place):      'where "_" is used to mean "exclusive" and "" is used to mean "inclusive".'
should be:      'where "_" is used to mean "exclusive" and "i" is used to mean "inclusive".'

And btw: it should mention here that the range*inf* functions,
an underscore to separate 'range' from the rest of the function name, e.g.:  range_linfi_()  =>  infinite lower bound,
inclusiveupper bound

I still want to do Performance review and Coding review.

FWIW, I would like to repeat that my impression is that the patch is very stable, especially  with
regard to the intranges (tested extensively).

regards,

Erik Rijkers

Re: Range Types

From

Jeff Davis

Date:

09 February 2011, 01:32:00

On Tue, 2011-02-08 at 09:10 -0800, Jeff Davis wrote:
> On Mon, 2011-02-07 at 18:23 +0100, Dimitri Fontaine wrote:
> > I would think
> > 
> >   CREATE TYPE foo AS RANGE (bar) USING (btree_ops);
> > 
> > The USING clause is optional, because you generally have a default btree
> > opclass for the datatype.
> 
> There are other options, like "CANONICAL", so where do those fit?
> 
> If CREATE TYPE already has an options list, it seems a little strange to
> add grammar to support this feature. "USING" doesn't seem to mean a lot,
> except that we happen to use it in other contexts to mean "operator
> class".

For the user-facing part, how about just passing it as a parameter
called "SUBTYPE_OPCLASS"? It sounds a little on the "internal detail"
side, but so do some other type definition parameters.

As for the catalog, I'm inclined to leave the compare function in there
directly and just add a dependency on the opclass. That way, it's only
one syscache lookup rather than two, to get the compare function oid.
Then again, perhaps that doesn't matter anyway. Thoughts?

Regards,Jeff Davis

Re: REVIEW Range Types

From

Jeff Davis

Date:

09 February 2011, 01:59:41

On Tue, 2011-02-08 at 20:43 +0100, Erik Rijkers wrote:
> --------------
> Documentation:
> --------------
> 
> Section 9.18:
>   table 9-42. range functions:
>   The following functions are missing (I encountered them in the regression tests):
>     contained_by()
>     range_eq()

I opted not to document functions that mostly exist to implement
operators. Some people might prefer the names, but then they don't
benefit from GiST index searches. So, I left this as-is for now.

> section 'Constructing Ranges' (8.16.6):
>   In the code example, remove the following line:
>     "-- the int4range result will appear in the canonical format"
>   it doesn't make sense there.  At this place "canonical format" has not been discussed;
> maybe it is not even discussed anywhere.

Thank you. I have added a forward reference.

> also (same place):
>        'where "_" is used to mean "exclusive" and "" is used to mean "inclusive".'
> should be:
>        'where "_" is used to mean "exclusive" and "i" is used to mean "inclusive".'

Thank you, fixed.

> And btw: it should mention here that the range*inf* functions,
> an underscore to separate 'range' from the rest of the function name, e.g.:
>    range_linfi_()  =>  infinite lower bound,  inclusive upper bound

I tried to clarify that section.

Thank you for the review! Updated patch forthcoming.

Regards,Jeff Davis

Re: Range Types

From

Jeff Davis

Date:

09 February 2011, 04:35:43

Updated patch.

Changes:
  * Addressed Erik's review comments.
  * Fixed issue with "range @> elem" found by Erik.
  * Merged with latest HEAD
  * Changed representation to be more efficient and more robust
    (could use some testing though, because I just did this tonight)

TODO:
  * send/recv -- just noticed this tonight, no reason not to do it

Open Items:
  * Maybe typmod
  * grammar -- ask for btree opclass, or compare function?
  * catalog -- store btree opclass, or compare function?
  * should non-superusers be able to create range types?
  * constructor issues I just posted about
  * SQL length function --immutable/stable/volatile?

As always, my repo is here:

http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes

Regards,
    Jeff Davis

Attachment

rangetypes-20110208.patch.gz

Re: Range Types (catversion.h)

From

"Erik Rijkers"

Date:

10 February 2011, 10:38:25

On Wed, February 9, 2011 09:35, Jeff Davis wrote:
> Updated patch.
>

Thanks!

I just wanted to mention that this latest patch doesn't quite apply as-is, because of catversion
changes.

I've removed the change to catversion.h (18 lines, starting at 4985) from the patch file; then it
applies cleanly.

Erik Rijkers

>
> As always, my repo is here:
>
> http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes
>
> Regards,
>     Jeff Davis
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Range Types (catversion.h)

From

Tom Lane

Date:

10 February 2011, 13:04:28

"Erik Rijkers" <er@xs4all.nl> writes:
> On Wed, February 9, 2011 09:35, Jeff Davis wrote:
>> Updated patch.

> I just wanted to mention that this latest patch doesn't quite apply as-is, because of catversion changes.

Just a note: standard practice is for submitted patches to *not* touch
catversion.h.  The committer will add that change before committing.
Otherwise, it's just guaranteed to cause merge problems such as this
one.  (It's not unreasonable to mention the need for a catversion bump
in the description of the patch, if you think the committer might not
realize it.)
        regards, tom lane

Re: Range Types (catversion.h)

From

Jeff Davis

Date:

10 February 2011, 13:29:00

On Thu, 2011-02-10 at 12:04 -0500, Tom Lane wrote:
> "Erik Rijkers" <er@xs4all.nl> writes:
> > On Wed, February 9, 2011 09:35, Jeff Davis wrote:
> >> Updated patch.
> 
> > I just wanted to mention that this latest patch doesn't quite apply as-is, because of catversion changes.
> 
> Just a note: standard practice is for submitted patches to *not* touch
> catversion.h.  The committer will add that change before committing.
> Otherwise, it's just guaranteed to cause merge problems such as this
> one.  (It's not unreasonable to mention the need for a catversion bump
> in the description of the patch, if you think the committer might not
> realize it.)

OK, I'll remove that then.

I originally put it there so that I wouldn't mix up data directories
with a patch I'm reviewing, but I agree that it seems easier this way.

Regards,Jeff Davis

Re: Range Types (catversion.h)

From

Peter Eisentraut

Date:

10 February 2011, 14:01:54

On tor, 2011-02-10 at 09:28 -0800, Jeff Davis wrote:
> I originally put it there so that I wouldn't mix up data directories
> with a patch I'm reviewing, but I agree that it seems easier this way.

FWIW, I disagree with Tom and do recommend putting the catversion change
in the patch.

Re: Range Types (catversion.h)

From

Jeff Davis

Date:

10 February 2011, 14:17:30

On Thu, 2011-02-10 at 15:38 +0100, Erik Rijkers wrote:
> I've removed the change to catversion.h (18 lines, starting at 4985) from the patch file; then it
> applies cleanly.

I should mention that the last patch changed the representation to be
more compact. So, if you have any existing test data it will need to be
reloaded to work with the latest.

Regards,Jeff Davis

Re: Range Types (catversion.h)

From

Heikki Linnakangas

Date:

10 February 2011, 14:23:33

On 10.02.2011 20:01, Peter Eisentraut wrote:
> On tor, 2011-02-10 at 09:28 -0800, Jeff Davis wrote:
>> I originally put it there so that I wouldn't mix up data directories
>> with a patch I'm reviewing, but I agree that it seems easier this way.
>
> FWIW, I disagree with Tom and do recommend putting the catversion change
> in the patch.

I'm very bad at remembering to bump it, so I also won't mind patch 
authors doing it.

The ideal reminder would be some special comment you could put on the 
catversion line that would cause "git push" to fail if it's still there 
when I try to push the commit to the repository. There doesn't seem to 
be a "pre-push" hook in git, although some googling suggests that it 
would be quite easy to write a small wrapper shell script to check that. 
I'm seriously considering to do that, given that I more often forget to 
bump catversion than not.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Range Types (catversion.h)

From

Robert Haas

Date:

10 February 2011, 14:39:51

On Thu, Feb 10, 2011 at 1:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 10.02.2011 20:01, Peter Eisentraut wrote:
>>
>> On tor, 2011-02-10 at 09:28 -0800, Jeff Davis wrote:
>>>
>>> I originally put it there so that I wouldn't mix up data directories
>>> with a patch I'm reviewing, but I agree that it seems easier this way.
>>
>> FWIW, I disagree with Tom and do recommend putting the catversion change
>> in the patch.
>
> I'm very bad at remembering to bump it, so I also won't mind patch authors
> doing it.
>
> The ideal reminder would be some special comment you could put on the
> catversion line that would cause "git push" to fail if it's still there when
> I try to push the commit to the repository. There doesn't seem to be a
> "pre-push" hook in git, although some googling suggests that it would be
> quite easy to write a small wrapper shell script to check that. I'm
> seriously considering to do that, given that I more often forget to bump
> catversion than not.

And I share Tom's preference, which is to not include it, because I
usually apply patches using patch, and when diff hunks fail it's a
nuisance for me.

So basically, do whatever you want, someone won't like it no matter what.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types: << >> -|- ops vs empty range

From

"Erik Rijkers"

Date:

11 February 2011, 10:09:12

On Wed, February 9, 2011 09:35, Jeff Davis wrote:
> Updated patch.
>

The operators  <<  >>  and -|-  have the following behavior with empty ranges:

testdb=# select '-'::int4range << range(200,300);
ERROR:  empty range
testdb=# select '-'::int4range >> range(200,300);
ERROR:  empty range
testdb=# select '-'::int4range -|- range(200,300);
ERROR:  empty range

I'm not sure if that is deliberate behavior, but they seem
almost bugs to me.

Wouldn't it be better (and more practical) if these would
return false (or perhaps NULL, for 'unknown') ?

(the same goes for all the other range types, btw.)

Erik Rijkers

Re: Range Types: << >> -|- ops vs empty range

From

Jeff Davis

Date:

11 February 2011, 12:55:57

On Fri, 2011-02-11 at 15:09 +0100, Erik Rijkers wrote:
> On Wed, February 9, 2011 09:35, Jeff Davis wrote:
> > Updated patch.
> >
> 
> The operators  <<  >>  and -|-  have the following behavior with empty ranges:
> 
> testdb=# select '-'::int4range << range(200,300);
> ERROR:  empty range
> testdb=# select '-'::int4range >> range(200,300);
> ERROR:  empty range
> testdb=# select '-'::int4range -|- range(200,300);
> ERROR:  empty range
> 
> I'm not sure if that is deliberate behavior, but they seem
> almost bugs to me.

It's deliberate, but it looks like the error messages could use some
improvement.

> Wouldn't it be better (and more practical) if these would
> return false (or perhaps NULL, for 'unknown') ?

I'm hesitant to return NULL when the inputs are known.

If we were to define these functions for empty ranges, I would think
they would all return true.

"<<" and ">>" ("strictly left of" and "strictly right of", respectively)
could be seen to start out as true and return false if it finds a point
overlapping or on the other side. 

The primary use case for "-|-" (adjacent) is to see if your ranges are
contiguous and non-overlapping. For empty ranges, that seems to be true.

I'm not disagreeing with your interpretation really. I think that
different people will assume different behavior, and so it's more likely
to cause confusion. An error early on will allow them to do something
like: CASE WHEN myrange? THEN myrange -|- range(10,20) ELSE TRUE END
So that they (and anyone who reads their query) can see explicitly
what's happening, without looking in the manual for details.

I'm open to suggestion, however. If we can get a reasonable consensus on
the values these functions should return, I'll change it.

Regards,Jeff Davis

Re: Range Types: << >> -|- ops vs empty range

From

Robert Haas

Date:

11 February 2011, 13:03:40

On Fri, Feb 11, 2011 at 11:55 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2011-02-11 at 15:09 +0100, Erik Rijkers wrote:
>> On Wed, February 9, 2011 09:35, Jeff Davis wrote:
>> > Updated patch.
>> >
>>
>> The operators  <<  >>  and -|-  have the following behavior with empty ranges:
>>
>> testdb=# select '-'::int4range << range(200,300);
>> ERROR:  empty range
>> testdb=# select '-'::int4range >> range(200,300);
>> ERROR:  empty range
>> testdb=# select '-'::int4range -|- range(200,300);
>> ERROR:  empty range
>>
>> I'm not sure if that is deliberate behavior, but they seem
>> almost bugs to me.
>
> It's deliberate, but it looks like the error messages could use some
> improvement.
>
>> Wouldn't it be better (and more practical) if these would
>> return false (or perhaps NULL, for 'unknown') ?
>
> I'm hesitant to return NULL when the inputs are known.
>
> If we were to define these functions for empty ranges, I would think
> they would all return true.
>
> "<<" and ">>" ("strictly left of" and "strictly right of", respectively)
> could be seen to start out as true and return false if it finds a point
> overlapping or on the other side.
>
> The primary use case for "-|-" (adjacent) is to see if your ranges are
> contiguous and non-overlapping. For empty ranges, that seems to be true.
>
> I'm not disagreeing with your interpretation really. I think that
> different people will assume different behavior, and so it's more likely
> to cause confusion. An error early on will allow them to do something
> like:
>  CASE WHEN myrange? THEN myrange -|- range(10,20) ELSE TRUE END
> So that they (and anyone who reads their query) can see explicitly
> what's happening, without looking in the manual for details.
>
> I'm open to suggestion, however. If we can get a reasonable consensus on
> the values these functions should return, I'll change it.

For what it's worth, my completely uninformed opinion is that
comparison operators shouldn't error out.  I haven't read the patch so
I'm not sure what those operators are defined to do, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types: << >> -|- ops vs empty range

From

Jeff Davis

Date:

11 February 2011, 13:26:35

On Fri, 2011-02-11 at 12:03 -0500, Robert Haas wrote:
> For what it's worth, my completely uninformed opinion is that
> comparison operators shouldn't error out.  I haven't read the patch so
> I'm not sure what those operators are defined to do, though.

">>" means "strictly right of"
"<<" means "strictly left of"
"-|-" means "adjacent" (touching but not overlapping)

I'm open to suggestion about how those behave with empty ranges.

Regards,Jeff Davis

Re: Range Types: << >> -|- ops vs empty range

From

Robert Haas

Date:

11 February 2011, 13:36:31

On Fri, Feb 11, 2011 at 12:26 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2011-02-11 at 12:03 -0500, Robert Haas wrote:
>> For what it's worth, my completely uninformed opinion is that
>> comparison operators shouldn't error out.  I haven't read the patch so
>> I'm not sure what those operators are defined to do, though.
>
> ">>" means "strictly right of"
> "<<" means "strictly left of"
> "-|-" means "adjacent" (touching but not overlapping)
>
> I'm open to suggestion about how those behave with empty ranges.

Hmm, so an empty range is a range that includes nothing at all, right?Not "everything in the world"?

Are we sure we even want to have that concept?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Range Types: << >> -|- ops vs empty range

From

"Kevin Grittner"

Date:

11 February 2011, 13:46:42

Jeff Davis <pgsql@j-davis.com> wrote:
> ">>" means "strictly right of"
> "<<" means "strictly left of"
> "-|-" means "adjacent" (touching but not overlapping)
> 
> I'm open to suggestion about how those behave with empty ranges.
OK, that still leaves a lot to the imagination, though.  To try to
clarify in *my* mind:
"empty range"
=============
Zero length? If so, is it fixed at some point, but empty?   '(x,x)'?   '[x,x)'?
Is it everything? '[-inf,+inf]'?
Is it really meaningfully distinct from NULL?
Where do you see it being useful?
-Kevin

Re: Range Types: << >> -|- ops vs empty range

From

Andrew Dunstan

Date:

11 February 2011, 13:53:24


On 02/11/2011 12:36 PM, Robert Haas wrote:
> On Fri, Feb 11, 2011 at 12:26 PM, Jeff Davis<pgsql@j-davis.com>  wrote:
>> On Fri, 2011-02-11 at 12:03 -0500, Robert Haas wrote:
>>> For what it's worth, my completely uninformed opinion is that
>>> comparison operators shouldn't error out.  I haven't read the patch so
>>> I'm not sure what those operators are defined to do, though.
>> ">>" means "strictly right of"
>> "<<" means "strictly left of"
>> "-|-" means "adjacent" (touching but not overlapping)
>>
>> I'm open to suggestion about how those behave with empty ranges.
> Hmm, so an empty range is a range that includes nothing at all, right?
>   Not "everything in the world"?
>
> Are we sure we even want to have that concept?

I have no particular opinion on that, but if we do then ISTM all the 
above (and particularly the last) should return false if either operand 
is an empty range.

cheers

andrew

Re: Range Types: << >> -|- ops vs empty range

From

Josh Berkus

Date:

11 February 2011, 14:02:36

> "empty range"
> =============
> Zero length?
>   If so, is it fixed at some point, but empty?
>     '(x,x)'?
>     '[x,x)'?

Neither of the above should be possible, I think.  The expression "(x"
logically excludes the expression "x)".

However, "[x,x]" would be valid, and would be a zero-length interval at
the point "x".

> Is it everything?
>   '[-inf,+inf]'?

No, that's "everything" which is a different concept.  The above also
ought to be possible, and overlap everything.

> Is it really meaningfully distinct from NULL?

Yes.  NULL means "I don't know".  If a range type IS NULL, then any
operation performed with it ought to be NULL.  Hence:

IF y > x, THEN:

[x,x] << [y,z) == TRUE
[x,x] -|- (x,y] == TRUE
NULL << [y,z} IS NULL
[-inf,+inf] << [y,z) == FALSE

I can imagine using all of these constructs in actual applications.  In
fact, I have *already* used [-inf,+inf]

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: Range Types: << >> -|- ops vs empty range

From

"Erik Rijkers"

Date:

11 February 2011, 14:09:08

On Fri, February 11, 2011 19:02, Josh Berkus wrote:
>
>> "empty range"
>> =============
>> Zero length?
>>   If so, is it fixed at some point, but empty?
>>     '(x,x)'?
>>     '[x,x)'?
>
> Neither of the above should be possible, I think.  The expression "(x"
> logically excludes the expression "x)".
>
> However, "[x,x]" would be valid, and would be a zero-length interval at
> the point "x".
>
>> Is it everything?
>>   '[-inf,+inf]'?
>
> No, that's "everything" which is a different concept.  The above also
> ought to be possible, and overlap everything.
>
>> Is it really meaningfully distinct from NULL?
>
> Yes.  NULL means "I don't know".  If a range type IS NULL, then any
> operation performed with it ought to be NULL.  Hence:
>
> IF y > x, THEN:
>
> [x,x] << [y,z) == TRUE
> [x,x] -|- (x,y] == TRUE
> NULL << [y,z} IS NULL
> [-inf,+inf] << [y,z) == FALSE
>
> I can imagine using all of these constructs in actual applications.  In
> fact, I have *already* used [-inf,+inf]
>

You say yes, but you don't mention "empty range".

Maybe we can indeed do without the concept of an empty-range?;
it might simplify implementation as well as usage.

I'm not decided, but I do find it hard to find a plausible use-case for it.

Re: Range Types: << >> -|- ops vs empty range

From

"Kevin Grittner"

Date:

11 February 2011, 14:10:09

Josh Berkus <josh@agliodbs.com> wrote:
> I can imagine using all of these constructs in actual
> applications.
But which of them, if any, is an "empty range"?
-Kevin

Re: Range Types: << >> -|- ops vs empty range

From

Bruce Momjian

Date:

11 March 2011, 09:38:13

Where are we on this?

---------------------------------------------------------------------------

Erik Rijkers wrote:
> On Wed, February 9, 2011 09:35, Jeff Davis wrote:
> > Updated patch.
> >
> 
> The operators  <<  >>  and -|-  have the following behavior with empty ranges:
> 
> testdb=# select '-'::int4range << range(200,300);
> ERROR:  empty range
> testdb=# select '-'::int4range >> range(200,300);
> ERROR:  empty range
> testdb=# select '-'::int4range -|- range(200,300);
> ERROR:  empty range
> 
> I'm not sure if that is deliberate behavior, but they seem
> almost bugs to me.
> 
> Wouldn't it be better (and more practical) if these would
> return false (or perhaps NULL, for 'unknown') ?
> 
> (the same goes for all the other range types, btw.)
> 
> 
> Erik Rijkers
> 
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Range Types: << >> -|- ops vs empty range

From

Jeff Davis

Date:

11 March 2011, 13:37:21

On Fri, 2011-03-11 at 08:37 -0500, Bruce Momjian wrote:
> Where are we on this?

The options are:

1. Rip out empty ranges. Several people have been skeptical of their
usefulness, but I don't recall anyone directly saying that they should
be removed. Robert Haas made the point that range types aren't closed
under UNION:

http://archives.postgresql.org/pgsql-hackers/2011-02/msg01045.php

So the additional nice mathematical properties provided by empty ranges
are not as important (because it wouldn't be perfect anyway).


2. Change the semantics. Erik Rijkers suggested that we define all
operators for empty ranges, perhaps using NULL semantics:

http://archives.postgresql.org/pgsql-hackers/2011-02/msg00942.php

And Kevin Grittner suggested that there could be discrete ranges of zero
length yet a defined starting point:

http://archives.postgresql.org/pgsql-hackers/2011-02/msg01042.php


3. Leave empty ranges with the existing "empty set" semantics. Nathan
Boley made a good point here:

http://archives.postgresql.org/pgsql-hackers/2011-02/msg01108.php


Right now it's #3, and I lean pretty strongly toward keeping it. Without
#3, people will get confused when fairly simple operations fail in a
data-dependent way (at runtime). With #3, people will run into problems
only in situations where it is fairly dubious to have an empty range
anyway (and therefore likely a real error), such as finding ranges "left
of" an empty range.

Otherwise, I'd prefer #1 to #2. I think #2 is a bad path to take, and
we'll end up with a lot of unintuitive and error-prone operators.

Regards,Jeff Davis

Re: Range Types: << >> -|- ops vs empty range

From

Christopher Browne

Date:

11 March 2011, 13:48:25

On Fri, Mar 11, 2011 at 12:37 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> Right now it's #3, and I lean pretty strongly toward keeping it. Without
> #3, people will get confused when fairly simple operations fail in a
> data-dependent way (at runtime). With #3, people will run into problems
> only in situations where it is fairly dubious to have an empty range
> anyway (and therefore likely a real error), such as finding ranges "left
> of" an empty range.

That seems pretty apropos to me.

> Otherwise, I'd prefer #1 to #2. I think #2 is a bad path to take, and
> we'll end up with a lot of unintuitive and error-prone operators.

I think back to your essay on the nonintuitiveness of NULL
(<http://thoughts.j-davis.com/2009/08/02/what-is-the-deal-with-nulls/>),
and suggest the thought that picking #2 would add to the already
existent confusion.
-- 
http://linuxfinances.info/info/linuxdistributions.html