Thread: range_adjacent and discrete ranges

range_adjacent and discrete ranges

From
Jeff Davis
Date:
While thinking about range_cmp_bounds, I started to think that the way
range_adjacent works is wrong.

range_adjacent() depends on the bounds of two ranges to match up, such
that the boundary values are equal, but one is exclusive and the other
inclusive, and one is a lower bound and the other an upper bound.

That makes perfect sense for continuous ranges because that is the only
way they can possibly be adjacent. It also works for the discrete ranges
as defined so far, because they use a canonical function that translates
the values to [) form. But if someone were to write a canonical function
that translates the ranges to [] or () form, range_adjacent would be
useless.

There are a few approaches to solving it:

1. Make the canonical function accept a "format" argument much like the
constructor, and have an output format stored in the catalog that would
be passed in under most circumstances. However, range_adjacent could
pass in its own form that would be useful for its purposes.

2. Replace the canonical function with "inc" and "dec" functions, or
some variation thereof. We'd still need some kind of output format to
match the current behavior, otherwise every range would always be output
in [) form (I don't necessarily object to that, but it would be a
behavior difference for user-defined range types).

3. Remove range_adjacent.

4. Do nothing, and document that the canonical function should translate
to either [) or (] if you want range_adjacent to work.

Thoughts?

Regards,Jeff Davis



Re: range_adjacent and discrete ranges

From
Florian Pflug
Date:
On Nov18, 2011, at 09:25 , Jeff Davis wrote:
> While thinking about range_cmp_bounds, I started to think that the way
> range_adjacent works is wrong.
> 
> range_adjacent() depends on the bounds of two ranges to match up, such
> that the boundary values are equal, but one is exclusive and the other
> inclusive, and one is a lower bound and the other an upper bound.

> That makes perfect sense for continuous ranges because that is the only
> way they can possibly be adjacent. It also works for the discrete ranges
> as defined so far, because they use a canonical function that translates
> the values to [) form. But if someone were to write a canonical function
> that translates the ranges to [] or () form, range_adjacent would be
> useless.

Hm, the problem here is for range_adjacent to recognize that [1,2] is
adjacent to [3,4] when treated as integer ranges, but that they're not
adjacent when treated as float ranges. The reason being, of course, that
there's isn't any integer between 2 and 3, but there are floats between
2 and 3.

That information, however, *is* already contained in the canonical
functions, because those function know that (2,3) are empty as an integer
range, but non-empty as a float range.

For example, [1,2] is adjacent to [3,4] as integer ranges because (2,3)
is empty as an integer range. Conversely, since (2,3) is *not* empty as a
float range, [1,2] and [3,4] are *not* adjacent as float ranges.

More formally, let there be two arbitrary ranges a,b,i_a,i_b c,d,i_c,i_d
where the first two parameters are the lower respectively upper bound, and
the last two are booleans saying whether the lower respectively upper bound
is inclusive (true) or exclusive (false).

These ranges are then adjacent exactly if the range b,c,!i_b,!i_c
is empty.

This definition does not depend on any specific canonical form of ranges,
only on the canonicalize function's ability to detect empty ranges.

best regards,
Florian Pflug



Re: range_adjacent and discrete ranges

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> ...This definition does not depend on any specific canonical form of ranges,
> only on the canonicalize function's ability to detect empty ranges.

Hmm, well, now that you mention it, I don't think the current canonical
functions handle empty ranges very nicely at all.  They tend to spit up:

regression=# select int4range(4,4,'[]');int4range 
-----------[4,5)
(1 row)

regression=# select int4range(4,4,'(]');
ERROR:  range lower bound must be less than or equal to range upper bound
regression=# select int4range(4,4,'()');
ERROR:  range lower bound must be less than or equal to range upper bound

Would it be better for them to silently transform such cases to "empty"?
        regards, tom lane


Re: range_adjacent and discrete ranges

From
Jeff Davis
Date:
On Fri, 2011-11-18 at 10:33 -0500, Tom Lane wrote:
> regression=# select int4range(4,4,'(]');
> ERROR:  range lower bound must be less than or equal to range upper bound
> regression=# select int4range(4,4,'()');
> ERROR:  range lower bound must be less than or equal to range upper bound
> 
> Would it be better for them to silently transform such cases to "empty"?

That had crossed my mind, but I read the first as saying that it
includes 4 and doesn't include 4, which is a little confusing.

But I wouldn't object to making them return empty ranges. Seeing that we
removed some other errors in favor of returning something, it might be a
little more consistent to return empty when possible.

I wouldn't like to extend that to int4range(4,3), however. When the
upper bound is less than the lower bound, it's almost certainly a
mistake, and the user should be informed.

By the way, what does this have to do with canonical functions? This
seems more like a constructor issue, and there is already a
zero-argument constructor to make empty ranges.

Regards,Jeff Davis



Re: range_adjacent and discrete ranges

From
Jeff Davis
Date:
On Fri, 2011-11-18 at 13:32 +0100, Florian Pflug wrote:
> That information, however, *is* already contained in the canonical
> functions, because those function know that (2,3) are empty as an integer
> range, but non-empty as a float range.

Very good point. Thank you.

Regards,Jeff Davis



Re: range_adjacent and discrete ranges

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Fri, 2011-11-18 at 10:33 -0500, Tom Lane wrote:
>> Would it be better for them to silently transform such cases to "empty"?

> I wouldn't like to extend that to int4range(4,3), however. When the
> upper bound is less than the lower bound, it's almost certainly a
> mistake, and the user should be informed.

Yeah, probably not.  However, I don't like the idea of
'(3,4)'::int4range throwing an error, as it currently does, because it
seems to require the application to have quite a lot of knowledge of the
range semantics to avoid having errors sprung on it.

> By the way, what does this have to do with canonical functions? This
> seems more like a constructor issue, and there is already a
> zero-argument constructor to make empty ranges.

What I was concerned about was whether Florian's idea of implementing
range_adjacent by testing for empty intervening range would work, or
would fail because of errors getting thrown.
        regards, tom lane


Re: range_adjacent and discrete ranges

From
Jeff Davis
Date:
On Fri, 2011-11-18 at 14:47 -0500, Tom Lane wrote:
> Yeah, probably not.  However, I don't like the idea of
> '(3,4)'::int4range throwing an error, as it currently does, because it
> seems to require the application to have quite a lot of knowledge of the
> range semantics to avoid having errors sprung on it.

OK, then let's make '(3,4)'::int4range the empty range. (3,3) might be
OK as well (for any range type), because at least it's consistent.

The one that I find strange is [3,3), but I think that needs to work for
the range_adjacent idea to work. Seeing it as useful in the context of
range_adjacent might mean that it's useful elsewhere, too, so now I'm
leaning toward supporting [3,3) as an empty range.

Regards,Jeff Davis



Re: range_adjacent and discrete ranges

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Fri, 2011-11-18 at 14:47 -0500, Tom Lane wrote:
>> Yeah, probably not.  However, I don't like the idea of
>> '(3,4)'::int4range throwing an error, as it currently does, because it
>> seems to require the application to have quite a lot of knowledge of the
>> range semantics to avoid having errors sprung on it.

> OK, then let's make '(3,4)'::int4range the empty range. (3,3) might be
> OK as well (for any range type), because at least it's consistent.

> The one that I find strange is [3,3), but I think that needs to work for
> the range_adjacent idea to work. Seeing it as useful in the context of
> range_adjacent might mean that it's useful elsewhere, too, so now I'm
> leaning toward supporting [3,3) as an empty range.

OK, so the thought is that the only actual error condition would be
lower bound value > upper bound value?  Otherwise, if the range
specification represents an empty set, that's fine, we just normalize it
to 'empty'.  Seems consistent to me.
        regards, tom lane


Re: range_adjacent and discrete ranges

From
Florian Pflug
Date:
On Nov19, 2011, at 22:03 , Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
>> On Fri, 2011-11-18 at 14:47 -0500, Tom Lane wrote:
>>> Yeah, probably not.  However, I don't like the idea of
>>> '(3,4)'::int4range throwing an error, as it currently does, because it
>>> seems to require the application to have quite a lot of knowledge of the
>>> range semantics to avoid having errors sprung on it.
> 
>> OK, then let's make '(3,4)'::int4range the empty range. (3,3) might be
>> OK as well (for any range type), because at least it's consistent.
> 
>> The one that I find strange is [3,3), but I think that needs to work for
>> the range_adjacent idea to work. Seeing it as useful in the context of
>> range_adjacent might mean that it's useful elsewhere, too, so now I'm
>> leaning toward supporting [3,3) as an empty range.
> 
> OK, so the thought is that the only actual error condition would be
> lower bound value > upper bound value?  Otherwise, if the range
> specification represents an empty set, that's fine, we just normalize it
> to 'empty'.  Seems consistent to me.

+1. Making the error condition independent from the boundary type makes
it easy for callers to avoid the error conditions. And it should still
catch mistakes that stem from swapping the lower and upper bound.

best regards,
Florian Pflug





Re: range_adjacent and discrete ranges

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> More formally, let there be two arbitrary ranges
>   a,b,i_a,i_b
>   c,d,i_c,i_d
> where the first two parameters are the lower respectively upper bound, and
> the last two are booleans saying whether the lower respectively upper bound
> is inclusive (true) or exclusive (false).

> These ranges are then adjacent exactly if the range
>   b,c,!i_b,!i_c
> is empty.

I tried to implement this, and I think it has a small bug.  It works as
stated if we have b < c.  However, if we have b == c, then we want to
consider the ranges adjacent if i_b != i_c (ie, only one of them claims
the common boundary point).  But a singleton range is empty unless it's
inclusive on both sides.  So we have to have a special case when the
bounds are equal.

(If b > c, then of course we have to consider the two ranges in the
opposite order.)

Attached is a draft patch for this.  It passes regression tests but I've
not tried to exercise it with a canonical function that actually does
something different.  It's going to be a bit slower than Jeff's
original, because it does not only range_cmp_bound_values but also a
make_range cycle (in most cases).  So I guess the question is how much
we care about supporting canonical functions with non-default policies.
Thoughts?

            regards, tom lane

diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c
index 3326cb17c895273fd01c4eda5eb0d65a521d0168..890b6850cb6f5611e9afa922b85b0c64125f31bb 100644
*** a/src/backend/utils/adt/rangetypes.c
--- b/src/backend/utils/adt/rangetypes.c
*************** range_adjacent(PG_FUNCTION_ARGS)
*** 699,704 ****
--- 699,706 ----
                  upper2;
      bool        empty1,
                  empty2;
+     RangeType  *r3;
+     int            cmp;

      /* Different types should be prevented by ANYRANGE matching rules */
      if (RangeTypeGetOid(r1) != RangeTypeGetOid(r2))
*************** range_adjacent(PG_FUNCTION_ARGS)
*** 714,736 ****
          PG_RETURN_BOOL(false);

      /*
!      * For two ranges to be adjacent, the lower boundary of one range has to
!      * match the upper boundary of the other. However, the inclusivity of
!      * those two boundaries must also be different.
!      *
!      * The semantics for range_cmp_bounds aren't quite what we need here, so
!      * we do the comparison more directly.
       */
!     if (lower1.inclusive != upper2.inclusive)
      {
!         if (range_cmp_bound_values(typcache, &lower1, &upper2) == 0)
!             PG_RETURN_BOOL(true);
      }
!
!     if (upper1.inclusive != lower2.inclusive)
      {
!         if (range_cmp_bound_values(typcache, &upper1, &lower2) == 0)
!             PG_RETURN_BOOL(true);
      }

      PG_RETURN_BOOL(false);
--- 716,758 ----
          PG_RETURN_BOOL(false);

      /*
!      * Given two ranges A..B and C..D, where B < C, the ranges are adjacent
!      * if and only if the range B..C is empty, where inclusivity of these two
!      * bounds is inverted compared to the original bounds.  For discrete
!      * ranges, we have to rely on the canonicalization function to normalize
!      * B..C to empty if it contains no elements of the subtype.  If B == C,
!      * the ranges are adjacent only if these bounds have different inclusive
!      * flags (i.e., exactly one of the ranges includes the common boundary).
!      * And if B > C then the ranges cannot be adjacent in this order, but we
!      * must consider the other order (i.e., check D <= A).
       */
!     cmp = range_cmp_bound_values(typcache, &upper1, &lower2);
!     if (cmp < 0)
      {
!         upper1.inclusive = !upper1.inclusive;
!         upper1.lower = true;
!         lower2.inclusive = !lower2.inclusive;
!         lower2.lower = false;
!         r3 = make_range(typcache, &upper1, &lower2, false);
!         PG_RETURN_BOOL(RangeIsEmpty(r3));
      }
!     if (cmp == 0)
      {
!         PG_RETURN_BOOL(upper1.inclusive != lower2.inclusive);
!     }
!     cmp = range_cmp_bound_values(typcache, &upper2, &lower1);
!     if (cmp < 0)
!     {
!         upper2.inclusive = !upper2.inclusive;
!         upper2.lower = true;
!         lower1.inclusive = !lower1.inclusive;
!         lower1.lower = false;
!         r3 = make_range(typcache, &upper2, &lower1, false);
!         PG_RETURN_BOOL(RangeIsEmpty(r3));
!     }
!     if (cmp == 0)
!     {
!         PG_RETURN_BOOL(upper2.inclusive != lower1.inclusive);
      }

      PG_RETURN_BOOL(false);

Re: range_adjacent and discrete ranges

From
Tom Lane
Date:
I wrote:
> Attached is a draft patch for this.  It passes regression tests but I've
> not tried to exercise it with a canonical function that actually does
> something different.

I hacked up int4range_canonical to produce []-style ranges, and
confirmed that this version of range_adjacent seems to work with them.

> It's going to be a bit slower than Jeff's
> original, because it does not only range_cmp_bound_values but also a
> make_range cycle (in most cases).  So I guess the question is how much
> we care about supporting canonical functions with non-default policies.
> Thoughts?

I did a little bit of performance testing on an x86_64 machine (Fedora 14),
and found that the time to execute a clause likeWHERE int4range(1,2) -|- int4range(x, 10000000)
(x being an integer Var) grows from 0.37 us to 0.56 us if we adopt the
patched version of range_adjacent.  With float8 ranges it grows from
0.35 us to 0.54 us.  So these are noticeable penalties but they don't
seem like show-stoppers.  Since the alternative is to document that
the apparent freedom to choose a canonicalization policy is illusory,
I'm inclined to think we should change it.
        regards, tom lane


Re: range_adjacent and discrete ranges

From
Tom Lane
Date:
I wrote:
> I did a little bit of performance testing on an x86_64 machine (Fedora 14),
> and found that the time to execute a clause like
>     WHERE int4range(1,2) -|- int4range(x, 10000000)
> (x being an integer Var) grows from 0.37 us to 0.56 us if we adopt the
> patched version of range_adjacent.  With float8 ranges it grows from
> 0.35 us to 0.54 us.  So these are noticeable penalties but they don't
> seem like show-stoppers.  Since the alternative is to document that
> the apparent freedom to choose a canonicalization policy is illusory,
> I'm inclined to think we should change it.

It occurred to me that we can easily buy back the extra time for range
types that don't have a canonical function (ie, continuous ranges).
If there's no such function, it's impossible for B..C to normalize to
empty when B < C, so we can skip the extra logic.  The attached version
is no slower than the original code for continuous ranges, and doesn't
seem measurably different from my previous patch for discrete ranges.

            regards, tom lane

*** /home/postgres/pgsql/src/backend/utils/adt/rangetypes.c    Tue Nov 22 23:18:51 2011
--- new/rangetypes.c    Wed Nov 23 16:29:20 2011
***************
*** 699,704 ****
--- 699,706 ----
                  upper2;
      bool        empty1,
                  empty2;
+     RangeType  *r3;
+     int            cmp;

      /* Different types should be prevented by ANYRANGE matching rules */
      if (RangeTypeGetOid(r1) != RangeTypeGetOid(r2))
***************
*** 714,736 ****
          PG_RETURN_BOOL(false);

      /*
!      * For two ranges to be adjacent, the lower boundary of one range has to
!      * match the upper boundary of the other. However, the inclusivity of
!      * those two boundaries must also be different.
       *
!      * The semantics for range_cmp_bounds aren't quite what we need here, so
!      * we do the comparison more directly.
       */
!     if (lower1.inclusive != upper2.inclusive)
      {
!         if (range_cmp_bound_values(typcache, &lower1, &upper2) == 0)
!             PG_RETURN_BOOL(true);
      }

!     if (upper1.inclusive != lower2.inclusive)
      {
!         if (range_cmp_bound_values(typcache, &upper1, &lower2) == 0)
!             PG_RETURN_BOOL(true);
      }

      PG_RETURN_BOOL(false);
--- 716,774 ----
          PG_RETURN_BOOL(false);

      /*
!      * Given two ranges A..B and C..D, where B < C, the ranges are adjacent
!      * if and only if the range B..C is empty, where inclusivity of these two
!      * bounds is inverted compared to the original bounds.  For discrete
!      * ranges, we have to rely on the canonicalization function to normalize
!      * B..C to empty if it contains no elements of the subtype.  (If there is
!      * no canonicalization function, it's impossible for such a range to
!      * normalize to empty, so we needn't bother to try.)
!      *
!      * If B == C, the ranges are adjacent only if these bounds have different
!      * inclusive flags (i.e., exactly one of the ranges includes the common
!      * boundary point).
       *
!      * And if B > C then the ranges cannot be adjacent in this order, but we
!      * must consider the other order (i.e., check D <= A).
       */
!     cmp = range_cmp_bound_values(typcache, &upper1, &lower2);
!     if (cmp < 0)
!     {
!         /* in a continuous subtype, there are assumed to be points between */
!         if (!OidIsValid(typcache->rng_canonical_finfo.fn_oid))
!             PG_RETURN_BOOL(false);
!         /* flip the inclusion flags */
!         upper1.inclusive = !upper1.inclusive;
!         lower2.inclusive = !lower2.inclusive;
!         /* change upper/lower labels to avoid Assert failures */
!         upper1.lower = true;
!         lower2.lower = false;
!         r3 = make_range(typcache, &upper1, &lower2, false);
!         PG_RETURN_BOOL(RangeIsEmpty(r3));
!     }
!     if (cmp == 0)
      {
!         PG_RETURN_BOOL(upper1.inclusive != lower2.inclusive);
      }

!     cmp = range_cmp_bound_values(typcache, &upper2, &lower1);
!     if (cmp < 0)
!     {
!         /* in a continuous subtype, there are assumed to be points between */
!         if (!OidIsValid(typcache->rng_canonical_finfo.fn_oid))
!             PG_RETURN_BOOL(false);
!         /* flip the inclusion flags */
!         upper2.inclusive = !upper2.inclusive;
!         lower1.inclusive = !lower1.inclusive;
!         /* change upper/lower labels to avoid Assert failures */
!         upper2.lower = true;
!         lower1.lower = false;
!         r3 = make_range(typcache, &upper2, &lower1, false);
!         PG_RETURN_BOOL(RangeIsEmpty(r3));
!     }
!     if (cmp == 0)
      {
!         PG_RETURN_BOOL(upper2.inclusive != lower1.inclusive);
      }

      PG_RETURN_BOOL(false);