Thread: range_adjacent and discrete ranges
While thinking about range_cmp_bounds, I started to think that the way range_adjacent works is wrong. range_adjacent() depends on the bounds of two ranges to match up, such that the boundary values are equal, but one is exclusive and the other inclusive, and one is a lower bound and the other an upper bound. That makes perfect sense for continuous ranges because that is the only way they can possibly be adjacent. It also works for the discrete ranges as defined so far, because they use a canonical function that translates the values to [) form. But if someone were to write a canonical function that translates the ranges to [] or () form, range_adjacent would be useless. There are a few approaches to solving it: 1. Make the canonical function accept a "format" argument much like the constructor, and have an output format stored in the catalog that would be passed in under most circumstances. However, range_adjacent could pass in its own form that would be useful for its purposes. 2. Replace the canonical function with "inc" and "dec" functions, or some variation thereof. We'd still need some kind of output format to match the current behavior, otherwise every range would always be output in [) form (I don't necessarily object to that, but it would be a behavior difference for user-defined range types). 3. Remove range_adjacent. 4. Do nothing, and document that the canonical function should translate to either [) or (] if you want range_adjacent to work. Thoughts? Regards,Jeff Davis
On Nov18, 2011, at 09:25 , Jeff Davis wrote: > While thinking about range_cmp_bounds, I started to think that the way > range_adjacent works is wrong. > > range_adjacent() depends on the bounds of two ranges to match up, such > that the boundary values are equal, but one is exclusive and the other > inclusive, and one is a lower bound and the other an upper bound. > That makes perfect sense for continuous ranges because that is the only > way they can possibly be adjacent. It also works for the discrete ranges > as defined so far, because they use a canonical function that translates > the values to [) form. But if someone were to write a canonical function > that translates the ranges to [] or () form, range_adjacent would be > useless. Hm, the problem here is for range_adjacent to recognize that [1,2] is adjacent to [3,4] when treated as integer ranges, but that they're not adjacent when treated as float ranges. The reason being, of course, that there's isn't any integer between 2 and 3, but there are floats between 2 and 3. That information, however, *is* already contained in the canonical functions, because those function know that (2,3) are empty as an integer range, but non-empty as a float range. For example, [1,2] is adjacent to [3,4] as integer ranges because (2,3) is empty as an integer range. Conversely, since (2,3) is *not* empty as a float range, [1,2] and [3,4] are *not* adjacent as float ranges. More formally, let there be two arbitrary ranges a,b,i_a,i_b c,d,i_c,i_d where the first two parameters are the lower respectively upper bound, and the last two are booleans saying whether the lower respectively upper bound is inclusive (true) or exclusive (false). These ranges are then adjacent exactly if the range b,c,!i_b,!i_c is empty. This definition does not depend on any specific canonical form of ranges, only on the canonicalize function's ability to detect empty ranges. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > ...This definition does not depend on any specific canonical form of ranges, > only on the canonicalize function's ability to detect empty ranges. Hmm, well, now that you mention it, I don't think the current canonical functions handle empty ranges very nicely at all. They tend to spit up: regression=# select int4range(4,4,'[]');int4range -----------[4,5) (1 row) regression=# select int4range(4,4,'(]'); ERROR: range lower bound must be less than or equal to range upper bound regression=# select int4range(4,4,'()'); ERROR: range lower bound must be less than or equal to range upper bound Would it be better for them to silently transform such cases to "empty"? regards, tom lane
On Fri, 2011-11-18 at 10:33 -0500, Tom Lane wrote: > regression=# select int4range(4,4,'(]'); > ERROR: range lower bound must be less than or equal to range upper bound > regression=# select int4range(4,4,'()'); > ERROR: range lower bound must be less than or equal to range upper bound > > Would it be better for them to silently transform such cases to "empty"? That had crossed my mind, but I read the first as saying that it includes 4 and doesn't include 4, which is a little confusing. But I wouldn't object to making them return empty ranges. Seeing that we removed some other errors in favor of returning something, it might be a little more consistent to return empty when possible. I wouldn't like to extend that to int4range(4,3), however. When the upper bound is less than the lower bound, it's almost certainly a mistake, and the user should be informed. By the way, what does this have to do with canonical functions? This seems more like a constructor issue, and there is already a zero-argument constructor to make empty ranges. Regards,Jeff Davis
On Fri, 2011-11-18 at 13:32 +0100, Florian Pflug wrote: > That information, however, *is* already contained in the canonical > functions, because those function know that (2,3) are empty as an integer > range, but non-empty as a float range. Very good point. Thank you. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Fri, 2011-11-18 at 10:33 -0500, Tom Lane wrote: >> Would it be better for them to silently transform such cases to "empty"? > I wouldn't like to extend that to int4range(4,3), however. When the > upper bound is less than the lower bound, it's almost certainly a > mistake, and the user should be informed. Yeah, probably not. However, I don't like the idea of '(3,4)'::int4range throwing an error, as it currently does, because it seems to require the application to have quite a lot of knowledge of the range semantics to avoid having errors sprung on it. > By the way, what does this have to do with canonical functions? This > seems more like a constructor issue, and there is already a > zero-argument constructor to make empty ranges. What I was concerned about was whether Florian's idea of implementing range_adjacent by testing for empty intervening range would work, or would fail because of errors getting thrown. regards, tom lane
On Fri, 2011-11-18 at 14:47 -0500, Tom Lane wrote: > Yeah, probably not. However, I don't like the idea of > '(3,4)'::int4range throwing an error, as it currently does, because it > seems to require the application to have quite a lot of knowledge of the > range semantics to avoid having errors sprung on it. OK, then let's make '(3,4)'::int4range the empty range. (3,3) might be OK as well (for any range type), because at least it's consistent. The one that I find strange is [3,3), but I think that needs to work for the range_adjacent idea to work. Seeing it as useful in the context of range_adjacent might mean that it's useful elsewhere, too, so now I'm leaning toward supporting [3,3) as an empty range. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Fri, 2011-11-18 at 14:47 -0500, Tom Lane wrote: >> Yeah, probably not. However, I don't like the idea of >> '(3,4)'::int4range throwing an error, as it currently does, because it >> seems to require the application to have quite a lot of knowledge of the >> range semantics to avoid having errors sprung on it. > OK, then let's make '(3,4)'::int4range the empty range. (3,3) might be > OK as well (for any range type), because at least it's consistent. > The one that I find strange is [3,3), but I think that needs to work for > the range_adjacent idea to work. Seeing it as useful in the context of > range_adjacent might mean that it's useful elsewhere, too, so now I'm > leaning toward supporting [3,3) as an empty range. OK, so the thought is that the only actual error condition would be lower bound value > upper bound value? Otherwise, if the range specification represents an empty set, that's fine, we just normalize it to 'empty'. Seems consistent to me. regards, tom lane
On Nov19, 2011, at 22:03 , Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: >> On Fri, 2011-11-18 at 14:47 -0500, Tom Lane wrote: >>> Yeah, probably not. However, I don't like the idea of >>> '(3,4)'::int4range throwing an error, as it currently does, because it >>> seems to require the application to have quite a lot of knowledge of the >>> range semantics to avoid having errors sprung on it. > >> OK, then let's make '(3,4)'::int4range the empty range. (3,3) might be >> OK as well (for any range type), because at least it's consistent. > >> The one that I find strange is [3,3), but I think that needs to work for >> the range_adjacent idea to work. Seeing it as useful in the context of >> range_adjacent might mean that it's useful elsewhere, too, so now I'm >> leaning toward supporting [3,3) as an empty range. > > OK, so the thought is that the only actual error condition would be > lower bound value > upper bound value? Otherwise, if the range > specification represents an empty set, that's fine, we just normalize it > to 'empty'. Seems consistent to me. +1. Making the error condition independent from the boundary type makes it easy for callers to avoid the error conditions. And it should still catch mistakes that stem from swapping the lower and upper bound. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > More formally, let there be two arbitrary ranges > a,b,i_a,i_b > c,d,i_c,i_d > where the first two parameters are the lower respectively upper bound, and > the last two are booleans saying whether the lower respectively upper bound > is inclusive (true) or exclusive (false). > These ranges are then adjacent exactly if the range > b,c,!i_b,!i_c > is empty. I tried to implement this, and I think it has a small bug. It works as stated if we have b < c. However, if we have b == c, then we want to consider the ranges adjacent if i_b != i_c (ie, only one of them claims the common boundary point). But a singleton range is empty unless it's inclusive on both sides. So we have to have a special case when the bounds are equal. (If b > c, then of course we have to consider the two ranges in the opposite order.) Attached is a draft patch for this. It passes regression tests but I've not tried to exercise it with a canonical function that actually does something different. It's going to be a bit slower than Jeff's original, because it does not only range_cmp_bound_values but also a make_range cycle (in most cases). So I guess the question is how much we care about supporting canonical functions with non-default policies. Thoughts? regards, tom lane diff --git a/src/backend/utils/adt/rangetypes.c b/src/backend/utils/adt/rangetypes.c index 3326cb17c895273fd01c4eda5eb0d65a521d0168..890b6850cb6f5611e9afa922b85b0c64125f31bb 100644 *** a/src/backend/utils/adt/rangetypes.c --- b/src/backend/utils/adt/rangetypes.c *************** range_adjacent(PG_FUNCTION_ARGS) *** 699,704 **** --- 699,706 ---- upper2; bool empty1, empty2; + RangeType *r3; + int cmp; /* Different types should be prevented by ANYRANGE matching rules */ if (RangeTypeGetOid(r1) != RangeTypeGetOid(r2)) *************** range_adjacent(PG_FUNCTION_ARGS) *** 714,736 **** PG_RETURN_BOOL(false); /* ! * For two ranges to be adjacent, the lower boundary of one range has to ! * match the upper boundary of the other. However, the inclusivity of ! * those two boundaries must also be different. ! * ! * The semantics for range_cmp_bounds aren't quite what we need here, so ! * we do the comparison more directly. */ ! if (lower1.inclusive != upper2.inclusive) { ! if (range_cmp_bound_values(typcache, &lower1, &upper2) == 0) ! PG_RETURN_BOOL(true); } ! ! if (upper1.inclusive != lower2.inclusive) { ! if (range_cmp_bound_values(typcache, &upper1, &lower2) == 0) ! PG_RETURN_BOOL(true); } PG_RETURN_BOOL(false); --- 716,758 ---- PG_RETURN_BOOL(false); /* ! * Given two ranges A..B and C..D, where B < C, the ranges are adjacent ! * if and only if the range B..C is empty, where inclusivity of these two ! * bounds is inverted compared to the original bounds. For discrete ! * ranges, we have to rely on the canonicalization function to normalize ! * B..C to empty if it contains no elements of the subtype. If B == C, ! * the ranges are adjacent only if these bounds have different inclusive ! * flags (i.e., exactly one of the ranges includes the common boundary). ! * And if B > C then the ranges cannot be adjacent in this order, but we ! * must consider the other order (i.e., check D <= A). */ ! cmp = range_cmp_bound_values(typcache, &upper1, &lower2); ! if (cmp < 0) { ! upper1.inclusive = !upper1.inclusive; ! upper1.lower = true; ! lower2.inclusive = !lower2.inclusive; ! lower2.lower = false; ! r3 = make_range(typcache, &upper1, &lower2, false); ! PG_RETURN_BOOL(RangeIsEmpty(r3)); } ! if (cmp == 0) { ! PG_RETURN_BOOL(upper1.inclusive != lower2.inclusive); ! } ! cmp = range_cmp_bound_values(typcache, &upper2, &lower1); ! if (cmp < 0) ! { ! upper2.inclusive = !upper2.inclusive; ! upper2.lower = true; ! lower1.inclusive = !lower1.inclusive; ! lower1.lower = false; ! r3 = make_range(typcache, &upper2, &lower1, false); ! PG_RETURN_BOOL(RangeIsEmpty(r3)); ! } ! if (cmp == 0) ! { ! PG_RETURN_BOOL(upper2.inclusive != lower1.inclusive); } PG_RETURN_BOOL(false);
I wrote: > Attached is a draft patch for this. It passes regression tests but I've > not tried to exercise it with a canonical function that actually does > something different. I hacked up int4range_canonical to produce []-style ranges, and confirmed that this version of range_adjacent seems to work with them. > It's going to be a bit slower than Jeff's > original, because it does not only range_cmp_bound_values but also a > make_range cycle (in most cases). So I guess the question is how much > we care about supporting canonical functions with non-default policies. > Thoughts? I did a little bit of performance testing on an x86_64 machine (Fedora 14), and found that the time to execute a clause likeWHERE int4range(1,2) -|- int4range(x, 10000000) (x being an integer Var) grows from 0.37 us to 0.56 us if we adopt the patched version of range_adjacent. With float8 ranges it grows from 0.35 us to 0.54 us. So these are noticeable penalties but they don't seem like show-stoppers. Since the alternative is to document that the apparent freedom to choose a canonicalization policy is illusory, I'm inclined to think we should change it. regards, tom lane
I wrote: > I did a little bit of performance testing on an x86_64 machine (Fedora 14), > and found that the time to execute a clause like > WHERE int4range(1,2) -|- int4range(x, 10000000) > (x being an integer Var) grows from 0.37 us to 0.56 us if we adopt the > patched version of range_adjacent. With float8 ranges it grows from > 0.35 us to 0.54 us. So these are noticeable penalties but they don't > seem like show-stoppers. Since the alternative is to document that > the apparent freedom to choose a canonicalization policy is illusory, > I'm inclined to think we should change it. It occurred to me that we can easily buy back the extra time for range types that don't have a canonical function (ie, continuous ranges). If there's no such function, it's impossible for B..C to normalize to empty when B < C, so we can skip the extra logic. The attached version is no slower than the original code for continuous ranges, and doesn't seem measurably different from my previous patch for discrete ranges. regards, tom lane *** /home/postgres/pgsql/src/backend/utils/adt/rangetypes.c Tue Nov 22 23:18:51 2011 --- new/rangetypes.c Wed Nov 23 16:29:20 2011 *************** *** 699,704 **** --- 699,706 ---- upper2; bool empty1, empty2; + RangeType *r3; + int cmp; /* Different types should be prevented by ANYRANGE matching rules */ if (RangeTypeGetOid(r1) != RangeTypeGetOid(r2)) *************** *** 714,736 **** PG_RETURN_BOOL(false); /* ! * For two ranges to be adjacent, the lower boundary of one range has to ! * match the upper boundary of the other. However, the inclusivity of ! * those two boundaries must also be different. * ! * The semantics for range_cmp_bounds aren't quite what we need here, so ! * we do the comparison more directly. */ ! if (lower1.inclusive != upper2.inclusive) { ! if (range_cmp_bound_values(typcache, &lower1, &upper2) == 0) ! PG_RETURN_BOOL(true); } ! if (upper1.inclusive != lower2.inclusive) { ! if (range_cmp_bound_values(typcache, &upper1, &lower2) == 0) ! PG_RETURN_BOOL(true); } PG_RETURN_BOOL(false); --- 716,774 ---- PG_RETURN_BOOL(false); /* ! * Given two ranges A..B and C..D, where B < C, the ranges are adjacent ! * if and only if the range B..C is empty, where inclusivity of these two ! * bounds is inverted compared to the original bounds. For discrete ! * ranges, we have to rely on the canonicalization function to normalize ! * B..C to empty if it contains no elements of the subtype. (If there is ! * no canonicalization function, it's impossible for such a range to ! * normalize to empty, so we needn't bother to try.) ! * ! * If B == C, the ranges are adjacent only if these bounds have different ! * inclusive flags (i.e., exactly one of the ranges includes the common ! * boundary point). * ! * And if B > C then the ranges cannot be adjacent in this order, but we ! * must consider the other order (i.e., check D <= A). */ ! cmp = range_cmp_bound_values(typcache, &upper1, &lower2); ! if (cmp < 0) ! { ! /* in a continuous subtype, there are assumed to be points between */ ! if (!OidIsValid(typcache->rng_canonical_finfo.fn_oid)) ! PG_RETURN_BOOL(false); ! /* flip the inclusion flags */ ! upper1.inclusive = !upper1.inclusive; ! lower2.inclusive = !lower2.inclusive; ! /* change upper/lower labels to avoid Assert failures */ ! upper1.lower = true; ! lower2.lower = false; ! r3 = make_range(typcache, &upper1, &lower2, false); ! PG_RETURN_BOOL(RangeIsEmpty(r3)); ! } ! if (cmp == 0) { ! PG_RETURN_BOOL(upper1.inclusive != lower2.inclusive); } ! cmp = range_cmp_bound_values(typcache, &upper2, &lower1); ! if (cmp < 0) ! { ! /* in a continuous subtype, there are assumed to be points between */ ! if (!OidIsValid(typcache->rng_canonical_finfo.fn_oid)) ! PG_RETURN_BOOL(false); ! /* flip the inclusion flags */ ! upper2.inclusive = !upper2.inclusive; ! lower1.inclusive = !lower1.inclusive; ! /* change upper/lower labels to avoid Assert failures */ ! upper2.lower = true; ! lower1.lower = false; ! r3 = make_range(typcache, &upper2, &lower1, false); ! PG_RETURN_BOOL(RangeIsEmpty(r3)); ! } ! if (cmp == 0) { ! PG_RETURN_BOOL(upper2.inclusive != lower1.inclusive); } PG_RETURN_BOOL(false);