Thread: Range Types and length function

Range Types and length function

From
Jeff Davis
Date:
Currently, there is no way to define a generic "length" function over
range types, which would give you the distance between the boundary
points.

It sounds simple, but the system actually needs quite a lot more
information to accomplish that:* a function that subtracts two values of the range's subtype* it needs to know the
resulttype of that function, which might not be
 
the subtype (for instance, for timestamp the difference type would be
interval)* it needs to know the "zero" value of the subtype for empty ranges* it also needs to know how to canonicalize
discreteranges for
 
meaningful results -- what's the length of [10,10]? If you write a
difference "canonical" function should the result be different? I
suppose so.

Even if the system knows all of that, we might run into problems with
the type system, because if you have a generic function: f(anyrange) -> anyelement
how would it know whether "anyelement" should be the subtype (e.g. if
"f" is the function "upper") or the difference type (e.g. if "f" is the
function "length")?

My solution to all of this is somewhat simplistic, but the best idea I
have so far:

create function length(anyrange) returns anyelement
language sql as
$$ select case when $1? then upper($1) - lower($1) else '0' end;
$$;

And then, for timestamp[tz] and date, just define specific functions for
those like:

create function length(tsrange) returns interval
language sql as
$$ select case when $1? then upper($1) - lower($1) else '0 s' end;
$$;

In other words, special case the range types where the "difference type"
is not the same as the subtype, and rely on function overloading to sort
them out.

These work for the most part, but they have a few problems:

1. It assumes that "-" really means "minus" and is defined effectively
over the subtypes.

2. It assumes that '0' is valid input for the "zero" value of the
subtype.

3. If the difference type is not the same as the subtype, and you forget
to define the special-case function, then you are bound to get a cryptic
error.

I suppose the "right" way to solve these problems would be:

1. Force users to supply the "minus" function.

2. Force users to supply the "zero" value as a constant of the same type
as the minus function's return value.

3. Check to see if the minus function's return type is different from
the subtype. If so, automatically create a new entry in the catalog for
the "length" function.

I suppose it's not out of the question to do all of that work, but it
seems like a little much just to get the generic length() function.

I don't mind leaving it as-is, and I think it's a fairly reasonable
solution. But I thought I would re-open it for discussion in case
someone has a better idea. The length() function is obviously an
important function to provide.

Regards,Jeff Davis



Re: Range Types and length function

From
Greg Stark
Date:
On Sun, Jun 26, 2011 at 8:18 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>  * it needs to know the result type of that function, which might not be
> the subtype (for instance, for timestamp the difference type would be
> interval)

What's the use case for the length() function? Is it for users to be
able to display useful information about their ranges? Or is it for
implementing things like GIST indexes?

For the latter a length function that always returns a float might be
more useful. Even if it isn't guaranteed to always be perfectly
precise, that is if ranges of similar length sometimes returned
identical values, at least it could be used for things like penalty().


--
greg


Re: Range Types and length function

From
Jeff Davis
Date:
On Sun, 2011-06-26 at 13:45 +0100, Greg Stark wrote:
> On Sun, Jun 26, 2011 at 8:18 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> >  * it needs to know the result type of that function, which might not be
> > the subtype (for instance, for timestamp the difference type would be
> > interval)
> 
> What's the use case for the length() function? Is it for users to be
> able to display useful information about their ranges? Or is it for
> implementing things like GIST indexes?

Here I was talking about something for logical use, not GiST. It's
pretty common to want to know how long a range is.

> For the latter a length function that always returns a float might be
> more useful. Even if it isn't guaranteed to always be perfectly
> precise, that is if ranges of similar length sometimes returned
> identical values, at least it could be used for things like penalty().

I already have a function like that. It's actually a function that takes
the subtype and returns a float, and the GiST code does the subtraction.

But you're right, I could have a length function that always returns a
float instead, and that would do the job. Do you see an advantage?

If I had a length function that returned the subtype, I wouldn't need
that. Except for those pesky types like timestamp -- because then, even
if I had a length() function, I'd also need a total order on the
"interval" type.

Regards,Jeff Davis



Re: Range Types and length function

From
Florian Pflug
Date:
On Jun26, 2011, at 09:18 , Jeff Davis wrote:
> Currently, there is no way to define a generic "length" function over
> range types, which would give you the distance between the boundary
> points.

I actually wouldn't expect there to be one. From what I gathered
during the last discussion, the ideal behind range types is that they
model sets of the form {x in T | a <= x < b} for arbitrary types
T, with the only requirement being that T be ordered. To compute
a length, you additionally need either an algebraic structure on
T which defines an operation "minus", or some metric which defines
distance(a,b). Both are *much* stronger concepts than simply being
ordered. The problems you outline below seem to me to all root in
this discrepancy.

Strings are a nice example of an ordered type on which no "intuitive"
definition of either "s1 - s2" or "distance(s1,s2)" exists.

> I suppose the "right" way to solve these problems would be:
> 
> 1. Force users to supply the "minus" function.
> 
> 2. Force users to supply the "zero" value as a constant of the same type
> as the minus function's return value.
> 
> 3. Check to see if the minus function's return type is different from
> the subtype. If so, automatically create a new entry in the catalog for
> the "length" function.
> 
> I suppose it's not out of the question to do all of that work, but it
> seems like a little much just to get the generic length() function.

That sounds like a poor man's version of type interfaces - i.e. of
a general-purpose way of having various algebraic structures defined
on a type. Having real type interface would be cool, but I don't think
that ranges should introduce a simplistic version of them.

> I don't mind leaving it as-is, and I think it's a fairly reasonable
> solution. But I thought I would re-open it for discussion in case
> someone has a better idea.

I suggest to simply provide no length() function at all. It's beyond
the realm of the mental model behind range types, and providing some
ad-hoc definition is IMHO bound to create more confusion than it'll
help. Especially since "upper(range) - lower(range)" isn't that much
longer to write than "length(range)" anyway.

> The length() function is obviously an
> important function to provide.


I'd say it isn't, but maybe I'm missing some use-case that you have
in mind.

best regards,
Florian Pflug



Re: Range Types and length function

From
Jeff Davis
Date:
On Mon, 2011-06-27 at 00:37 +0200, Florian Pflug wrote:
> I actually wouldn't expect there to be one. From what I gathered
> during the last discussion, the ideal behind range types is that they
> model sets of the form {x in T | a <= x < b} for arbitrary types
> T, with the only requirement being that T be ordered. To compute
> a length, you additionally need either an algebraic structure on
> T which defines an operation "minus", or some metric which defines
> distance(a,b). Both are *much* stronger concepts than simply being
> ordered. The problems you outline below seem to me to all root in
> this discrepancy.

I agree with you here. It does seem like supporting length() increases
the the complexity of range types quite a bit.

> Strings are a nice example of an ordered type on which no "intuitive"
> definition of either "s1 - s2" or "distance(s1,s2)" exists.

Another good point. There's no logical "length()" function at all for a
text range.

> > The length() function is obviously an
> > important function to provide.
> 
> 
> I'd say it isn't, but maybe I'm missing some use-case that you have
> in mind.

The reason I said that is because, if making only a single range type
for, say, timestamptz, I would make a length() function without even
thinking about it.

There are a few types of queries where that kind of thing is useful,
like billing based on the amount of time some resource is allocated to
you.

But I think you're right, it shouldn't be the responsibility of range
types. Perhaps I should leave length() as some inlinable SQL functions
like I mentioned, or perhaps I should remove them completely.

Regards,Jeff Davis



Re: Range Types and length function

From
Florian Pflug
Date:
On Jun27, 2011, at 03:12 , Jeff Davis wrote:
> But I think you're right, it shouldn't be the responsibility of range
> types. Perhaps I should leave length() as some inlinable SQL functions
> like I mentioned, or perhaps I should remove them completely.

Does the current definition of length(range), i.e. upper(range) - lower(range)
deal correctly with open vs. closed ranges and unbounded ranges? I'm thinking
that it probably doesn't - what would be the results of length('[0,1]'::intrange) -- Should be 2
length('[0,1)'::intrange)-- Should be 1 length('[0,inf]'::intrange) -- Should be infinity, but ints can't
                represent that, can't they?
 

If it cannot be easily made to support these cases, than I vote for
removing it all together.

best regards,
Florian Pflug



Re: Range Types and length function

From
Jeff Davis
Date:
On Mon, 2011-06-27 at 12:25 +0200, Florian Pflug wrote:
> Does the current definition of length(range), i.e.
>   upper(range) - lower(range)
> deal correctly with open vs. closed ranges and unbounded ranges? I'm thinking
> that it probably doesn't - what would be the results of
>   length('[0,1]'::intrange) -- Should be 2
>   length('[0,1)'::intrange) -- Should be 1

I alluded to this problem in an earlier email.

I think this would need to be handled by the "canonical" function. If
the canonical function is specified to return values in [) or (] form,
then we'd get the behavior above.

However, it's a little strange, because for discrete ranges you probably
want cardinality, not length. I don't have a clear idea on exactly what
behavior users will expect in this case, which is a pretty good argument
to leave length() out.

>   length('[0,inf]'::intrange) -- Should be infinity, but ints can't
>                                  represent that, can't they?

That would throw an exception currently, for exactly the reason you
mention.

> If it cannot be easily made to support these cases, than I vote for
> removing it all together.

I now agree. I think you've brought up some good reasons for that. If
users write upper(r)-lower(r), then they know what the semantics will
be; or they can easily write their own length() function (perhaps
specific to a range type).

Regards,Jeff Davis