Thread: Range Types: empty ranges

Range Types: empty ranges

From
Jeff Davis
Date:
Do we want empty ranges?

The philosophy is that they are essentially the "zero" value of any
range type. Like the number zero, it allows closure over operations that
would otherwise return an error.

For instance, the number zero is useful because you can do things like: f(x) = 5x + 3;
And even if x is zero, the function is still defined, and even produces
a more "real" number like 3. Sure, when you try to divide by zero, you
have a problem, but otherwise it works.

Similarly, "intersection" of ranges is somewhat analogous to
multiplication of numbers.

I have a feeling that there are a lot of applications that might use
intersection in combination with other operators, like overlaps or
contains. If we do allow empty ranges, I think it will be seamless in
most of those situations because "overlaps" and "contains" are
well-defined for empty ranges. But if we don't allow empty ranges, I
suspect that it will cause some user surprise, because depending on the
order of operations an empty range may be created (causing an error) or
not.

The cost, of course, is that not all operations are well-defined for
empty ranges. I think those are mostly operators like those mentioned in
the other thread: ">>" (strictly right of), "<<" (strictly left of), and
"-|-" (adjacent); and perhaps "&>" and "&<". These are probably used a
little less frequently, and should probably not be used in a context
where empty ranges are permitted (if they are, it's likely a mistake and
an error should be thrown).

My feeling is that we should let the operation proceed as far as it is
well-defined, and no further; and I think that means empty ranges should
be allowed.

Thoughts? Do the benefits outweigh the costs?

Regards,Jeff Davis



Re: Range Types: empty ranges

From
Josh Berkus
Date:
On 2/11/11 10:11 AM, Jeff Davis wrote:
> Thoughts? Do the benefits outweigh the costs?

I guess I'm having trouble tying the concept of empty ranges to any
reality external to the database.

For example, what would the time range:

'('15:15:00','15:15:00')'

... represent exactly?  "A non-existant point in time which might or
might not be near 3:15 PM"?  For my personal use of ranges, I'm very
reluctant to embrace a concept in the database which can't possibly
depict something concrete.

Really, it seems like you're trying to fix NULL by separating the
concept of EMPTY from NULL.  Which is a good idea in general, I'm just
not sure that this is the way to do it.

HOWEVER, I also recognize that range types might be used for scientific
applications, in which the mathematical concepts of imaginary ranges and
empty ranges might be valid.  So I wouldn't want to prohibit this
feature for the people who need it.

BUT ... if I, in one of my applications, accidentally defined something
as having the range '('15:15:00','15:15:00')', I would *want* the
database to through an error and not accept it.

So, if we allow empty ranges of this kind, I would want a GUC for
"allow_empty_ranges".

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Range Types: empty ranges

From
"Kevin Grittner"
Date:
Jeff Davis <pgsql@j-davis.com> wrote:
> The philosophy is that they are essentially the "zero" value of
> any range type. Like the number zero, it allows closure over
> operations that would otherwise return an error.
Well, zero has a pretty well defined and easy to understand meaning.
How many ostriches do you have in your back yard?  In my case,
zero.  That's distinct from NULL, which would mean the answer is
unknown.
> Similarly, "intersection" of ranges is somewhat analogous to
> multiplication of numbers.
Ah, so the point is that the intersection of '[13:00,15:00)' and
'[08:00,10:00)' is the empty range, not NULL -- because you *know*
they don't overlap?
That seems useful.
What seems not useful is to say that such a value is both strictly
to the right *and* strictly to the left of a range.  That seems to
fly in the face of the normal meaning of "strict".  Nor does it seem
reasonable to say that it is adjacent to a range.  It seems like it
would be useful to be able to say that if A is adjacent to B and B
is adjacent to C that the three ranges could be considered as one
contiguous range.  You'd get in trouble with such an assumption if
an empty B was considered adjacent to all ranges.  I see similar
problems with an empty B if A is strictly to the left of B and B is
strictly to the left of C, where C is strictly to the left of A.
I'm somewhat inclined to agree with Robert that such comparisons
should not throw errors.  At this point, my inclination is to say
that UNKNOWN should be the result (which I believe is equivalent to
NULL::boolean).
-Kevin


Re: Range Types: empty ranges

From
"Kevin Grittner"
Date:
Josh Berkus <josh@agliodbs.com> wrote:
> if I, in one of my applications, accidentally defined something
> as having the range '('15:15:00','15:15:00')', I would *want* the
> database to through an error and not accept it.
I can agree with that, but I think that range '[15:15:00,15:15:00)'
should be valid as a zero-length range between, for example,
'[15:00:00,15:15:00)' and '[15:15:00,15:30:00)'.
-Kevin


Re: Range Types: empty ranges

From
Robert Haas
Date:
On Fri, Feb 11, 2011 at 1:11 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> Similarly, "intersection" of ranges is somewhat analogous to
> multiplication of numbers.

I had a feeling that we might be going in this direction.  It strikes
me that this case is a bit like division by zero.  It's kind of a
nuisance that dividing by zero throws an error and we COULD fix that
by making it return NULL or NaN or some new distinguished value DbZ.
But then we'd have to define what happens when you feed DbZ into every
other operation in the system, and similarly here.  If we define two
non-overlapping ranges as intersecting to NULL, or as throwing an
error, then everything else is clear after that.  I'm not sure it's
worth complicating the representation and the definitions of other
operations to cater to this case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types: empty ranges

From
Robert Haas
Date:
On Fri, Feb 11, 2011 at 1:50 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Josh Berkus <josh@agliodbs.com> wrote:
>
>> if I, in one of my applications, accidentally defined something
>> as having the range '('15:15:00','15:15:00')', I would *want* the
>> database to through an error and not accept it.
>
> I can agree with that, but I think that range '[15:15:00,15:15:00)'
> should be valid as a zero-length range between, for example,
> '[15:00:00,15:15:00)' and '[15:15:00,15:30:00)'.

How would that actually work?  I kind of agree with Josh: I'd be
inclined to make the type input function boot such constructs at the
door.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types: empty ranges

From
Jeff Davis
Date:
On Fri, 2011-02-11 at 10:28 -0800, Josh Berkus wrote:
> I guess I'm having trouble tying the concept of empty ranges to any
> reality external to the database.

That's true, but in the same sense as zero has no meaning outside of the
database.

It's really that it has nice mathematical properties coming from set
theory. Take the distributive law:
 A UNION (B INTERSECT C) = (A UNION B) INTERSECT (A UNION C)

If (B INTERSECT C) is empty, then the result should be A. With empty
ranges, that holds on both sides. Without them, it doesn't hold on the
left side.

People implicitly use this kind of logic all the time when constructing
queries. If one form works, and the other throws an error, it will cause
confusion.

> For example, what would the time range:
> 
> '('15:15:00','15:15:00')'
> 
> ... represent exactly?  "A non-existant point in time which might or
> might not be near 3:15 PM"?

That's meaningless and will throw an error. An empty range is not
anchored at any particular point, so any two empty ranges are equal.

> BUT ... if I, in one of my applications, accidentally defined something
> as having the range '('15:15:00','15:15:00')', I would *want* the
> database to through an error and not accept it.

Absolutely. That kind of input should throw an error (and does).

> So, if we allow empty ranges of this kind, I would want a GUC for
> "allow_empty_ranges".

I think that would be the least desirable option. If we don't like empty
ranges, let's prohibit them entirely.

Or, there are always check constraints...

Regards,Jeff Davis




Re: Range Types: empty ranges

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
>> I think that range '[15:15:00,15:15:00)' should be valid as a
>> zero-length range between, for example, '[15:00:00,15:15:00)' and
>> '[15:15:00,15:30:00)'.
> 
> How would that actually work?  I kind of agree with Josh: I'd be
> inclined to make the type input function boot such constructs at
> the door.
It makes more sense in the context of a range of some type with a
clearly defined granularity.  Our accounting system, for example,
can assign a new range of receipt IDs for each calendar year.  If
you want a variable to represent the receipts for traffic receipts
for 2012, you might, in preparation for the upcoming year, define
something to declare the range as '[12TR000001,12TR000001)'.  When
the first receipt is assigned as 12TR000001, that is updated to
'[12TR000001,12TR000002)'.  Just as an off-the-cuff example.
Basically, with a type having well-defined granularity, a [) range
could usefully represent, "start to last used", and start out empty.
-Kevin


Re: Range Types: empty ranges

From
Robert Haas
Date:
On Fri, Feb 11, 2011 at 2:06 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, 2011-02-11 at 10:28 -0800, Josh Berkus wrote:
>> I guess I'm having trouble tying the concept of empty ranges to any
>> reality external to the database.
>
> That's true, but in the same sense as zero has no meaning outside of the
> database.
>
> It's really that it has nice mathematical properties coming from set
> theory. Take the distributive law:
>
>  A UNION (B INTERSECT C) = (A UNION B) INTERSECT (A UNION C)

But the basic range type isn't even closed under UNION.  It seems like
what you really need if you want this stuff to work out smoothly is a
MULTIRANGE type.  Then this problem goes away: the union or
intersection of two ranges is a multirange, and after that everything
is closed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types: empty ranges

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> Basically, with a type having well-defined granularity, a [) range
> could usefully represent, "start to last used", and start out
> empty.
I guess that would actually be "start to next available"....
-Kevin


Re: Range Types: empty ranges

From
Jeff Davis
Date:
On Fri, 2011-02-11 at 13:08 -0600, Kevin Grittner wrote:
> It makes more sense in the context of a range of some type with a
> clearly defined granularity.  Our accounting system, for example,
> can assign a new range of receipt IDs for each calendar year.  If
> you want a variable to represent the receipts for traffic receipts
> for 2012, you might, in preparation for the upcoming year, define
> something to declare the range as '[12TR000001,12TR000001)'.  When
> the first receipt is assigned as 12TR000001, that is updated to
> '[12TR000001,12TR000002)'.  Just as an off-the-cuff example.
>  
> Basically, with a type having well-defined granularity, a [) range
> could usefully represent, "start to last used", and start out empty.

I think this is trying to make a range into something that's not. A
range is a set of values with the additional constraint that there are
no "gaps".

Trying to incorporate a "start value" is adding extra information in
there, and it's not really a part of the same algebra. It sounds more
like a contiguous sequence with a "start value" and a "current value" to
me.

Sequences have other useful operations, like "next", and things like
"overlaps" don't really seem to make sense (at least not in a practical
way that I can tell).

Regards,Jeff Davis



Re: Range Types: empty ranges

From
"David E. Wheeler"
Date:
On Feb 11, 2011, at 10:28 AM, Josh Berkus wrote:

> So, if we allow empty ranges of this kind, I would want a GUC for
> "allow_empty_ranges".

GUC you, josh. ;-P

D


Re: Range Types: empty ranges

From
Jeff Davis
Date:
On Fri, 2011-02-11 at 13:50 -0500, Robert Haas wrote: 
> On Fri, Feb 11, 2011 at 1:11 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> > Similarly, "intersection" of ranges is somewhat analogous to
> > multiplication of numbers.
> 
> I had a feeling that we might be going in this direction.  It strikes
> me that this case is a bit like division by zero.

Except that we do happen to allow the value zero and wait 'til someone
divides by it before throwing an error. So I think that's more of a
point toward allowing empty ranges than rejecting them.

>   It's kind of a
> nuisance that dividing by zero throws an error and we COULD fix that
> by making it return NULL or NaN or some new distinguished value DbZ.

But empty ranges are actually quite well-defined, in a way similar to an
empty set.
* it can meaningfully result in a non-empty range at a later stage of computation
* it increases the number of tautologies, rather than decreasing them like NULL

I guess what I'm saying is that DbZ doesn't seem particularly useful to
carry along, while and empty range plausibly is.

> But then we'd have to define what happens when you feed DbZ into every
> other operation in the system, and similarly here.

If your point is that empty ranges need to be handled specially
sometimes, I agree. That is the semantic cost which I identified in the
original email. Are the benefits worth it?

> If we define two
> non-overlapping ranges as intersecting to NULL, or as throwing an
> error, then everything else is clear after that.

Well, there is a certain amount of localized clarity, I will agree with
that. The complexity comes when you accidentally rely on some
transformation which seems logically sound, but could result in a
transient empty range, which then throws an error.

Regards,Jeff Davis



Re: Range Types: empty ranges

From
Jeff Davis
Date:
On Fri, 2011-02-11 at 14:14 -0500, Robert Haas wrote:
> > It's really that it has nice mathematical properties coming from set
> > theory. Take the distributive law:
> >
> >  A UNION (B INTERSECT C) = (A UNION B) INTERSECT (A UNION C)
> 
> But the basic range type isn't even closed under UNION.

An excellent point. Allow me to move the target a little:
 WHERE A && B AND A && C
and: WHERE A && (B INTERSECT C)

That seems like a logically sound transformation, but if (B INTERSECT C)
is empty, it relies on the empty range for those two to be equivalent.
And that would be a runtime error, caught during testing only if you're
lucky.

Now, I agree that lack of closure on UNION exhibits many of the problems
that I am pointing out related to forbidding empty ranges. However, I'm
not sure if that means we should give up on either.

Regards,Jeff Davis



Re: Range Types: empty ranges

From
Robert Haas
Date:
On Fri, Feb 11, 2011 at 3:03 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> Well, there is a certain amount of localized clarity, I will agree with
> that. The complexity comes when you accidentally rely on some
> transformation which seems logically sound, but could result in a
> transient empty range, which then throws an error.

But by this argument you also need to support discontiguous ranges, don't you?

I mean, if you want to insist that A intersect B has to still be a
legal range, what about A union B?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types: empty ranges

From
"Kevin Grittner"
Date:
Jeff Davis <pgsql@j-davis.com> wrote:
> Trying to incorporate a "start value" is adding extra information
> in there, and it's not really a part of the same algebra. It
> sounds more like a contiguous sequence with a "start value" and a
> "current value" to me.
Well, in the receipt number example there are multiple ranges in use
for each year, and ranges for multiple years.  If we get to the idea
of a multi-ranges, this would be very handy for certain types of
reports -- especially for auditors.  It's not that we can't do with
with discrete begin and end columns -- we do that now; but it seemed
a potentially beneficial use of ranges for us, if they can represent
the needed states.  People already talk about these as ranges, just
in terms of the common English understanding of the word.
Perhaps it was a mistake to get so concrete rather than conceptual
-- basically, it seems like it could be a useful concept for any
planned or scheduled range with an indeterminate end point, which
you want to "reserve" up front and record in progress until
complete.  The alternative would be that such "ranges to be" have a
parallel "planned start value" column of the same type as the range,
to be used as the start of the range once it is not empty.  Or, as
another way to put it, it seems potentially useful to me to have an
empty range which is pinned to a location, in *addition* to the
unpinned empty ranges such as would be needed to represent the
intersection of two non-overlapping ranges.
Of course, the *most* useful places for our shop to have ranges are
temporal.  Many (most?) of those are situations where you start with
a range with an unknown end and later (often years later) fill in
the end of the range based on some event which finally closes it. 
Again, two discreet dates with a null-capable end-date work, but I
can see where ranges could potentially be more powerful.
-Kevin


Re: Range Types: empty ranges

From
Jeff Davis
Date:
On Fri, 2011-02-11 at 14:19 -0600, Kevin Grittner wrote:
> Well, in the receipt number example there are multiple ranges in use
> for each year, and ranges for multiple years.  If we get to the idea
> of a multi-ranges, this would be very handy for certain types of
> reports -- especially for auditors.  It's not that we can't do with
> with discrete begin and end columns -- we do that now; but it seemed
> a potentially beneficial use of ranges for us, if they can represent
> the needed states.  People already talk about these as ranges, just
> in terms of the common English understanding of the word.

I think that might indicate that the word "range" is a little too loose.
The definition that I've been using is more like a mathematical
interval.

> Perhaps it was a mistake to get so concrete rather than conceptual
> -- basically, it seems like it could be a useful concept for any
> planned or scheduled range with an indeterminate end point, which
> you want to "reserve" up front and record in progress until
> complete.

Maybe this is a range... would it be served by: (5, INF)
or: [5, INF)
?

That's already supported, and it means "all points greater than 5".

> Of course, the *most* useful places for our shop to have ranges are
> temporal.  Many (most?) of those are situations where you start with
> a range with an unknown end and later (often years later) fill in
> the end of the range based on some event which finally closes it. 
> Again, two discreet dates with a null-capable end-date work, but I
> can see where ranges could potentially be more powerful.

Ranges support infinite boundaries, but do not support NULL (previous
discussion concluded that NULL boundaries were likely to be confusing
and served no obvious use case not handled by infinity).

Regards,Jeff Davis



Re: Range Types: empty ranges

From
Jeff Davis
Date:
On Fri, 2011-02-11 at 15:14 -0500, Robert Haas wrote:
> On Fri, Feb 11, 2011 at 3:03 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> > Well, there is a certain amount of localized clarity, I will agree with
> > that. The complexity comes when you accidentally rely on some
> > transformation which seems logically sound, but could result in a
> > transient empty range, which then throws an error.
> 
> But by this argument you also need to support discontiguous ranges, don't you?
> 
> I mean, if you want to insist that A intersect B has to still be a
> legal range, what about A union B?
> 

I responded to a similar question/point here:
http://archives.postgresql.org/pgsql-hackers/2011-02/msg01073.php

Regards,Jeff Davis



Re: Range Types: empty ranges

From
"Kevin Grittner"
Date:
Jeff Davis <pgsql@j-davis.com> wrote:
>> Perhaps it was a mistake to get so concrete rather than
>> conceptual -- basically, it seems like it could be a useful
>> concept for any planned or scheduled range with an indeterminate
>> end point, which you want to "reserve" up front and record in
>> progress until complete.
> 
> Maybe this is a range... would it be served by:
>   (5, INF)
> or:
>   [5, INF)
> ?
> 
> That's already supported, and it means "all points greater than
> 5".
Well that would create a range of infinite size until you started
using it, at which point it would drop to tiny and start to increase
until completion, which would be kinda weird as well as error-prone.
At least for the uses I'm considering.  I'm sure we can work around
it if it isn't supported, but for our purposes, [x,x) ranges would
be a useful construct.
>> Of course, the *most* useful places for our shop to have ranges
>> are temporal.  Many (most?) of those are situations where you
>> start with a range with an unknown end and later (often years
>> later) fill in the end of the range based on some event which
>> finally closes it.  Again, two discreet dates with a null-capable
>> end-date work, but I can see where ranges could potentially be
>> more powerful.
> 
> Ranges support infinite boundaries, but do not support NULL
> (previous discussion concluded that NULL boundaries were likely to
> be confusing and served no obvious use case not handled by
> infinity).
Yeah, infinity works fine as long as you realize it's only
temporary.  I don't know whether there are any practical uses where
on the low end it doesn't mean "some particular date we don't know
or don't care enough to determine" and on the high end doesn't mean
"some particular date which we don't know or which hasn't yet been
fixed".  For the latter we have, so far, been using NULL, since we
were trying to stay portable and not all products support infinity.
We're committing to PostgreSQL now, so we could start to use
infinity for these temporarily indefinite values, and I would expect
to do so in ranges.
-Kevin


Re: Range Types: empty ranges

From
Nathan Boley
Date:
FWIW, a very informal survey of probabilists didn't yield any reason
for trying to put an order on the empty set ( unless the metric was
cardinality or other equivalence relation ).

I think the problem here is that the idea of union and intersection
forming a ring over sets is being conflated with the order relation.
Clearly, combining the two notions can be inconsistent.  However...

>> >  A UNION (B INTERSECT C) = (A UNION B) INTERSECT (A UNION C)
>>
>> But the basic range type isn't even closed under UNION.
>
> An excellent point. Allow me to move the target a little:
>
>  WHERE A && B AND A && C
> and:
>  WHERE A && (B INTERSECT C)
>
> That seems like a logically sound transformation, but if (B INTERSECT C)
> is empty, it relies on the empty range for those two to be equivalent.
>
> Now, I agree that lack of closure on UNION exhibits many of the problems
> that I am pointing out related to forbidding empty ranges. However, I'm
> not sure if that means we should give up on either.

This seems potentially very useful, because we can transform WHERE A
&& B AND A && C from a bitmap scan into WHERE A && (B INTERSECT C), a
simple index scan. In the union case ( even if we had a type that
supported disjoint intervals), I doubt we would ever make that
transformation because the index will probably still be over connected
intervals.

So, +1 for keeping it how it is ( but maybe with a better error message ).

Best,
Nathan


Re: Range Types: empty ranges

From
Martijn van Oosterhout
Date:
On Fri, Feb 11, 2011 at 10:11:45AM -0800, Jeff Davis wrote:
> The cost, of course, is that not all operations are well-defined for
> empty ranges. I think those are mostly operators like those mentioned in
> the other thread: ">>" (strictly right of), "<<" (strictly left of), and
> "-|-" (adjacent); and perhaps "&>" and "&<". These are probably used a
> little less frequently, and should probably not be used in a context
> where empty ranges are permitted (if they are, it's likely a mistake and
> an error should be thrown).

But surely this is just a matter of your definitions? If you define
"A strictly left of B" as "all points in A are less than all points in
B", then you might have a problem with "all points" of an empty range.

If you define it as "not (any points in A to the right of any points in
B)" then the answer for an empty range is well defined (namely true).

I think once you make workable definitions then empty ranges should
pose no problems. Sure, it may be that an empty range will be both to
the left and to the right of every other set. That doesn't make it
wrong.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: Range Types: empty ranges

From
Jan Wieck
Date:
On 2/11/2011 1:50 PM, Kevin Grittner wrote:
> Josh Berkus<josh@agliodbs.com>  wrote:
>
>>  if I, in one of my applications, accidentally defined something
>>  as having the range '('15:15:00','15:15:00')', I would *want* the
>>  database to through an error and not accept it.
>
> I can agree with that, but I think that range '[15:15:00,15:15:00)'
> should be valid as a zero-length range between, for example,
> '[15:00:00,15:15:00)' and '[15:15:00,15:30:00)'.

Does ['15:15:00','15:15:00') make any more sense? Doesn't this 
essentially mean
    >= '15:15:00' && < '15:15:00'

which again doesn't include a single point on the time line?


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


Re: Range Types: empty ranges

From
"Kevin Grittner"
Date:
Jan Wieck <JanWieck@Yahoo.com> wrote:
> Does ['15:15:00','15:15:00') make any more sense? Doesn't this 
> essentially mean
> 
>      >= '15:15:00' && < '15:15:00'
> 
> which again doesn't include a single point on the time line?
It defines a position in time with zero duration.
Some of the graphics programming I've done in the past was based on
a system where, at the pixel level, the coordinates referred to the
boundaries *between* the pixels.  If you were to have a number of
horizontal bars, for example, the size of which you would adjust to
represent fluctuating data, you might have an x coordinate of 20 for
the left edge, and [20,30) would paint 10 pixels.  I guess you
*could* destroy and recreate the object when the number dropped to
zero and came off it again; but the concept of [20,20) to draw zero
pixels but maintain the positional anchor can be convenient.  I see
parallel concepts for some data domains in a database.
-Kevin