Thread: Range Types, discrete and/or continuous

Range Types, discrete and/or continuous

From
Jeff Davis
Date:
I'd like to open the discussion for Range Types again. This is a fairly
long email because several issues are intertwined, and I don't think
they can be neatly pulled apart. Previous discussions:
 http://archives.postgresql.org/pgsql-hackers/2009-12/msg01162.php
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01126.php

Wiki page:
 http://wiki.postgresql.org/wiki/RangeTypes

Last development cycle, one of the questions that was unresolved was
whether to handle ranges like a discrete set (that is, [1,5) = [1,4] )
or continuous or both.

I think that discrete ranges are required. For instance, day range and
IP address ranges are two examples where treating them continuously
would clearly cause confusion. "Monday until Thursday" is the same as
"Monday through Wednesday," and it would be dangerous to treat them as
different values.

Timestamp ranges are a little trickier because the resolution is much
finer. There's a certain intuition for either representation:* We could look at them as continuous, and say that the
resolutionis
 
merely an implementation detail.* We could look at them as a discrete range, saying that it just
happens to have a very fine granularity.

I don't think intuition helps us much here, because different people
have different intuitions for different problems. The second seems more
mathematically sound to me, because the resolution will ultimately
affect the semantics at some point (whether it's seconds or days), so it
can't be purely an implementation detail.

For instance, let's say you have two ranges:
R1 = [ 2009-01-01 01:00:00, 2009-01-01 01:00:10 ]R2 = [ 2009-01-01 01:00:00, 2009-01-01 01:00:10.000001 )

If we treat those as discrete, then R1 = R2, R1 contains R2, R2 contains
R1, and R2 - R1 = R1 - R2 = empty. However, if we treat those as
continuous, then we get a contradiction:
R2 contains R1R1 does not contain R2R2 - R1 = ( 2009-01-01 01:00:10, 2009-01-01 01:00:10.000001 ) = empty?

So clearly, the granularity can make a difference in the semantics. It
can only be treated as an implementation detail if we say "nobody will
ever have timestamps that close," which seems pretty flimsy. I think
everyone would agree that seconds are plenty large enough that conflicts
are likely, and there are only a million microseconds per second. It's
easy to imagine that such cases might appear if you are using interval
math or if you are collecting data from multiple inputs constantly
(network log data, sensor networks, etc).

A couple problems though: one is float types. Technically they can be
treated as discrete, but the granularity is variable, which may pose
some implementation awkwardness (for instance, trying to make use of
division or multiplication for granules won't work). I suppose we can
write off float4/8 because the users should expect some fuzziness, and
we can treat them as continuous. But if we want floating-point
timestamps to be supported by range types at all, do we treat those as
continuous while integer timestamps are discrete? Seems odd.

Also, there's a good case for continuous ranges for types like NUMERIC,
but still continuous ranges don't seem quite as important overall. One
way to make NUMERIC ranges discrete is by specifying a granule size; and
that has the added benefit that you can use that for other types as well
(like timestamp ranges where you only care about second granularity).
But then there are problems, like someone trying to say "float4 range
with granularity 0.1", which is asking for trouble. I suppose we could
say again that people using float4 are expecting some fuzziness.

The way I see it, there are three approaches to resolving these issues:

1. Support only discrete ranges, possibly allowing the specification of
the granule.
2. Support two different ways to specify range types: one for discrete
and one for continuous.
3. Combine the discrete and continuous interfaces by making the API a
higher level and forcing the type definition to make all of the
decisions. I think it's possible to do this in such a way that postgres
doesn't really care whether a type is continuous or discrete, based on
some ideas from Nathan Boley. This is "Approach 2" on wiki page.

#1 allows for #2 to happen later, so I don't think it makes sense to
choose #2 at this stage. However, #1 may preclude doing #3 later.

Right now I'm leaning toward #3. I like this approach quite a lot, and
it only has two real disadvantages:
 * Harder to do sanity checks at the time of definition. * No easy way to declare a range type that's slightly
differentfrom
 
an existing type. For instance, you can't just specify a different
granule with simple syntax, you'd need to handle that in the functions
used to define the range type.

I don't think either of those things are a major problem. Defining a new
range type requires a bit of thought to get right, and I don't think we
can come up with one interface that forces users to get it right. So,
it's reasonable to say that defining a new range type that's not
built-in should be an extension, rather than just a little DDL. The
important part of this feature is that postgres will know what types are
range types and how to manipulate them.

Also, if we start with #3, we might be able to add some syntax sugar on
top to make it easy to define new range types with simple DDL in some
simple situations.

Thoughts?

Regards,Jeff Davis






Re: Range Types, discrete and/or continuous

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> Last development cycle, one of the questions that was unresolved was
> whether to handle ranges like a discrete set (that is, [1,5) = [1,4] )
> or continuous or both.

Put me in the camp that says you need both.  I really seriously dislike
the idea of representing [1, 2) as [1, 2-epsilon], mainly because there
is often no portable value for epsilon.  Dump-and-restore would be quite
hazardous.

> If we treat those as discrete, then R1 = R2, R1 contains R2, R2 contains
> R1, and R2 - R1 = R1 - R2 = empty. However, if we treat those as
> continuous, then we get a contradiction:
>  R2 contains R1
>  R1 does not contain R2
>  R2 - R1 = ( 2009-01-01 01:00:10, 2009-01-01 01:00:10.000001 ) = empty?

This is a circular argument: your conclusion that there's a
contradiction in the concept of continuous ranges depends on the
assumption that the datatype is discrete; and with such an assumption
*of course* you can get a contradiction.

But the real problem is that if the user wants to think in terms of
continuous ranges, the only way that he can convert those to discrete
ranges is to assume an epsilon for the datatype, and he shouldn't be
forced to do that; not even if the datatype does have a well-defined
epsilon at the implementation level, which several of ours don't.
        regards, tom lane


Re: Range Types, discrete and/or continuous

From
Robert Haas
Date:
On Sun, Oct 24, 2010 at 6:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
>> Last development cycle, one of the questions that was unresolved was
>> whether to handle ranges like a discrete set (that is, [1,5) = [1,4] )
>> or continuous or both.
>
> Put me in the camp that says you need both.  I really seriously dislike
> the idea of representing [1, 2) as [1, 2-epsilon], mainly because there
> is often no portable value for epsilon.  Dump-and-restore would be quite
> hazardous.

+1.  I think the right way to do this is - in general, a range should
be stored as a start value, a stop value, and 2 extra bits indicating
whether each end is open or closed.  (And perhaps a few bits beyond
that if you want to support things like (4, inf) over a base type that
has no infinite value, which I'm not sure about.)  Then you can
optionally allow a "convert-to-closed" operation, which will convert a
[3, 5) over int to [3, 4], but be undefined for float/timestamp
ranges.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types, discrete and/or continuous

From
David Fetter
Date:
On Sun, Oct 24, 2010 at 06:59:34PM -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > Last development cycle, one of the questions that was unresolved
> > was whether to handle ranges like a discrete set (that is, [1,5) =
> > [1,4] ) or continuous or both.
> 
> Put me in the camp that says you need both.  I really seriously
> dislike the idea of representing [1, 2) as [1, 2-epsilon], mainly
> because there is often no portable value for epsilon.
> Dump-and-restore would be quite hazardous.

It wouldn't be stored as (1, 2-epsilon).  It would be stored more like
(1, 2, closed, open).

> > If we treat those as discrete, then R1 = R2, R1 contains R2, R2 contains
> > R1, and R2 - R1 = R1 - R2 = empty. However, if we treat those as
> > continuous, then we get a contradiction:
> >  R2 contains R1
> >  R1 does not contain R2
> >  R2 - R1 = ( 2009-01-01 01:00:10, 2009-01-01 01:00:10.000001 ) = empty?
> 
> This is a circular argument: your conclusion that there's a
> contradiction in the concept of continuous ranges depends on the
> assumption that the datatype is discrete; and with such an
> assumption *of course* you can get a contradiction.

There's a reason all the temporal database theory is done with
discrete time, even though physics, etc., have been known to model
time as continuous.

If you have a coherent, worked-out theory of continuous ranges, please
feel free to develop and publish it just as Snodgrass, et al., have
done with discrete ranges, but please *don't* feel free to assume that
you can just wave a magic wand and make continuous time ranges "just
work" because it pleases you aesthetically.

The burden of proof of that would be on you to show that the theory so
far developed extends into continuous ranges, then back into floating
point.

You would need to start with the first part.  I don't think it's fair
to make the project however long it takes you to explore this,
especially if it turns out that continuous ranges *don't* work in some
way that's congruent with discrete ranges.

> But the real problem is that if the user wants to think in terms of
> continuous ranges, the only way that he can convert those to
> discrete ranges is to assume an epsilon for the datatype, and he
> shouldn't be forced to do that; not even if the datatype does have a
> well-defined epsilon at the implementation level, which several of
> ours don't.

They're all well defined, but not uniform.  This is true even in the
case of integer timestamps.  We don't actually have a real "real"
type, despite the fact that "real" is synonymous with some kind of
floating point thingy.

Let's start with discrete range types, and if it ever turns out that
we actually want some kind of model of continuous range types, we can
do the basic research that would involve, etc., later.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Range Types, discrete and/or continuous

From
Heikki Linnakangas
Date:
On 25.10.2010 01:59, Tom Lane wrote:
> Jeff Davis<pgsql@j-davis.com>  writes:
>> If we treat those as discrete, then R1 = R2, R1 contains R2, R2 contains
>> R1, and R2 - R1 = R1 - R2 = empty. However, if we treat those as
>> continuous, then we get a contradiction:
>>   R2 contains R1
>>   R1 does not contain R2
>>   R2 - R1 = ( 2009-01-01 01:00:10, 2009-01-01 01:00:10.000001 ) = empty?
>
> This is a circular argument: your conclusion that there's a
> contradiction in the concept of continuous ranges depends on the
> assumption that the datatype is discrete; and with such an assumption
> *of course* you can get a contradiction.

Let's open that up a bit:
>> R2 - R1 = ( 2009-01-01 01:00:10, 2009-01-01 01:00:10.000001 )

Correct.
>> ( 2009-01-01 01:00:10, 2009-01-01 01:00:10.000001 ) = empty?

No. The problem here is the unpack operator, ie. getting all discrete 
points within a range. It depends on the discreteness.

I'm not sure what the ramifications of that are. It means that 
PACK(UNPACK(r)) != r, and I believe many of the other operators are 
defined in terms of pack/unpack, even though there's more practical 
implementations of them. Can we get away without pack/unpack? Can we 
define all the range operations without them?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Range Types, discrete and/or continuous

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Sun, Oct 24, 2010 at 06:59:34PM -0400, Tom Lane wrote:
>> Put me in the camp that says you need both.  I really seriously
>> dislike the idea of representing [1, 2) as [1, 2-epsilon], mainly
>> because there is often no portable value for epsilon.
>> Dump-and-restore would be quite hazardous.

> It wouldn't be stored as (1, 2-epsilon).  It would be stored more like
> (1, 2, closed, open).

Fine ...

> If you have a coherent, worked-out theory of continuous ranges, please
> feel free to develop and publish it just as Snodgrass, et al., have
> done with discrete ranges, but please *don't* feel free to assume that
> you can just wave a magic wand and make continuous time ranges "just
> work" because it pleases you aesthetically.

That is FUD, and nothing more.  If you know a concrete reason why
Postgres shouldn't provide both closed and open ranges, you need to
explain it, not claim that there might be a reason someplace and it's
someone else's problem to prove your point for you.

I don't have any problem with specific operations failing for open-ended
ranges, if there isn't a meaningful result for the case; but that
doesn't lead me to the conclusion that every operation is meaningless
for open-ended ranges.

>> But the real problem is that if the user wants to think in terms of
>> continuous ranges, the only way that he can convert those to
>> discrete ranges is to assume an epsilon for the datatype, and he
>> shouldn't be forced to do that; not even if the datatype does have a
>> well-defined epsilon at the implementation level, which several of
>> ours don't..

> They're all well defined, but not uniform.

And that's not even FUD, it's simply wrong.  Even if you're prepared to
claim that users should understand the precise behavior of their local
floating-point type, what about NUMERIC?
        regards, tom lane


Re: Range Types, discrete and/or continuous

From
David Fetter
Date:
On Mon, Oct 25, 2010 at 10:21:49AM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > On Sun, Oct 24, 2010 at 06:59:34PM -0400, Tom Lane wrote:
> >> Put me in the camp that says you need both.  I really seriously
> >> dislike the idea of representing [1, 2) as [1, 2-epsilon], mainly
> >> because there is often no portable value for epsilon.
> >> Dump-and-restore would be quite hazardous.
> 
> > It wouldn't be stored as (1, 2-epsilon).  It would be stored more
> > like (1, 2, closed, open).
> 
> Fine ...
> 
> > If you have a coherent, worked-out theory of continuous ranges,
> > please feel free to develop and publish it just as Snodgrass, et
> > al., have done with discrete ranges, but please *don't* feel free
> > to assume that you can just wave a magic wand and make continuous
> > time ranges "just work" because it pleases you aesthetically.
> 
> That is FUD, and nothing more.  If you know a concrete reason why
> Postgres shouldn't provide both closed and open ranges, you need to
> explain it, not claim that there might be a reason someplace and
> it's someone else's problem to prove your point for you.

That you're confusing the open/closed ranges with discrete/continuous
tells me that it's *you* who doesn't understand the issues at hand.

If you can present or develop a coherent theory of continuous ranges,
that's great.  If you can present or develop a theory that merges that
one with discrete ranges, that's great too.  Until then, let's get the
discrete ranges going and disallow ones based on the continuum, i.e.
floats.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Range Types, discrete and/or continuous

From
Jeff Davis
Date:
On Sun, 2010-10-24 at 18:59 -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > Last development cycle, one of the questions that was unresolved was
> > whether to handle ranges like a discrete set (that is, [1,5) = [1,4] )
> > or continuous or both.
> 
> Put me in the camp that says you need both.  I really seriously dislike
> the idea of representing [1, 2) as [1, 2-epsilon], mainly because there
> is often no portable value for epsilon.  Dump-and-restore would be quite
> hazardous.
> 

OK. I tried to present a couple approaches for achieving that. To
summarize:

The most obvious way would be different code paths and DDL options that
let postgresql know whether it's continuous or discrete. That may make
it easier to create new range types with just DDL and without defining
any low-level functions, and postgresql may be able to take care of
representational issues.

Another way, suggested by Nathan Boley, is to require the type
definition to do a lot of work and define its own representation that's
opaque to postgres. Then, postgres would ask for information through
accessors like min (null if open at beginning), max (null if open at
end), upper bound, lower bound, and flags (to indicate null or infinite
boundaries). This requires more work to define a new range type, and it
certainly couldn't be done with DDL only. However, it seems to allow
discrete and continuous ranges to work together more seamlessly and
share more code. I am leaning toward this approach.

Regards,Jeff Davis



Re: Range Types, discrete and/or continuous

From
Robert Haas
Date:
On Mon, Oct 25, 2010 at 12:51 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Sun, 2010-10-24 at 18:59 -0400, Tom Lane wrote:
>> Jeff Davis <pgsql@j-davis.com> writes:
>> > Last development cycle, one of the questions that was unresolved was
>> > whether to handle ranges like a discrete set (that is, [1,5) = [1,4] )
>> > or continuous or both.
>>
>> Put me in the camp that says you need both.  I really seriously dislike
>> the idea of representing [1, 2) as [1, 2-epsilon], mainly because there
>> is often no portable value for epsilon.  Dump-and-restore would be quite
>> hazardous.
>>
>
> OK. I tried to present a couple approaches for achieving that. To
> summarize:
>
> The most obvious way would be different code paths and DDL options that
> let postgresql know whether it's continuous or discrete. That may make
> it easier to create new range types with just DDL and without defining
> any low-level functions, and postgresql may be able to take care of
> representational issues.
>
> Another way, suggested by Nathan Boley, is to require the type
> definition to do a lot of work and define its own representation that's
> opaque to postgres. Then, postgres would ask for information through
> accessors like min (null if open at beginning), max (null if open at
> end), upper bound, lower bound, and flags (to indicate null or infinite
> boundaries). This requires more work to define a new range type, and it
> certainly couldn't be done with DDL only. However, it seems to allow
> discrete and continuous ranges to work together more seamlessly and
> share more code. I am leaning toward this approach.

I'm still confused.  It seems to me (and maybe I'm full of it) that
the distinction between continuous ranges and discrete ranges is
pretty minor.  Suppose you have continuous ranges done, and working.
The only thing you need to add for discrete ranges (I think) is a
canonicalization function that converts a range with one or both ends
open to a range with both ends closed.  Then you just apply this
canonicalization functions to every value supplied by the user before
doing anything else with it.  Poof, discrete ranges!  What am I
missing?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types, discrete and/or continuous

From
"Kevin Grittner"
Date:
Jeff Davis <pgsql@j-davis.com> wrote:
> Last development cycle, one of the questions that was unresolved
> was whether to handle ranges like a discrete set (that is, [1,5) =
> [1,4] ) or continuous or both.
> 
> I think that discrete ranges are required. For instance, day range
> and IP address ranges are two examples where treating them
> continuously would clearly cause confusion. "Monday until
> Thursday" is the same as "Monday through Wednesday," and it would
> be dangerous to treat them as different values.
All of the use cases I've been able to imagine as useful for our
shop would work fine with discrete ranges.  Continuous ranges seem
as though they would be more clumsy and dangerous for these use
cases.  Having not seen or imagined a practical use case for
continuous ranges, I'm indifferent to support for them.
It would be very useful to be able to specify a granularity -- for
example timestamps with a five minute granularity would be useful
for scheduling appointments. In some cases the granularity might be
inferred -- if we have a domain defined as numeric(13,2), it would
be nice if the default granularity was 0.01::numeric.
-Kevin


Re: Range Types, discrete and/or continuous

From
Jeff Davis
Date:
On Mon, 2010-10-25 at 12:20 -0500, Kevin Grittner wrote:
> It would be very useful to be able to specify a granularity -- for
> example timestamps with a five minute granularity would be useful
> for scheduling appointments. In some cases the granularity might be
> inferred -- if we have a domain defined as numeric(13,2), it would
> be nice if the default granularity was 0.01::numeric.

I don't think typmod really helps us much. It's more a property of the
column than the type.

To specify different granularities, I don't think we can avoid
specifying new types with their own entries in pg_type.

Regards,Jeff Davis



Re: Range Types, discrete and/or continuous

From
Robert Haas
Date:
On Mon, Oct 25, 2010 at 2:01 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2010-10-25 at 12:20 -0500, Kevin Grittner wrote:
>> It would be very useful to be able to specify a granularity -- for
>> example timestamps with a five minute granularity would be useful
>> for scheduling appointments. In some cases the granularity might be
>> inferred -- if we have a domain defined as numeric(13,2), it would
>> be nice if the default granularity was 0.01::numeric.
>
> I don't think typmod really helps us much. It's more a property of the
> column than the type.
>
> To specify different granularities, I don't think we can avoid
> specifying new types with their own entries in pg_type.

Why?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types, discrete and/or continuous

From
Jeff Davis
Date:
On Mon, 2010-10-25 at 13:00 -0400, Robert Haas wrote:
> I'm still confused.  It seems to me (and maybe I'm full of it) that
> the distinction between continuous ranges and discrete ranges is
> pretty minor.  Suppose you have continuous ranges done, and working.
> The only thing you need to add for discrete ranges (I think) is a
> canonicalization function that converts a range with one or both ends
> open to a range with both ends closed.  Then you just apply this
> canonicalization functions to every value supplied by the user before
> doing anything else with it.  Poof, discrete ranges!  What am I
> missing?

That's not too far from what I'm suggesting. On the wiki page, under
"approach 2" you'll see that one of the functions needed is a
"constructor" which would put it into a canonical form (if applicable)
and construct the representation.

I think the difference is that I assumed that the UDFs used for the type
definition would handle both canonicalization and representation. I
think what you're suggesting is that postgres could handle
representation, and just always call the UDF to put it in canonical form
first. That might make it easier to define new types, but might limit
any representation optimizations that certain range types may be able to
exploit. Either approach seems reasonable to me.

I know the wiki page isn't very formal about the approaches yet, but as
we start to coalesce around a basic idea I'll write it up in more
detail.

Regards,Jeff Davis



Re: Range Types, discrete and/or continuous

From
Jeff Davis
Date:
On Mon, 2010-10-25 at 14:11 -0400, Robert Haas wrote:
> On Mon, Oct 25, 2010 at 2:01 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> > On Mon, 2010-10-25 at 12:20 -0500, Kevin Grittner wrote:
> >> It would be very useful to be able to specify a granularity -- for
> >> example timestamps with a five minute granularity would be useful
> >> for scheduling appointments. In some cases the granularity might be
> >> inferred -- if we have a domain defined as numeric(13,2), it would
> >> be nice if the default granularity was 0.01::numeric.
> >
> > I don't think typmod really helps us much. It's more a property of the
> > column than the type.
> >
> > To specify different granularities, I don't think we can avoid
> > specifying new types with their own entries in pg_type.
> 
> Why?

Because typmod doesn't survive through a function call. Even if it did,
I don't think typmod has a real answer for type promotion, implicit
casting etc.

If we lose the typmod (and therefore the granularity), then a function
like "adjacent" is difficult to answer if we use a closed-closed
canonical representation (as you suggested); and if we use a closed-open
representation then it's difficult to answer a question like whether a
range contains a specific timestamp.

Can I turn the question around and ask how you intend to make it work
without new entries in pg_type?

Regards,Jeff Davis



Re: Range Types, discrete and/or continuous

From
Robert Haas
Date:
On Mon, Oct 25, 2010 at 2:27 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2010-10-25 at 13:00 -0400, Robert Haas wrote:
>> I'm still confused.  It seems to me (and maybe I'm full of it) that
>> the distinction between continuous ranges and discrete ranges is
>> pretty minor.  Suppose you have continuous ranges done, and working.
>> The only thing you need to add for discrete ranges (I think) is a
>> canonicalization function that converts a range with one or both ends
>> open to a range with both ends closed.  Then you just apply this
>> canonicalization functions to every value supplied by the user before
>> doing anything else with it.  Poof, discrete ranges!  What am I
>> missing?
>
> That's not too far from what I'm suggesting. On the wiki page, under
> "approach 2" you'll see that one of the functions needed is a
> "constructor" which would put it into a canonical form (if applicable)
> and construct the representation.
>
> I think the difference is that I assumed that the UDFs used for the type
> definition would handle both canonicalization and representation. I
> think what you're suggesting is that postgres could handle
> representation, and just always call the UDF to put it in canonical form
> first. That might make it easier to define new types, but might limit
> any representation optimizations that certain range types may be able to
> exploit. Either approach seems reasonable to me.

<reads wiki page>

Hmm.  Do you have some concrete examples of cases where a range type
might want to do some representational optimization?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types, discrete and/or continuous

From
Robert Haas
Date:
On Mon, Oct 25, 2010 at 2:44 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2010-10-25 at 14:11 -0400, Robert Haas wrote:
>> On Mon, Oct 25, 2010 at 2:01 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> > On Mon, 2010-10-25 at 12:20 -0500, Kevin Grittner wrote:
>> >> It would be very useful to be able to specify a granularity -- for
>> >> example timestamps with a five minute granularity would be useful
>> >> for scheduling appointments. In some cases the granularity might be
>> >> inferred -- if we have a domain defined as numeric(13,2), it would
>> >> be nice if the default granularity was 0.01::numeric.
>> >
>> > I don't think typmod really helps us much. It's more a property of the
>> > column than the type.
>> >
>> > To specify different granularities, I don't think we can avoid
>> > specifying new types with their own entries in pg_type.
>>
>> Why?
>
> Because typmod doesn't survive through a function call. Even if it did,
> I don't think typmod has a real answer for type promotion, implicit
> casting etc.
>
> If we lose the typmod (and therefore the granularity), then a function
> like "adjacent" is difficult to answer if we use a closed-closed
> canonical representation (as you suggested); and if we use a closed-open
> representation then it's difficult to answer a question like whether a
> range contains a specific timestamp.
>
> Can I turn the question around and ask how you intend to make it work
> without new entries in pg_type?

Oh, maybe I'm confused.  Are you saying you'd need multiple copies of
the base type, or multiple range types based on a single base type?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types, discrete and/or continuous

From
Jeff Davis
Date:
On Mon, 2010-10-25 at 18:28 -0400, Robert Haas wrote:
> Oh, maybe I'm confused.  Are you saying you'd need multiple copies of
> the base type, or multiple range types based on a single base type?

The latter. That is, if you want a timestamp range with granularity 1
second, and a timestamp range with granularity 1 minute, I think those
need to have their own entries in pg_type.

The way I look at it, typmod just doesn't help at all. It's useful
perhaps for constraining what a column can hold (like a different kind
of CHECK constraint), or perhaps for display purposes. But typmod isn't
really a part of the type system itself.

There may be some utility in a pseudo-type like "anyrange", but I think
that's a separate issue.

Regards,Jeff Davis




Re: Range Types, discrete and/or continuous

From
Jeff Davis
Date:
On Mon, 2010-10-25 at 18:03 -0400, Robert Haas wrote:
> Hmm.  Do you have some concrete examples of cases where a range type
> might want to do some representational optimization?

Let's say for instance you want to keep an timestamp range in 16 bytes.
You could have an 8-byte timestamp, a 7-byte integer that represents the
offset from that timestamp in microseconds, and one byte for flags (e.g.
NULL or infinite boundaries, etc.). I'm not sure that you can make that
representation work in a generic way.

It's not critical, and perhaps not even desirable. But it crossed my
mind because alignment might make a 17-byte type look like 24 bytes,
which seems pretty wasteful to me.

Regards,Jeff Davis




Re: Range Types, discrete and/or continuous

From
Robert Haas
Date:
On Mon, Oct 25, 2010 at 8:01 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2010-10-25 at 18:28 -0400, Robert Haas wrote:
>> Oh, maybe I'm confused.  Are you saying you'd need multiple copies of
>> the base type, or multiple range types based on a single base type?
>
> The latter. That is, if you want a timestamp range with granularity 1
> second, and a timestamp range with granularity 1 minute, I think those
> need to have their own entries in pg_type.

OK, I agree with that.  Sorry.

> The way I look at it, typmod just doesn't help at all. It's useful
> perhaps for constraining what a column can hold (like a different kind
> of CHECK constraint), or perhaps for display purposes. But typmod isn't
> really a part of the type system itself.

I view that as a problem in need of fixing, but that's another discussion.

> There may be some utility in a pseudo-type like "anyrange", but I think
> that's a separate issue.

Yeah, interesting idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types, discrete and/or continuous

From
Robert Haas
Date:
On Mon, Oct 25, 2010 at 8:13 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2010-10-25 at 18:03 -0400, Robert Haas wrote:
>> Hmm.  Do you have some concrete examples of cases where a range type
>> might want to do some representational optimization?
>
> Let's say for instance you want to keep an timestamp range in 16 bytes.
> You could have an 8-byte timestamp, a 7-byte integer that represents the
> offset from that timestamp in microseconds, and one byte for flags (e.g.
> NULL or infinite boundaries, etc.). I'm not sure that you can make that
> representation work in a generic way.

See, that gets complicated, because now you're restricting the range
of values that can be expressed by the range type to something less
than the natural range of the data type.  I am not sure the value of
supporting that is sufficient to justify the amount of extra code that
will be required to make it work.  I'd say for a first version, nail
down the representation.  Perhaps in a future version you could have
compress/uncompress methods sort of like GIST, but for a first cut it
seems highly desirable to be able to say something like:

CREATE TYPE int_range AS RANGE (BASETYPE = int);

I hear you complaining that we don't know the values you've called
dtype, cmpfunc, addfunc, and subfunc.  It seems pretty reasonable to
extract cmpfunc, if unspecified, from the default btree opclass for
the data type.  For the rest, I'm inclined to propose that we support
something like:

ALTER TYPE timestamptz   ADD INTERFACE increment timestamp_pl_interval(timestamptz, interval),   ADD INTERFACE
decrementtimestamp_mi_interval(timestamptz, interval); 

or

ALTER TYPE integer  ADD INTERFACE increment int4pl (integer, integer),  ADD INTERFACE decrement int4mi (integer,
integer), ADD VALUE addative_unit 1::integer,  ADD VALUE addative_identity 0::integer; 

IIRC, Window functions need this information too, so there's value in
associating it with the base type, even if we want to allow users to
override it when creating ranges.

> It's not critical, and perhaps not even desirable. But it crossed my
> mind because alignment might make a 17-byte type look like 24 bytes,
> which seems pretty wasteful to me.

There's no requirement that you set typalign='d'; it's just that if
you don't the entries may not be aligned.  But it may be that that's a
small price to pay for shrinking the on-disk footprint.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types, discrete and/or continuous

From
Hitoshi Harada
Date:
2010/10/26 Robert Haas <robertmhaas@gmail.com>:
> On Mon, Oct 25, 2010 at 8:13 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> On Mon, 2010-10-25 at 18:03 -0400, Robert Haas wrote:
>>> Hmm.  Do you have some concrete examples of cases where a range type
>>> might want to do some representational optimization?
>>
>> Let's say for instance you want to keep an timestamp range in 16 bytes.
>> You could have an 8-byte timestamp, a 7-byte integer that represents the
>> offset from that timestamp in microseconds, and one byte for flags (e.g.
>> NULL or infinite boundaries, etc.). I'm not sure that you can make that
>> representation work in a generic way.
>
> See, that gets complicated, because now you're restricting the range
> of values that can be expressed by the range type to something less
> than the natural range of the data type.  I am not sure the value of
> supporting that is sufficient to justify the amount of extra code that
> will be required to make it work.  I'd say for a first version, nail
> down the representation.  Perhaps in a future version you could have
> compress/uncompress methods sort of like GIST, but for a first cut it
> seems highly desirable to be able to say something like:
>
> CREATE TYPE int_range AS RANGE (BASETYPE = int);
>
> I hear you complaining that we don't know the values you've called
> dtype, cmpfunc, addfunc, and subfunc.  It seems pretty reasonable to
> extract cmpfunc, if unspecified, from the default btree opclass for
> the data type.  For the rest, I'm inclined to propose that we support
> something like:
>
> ALTER TYPE timestamptz
>    ADD INTERFACE increment timestamp_pl_interval(timestamptz, interval),
>    ADD INTERFACE decrement timestamp_mi_interval(timestamptz, interval);
>
> or
>
> ALTER TYPE integer
>   ADD INTERFACE increment int4pl (integer, integer),
>   ADD INTERFACE decrement int4mi (integer, integer),
>   ADD VALUE addative_unit 1::integer,
>   ADD VALUE addative_identity 0::integer;
>
> IIRC, Window functions need this information too, so there's value in
> associating it with the base type, even if we want to allow users to
> override it when creating ranges.

Yeah, window functions will require 'how to add or subtract value' of
ORDER BY expression data type to search window frame boundary when we
want to support RANGE BETWEEN frame option. I tried to retrieve the
information by hard-coding '+'/'-' to get it from pg_oper in the
previous patch, but actually we found out we need more general way
like above. This will help it. But I don't have blue-print of catalog
format for it yet, between knn gist and range types.

Regards,


--
Hitoshi Harada


Re: Range Types, discrete and/or continuous

From
Jeff Davis
Date:
On Mon, 2010-10-25 at 21:07 -0400, Robert Haas wrote:
> See, that gets complicated, because now you're restricting the range
> of values that can be expressed by the range type to something less
> than the natural range of the data type.  I am not sure the value of
> supporting that is sufficient to justify the amount of extra code that
> will be required to make it work.  I'd say for a first version, nail
> down the representation.  Perhaps in a future version you could have
> compress/uncompress methods sort of like GIST,

OK, I can live with that. 

> ALTER TYPE timestamptz
>     ADD INTERFACE increment timestamp_pl_interval(timestamptz, interval),
>     ADD INTERFACE decrement timestamp_mi_interval(timestamptz, interval);

I think we chatted about this before. Sounds like a good idea to me
(except the name -- "increment" is not the same as "plus").

However, this is orthogonal, I think. I can always ask the user to
specify everything when creating a Range Type, and then we can make them
default to use the interface functions later. Some, like "plus" might be
constant, but people certainly might want to specify alternate
comparators.

Regards,Jeff Davis




Re: Range Types, discrete and/or continuous

From
Robert Haas
Date:
On Tue, Oct 26, 2010 at 1:26 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2010-10-25 at 21:07 -0400, Robert Haas wrote:
>> See, that gets complicated, because now you're restricting the range
>> of values that can be expressed by the range type to something less
>> than the natural range of the data type.  I am not sure the value of
>> supporting that is sufficient to justify the amount of extra code that
>> will be required to make it work.  I'd say for a first version, nail
>> down the representation.  Perhaps in a future version you could have
>> compress/uncompress methods sort of like GIST,
>
> OK, I can live with that.
>
>> ALTER TYPE timestamptz
>>     ADD INTERFACE increment timestamp_pl_interval(timestamptz, interval),
>>     ADD INTERFACE decrement timestamp_mi_interval(timestamptz, interval);
>
> I think we chatted about this before. Sounds like a good idea to me
> (except the name -- "increment" is not the same as "plus").
>
> However, this is orthogonal, I think. I can always ask the user to
> specify everything when creating a Range Type, and then we can make them
> default to use the interface functions later. Some, like "plus" might be
> constant, but people certainly might want to specify alternate
> comparators.

If it were me, I would go design and implement the type interface part
first.   But it's not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types, discrete and/or continuous

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Oct 26, 2010 at 1:26 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> However, this is orthogonal, I think. I can always ask the user to
>> specify everything when creating a Range Type, and then we can make them
>> default to use the interface functions later. Some, like "plus" might be
>> constant, but people certainly might want to specify alternate
>> comparators.

> If it were me, I would go design and implement the type interface part
> first.   But it's not.

I agree with Jeff's plan: seems like taking a first cut at the higher
level is worthwhile, to make sure you know what you need from the
type-system interfaces.

FWIW, I don't agree with the proposed syntax.  We already have a
perfectly extensible CREATE TYPE syntax, so there is no reason to
implement this as an ALTER TYPE operation.  What's more, altering
existing datatype declarations is fraught with all kinds of fun
risks, as we were reminded with the recent enum patch.
        regards, tom lane


Re: Range Types, discrete and/or continuous

From
Robert Haas
Date:
On Tue, Oct 26, 2010 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Oct 26, 2010 at 1:26 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>>> However, this is orthogonal, I think. I can always ask the user to
>>> specify everything when creating a Range Type, and then we can make them
>>> default to use the interface functions later. Some, like "plus" might be
>>> constant, but people certainly might want to specify alternate
>>> comparators.
>
>> If it were me, I would go design and implement the type interface part
>> first.   But it's not.
>
> I agree with Jeff's plan: seems like taking a first cut at the higher
> level is worthwhile, to make sure you know what you need from the
> type-system interfaces.
>
> FWIW, I don't agree with the proposed syntax.  We already have a
> perfectly extensible CREATE TYPE syntax, so there is no reason to
> implement this as an ALTER TYPE operation.  What's more, altering
> existing datatype declarations is fraught with all kinds of fun
> risks, as we were reminded with the recent enum patch.

Fair enough.  I'm not wedded to the syntax (or the order of development).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Range Types, discrete and/or continuous

From
Dimitri Fontaine
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> Also, there's a good case for continuous ranges for types like NUMERIC,
> but still continuous ranges don't seem quite as important overall.

Well it seems to me that the prefix_range type is continuous, so I would
tend to disagree here :)

A prefix_range is currently written as e.g. '012[3-5]' but could be
worked out to be represented '0123-0125'. In any case here are some
example of values that are contained in the mentioned range:
 0123 0124 012345 0123456789 0125221956

You get the idea, I suppose. The main use case is telephony routing:
searching for a prefix in your operator table when you know the phone
number that's calling you.

So let's not forget about continuous ranges, because there's at least
one real-world case that needs them. Oh and it's based on text really,
not on numbers, so that could be the range 'abc-abe'.

I'm too tired to try to understand how your proposal fits with what
would be needed to cover prefix ranges tonight, but I wanted to share
the use case at least.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Range Types, discrete and/or continuous

From
Dimitri Fontaine
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Would you be comfortable writing that '012[3-5]' range as
> '[0123, 0126)' or something similar?  What benefits do you see to
> using a range for prefixes versus a regular expression?

Your proposed syntax would do fine, sure. Something like this is even on
the TODO list for prefix indexing, but for the internal representation,
as I think there might be some optimisation potential there. Meanwhile,
it would be easy enough to accept alternative input syntax.

I don't see what benefits I'd get from regexp alike input syntax as all
we need to support is 'foo.*', or if you prefer LIKE syntax, 'foo%'.

Now please remind that the literal is a full phone number, and the table
has the prefixes. So the table would have regular expressions and the
indexing would be about optimising searches of which regexp "best" fits
the input literal. It seems to me the idea of a range makes it easier
here.

Oh, and there's a meaningful overlap notion too, even if depending on
the application you can't enforce non-overlapping ranges (a carrier
might own almost all the '01234' prefix, but another one owns the
'012345' prefix). It gets very funny if you include the country code in
the prefix, too, as they run from one to 5 digits according to
 http://en.wikipedia.org/wiki/List_of_country_calling_codes

Still, we're talking about continuous ranges I think, because there's no
way to count the elements that fits in any given prefix_range.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Range Types, discrete and/or continuous

From
Jeff Davis
Date:
On Mon, 2010-11-01 at 20:36 +0100, Dimitri Fontaine wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> > Would you be comfortable writing that '012[3-5]' range as
> > '[0123, 0126)' or something similar?  What benefits do you see to
> > using a range for prefixes versus a regular expression?
> 
> Your proposed syntax would do fine, sure. Something like this is even on
> the TODO list for prefix indexing, but for the internal representation,
> as I think there might be some optimisation potential there. Meanwhile,
> it would be easy enough to accept alternative input syntax.

Interesting example of a situation where the representation can be
optimized. I suspected that this was the case, but perhaps my example
wasn't as compelling:

http://archives.postgresql.org/pgsql-hackers/2010-10/msg01842.php

This suggests that there should be some way for the user to specify
their own representation function.

Regards,Jeff Davis