Re: Range Types and extensions - Mailing list pgsql-hackers

From Darren Duncan
Subject Re: Range Types and extensions
Date
Msg-id 4DED49BE.9070902@darrenduncan.net
Whole thread Raw
In response to Re: Range Types and extensions  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Range Types and extensions
List pgsql-hackers
Jeff Davis wrote:
> On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:
>> Jeff Davis wrote:
>>> I'd like to take another look at Range Types and whether part of it
>>> should be an extension. Some of these issues relate to extensions in
>>> general, not just range types.
>>>
>>> First of all, what are the advantages to being in core?
>> I believe that ranges aka intervals are widely useful generic types, next after 
>> relations/tuples/arrays, and they *should* be supported in core, same as arrays are.
> 
> I think we all agree that ranges are important. I am not suggesting that
> we sacrifice on the semantics to make it an extension; I'm just trying
> to see if involving extensions for some of the approximately 5000 lines
> would be a good idea.

Generally speaking, the best way to go about this is to define the *generic* 
data type in the core, and leave most operators to extensions.  So, in core, we 
need to have the way to select a range value over ANYTYPE either completely as a 
value literal or in terms of endpoint values from arbitrary expressions or 
variables, store the range value in a database, retrieve it, and access its 
component attributes (endpoints, open/closed) in user-defined constraint and 
operator definitions.

The fundamental value of ranges is the fact that they're a concise way to store 
and express an interval over an ordered type, and to either compare such 
intervals or test whether individual values or sets of values are in intervals.  And people do that a *lot* (such as
withdates), so I see having this range 
 
type, which is generic and orthogonal to other types in the same way as arrays 
or tables are, in core just makes the most sense, and as previously illustrated, 
ranges are useful in places one might not always think about.

Ranges are also much more flexible than BETWEEN for what it does, because AFAIK 
you can't indicate open or closed with BETWEEN.

You should not need to define separate range types or operators for each ordered 
type, same as you should not have to do so for arrays, or where such 
functionality is defined should be similar; whatever functionality for arrays 
you do or don't define in core, do corresponding things for ranges.

>> Now assuming that a range/interval value is generally defined in terms of a pair 
>> of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or 
>> {<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value is 
>> capable of distinguishing open and closed intervals.
> 
> Right, it already does that explicitly. I'd appreciate your input on
> some of the previous discussion though.

On this note, here's a *big* thing that needs discussion ...

Citing this whole FOREACH talk, we need to recognize that this talk about ranges 
is actually being overloaded for 2 very distinct concepts, which are probably 
best dealt with separately, possibly as distinct types.

This discussion came up in the development of Perl 6 too, and that discussion is 
probably worth looking into.

Ranges/intervals in the general sense can *not* be used to enumerate a list of 
values in a standard type-sensical manner, such as FOREACH requires. 
Ranges/intervals are about *comparison*, meaning combinations of tests of how 2 
arbitrary values of an ordered type sort relative to each other, and that's it.  This usage works for integers, other
numbers,strings, dates, and so on, all 
 
in a natural manner.

Value enumeration, such as in a FOREACH, is a *separate* concept.

The comparison and enumeration tasks have distinct sets of operators and are 
used in distinct contexts.  Enumeration requires next/prev-value operators, 
while ranges/intervals in general do not.  Enumeration requires discrete types 
(or the faking of such) like integers while ranges work for continuous types.

Moreover, in practice, one probably wants enumerations to be more flexible than 
just monotonic increases.  With enumerations you'd probably want to start go 
top-down or bottom-up, you might want to increase geometrically or by some other 
formula rather than incrementally.

I totally agree with sharing syntax and using ranges/intervals to define 
sequence generators, but a range value should be considered immutable like a 
number or string while a sequence generator may mutate.

For syntax, one could use "x..y" to define an interval while "x...y" for a 
sequence generator, or that's what Perl 6 does.

See also http://perlcabal.org/syn/S03.html#Range_and_RangeIter_semantics that 
talks about how Perl 6 does ranges.

>> Also, if Postgres has some concept of type-generic special values -Inf and +Inf 
>> (which always sort before or after any other value in the type system), those 
>> can be used as endpoints to indicate that the interval is unbounded.
> 
> I already introduced +/- infinity to range types. They are not generic
> outside of ranges, however -- therefore you can't select the upper bound
> of an upper-infinite range.

Well, what you have is the least one would want.

>> Unless you have some other syntax in mind, I suggest lifting the range literal 
>> syntax from Perl 6, where ".." is an infix operator building a range between its 
>> arguments, and a "^" on either side means that side is open, I think; so there 
>> are 4 variants: {..,^..,..^,^..^}.
> 
> Oh, interesting syntax. That might make a good operator version of a
> constructor. Unfortunately, "." is not valid in an operator name in PG.
> Maybe I can use tilde or dash?

Can Pg be changed to support "." in operator names as long as they don't just 
appear by themselves?  What would this break to do so?

>> Any operation that wants to deal with a range somehow, such as the BETWEEN 
>> syntax, could instead use a range/interval; for example, both of:
>>
>>    foo in 1..10
> 
> I don't know if it's reasonable to introduce syntax like "in" here.
> Maybe we could just still use "between" and it would recognize that the
> RHS is a range?

I believe it is quite reasonable to treat ranges like sets, in an abstract 
sense, and so using set membership syntax like "in" is valid.  Same as one 
should be able to use "in" to test whether a value is in an array.  I would 
expect "in" to be a polymorphic infix operator same as "<" or "=" etc are, 
aren't they?  This shouldn't conflict with testing tuples in relations as they 
are different types, same as you can use the same "<" for numbers and strings, 
can't you?

We could add parenthesis if that helps:
  foo in (1..10)

>> The LIMIT clause could take a range to specify take and skip count at once.
> 
> Interesting idea.
> 
>> Array slicing can be done using foo[first..last] or such.
> 
> I like that, but we already have foo[3:7], so it might be better not to
> introduce redundancy. Too bad I can't use ":" as an operator.

On that note, some languages use ":" for defining intervals rather than "..".

Some languages also use round parenthesis or curly braces to define intervals, 
but I really don't like that and we shouldn't use it.

>> A random number generator that takes endpoints can take a range argument.
> 
> Sounds useful because it would make it more explicit whether the
> endpoints are possible results.

Exactly.

>> An array or relation of these range can represent ranges with holes, and the 
>> general results of range union operations.
> 
> Right, that's been brought up before as well. In particular, Scott
> Bailey has done some thinking/writing on this topic.

I also see these as considerably less important and useful in practice than the 
continuous intervals.  Facilities for discontinuous intervals could more easily 
be left to extensions than those for continuous ones.  I see the continuous as 
more fundamental, at least in the same manner as seeing integers as more 
fundamental than rationals (you can define the latter with the former), though 
one could define things in the opposite manner too.

> Regards,
>     Jeff Davis

-- Darren Duncan


pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: WALInsertLock tuning
Next
From: Alvaro Herrera
Date:
Subject: Re: Postmaster holding unlinked files for pg_largeobject table