Thread: range_agg

range_agg

From

Paul Jungwirth

Date:

03 May 2019, 22:56:41

Hello,

I wrote an extension to add a range_agg function with similar behavior 
to existing *_agg functions, and I'm wondering if folks would like to 
have it in core? Here is the repo: https://github.com/pjungwir/range_agg

I'm also working on a patch for temporal foreign keys, and having 
range_agg would make the FK check easier and faster, which is why I'd 
like to get it added. But also it just seems useful, like array_agg, 
json_agg, etc.

One question is how to aggregate ranges that would leave gaps and/or 
overlaps. So in my extension there is a one-param version that forbids 
gaps & overlaps, but I let you permit them by passing extra parameters, 
so the signature is:

     range_agg(r anyrange, permit_gaps boolean, permit_overlaps boolean)

Perhaps another useful choice would be to return NULL if a gap/overlap 
is found, so that each param would have three choices instead of just 
two: accept the inputs, raise an error, return a NULL.

What do people think? I plan to work on a patch regardless, so that I 
can use it for temporal FKs, but I'd appreciate some feedback on the 
"user interface".

Thanks,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: range_agg

From

David Fetter

Date:

04 May 2019, 01:41:47

On Fri, May 03, 2019 at 03:56:41PM -0700, Paul Jungwirth wrote:
> Hello,
> 
> I wrote an extension to add a range_agg function with similar behavior to
> existing *_agg functions, and I'm wondering if folks would like to have it
> in core? Here is the repo: https://github.com/pjungwir/range_agg
> 
> I'm also working on a patch for temporal foreign keys, and having range_agg
> would make the FK check easier and faster, which is why I'd like to get it
> added. But also it just seems useful, like array_agg, json_agg, etc.
> 
> One question is how to aggregate ranges that would leave gaps and/or
> overlaps.

This suggests two different ways to extend ranges over aggregation:
one which is a union of (in general) disjoint intervals, two others
are a union of intervals, each of which has a weight.  Please pardon
the ASCII art.

The aggregation of:

[1, 4)
  [2, 5)
             [8, 10)

could turn into:

{[1,5), [8, 10)} (union without weight)

{{[1,2),1}, {[2,4),2}, {[4,5),1}, {[8,10),1}} (strictly positive weights which don't (in general) cover the space)

{{[1,2),1}, {[2,4),2}, {[4,5),1}, {[5,8),0}, {[8,10),1}} (non-negative weights which guarantee the space is covered)

There is no principled reason to choose one over the others.

> What do people think? I plan to work on a patch regardless, so that I can
> use it for temporal FKs, but I'd appreciate some feedback on the "user
> interface".

I think the cases above, or at least the first two of them, should be
available. They could be called range_agg, weighted_range_agg, and
covering_range_agg.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: range_agg

From

Corey Huinker

Date:

04 May 2019, 22:11:34

One question is how to aggregate ranges that would leave gaps and/or
overlaps. So in my extension there is a one-param version that forbids
gaps & overlaps, but I let you permit them by passing extra parameters,
so the signature is:

Perhaps a third way would be to allow and preserve the gaps.

A while back I wrote an extension called disjoint_date_range for storing sets of dates where it was assumed that most dates would be contiguous. The basic idea was that The core datatype was an array of ranges of dates, and with every modification you'd unnest them all to their discrete elements and use a window function to identify "runs" of dates and recompose them into a canonical set. It was an efficient way of representing "Every day last year except for June 2nd and August 4th, when we closed business for special events."

For arrays of ranges the principle is the same but it'd get a bit more tricky, you'd have to order by low bound, use window functions to detect adjacency/overlap to identify your runs, and the generate the canonical minimum set of ranges in your array.

Re: range_agg

From

Paul Jungwirth

Date:

06 May 2019, 18:17:46

On 5/4/19 3:11 PM, Corey Huinker wrote:
>     One question is how to aggregate ranges that would leave gaps and/or
>     overlaps. So in my extension there is a one-param version that forbids
>     gaps & overlaps, but I let you permit them by passing extra parameters,
>     so the signature is:
> 
> 
> Perhaps a third way would be to allow and preserve the gaps.

Thanks for the feedback! I think this is what I'm doing already 
(returning an array of ranges), but let me know if I'm misunderstanding. 
My extension has these signatures:

     range_agg(anyrange) returning anyrange
     range_agg(anyrange, bool) returning anyarray
     range_agg(anyrange, bool, bool) returning anyarray.

The first variant raises an error if there are gaps or overlaps and 
always returns a single range, but the other two return an array of ranges.

I was planning to use the same signatures for my patch to pg, unless 
someone thinks they should be different. But I'm starting to wonder if 
they shouldn't *all* return arrays. I have two concrete use-cases for 
these functions and they both require the array-returning versions. Is 
it helpful to have a version that always returns a single range? Or 
should I make them all consistent?

Thanks,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: range_agg

From

Paul Jungwirth

Date:

06 May 2019, 18:19:27

On 5/3/19 6:41 PM, David Fetter wrote:
> This suggests two different ways to extend ranges over aggregation:
> one which is a union of (in general) disjoint intervals, two others
> are a union of intervals, each of which has a weight.
> . . .
> I think the cases above, or at least the first two of them, should be
> available. They could be called range_agg, weighted_range_agg, and
> covering_range_agg.

Thanks David! I think these two new functions make sense. Before I 
implement them too I wonder if anyone else has uses for them?

Thanks,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: range_agg

From

David Fetter

Date:

06 May 2019, 22:07:04

On Mon, May 06, 2019 at 11:19:27AM -0700, Paul Jungwirth wrote:
> On 5/3/19 6:41 PM, David Fetter wrote:
> > This suggests two different ways to extend ranges over aggregation:
> > one which is a union of (in general) disjoint intervals, two others
> > are a union of intervals, each of which has a weight.
> > . . .
> > I think the cases above, or at least the first two of them, should be
> > available. They could be called range_agg, weighted_range_agg, and
> > covering_range_agg.
> 
> Thanks David! I think these two new functions make sense. Before I implement
> them too I wonder if anyone else has uses for them?

I suspect that if you build it, the will come, "they" being anyone who
has to schedule coverage, check usage of a resource over time, etc. Is
this something you want help with at some level? Coding, testing,
promoting...

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: range_agg

From

Paul Jungwirth

Date:

06 May 2019, 23:21:45

> I suspect that if you build it, the will come, "they" being anyone who
> has to schedule coverage, check usage of a resource over time, etc. Is
> this something you want help with at some level? Coding, testing,
> promoting...

You might be right. :-) Most of this is done already, since it was 
largely copy/paste from my extension plus figuring out how to register 
built-in functions with the .dat files. I need to write some docs and do 
some cleanup and I'll have a CF entry. And I'll probably go ahead and 
add your two suggestions too.... Things I'd love help with:

- Getting more opinions about the functions' interface, either from you 
or others, especially:
   - In the extension I have a boolean param to let you accept gaps or 
raise an error, and another for overlaps. But what about 
accepting/raising/returning null? How should the parameters expose that? 
Maybe leave them as bools but accept true/false/null for 
permit/raise/nullify respectively? That seems like a questionable UI, 
but I'm not sure what would be better. Maybe someone with better taste 
can weigh in. :-) I tried to find existing built-in functions that gave 
a enumeration of options like that but couldn't find an existing example.
   - Also: what do you think of the question I asked in my reply to 
Corey? Is it better to have *all* range_agg functions return an array of 
ranges, or it is nicer to have a variant that always returns a single range?
- Getting it reviewed.
- Advice about sequencing it with respect to my temporal foreign keys 
patch, where I'm planning to call range_agg to check an FK. E.g. should 
my FK patch be a diff on top of the range_agg code? I assume they should 
have separate CF entries though?

Oh and here's something specific:

- I gave oids to my new functions starting with 8000, because I thought 
I saw some discussion about that recently, and the final committer will 
correct the oids to the current n+1? But I can't find that discussion 
anymore, so if that's the wrong approach let me know.

Thanks!

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: range_agg

From

Paul A Jungwirth

Date:

09 May 2019, 04:54:11

On Mon, May 6, 2019 at 4:21 PM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
> I need to write some docs and do
> some cleanup and I'll have a CF entry.

Here is an initial patch. I'd love to have some feedback! :-)

One challenge was handling polymorphism, since I want to have this:

    anyrange[] range_agg(anyrange, bool, bool)

But there is no anyrange[]. I asked about this back when I wrote my
extension too:

https://www.postgresql.org/message-id/CA%2BrenyVOjb4xQZGjdCnA54-1nzVSY%2B47-h4nkM-EP5J%3D1z%3Db9w%40mail.gmail.com

Like then, I handled it by overloading the function with separate
signatures for each built-in range type:

    int4range[] range_agg(int4range, bool, bool);
    int8range[] range_agg(int8range, bool, bool);
    numrange[] range_agg(numrange, bool, bool);
    tsrange[] range_agg(tsrange, bool, bool);
    tstzrange[] range_agg(tstzrange, bool, bool);
    daterange[] range_agg(daterange, bool, bool);

(And users can still define a range_agg for their own custom range
types using similar instructions to those in my extension's README.)

The problem was the opr_sanity regression test, which rejects
functions that share the same C-function implementation (roughly). I
took a few stabs at changing my code to pass that test, e.g. defining
separate wrapper functions for everything like this:

    Datum
    int4range_agg_transfn(PG_FUNCTION_ARGS) {
          return range_agg_transfn(fcinfo);
    }

But that felt like it was getting ridiculous (8 range types *
transfn+finalfn * 1-bool and 2-bool variants), so in the end I just
added my functions to the "permitted" output in opr_sanity. Let me
know if I should avoid that though.

Also I would still appreciate some bikeshedding over the functions' UI
since I don't feel great about it.

If the overall approach seems okay, I'm still open to adding David's
suggestions for weighted_range_agg and covering_range_agg.

Thanks!
Paul

Attachment

range_agg_v1.patch

Re: range_agg

From

Paul A Jungwirth

Date:

16 June 2019, 22:59:29

On Wed, May 8, 2019 at 9:54 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> Here is an initial patch. I'd love to have some feedback! :-)

Here is a v2 rebased off current master. No substantive changes, but
it does fix one trivial git conflict.

After talking with David in Ottawa and hearing a good use-case from
one other person for his proposed weighted_range_agg and
covering_range_agg, I think *will* add those to this patch, but if
anyone wants to offer feedback on my approach so far, I'd appreciate
that too.

Yours,
Paul

Attachment

range_agg_v0002.patch

Re: range_agg

From

Jeff Davis

Date:

01 July 2019, 22:38:30

On Fri, 2019-05-03 at 15:56 -0700, Paul Jungwirth wrote:
> Hello,
> 
> I wrote an extension to add a range_agg function with similar
> behavior 
> to existing *_agg functions, and I'm wondering if folks would like
> to 
> have it in core? Here is the repo: 
> https://github.com/pjungwir/range_agg

This seems like a very useful extension, thank you.

For getting into core though, it should be a more complete set of
related operations. The patch is implicitly introducing the concept of
a "multirange" (in this case, an array of ranges), but it's not making
the concept whole.

What else should return a multirange? This patch implements the union-
agg of ranges, but we also might want an intersection-agg of ranges
(that is, the set of ranges that are subranges of every input). Given
that there are other options here, the name "range_agg" is too generic
to mean union-agg specifically.

What can we do with a multirange? A lot of range operators still make
sense, like "contains" or "overlaps"; but "adjacent" doesn't quite
work. What about new operations like inverting a multirange to get the
gaps?

Do we want to continue with the array-of-ranges implementation of a
multirange, or do we want a first-class multirange concept that might
eliminate the boilerplate around defining all of these operations?

If we have a more complete set of operators here, the flags for
handling overlapping ranges and gaps will be unnecessary.

Regards,
    Jeff Davis

Re: range_agg

From

Alvaro Herrera

Date:

04 July 2019, 18:33:58

I noticed that this patch has a // comment about it segfaulting.  Did
you ever figure that out?  Is the resulting code the one you intend as
final?

Did you make any inroads regarding Jeff Davis' suggestion about
implementing "multiranges"?  I wonder if that's going to end up being a
Pandora's box.

Stylistically, the code does not match pgindent's choices very closely.
I can return a diff to a pgindented version of your v0002 for your
perusal, if it would be useful for you to learn its style.  (A committer
would eventually run pgindent himself[1], but it's good to have
submissions be at least close to the final form.)  BTW, I suggest "git
format-patch -vN" to prepare files for submission.


[1] No female committers yet ... is this 2019?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Pavel Stehule

Date:

05 July 2019, 05:58:21

út 2. 7. 2019 v 0:38 odesílatel Jeff Davis <pgsql@j-davis.com> napsal:

On Fri, 2019-05-03 at 15:56 -0700, Paul Jungwirth wrote:
> Hello,
>
> I wrote an extension to add a range_agg function with similar
> behavior
> to existing *_agg functions, and I'm wondering if folks would like
> to
> have it in core? Here is the repo:
> https://github.com/pjungwir/range_agg

This seems like a very useful extension, thank you.

For getting into core though, it should be a more complete set of
related operations. The patch is implicitly introducing the concept of
a "multirange" (in this case, an array of ranges), but it's not making
the concept whole.

What else should return a multirange? This patch implements the union-
agg of ranges, but we also might want an intersection-agg of ranges
(that is, the set of ranges that are subranges of every input). Given
that there are other options here, the name "range_agg" is too generic
to mean union-agg specifically.

What can we do with a multirange? A lot of range operators still make
sense, like "contains" or "overlaps"; but "adjacent" doesn't quite
work. What about new operations like inverting a multirange to get the
gaps?

Do we want to continue with the array-of-ranges implementation of a
multirange, or do we want a first-class multirange concept that might
eliminate the boilerplate around defining all of these operations?

If we have a more complete set of operators here, the flags for
handling overlapping ranges and gaps will be unnecessary.

I think so scope of this patch is correct. Introduction of set of ranges data type based on a array or not, should be different topic.

The question is naming - should be this agg function named "range_agg", and multi range agg "multirange_agg"? Personally, I have not a problem with range_agg, and I have not a problem so it is based on union operation. It is true so only result of union can be implemented as range simply. When I though about multi range result, then there are really large set of possible algorithms how to do some operations over two, three values. So personally, I am satisfied with scope of simple range_agg functions, because I see a benefits, and I don't think so this implementation block any more complex designs in the future. There is really big questions how to implement multi range, and now I think so special data type will be better than possible unordered arrays.

Regards

Pavel

Regards,
Jeff Davis

Re: range_agg

From

Pavel Stehule

Date:

05 July 2019, 11:30:52

čt 4. 7. 2019 v 20:34 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> napsal:

I noticed that this patch has a // comment about it segfaulting. Did
you ever figure that out? Is the resulting code the one you intend as
final?

Did you make any inroads regarding Jeff Davis' suggestion about
implementing "multiranges"? I wonder if that's going to end up being a
Pandora's box.

Introduction of new datatype can be better, because we can ensure so data are correctly ordered

Stylistically, the code does not match pgindent's choices very closely.
I can return a diff to a pgindented version of your v0002 for your
perusal, if it would be useful for you to learn its style. (A committer
would eventually run pgindent himself[1], but it's good to have
submissions be at least close to the final form.) BTW, I suggest "git
format-patch -vN" to prepare files for submission.

The first issue is unstable regress tests - there is a problem with opr_sanity

SELECT p1.oid, p1.proname, p2.oid, p2.proname
FROM pg_proc AS p1, pg_proc AS p2
WHERE p1.oid < p2.oid AND
p1.prosrc = p2.prosrc AND
p1.prolang = 12 AND p2.prolang = 12 AND
(p1.prokind != 'a' OR p2.prokind != 'a') AND
(p1.prolang != p2.prolang OR
p1.prokind != p2.prokind OR
p1.prosecdef != p2.prosecdef OR
p1.proleakproof != p2.proleakproof OR
p1.proisstrict != p2.proisstrict OR
p1.proretset != p2.proretset OR
p1.provolatile != p2.provolatile OR
p1.pronargs != p2.pronargs)
ORDER BY p1.oid, p2.oid; -- requires explicit ORDER BY now

+ rangeTypeId = get_fn_expr_argtype(fcinfo->flinfo, 1);
+ if (!type_is_range(rangeTypeId))
+ {
+ ereport(ERROR, (errmsg("range_agg must be called with a range")));
+ }

???

+ r1Str = "lastRange";
+ r2Str = "currentRange";
+ // TODO: Why is this segfaulting?:
+ // r1Str = DatumGetCString(DirectFunctionCall1(range_out, RangeTypePGetDatum(lastRange)));
+ // r2Str = DatumGetCString(DirectFunctionCall1(range_out, RangeTypePGetDatum(currentRange)));
+ ereport(ERROR, (errmsg("range_agg: gap detected between %s and %s", r1Str, r2Str)));
+ }

???

The patch doesn't respect Postgres formatting

+# Making 2- and 3-param range_agg polymorphic is difficult
+# because it would take an anyrange and return an anyrange[],
+# which doesn't exist.
+# As a workaround we define separate functions for each built-in range type.
+# This is what causes the mess in src/test/regress/expected/opr_sanity.out.
+{ oid => '8003', descr => 'aggregate transition function',

This is strange. Does it means so range_agg will not work with custom range types?

I am not sure about implementation. I think so accumulating all ranges, sorting and processing on the end can be memory and CPU expensive.

Regards

Pavel

[1] No female committers yet ... is this 2019?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Jeff Davis

Date:

05 July 2019, 16:48:29

On Fri, 2019-07-05 at 07:58 +0200, Pavel Stehule wrote:
> The question is naming - should be this agg function named
> "range_agg", and multi range agg "multirange_agg"? Personally, I have
> not a problem with range_agg, and I have not a problem so it is based
> on union operation. It is true so only result of union can be
> implemented as range simply. When I though about multi range result,
> then there are really large set of possible algorithms how to do some
> operations over two, three values.

Hi Pavel,

Can you explain in more detail? Would an intersection-based aggregate
be useful? If so, and we implement it in the future, what would we call
it?

Regards,
    Jeff Davis

Re: range_agg

From

Paul A Jungwirth

Date:

05 July 2019, 16:58:02

On Mon, Jul 1, 2019 at 3:38 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> For getting into core though, it should be a more complete set of
> related operations. The patch is implicitly introducing the concept of
> a "multirange" (in this case, an array of ranges), but it's not making
> the concept whole.
>
> What else should return a multirange? This patch implements the union-
> agg of ranges, but we also might want an intersection-agg of ranges
> (that is, the set of ranges that are subranges of every input). Given
> that there are other options here, the name "range_agg" is too generic
> to mean union-agg specifically.

Thanks for the review!

I like the idea of adding a multirange type that works like range
types, although I'm not sure I want to build it. :-)

My own motivations for the range_agg patch are for temporal databases,
where I have two use-cases: checking temporal foreign keys [1] and
doing a Snodgrass "coalesce" operation [2]. The FKs use-case is more
important. For coalesce I just immediately UNNEST the array, so a
multirange would sort of "get in the way". It's no big deal if I can
cast a multirange to an array, although for something that would run
on every INSERT/UPDATE/DELETE I'd like to understand what the cast
would cost us in performance. But coalesce isn't part of SQL:2011 and
should be optional behavior or just something in an extension. The FKs
use case matters to me a lot more, and I think a multirange would be
fine for that. Also a multirange would solve the generics problem
Pavel asked about. (I'll say more about that in a reply to him.)

I'm not convinced that an intersection aggregate function for
multiranges would be used by anyone, but I don't mind building one.
Every other *_agg function has an "additive" sense, not a
"subtractive" sense. json{,b}_object_agg are the closest since you
*could* imagine intersection semantics for those, but they are unions.
So in terms of *naming* I think using "range_agg" for union semantics
is natural and would fit expectations. (I'm not the first to name this
function range_agg btw: [3]).

But there is clearly more than one worthwhile way to aggregate ranges:

- David suggested weighted_range_agg and covering_range_agg. At PGCon
2019 someone else said he has had to build something that was
essentially weighted_range_agg. I can see it used for
scheduling/capacity/max-utilization problems.
- You suggested an intersection range_agg.
- At [4] there is a missing_ranges function that only gives the *gaps*
between the input ranges.

Nonetheless I still think I would call the union behavior simply
range_agg, and then use weighted_range_agg, covering_range_agg,
intersection_range_agg, and missing_range_agg for the rest (assuming
we built them all). I'm not going to insist on any of that, but it's
what feels most user-friendly to me.

> What can we do with a multirange? A lot of range operators still make
> sense, like "contains" or "overlaps"; but "adjacent" doesn't quite
> work. What about new operations like inverting a multirange to get the
> gaps?

I can think of a lot of cool operators for `multirange \bigoplus
multirange` or `multirange \bigoplus range` (commuting of course). And
I've certainly wanted `range + range` and `range - range` in the past,
which would both return a multirange.

> If we have a more complete set of operators here, the flags for
> handling overlapping ranges and gaps will be unnecessary.

Both of my use cases should permit overlaps & gaps (range_agg(r, true,
true)), so I'm actually pretty okay with dropping the flags entirely
and just giving a one-param function that behavior. Or defining a
strict_range_agg that offers more control. But also I don't understand
how richer *operators* make the flags for the *aggregate* unnecessary.

So I really like the idea of multiranges, but I'm reluctant to take it
on myself, especially since this patch is just a utility function for
my other temporal patches. But I don't want my rush to leave a blemish
in our standard library either. But I think what really persuades me
to add multiranges is making a range_agg that more easily supports
user-defined range types. So how about I start on it and see how it
goes? I expect I can follow the existing code for range types pretty
closely, so maybe it won't be too hard.

Another option would be to rename my function range_array_agg (or
something) so that we are leaving space for a multirange function in
the future. I don't love this idea myself but it would could a Plan B.
What do you think of that?

Regards,
Paul

[1] With range_agg I can make the temporal FK check use similar SQL to
the non-temporal FK check:

    SELECT 1
    FROM [ONLY] <pktable> x
    WHERE pkatt1 = $1 [AND ...]
    FOR KEY SHARE OF x

vs

    SELECT 1
    FROM (
        SELECT range_agg(r, true, true) AS r
        FROM  (
            SELECT pkperiodatt AS r
            FROM [ONLY] pktable x
            WHERE pkatt1 = $1 [AND ...]
            FOR KEY SHARE OF x
        ) x1
    ) x2
    WHERE $n <@ x2.r[1]

(The temporal version could be simplified a bit if FOR KEY SHARE ever
supports aggregate functions.)

[2] page number 159ff in https://www2.cs.arizona.edu/~rts/tdbbook.pdf
(section 6.5.1, starts on page 183 of the PDF)

[3]
https://git.proteus-tech.com/open-source/django-postgres/blob/fa91cf9b43ce942e84b1a9be22f445f3515ca360/postgres/sql/range_agg.sql

[4] https://schinckel.net/2014/11/18/aggregating-ranges-in-postgres/

Re: range_agg

From

Paul A Jungwirth

Date:

05 July 2019, 17:00:15

On Thu, Jul 4, 2019 at 11:34 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> I noticed that this patch has a // comment about it segfaulting.  Did
> you ever figure that out?  Is the resulting code the one you intend as
> final?

Thanks for the review! I haven't revisited it but I'll see if I can
track it down. I consider this a WIP patch, not something final. (I
don't think Postgres likes C++-style comments, so anything that is //
marks something I consider needs more work.)

> Stylistically, the code does not match pgindent's choices very closely.
> I can return a diff to a pgindented version of your v0002 for your
> perusal, if it would be useful for you to learn its style.

Sorry about that, and thank you for making it easier for me to learn
how to do it the right way. :-)

Paul

Re: range_agg

From

Paul A Jungwirth

Date:

05 July 2019, 17:22:14

On Fri, Jul 5, 2019 at 4:31 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> The first issue is unstable regress tests - there is a problem with opr_sanity

I would prefer to avoid needing to add anything to opr_sanity really.
A multirange would let me achieve that I think. But otherwise I'll add
the ordering. Thanks!

> +   rangeTypeId = get_fn_expr_argtype(fcinfo->flinfo, 1);
> +   if (!type_is_range(rangeTypeId))
> +   {
> +       ereport(ERROR, (errmsg("range_agg must be called with a range")));
> +   }

I think this was left over from my attempts to support user-defined
ranges. I believe I can remove it now.

> +# Making 2- and 3-param range_agg polymorphic is difficult
> +# because it would take an anyrange and return an anyrange[],
> +# which doesn't exist.
> +# As a workaround we define separate functions for each built-in range type.
> +# This is what causes the mess in src/test/regress/expected/opr_sanity.out.
> +{ oid => '8003', descr => 'aggregate transition function',
>
> This is strange. Does it means so range_agg will not work with custom range types?

You would have to define your own range_agg with your own custom type
in the signature. I gave instructions about that in my range_agg
extension (https://github.com/pjungwir/range_agg), but it's not as
easy with range_agg in core because I don't think you can define a
custom function that is backed by a built-in C function. At least I
couldn't get that to work.

The biggest argument for a multirange to me is that we could have
anymultirange, with similar rules as anyarray. That way I could have
`range_agg(r anyrange) RETURNS anymultirange`. [1] is a conversation
where I asked about doing this before multiranges were suggested. Also
I'm aware of your own recent work on polymorphic types at [2] but I
haven't read it closely enough to see if it would help me here. Do you
think it applies?

> I am not sure about implementation. I think so accumulating all ranges, sorting and processing on the end can be
memoryand CPU expensive.

I did some research and couldn't find any algorithm that was better
than O(n log n), but if you're aware of any I'd like to know about it.
Assuming we can't improve on the complexity bounds, I think a sort +
iteration is desirable because it keeps things simple and benefits
from past & future work on the sorting code. I care more about
optimizing time here than RAM, especially since we'll use this in FK
checks, where the inputs will rarely be more than a few and probably
never in the millions.

We especially want to avoid an O(n^2) algorithm, which would be easy
to stumble into if we naively accumulated the result as we go. For
example if we had multiranges you could imagine an implementation that
just did `result + r` for all inputs r. But that would have the same
n^2 pattern as looping over strcat.

We could use a tree to keep things sorted and accumulate as we go, but
the implementation would be more complicated and I think slower.

Thanks for the review!

Paul

[1] https://www.postgresql.org/message-id/CA%2BrenyVOjb4xQZGjdCnA54-1nzVSY%2B47-h4nkM-EP5J%3D1z%3Db9w%40mail.gmail.com

[2] https://www.postgresql.org/message-id/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com

Re: range_agg

From

Paul A Jungwirth

Date:

05 July 2019, 17:38:26

On Mon, Jul 1, 2019 at 3:38 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> The patch is implicitly introducing the concept of
> a "multirange" (in this case, an array of ranges),

I meant to say before: this patch always returns a sorted array, and I
think a multirange should always act as if sorted when we stringify it
or cast it to an array. If you disagree let me know. :-)

You could imagine that when returning arrays we rely on the caller to
do the sorting (range_agg(r ORDER BY r)) and otherwise give wrong
results. But hopefully everyone agrees that would not be nice. :-) So
even the array-returning version should always return a sorted array I
think. (I'm not sure anything else is really coherent or at least easy
to describe.)

Paul

Re: range_agg

From

David Fetter

Date:

05 July 2019, 17:45:32

On Fri, Jul 05, 2019 at 09:58:02AM -0700, Paul A Jungwirth wrote:
> On Mon, Jul 1, 2019 at 3:38 PM Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > For getting into core though, it should be a more complete set of
> > related operations. The patch is implicitly introducing the concept of
> > a "multirange" (in this case, an array of ranges), but it's not making
> > the concept whole.
> >
> > What else should return a multirange? This patch implements the union-
> > agg of ranges, but we also might want an intersection-agg of ranges
> > (that is, the set of ranges that are subranges of every input). Given
> > that there are other options here, the name "range_agg" is too generic
> > to mean union-agg specifically.
> 
> Thanks for the review!
> 
> I like the idea of adding a multirange type that works like range
> types, although I'm not sure I want to build it. :-)
> 
> My own motivations for the range_agg patch are for temporal databases,
> where I have two use-cases: checking temporal foreign keys [1] and
> doing a Snodgrass "coalesce" operation [2]. The FKs use-case is more
> important. For coalesce I just immediately UNNEST the array, so a
> multirange would sort of "get in the way". It's no big deal if I can
> cast a multirange to an array, although for something that would run
> on every INSERT/UPDATE/DELETE I'd like to understand what the cast
> would cost us in performance. But coalesce isn't part of SQL:2011 and
> should be optional behavior or just something in an extension. The FKs
> use case matters to me a lot more, and I think a multirange would be
> fine for that. Also a multirange would solve the generics problem
> Pavel asked about. (I'll say more about that in a reply to him.)
> 
> I'm not convinced that an intersection aggregate function for
> multiranges would be used by anyone, but I don't mind building one.
> Every other *_agg function has an "additive" sense, not a
> "subtractive" sense. json{,b}_object_agg are the closest since you
> *could* imagine intersection semantics for those, but they are unions.
> So in terms of *naming* I think using "range_agg" for union semantics
> is natural and would fit expectations. (I'm not the first to name this
> function range_agg btw: [3]).
> 
> But there is clearly more than one worthwhile way to aggregate ranges:
> 
> - David suggested weighted_range_agg and covering_range_agg. At PGCon

If I understand the cases correctly, the combination of covering_range
and multi_range types covers all cases. To recap, covering_range_agg
assigns a weight, possibly 0, to each non-overlapping sub-range.  A
cast from covering_range to multi_range would meld adjacent ranges
with non-zero weights into single ranges in O(N) (N sub-ranges) time
and some teensy amount of memory for tracking progress of adjacent
ranges of non-zero weight.  Gaps would just be multi_range consisting
of the sub-ranges of the covering_range with weight 0, and wouldn't
require any tracking.

What have I missed?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: range_agg

From

Paul A Jungwirth

Date:

05 July 2019, 17:57:53

On Fri, Jul 5, 2019 at 10:45 AM David Fetter <david@fetter.org> wrote:
> If I understand the cases correctly, the combination of covering_range
> and multi_range types covers all cases. To recap, covering_range_agg
> assigns a weight, possibly 0, to each non-overlapping sub-range.  A
> cast from covering_range to multi_range would meld adjacent ranges
> with non-zero weights into single ranges in O(N) (N sub-ranges) time
> and some teensy amount of memory for tracking progress of adjacent
> ranges of non-zero weight.  Gaps would just be multi_range consisting
> of the sub-ranges of the covering_range with weight 0, and wouldn't
> require any tracking.

I take it that a multirange contains of *disjoint* ranges, so instead
of {[1,2), [2,3), [6,7)} you'd have {[1,3), [6,7)}. Jeff does that
match your expectation?

I just realized that since weighted_range_agg and covering_range_agg
return tuples of (anyrange, integer) (maybe other numeric types too?),
their elements are *not ranges*, so they couldn't return a multirange.
They would have to return an array of those tuples.

I agree that if you had a pre-sorted list of weighted ranges (with or
without zero-weight elements), you could convert it to a multirange in
O(n) very easily.

Paul

Re: range_agg

From

Paul A Jungwirth

Date:

05 July 2019, 17:59:26

On Fri, Jul 5, 2019 at 10:57 AM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> I take it that a multirange contains of *disjoint* ranges,

*consists* of. :-)

Re: range_agg

From

Pavel Stehule

Date:

06 July 2019, 06:09:40

pá 5. 7. 2019 v 18:48 odesílatel Jeff Davis <pgsql@j-davis.com> napsal:

On Fri, 2019-07-05 at 07:58 +0200, Pavel Stehule wrote:
> The question is naming - should be this agg function named
> "range_agg", and multi range agg "multirange_agg"? Personally, I have
> not a problem with range_agg, and I have not a problem so it is based
> on union operation. It is true so only result of union can be
> implemented as range simply. When I though about multi range result,
> then there are really large set of possible algorithms how to do some
> operations over two, three values.

Hi Pavel,

Can you explain in more detail? Would an intersection-based aggregate
be useful? If so, and we implement it in the future, what would we call
it?

Intersection can be interesting - you can use it for planning - "is there a intersection of free time ranges?"

About naming - what range_union_agg(), range_intersect_agg() ?

Regards

Pavel

Regards,
Jeff Davis

Re: range_agg

From

Pavel Stehule

Date:

06 July 2019, 06:33:55

pá 5. 7. 2019 v 19:22 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Fri, Jul 5, 2019 at 4:31 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> The first issue is unstable regress tests - there is a problem with opr_sanity

I would prefer to avoid needing to add anything to opr_sanity really.
A multirange would let me achieve that I think. But otherwise I'll add
the ordering. Thanks!

> + rangeTypeId = get_fn_expr_argtype(fcinfo->flinfo, 1);
> + if (!type_is_range(rangeTypeId))
> + {
> + ereport(ERROR, (errmsg("range_agg must be called with a range")));
> + }

I think this was left over from my attempts to support user-defined
ranges. I believe I can remove it now.

> +# Making 2- and 3-param range_agg polymorphic is difficult
> +# because it would take an anyrange and return an anyrange[],
> +# which doesn't exist.
> +# As a workaround we define separate functions for each built-in range type.
> +# This is what causes the mess in src/test/regress/expected/opr_sanity.out.
> +{ oid => '8003', descr => 'aggregate transition function',
>
> This is strange. Does it means so range_agg will not work with custom range types?

You would have to define your own range_agg with your own custom type
in the signature. I gave instructions about that in my range_agg
extension (https://github.com/pjungwir/range_agg), but it's not as
easy with range_agg in core because I don't think you can define a
custom function that is backed by a built-in C function. At least I
couldn't get that to work.

I understand so anybody can implement own function, but this is fundamental problem in implementation.

In this case I prefer to start implement anyrangearray type first. It is not hard work.

The biggest argument for a multirange to me is that we could have
anymultirange, with similar rules as anyarray. That way I could have
`range_agg(r anyrange) RETURNS anymultirange`. [1] is a conversation
where I asked about doing this before multiranges were suggested. Also
I'm aware of your own recent work on polymorphic types at [2] but I
haven't read it closely enough to see if it would help me here. Do you
think it applies?

I don't see any difference between anymultirange or anyrangearray - it is just name

ANSI SQL has a special kind for this purpose - "set". It is maybe better, because PostgreSQL's arrays revoke a idea of ordering, but it is hard to do some generic order for ranges.

Functionality is important - if you can do some special functionality, that cannot be implemented as "array of ranges", then the new type is necessary. If all functionality can be covered by array of ranges, then there is not necessity of new type.

I don't talk about polymorphic types - probably it needs one - "anymultirange" or "anyrangearray".

If some operation can be done smarter with new type, then I am ok for new type, if not, then we should to use array of ranges.

> I am not sure about implementation. I think so accumulating all ranges, sorting and processing on the end can be memory and CPU expensive.

I did some research and couldn't find any algorithm that was better
than O(n log n), but if you're aware of any I'd like to know about it.
Assuming we can't improve on the complexity bounds, I think a sort +
iteration is desirable because it keeps things simple and benefits
from past & future work on the sorting code. I care more about
optimizing time here than RAM, especially since we'll use this in FK
checks, where the inputs will rarely be more than a few and probably
never in the millions.

We especially want to avoid an O(n^2) algorithm, which would be easy
to stumble into if we naively accumulated the result as we go. For
example if we had multiranges you could imagine an implementation that
just did `result + r` for all inputs r. But that would have the same
n^2 pattern as looping over strcat.

We could use a tree to keep things sorted and accumulate as we go, but
the implementation would be more complicated and I think slower.

I don't think - working with large arrays is slow, due often cache miss. I understand so it depends on data, but in this area, I can imagine significant memory reduction based on running processing.

array builder doesn't respect work_men, and I think so any different way is safer.

Regards

Pavel

Thanks for the review!

Paul

[1] https://www.postgresql.org/message-id/CA%2BrenyVOjb4xQZGjdCnA54-1nzVSY%2B47-h4nkM-EP5J%3D1z%3Db9w%40mail.gmail.com

[2] https://www.postgresql.org/message-id/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com

Re: range_agg

From

Jeff Davis

Date:

06 July 2019, 19:13:08

On Fri, 2019-07-05 at 09:58 -0700, Paul A Jungwirth wrote:
> user-defined range types. So how about I start on it and see how it
> goes? I expect I can follow the existing code for range types pretty
> closely, so maybe it won't be too hard.

That would be great to at least take a look. If it starts to look like
a bad idea, then we can re-evaluate and see if it's better to just
return arrays.

The "weighted" version of the aggregate might be interesting... what
would it return exactly? An array of (range, weight) pairs, or an array
of ranges and an array of weights, or a multirange and an array of
weights?

> Another option would be to rename my function range_array_agg (or
> something) so that we are leaving space for a multirange function in
> the future. I don't love this idea myself but it would could a Plan
> B.
> What do you think of that?

Not excited about that either.

Regards,
    Jeff Davis

Re: range_agg

From

Jeff Davis

Date:

06 July 2019, 19:26:06

On Fri, 2019-07-05 at 10:57 -0700, Paul A Jungwirth wrote:
> I take it that a multirange contains of *disjoint* ranges, so instead
> of {[1,2), [2,3), [6,7)} you'd have {[1,3), [6,7)}. Jeff does that
> match your expectation?

Yes.

> I just realized that since weighted_range_agg and covering_range_agg
> return tuples of (anyrange, integer) (maybe other numeric types
> too?),
> their elements are *not ranges*, so they couldn't return a
> multirange.
> They would have to return an array of those tuples.

I think you are right. I was originally thinking a multirange and an
array of weights would work, but the multirange would coalesce adjacent
ranges because it would have no way to know they have different
weights.

Regards,
    Jeff Davis

Re: range_agg

From

Paul A Jungwirth

Date:

08 July 2019, 16:46:44

On Sat, Jul 6, 2019 at 12:13 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Fri, 2019-07-05 at 09:58 -0700, Paul A Jungwirth wrote:
> > user-defined range types. So how about I start on it and see how it
> > goes? I expect I can follow the existing code for range types pretty
> > closely, so maybe it won't be too hard.
>
> That would be great to at least take a look. If it starts to look like
> a bad idea, then we can re-evaluate and see if it's better to just
> return arrays.

I made some progress over the weekend. I don't have a patch yet but I
thought I'd ask for opinions on the approach I'm taking:

- A multirange type is an extra thing you get when you define a range
(just like how you get a tstzrange[]). Therefore....
- I don't need separate commands to add/drop multirange types. You get
one when you define a range type, and if you drop a range type it gets
dropped automatically.
- I'm adding a new typtype for multiranges. ('m' in pg_type).
- I'm just adding a mltrngtypeid column to pg_range. I don't think I
need a new pg_multirange table.
- You can have a multirange[].
- Multirange in/out work just like arrays, e.g. '{"[1,3)", "[5,6)"}'
- I'll add an anymultirange pseudotype. When it's the return type of a
function that has an "anyrange" parameter, it will use the same base
element type. (So basically anymultirange : anyrange :: anyarray ::
anyelement.)
- You can cast from a multirange to an array. The individual ranges
are always sorted in the result array.
- You can cast from an array to a multirange but it will error if
there are overlaps (or not?). The array's ranges don't have to be
sorted but they will be after a "round trip".
- Interesting functions:
  - multirange_length
  - range_agg (range_union_agg if you like)
  - range_intersection_agg
- You can subscript a multirange like you do an array (? This could be
a function instead.)
- operators:
  - union (++) and intersection (*):
    - We already have + for range union but it raises an error if
there is a gap, so ++ is the same but with no errors.
    - r ++ r = mr (commutative, associative)
    - mr ++ r = mr
    - r ++ mr = mr
    - r * r = r (unchanged)
    - mr * r = r
    - r * mr = r
    - mr - r = mr
    - r - mr = mr
    - mr - mr = mr
  - comparison
    - mr = mr
    - mr @> x
    - mr @> r
    - mr @> mr
    - x <@ mr
    - r <@ mr
    - mr <@ mr
    - mr << mr (strictly left of)
    - mr >> mr (strictly right of)
    - mr &< mr (does not extend to the right of)
    - mr &> mr (does not extend to the left of)
  - inverse operator?:
    - the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.
    - not sure we want this or what the symbol should be. I don't like
-mr as an inverse because then mr - mr != mr ++ -mr.

Anything in there you think should be different?

Thanks,
Paul

Re: range_agg

From

Pavel Stehule

Date:

09 July 2019, 05:08:49

po 8. 7. 2019 v 18:47 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Sat, Jul 6, 2019 at 12:13 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Fri, 2019-07-05 at 09:58 -0700, Paul A Jungwirth wrote:
> > user-defined range types. So how about I start on it and see how it
> > goes? I expect I can follow the existing code for range types pretty
> > closely, so maybe it won't be too hard.
>
> That would be great to at least take a look. If it starts to look like
> a bad idea, then we can re-evaluate and see if it's better to just
> return arrays.

I made some progress over the weekend. I don't have a patch yet but I
thought I'd ask for opinions on the approach I'm taking:

- A multirange type is an extra thing you get when you define a range
(just like how you get a tstzrange[]). Therefore....

I am not against a multirange type, but I miss a explanation why you introduce new kind of types and don't use just array of ranges.

Introduction of new kind of types is not like introduction new type.

Regards

Pavel

- I don't need separate commands to add/drop multirange types. You get
one when you define a range type, and if you drop a range type it gets
dropped automatically.
- I'm adding a new typtype for multiranges. ('m' in pg_type).
- I'm just adding a mltrngtypeid column to pg_range. I don't think I
need a new pg_multirange table.
- You can have a multirange[].
- Multirange in/out work just like arrays, e.g. '{"[1,3)", "[5,6)"}'
- I'll add an anymultirange pseudotype. When it's the return type of a
function that has an "anyrange" parameter, it will use the same base
element type. (So basically anymultirange : anyrange :: anyarray ::
anyelement.)
- You can cast from a multirange to an array. The individual ranges
are always sorted in the result array.
- You can cast from an array to a multirange but it will error if
there are overlaps (or not?). The array's ranges don't have to be
sorted but they will be after a "round trip".
- Interesting functions:
- multirange_length
- range_agg (range_union_agg if you like)
- range_intersection_agg
- You can subscript a multirange like you do an array (? This could be
a function instead.)
- operators:
- union (++) and intersection (*):
- We already have + for range union but it raises an error if
there is a gap, so ++ is the same but with no errors.
- r ++ r = mr (commutative, associative)
- mr ++ r = mr
- r ++ mr = mr
- r * r = r (unchanged)
- mr * r = r
- r * mr = r
- mr - r = mr
- r - mr = mr
- mr - mr = mr
- comparison
- mr = mr
- mr @> x
- mr @> r
- mr @> mr
- x <@ mr
- r <@ mr
- mr <@ mr
- mr << mr (strictly left of)
- mr >> mr (strictly right of)
- mr &< mr (does not extend to the right of)
- mr &> mr (does not extend to the left of)
- inverse operator?:
- the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.
- not sure we want this or what the symbol should be. I don't like
-mr as an inverse because then mr - mr != mr ++ -mr.

Anything in there you think should be different?

Thanks,
Paul

Re: range_agg

From

David Fetter

Date:

09 July 2019, 15:51:12

On Mon, Jul 08, 2019 at 09:46:44AM -0700, Paul A Jungwirth wrote:
> On Sat, Jul 6, 2019 at 12:13 PM Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > On Fri, 2019-07-05 at 09:58 -0700, Paul A Jungwirth wrote:
> > > user-defined range types. So how about I start on it and see how it
> > > goes? I expect I can follow the existing code for range types pretty
> > > closely, so maybe it won't be too hard.
> >
> > That would be great to at least take a look. If it starts to look like
> > a bad idea, then we can re-evaluate and see if it's better to just
> > return arrays.
> 
> I made some progress over the weekend. I don't have a patch yet but I
> thought I'd ask for opinions on the approach I'm taking:
> 
> - A multirange type is an extra thing you get when you define a range
> (just like how you get a tstzrange[]). Therefore....
> - I don't need separate commands to add/drop multirange types. You get
> one when you define a range type, and if you drop a range type it gets
> dropped automatically.

Yay for fewer manual steps!

> - I'm adding a new typtype for multiranges. ('m' in pg_type).
> - I'm just adding a mltrngtypeid column to pg_range. I don't think I
> need a new pg_multirange table.

Makes sense, as they'd no longer be separate concepts.

> - You can have a multirange[].

I can see how that would fall out of this, but I'm a little puzzled as
to what people might use it for. Aggregates, maybe?

> - Multirange in/out work just like arrays, e.g. '{"[1,3)", "[5,6)"}'
> - I'll add an anymultirange pseudotype. When it's the return type of a
> function that has an "anyrange" parameter, it will use the same base
> element type. (So basically anymultirange : anyrange :: anyarray ::
> anyelement.)

Neat!

> - You can cast from a multirange to an array. The individual ranges
> are always sorted in the result array.

Is this so people can pick individual ranges out of the multirange,
or...? Speaking of casts, it's possible that a multirange is also a
range. Would it make sense to have a cast from multirange to range?

> - You can cast from an array to a multirange but it will error if
> there are overlaps (or not?).

An alternative would be to canonicalize into non-overlapping ranges.
There's some precedent for this in casts to JSONB. Maybe a function
that isn't a cast should handle such things.

> The array's ranges don't have to be sorted but they will be after a
> "round trip".

Makes sense.

> - Interesting functions:
>   - multirange_length

Is that the sum of the lengths of the ranges?  Are we guaranteeing a
measure in addition to ordering on ranges now?

>   - range_agg (range_union_agg if you like)
>   - range_intersection_agg
> - You can subscript a multirange like you do an array (? This could be
> a function instead.)

How would this play with the generic subscripting patch in flight?

> - operators:
>   - union (++) and intersection (*):
>     - We already have + for range union but it raises an error if
> there is a gap, so ++ is the same but with no errors.
>     - r ++ r = mr (commutative, associative)
>     - mr ++ r = mr
>     - r ++ mr = mr
>     - r * r = r (unchanged)
>     - mr * r = r
>     - r * mr = r

Shouldn't the two above both yield multirange ? For example, if I
understand correctly,

{"[1,3)","[5,7)"} * [2,6) should be {"[2,3)","[5,6)"}

>     - mr - r = mr
>     - r - mr = mr
>     - mr - mr = mr
>   - comparison
>     - mr = mr
>     - mr @> x

x is in the domain of the (multi)range?

>     - mr @> r
>     - mr @> mr
>     - x <@ mr
>     - r <@ mr
>     - mr <@ mr
>     - mr << mr (strictly left of)
>     - mr >> mr (strictly right of)
>     - mr &< mr (does not extend to the right of)
>     - mr &> mr (does not extend to the left of)
>   - inverse operator?:
>     - the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.

Is that the same as ["(∞, ∞)"] - {"[1,2)"}? I seem to recall that the
usual convention (at least in math) is to use intervals that are
generally represented as open on the infinity side, but that might not
fit how we do things.

>     - not sure we want this or what the symbol should be. I don't like
> -mr as an inverse because then mr - mr != mr ++ -mr.

!mr , perhaps?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: range_agg

From

Paul A Jungwirth

Date:

09 July 2019, 16:31:07

On Mon, Jul 8, 2019 at 10:09 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 8. 7. 2019 v 18:47 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:
> I am not against a multirange type, but I miss a explanation why you introduce new kind of types and don't use just
arrayof ranges. 

Hi Pavel, I'm sorry, and thanks for your feedback! I had a response to
your earlier email but it was stuck in my Drafts folder.

I do think a multirange would have enough new functionality to be
worth doing. I was pretty reluctant to take it on at first but the
idea is growing on me, and it does seem to offer a more sensible
interface. A lot of the value would come from range and multirange
operators, which we can't do with just arrays (I think). Having a
range get merged correctly when you add it would be very helpful. Also
it would be nice to have a data type that enforces a valid structure,
since not all range arrays are valid multiranges. My reply yesterday
to Jeff expands on this with all the functions/operators/etc we could
offer.

Your other email also asked:
> I don't think - working with large arrays is slow, due often cache miss.
>I understand so it depends on data, but in this area, I can imagine significant memory reduction based on running
processing.
>
> array builder doesn't respect work_men, and I think so any different way is safer.

I'm still thinking about this one. I tried to work out how I'd
implement a tree-based sorted list of ranges so that I can quickly
insert/remove ranges. It is very complicated, and I started to feel
like I was just re-implementing GiST but in memory. I did find the
interval_set class from Boost's boost_icl library which could offer
some guidance. But for now I want to press forward with a
sort-then-iterate implementation and consider a different
implementation later. If you have any guidance I would appreciate it!
I especially want something that is O(n log n) to insert n ranges.
Other suggestions here are very welcome! :-)

Regards,
Paul

Re: range_agg

From

Paul A Jungwirth

Date:

09 July 2019, 16:40:59

On Tue, Jul 9, 2019 at 8:51 AM David Fetter <david@fetter.org> wrote:
> > - A multirange type is an extra thing you get when you define a range
> > (just like how you get a tstzrange[]). Therefore....
> > - I don't need separate commands to add/drop multirange types. You get
> > one when you define a range type, and if you drop a range type it gets
> > dropped automatically.
>
> Yay for fewer manual steps!

Thanks for taking a look and sharing your thoughts!

> > - You can have a multirange[].
>
> I can see how that would fall out of this, but I'm a little puzzled as
> to what people might use it for. Aggregates, maybe?

I don't know either, but I thought it was standard to define a T[] for
every T. Anyway it doesn't seem difficult.

> > - You can cast from a multirange to an array. The individual ranges
> > are always sorted in the result array.
>
> Is this so people can pick individual ranges out of the multirange,
> or...?

Yes. I want this for foreign keys actually, where I construct a
multirange and ask for just its first range.

> Speaking of casts, it's possible that a multirange is also a
> range. Would it make sense to have a cast from multirange to range?

Hmm, that seems strange to me. You don't cast from an array to one of
its elements. If we have subscripting, why use casting to get the
first element?

> > - You can cast from an array to a multirange but it will error if
> > there are overlaps (or not?).
>
> An alternative would be to canonicalize into non-overlapping ranges.
> There's some precedent for this in casts to JSONB. Maybe a function
> that isn't a cast should handle such things.

I agree it'd be nice to have both.

> > - Interesting functions:
> >   - multirange_length
>
> Is that the sum of the lengths of the ranges?  Are we guaranteeing a
> measure in addition to ordering on ranges now?

Just the number of disjoint ranges in the multirange.

> > - You can subscript a multirange like you do an array (? This could be
> > a function instead.)
>
> How would this play with the generic subscripting patch in flight?

I'm not aware of that patch but I guess I better check it out. :-)

> > - operators:
> >     - mr * r = r
> >     - r * mr = r
>
> Shouldn't the two above both yield multirange ? For example, if I
> understand correctly,

You're right! Thanks for the correction.

> >   - comparison
> >     - mr = mr
> >     - mr @> x
>
> x is in the domain of the (multi)range?

Yes. It's the scalar base type the range type is based on. I had in
mind the math/ML convention of `x` for scalar and `X` for
vector/matrix.

> >   - inverse operator?:
> >     - the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.
>
> Is that the same as ["(∞, ∞)"] - {"[1,2)"}?

Yes.

> I seem to recall that the
> usual convention (at least in math) is to use intervals that are
> generally represented as open on the infinity side, but that might not
> fit how we do things.

I think it does, unless I'm misunderstanding?

> >     - not sure we want this or what the symbol should be. I don't like
> > -mr as an inverse because then mr - mr != mr ++ -mr.
>
> !mr , perhaps?

I like that suggestion. Honestly I'm not sure we even want an inverse,
but it's so important theoretically we should at least consider
whether it is appropriate here. Or maybe "inverse" is the wrong word
for this, or there is a different meaning it should have.

Thanks,
Paul

Re: range_agg

From

Jeff Davis

Date:

09 July 2019, 18:24:56

On Tue, 2019-07-09 at 07:08 +0200, Pavel Stehule wrote:
> 
> I am not against a multirange type, but I miss a explanation why you
> introduce new kind of types and don't use just array of ranges.
> 
> Introduction of new kind of types is not like introduction new type.

The biggest benefit, in my opinion, is that it means you can define
functions/operators that take an "anyrange" and return an
"anymultirange". That way you don't have to define different functions
for int4 ranges, date ranges, etc.

It starts to get even more complex when you want to add opclasses, etc.

Ranges and arrays are effectively generic types that need a type
parameter to become a concrete type. Ideally, we'd have first-class
support for generic types, but I think that's a different topic ;-)

Regards,
    Jeff Davis

Re: range_agg

From

Alvaro Herrera

Date:

09 July 2019, 19:01:18

On 2019-Jul-08, Paul A Jungwirth wrote:

> - You can subscript a multirange like you do an array (? This could be
> a function instead.)

Note that we already have a patch in the pipe to make subscripting an
extensible operation, which would fit pretty well here, I think.

Also, I suppose you would need unnest(multirange) to yield the set of
ranges.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Jeff Davis

Date:

09 July 2019, 19:02:38

On Mon, 2019-07-08 at 09:46 -0700, Paul A Jungwirth wrote:
> - A multirange type is an extra thing you get when you define a range
> (just like how you get a tstzrange[]). Therefore....

Agreed.

> - I'm adding a new typtype for multiranges. ('m' in pg_type).

Sounds reasonable.

> - I'm just adding a mltrngtypeid column to pg_range. I don't think I
> need a new pg_multirange table.
> - You can have a multirange[].
> - Multirange in/out work just like arrays, e.g. '{"[1,3)", "[5,6)"}'

It would be cool to have a better text representation. We could go
simple like:

   '[1,3) [5,6)'

Or maybe someone has another idea how to represent a multirange to be
more visually descriptive?

> - I'll add an anymultirange pseudotype. When it's the return type of
> a
> function that has an "anyrange" parameter, it will use the same base
> element type. (So basically anymultirange : anyrange :: anyarray ::
> anyelement.)

I like it.

>   - range_agg (range_union_agg if you like)
>   - range_intersection_agg

I'm fine with those names.

> - You can subscript a multirange like you do an array (? This could
> be
> a function instead.)

I wouldn't try to hard to make them subscriptable. I'm not opposed to
it, but it's easy enough to cast to an array and then subscript.

> - operators:
>   - union (++) and intersection (*):
>     - We already have + for range union but it raises an error if
> there is a gap, so ++ is the same but with no errors.
>     - r ++ r = mr (commutative, associative)
>     - mr ++ r = mr
>     - r ++ mr = mr

I like it.

>   - inverse operator?:
>     - the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.
>     - not sure we want this or what the symbol should be. I don't
> like
> -mr as an inverse because then mr - mr != mr ++ -mr.

I think "complement" might be a better name than "inverse".

m1 - m2 = m1 * complement(m2)

What about "~"?



There will be some changes to parse_coerce.c, just like in range types.
I took a brief look here and it looks pretty reasonable; hopefully
there aren't any hidden surprises.

Regards,
    Jeff Davis

Re: range_agg

From

Paul Jungwirth

Date:

09 July 2019, 19:05:16

On 7/9/19 12:01 PM, Alvaro Herrera wrote:
> On 2019-Jul-08, Paul A Jungwirth wrote:
> 
>> - You can subscript a multirange like you do an array (? This could be
>> a function instead.)
> 
> Note that we already have a patch in the pipe to make subscripting an
> extensible operation, which would fit pretty well here, I think.

I'll take a look at that!

> Also, I suppose you would need unnest(multirange) to yield the set of
> ranges.

I think that would be really nice, although it isn't critical I think if 
you can do something like UNNEST(multirange::tstzrange[]).

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: range_agg

From

Pavel Stehule

Date:

09 July 2019, 19:10:51

út 9. 7. 2019 v 20:25 odesílatel Jeff Davis <pgsql@j-davis.com> napsal:

On Tue, 2019-07-09 at 07:08 +0200, Pavel Stehule wrote:
>
> I am not against a multirange type, but I miss a explanation why you
> introduce new kind of types and don't use just array of ranges.
>
> Introduction of new kind of types is not like introduction new type.

The biggest benefit, in my opinion, is that it means you can define
functions/operators that take an "anyrange" and return an
"anymultirange". That way you don't have to define different functions
for int4 ranges, date ranges, etc.

I am not sure how strong is this argument.

I think so introduction of anyrangearray polymorphic type and enhancing some type deduction can do same work.

It starts to get even more complex when you want to add opclasses, etc.

Ranges and arrays are effectively generic types that need a type
parameter to become a concrete type. Ideally, we'd have first-class
support for generic types, but I think that's a different topic ;-)

I afraid so with generic multiragetype there lot of array infrastructure will be duplicated

Regards

Pavel

Regards,
Jeff Davis

Re: range_agg

From

Pavel Stehule

Date:

09 July 2019, 19:23:48

út 9. 7. 2019 v 21:10 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

út 9. 7. 2019 v 20:25 odesílatel Jeff Davis <pgsql@j-davis.com> napsal:
On Tue, 2019-07-09 at 07:08 +0200, Pavel Stehule wrote:
>
> I am not against a multirange type, but I miss a explanation why you
> introduce new kind of types and don't use just array of ranges.
>
> Introduction of new kind of types is not like introduction new type.

The biggest benefit, in my opinion, is that it means you can define
functions/operators that take an "anyrange" and return an
"anymultirange". That way you don't have to define different functions
for int4 ranges, date ranges, etc.

I am not sure how strong is this argument.

I think so introduction of anyrangearray polymorphic type and enhancing some type deduction can do same work.

It starts to get even more complex when you want to add opclasses, etc.

Ranges and arrays are effectively generic types that need a type
parameter to become a concrete type. Ideally, we'd have first-class
support for generic types, but I think that's a different topic ;-)

I afraid so with generic multiragetype there lot of array infrastructure will be duplicated

on second hand - it is true so classic array concat is not optimal for set of ranges, so some functionality should be redefined every time.

I don't know what is possible, but for me - multiranges is special kind (subset) of arrays and can be implement as subset of arrays. I remember other possible kind of arrays - "sets" without duplicates. It is similar case, I think.

Maybe introduction of multirages as new generic type is bad direction, and can be better and more enhanceable in future to introduce some like special kinds of arrays. So for example - unnest can be used directly for arrays and multiranges too - because there will be common base.

Regards

Pavel

Regards

Pavel

Regards,
Jeff Davis

Re: range_agg

From

Paul A Jungwirth

Date:

10 July 2019, 04:18:44

On Tue, Jul 9, 2019 at 12:02 PM Jeff Davis <pgsql@j-davis.com> wrote:
> > - Multirange in/out work just like arrays, e.g. '{"[1,3)", "[5,6)"}'
>
> It would be cool to have a better text representation. We could go
> simple like:
>
>    '[1,3) [5,6)'

Will that work with all ranges, even user-defined ones? With a
tstzrange[] there is a lot of quoting:

=# select array[tstzrange(now(), now() + interval '1 hour')];
                                   array
---------------------------------------------------------------------------
 {"[\"2019-07-09 12:40:20.794054-07\",\"2019-07-09 13:40:20.794054-07\")"}

I'm more inclined to follow the array syntax both because it will be
familiar & consistent to users (no need to remember any differences)
and because it's already built so we can use it and know it won't have
gotchas.

> I think "complement" might be a better name than "inverse".
>
> m1 - m2 = m1 * complement(m2)
>
> What about "~"?

That seems like the right term and a good symbol.

Paul

Re: range_agg

From

Paul A Jungwirth

Date:

10 July 2019, 04:26:33

On Tue, Jul 9, 2019 at 12:24 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> út 9. 7. 2019 v 21:10 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
>> I afraid so with generic multiragetype there lot of array infrastructure will be duplicated
>
> on second hand - it is true so classic array concat is not optimal for set of ranges, so some functionality should be
redefinedevery time. 
>
> I don't know what is possible, but for me - multiranges is special kind (subset) of arrays and can be implement as
subsetof arrays. I remember other possible kind of arrays - "sets" without duplicates. It is similar case, I think. 
>
> Maybe introduction of multirages as new generic type is bad direction, and can be better and more enhanceable in
futureto introduce some like special kinds of arrays. So for example - unnest can be used directly for arrays and
multirangestoo - because there will be common base. 

Well I'm afraid of that too a bit, although I also agree it may be an
opportunity to share some common behavior and implementation. For
example in the discussion about string syntax, I think keeping it the
same as arrays is nicer for people and lets us share more between the
two types.

That said I don't think a multirange is a subtype of arrays (speaking
as a traditional object-oriented subtype), just something that shares
a lot of the same behavior. I'm inclined to maximize the overlap where
feasible though, e.g. string syntax, UNNEST, indexing, function naming
(`range_length`), etc. Something like Rust traits (or Java interfaces)
seems a closer mental model, not that we have to formalize that
somehow, particularly up front.

Yours,
Paul

Re: range_agg

From

Pavel Stehule

Date:

10 July 2019, 04:50:23

st 10. 7. 2019 v 6:26 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Tue, Jul 9, 2019 at 12:24 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> út 9. 7. 2019 v 21:10 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
>> I afraid so with generic multiragetype there lot of array infrastructure will be duplicated
>
> on second hand - it is true so classic array concat is not optimal for set of ranges, so some functionality should be redefined every time.
>
> I don't know what is possible, but for me - multiranges is special kind (subset) of arrays and can be implement as subset of arrays. I remember other possible kind of arrays - "sets" without duplicates. It is similar case, I think.
>
> Maybe introduction of multirages as new generic type is bad direction, and can be better and more enhanceable in future to introduce some like special kinds of arrays. So for example - unnest can be used directly for arrays and multiranges too - because there will be common base.

Well I'm afraid of that too a bit, although I also agree it may be an
opportunity to share some common behavior and implementation. For
example in the discussion about string syntax, I think keeping it the
same as arrays is nicer for people and lets us share more between the
two types.

That said I don't think a multirange is a subtype of arrays (speaking
as a traditional object-oriented subtype), just something that shares
a lot of the same behavior. I'm inclined to maximize the overlap where
feasible though, e.g. string syntax, UNNEST, indexing, function naming
(`range_length`), etc. Something like Rust traits (or Java interfaces)
seems a closer mental model, not that we have to formalize that
somehow, particularly up front.

A introduction of new generic type can have some other impacts - there can be necessary special support for PL languages.

I understand so it is hard to decide - because we miss some more generic base "sets".

Probably we cannot to think more about it now, and we have to wait to some patches. Later we can see how much code is duplicated and if it is a problem or not.

Regards

Pavel

Yours,
Paul

Re: range_agg

From

David Fetter

Date:

10 July 2019, 06:24:00

On Tue, Jul 09, 2019 at 09:40:59AM -0700, Paul A Jungwirth wrote:
> On Tue, Jul 9, 2019 at 8:51 AM David Fetter <david@fetter.org> wrote:
> > > - A multirange type is an extra thing you get when you define a range
> > > (just like how you get a tstzrange[]). Therefore....
> > > - I don't need separate commands to add/drop multirange types. You get
> > > one when you define a range type, and if you drop a range type it gets
> > > dropped automatically.
> >
> > Yay for fewer manual steps!
> 
> Thanks for taking a look and sharing your thoughts!
> 
> > > - You can have a multirange[].
> >
> > I can see how that would fall out of this, but I'm a little puzzled as
> > to what people might use it for. Aggregates, maybe?
> 
> I don't know either, but I thought it was standard to define a T[] for
> every T. Anyway it doesn't seem difficult.
> 
> > > - You can cast from a multirange to an array. The individual ranges
> > > are always sorted in the result array.
> >
> > Is this so people can pick individual ranges out of the multirange,
> > or...?
> 
> Yes. I want this for foreign keys actually, where I construct a
> multirange and ask for just its first range.

I'm sure I'll understand this better once I get my head around
temporal foreign keys.

> > Speaking of casts, it's possible that a multirange is also a
> > range. Would it make sense to have a cast from multirange to range?
> 
> Hmm, that seems strange to me. You don't cast from an array to one of
> its elements. If we have subscripting, why use casting to get the
> first element?

Excellent point.

> > > - Interesting functions:
> > >   - multirange_length
> >
> > Is that the sum of the lengths of the ranges?  Are we guaranteeing a
> > measure in addition to ordering on ranges now?
> 
> Just the number of disjoint ranges in the multirange.

Thanks for clarifying.

> > > - You can subscript a multirange like you do an array (? This could be
> > > a function instead.)
> >
> > How would this play with the generic subscripting patch in flight?
> 
> I'm not aware of that patch but I guess I better check it out. :-)

Looks like I'm the second to mention it. Worth a review?

> > >   - inverse operator?:
> > >     - the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.
> >
> > Is that the same as ["(∞, ∞)"] - {"[1,2)"}?
> 
> Yes.
> 
> > I seem to recall that the usual convention (at least in math) is
> > to use intervals that are generally represented as open on the
> > infinity side, but that might not fit how we do things.
> 
> I think it does, unless I'm misunderstanding?

Oh, I was just wondering about the square bracket on the left side of
[null, 1).  It's not super important.

> > >     - not sure we want this or what the symbol should be. I don't like
> > > -mr as an inverse because then mr - mr != mr ++ -mr.
> >
> > !mr , perhaps?
> 
> I like that suggestion. Honestly I'm not sure we even want an inverse,
> but it's so important theoretically we should at least consider
> whether it is appropriate here. Or maybe "inverse" is the wrong word
> for this, or there is a different meaning it should have.

Jeff's suggestion of ~ for complement is better.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: range_agg

From

Paul Jungwirth

Date:

10 July 2019, 14:55:11

On 7/9/19 11:24 PM, David Fetter wrote:
>>> I seem to recall that the usual convention (at least in math) is
>>> to use intervals that are generally represented as open on the
>>> infinity side, but that might not fit how we do things.
>>
>> I think it does, unless I'm misunderstanding?
> 
> Oh, I was just wondering about the square bracket on the left side of
> [null, 1).  It's not super important.

Ah, I understand now. Just a typo on my part. Thanks for catching it, 
and sorry for the confusion!

>>> !mr , perhaps?
>>
>> I like that suggestion. Honestly I'm not sure we even want an inverse,
>> but it's so important theoretically we should at least consider
>> whether it is appropriate here. Or maybe "inverse" is the wrong word
>> for this, or there is a different meaning it should have.
> 
> Jeff's suggestion of ~ for complement is better.

Okay, thanks. I like it better too.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: range_agg

From

Alvaro Herrera

Date:

23 July 2019, 22:32:50

Hi Paul,

Just checking if you've had a chance to make progress on this.

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Paul A Jungwirth

Date:

24 July 2019, 05:13:07

On Tue, Jul 23, 2019 at 3:32 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Just checking if you've had a chance to make progress on this.

Not a lot. :-) But I should have more time for it the next few weeks
than I did the last few. I do have some code for creating concrete
multirange types (used when you create a concrete range type) and
filling in a TypeCacheEntry based on the range type oid---which I know
is all very modest progress. I've been working on a multirange_in
function and mostly just learning about Postgres varlena and TOASTed
objects by reading the code for range_in & array_in.

Here is something from my multirangetypes.h:

/*
 * Multiranges are varlena objects, so must meet the varlena convention that
 * the first int32 of the object contains the total object size in bytes.
 * Be sure to use VARSIZE() and SET_VARSIZE() to access it, though!
 */
typedef struct
{
    int32       vl_len_;            /* varlena header (do not touch
directly!) */
    Oid         multirangetypid;    /* multirange type's own OID */
    /*
     * Following the OID are the range objects themselves.
     * Note that ranges are varlena too,
     * depending on whether they have lower/upper bounds
     * and because even their base types can be varlena.
     * So we can't really index into this list.
     */
} MultirangeType;

I'm working on parsing a multirange much like we parse an array,
although it's a lot simpler because it's a single dimension and there
are no nulls.

I know that's not much to go on, but let me know if any of it worries you. :-)

Paul

Re: range_agg

From

Thomas Munro

Date:

01 August 2019, 08:34:19

On Wed, Jul 24, 2019 at 5:13 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> On Tue, Jul 23, 2019 at 3:32 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Just checking if you've had a chance to make progress on this.
>
> Not a lot. :-) But I should have more time for it the next few weeks
> than I did the last few.  ...

Hi Paul,

I didn't follow this thread, but as the CF is coming to a close, I'm
interpreting the above to mean that this is being worked on and there
is a good chance of a new patch in time for September.  Therefore I
have moved this entry to that 'fest.

Thanks,

-- 
Thomas Munro
https://enterprisedb.com

Re: range_agg

From

Paul A Jungwirth

Date:

17 August 2019, 17:47:08

On Mon, Jul 8, 2019 at 9:46 AM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> - A multirange type is an extra thing you get when you define a range
> (just like how you get a tstzrange[]). Therefore....

I've been able to make a little more progress on multiranges the last
few days, but it reminded me of an open question I've had for awhile:
typmods! I see places in the range code that gesture toward supporting
typmods, but none of the existing range types permit them. For
example:

postgres=# select '5'::numeric(4,2);
 numeric
---------
    5.00
(1 row)

postgres=# select '[1,4)'::numrange(4,2);
ERROR:  type modifier is not allowed for type "numrange"
LINE 1: select '[1,4)'::numrange(4,2);

So I'm wondering how seriously I should take this for multiranges? I
guess if a range type did support typmods, it would just delegate to
the underlying element type for their meaning, and so a multirange
should delegate it too? Is there any historical discussion around
typemods on range types?

Thanks!
Paul

Re: range_agg

From

Jeff Davis

Date:

21 August 2019, 05:33:47

On Sat, 2019-08-17 at 10:47 -0700, Paul A Jungwirth wrote:
> So I'm wondering how seriously I should take this for multiranges? I
> guess if a range type did support typmods, it would just delegate to
> the underlying element type for their meaning, and so a multirange
> should delegate it too? Is there any historical discussion around
> typemods on range types?

I did find a few references:

https://www.postgresql.org/message-id/1288029716.8645.4.camel%40jdavis-ux.asterdata.local

https://www.postgresql.org/message-id/20110111191334.GB11603%40fetter.org
https://www.postgresql.org/message-id/1296974485.27157.136.camel@jdavis

I'd be interested in ways that we can use a typmod-like concept to
improve the type system. Unfortunately, typmod is just not
sophisticated enough to do very much because it's lost through function
calls. Improving that would be a separate and challenging project.

So, I wouldn't spend a lot of time on typmod for multiranges.

Regards,
    Jeff Davis

Re: range_agg

From

Paul A Jungwirth

Date:

22 August 2019, 04:54:53

On Tue, Aug 20, 2019 at 10:33 PM Jeff Davis <pgsql@j-davis.com> wrote:
> > Is there any historical discussion around
> > typemods on range types?
>
> I did find a few references:

Thanks for looking those up! It's very interesting to see some of the
original discussion around range types.

Btw this is true of so much, isn't it?:

> It's more a property of the
> column than the type.

Sometimes I think about having a maybe<T> type instead of null/not
null. SQL nulls are already very "monadic" I think but nullability
doesn't travel. I feel like someone ought to write a paper about that,
but I don't know of any. This is tantalizingly close (and a fun read)
but really about something else:
https://www.researchgate.net/publication/266657590_Incomplete_data_what_went_wrong_and_how_to_fix_it
Since you're getting into Rust maybe you can update the wiki page
mentioned in those threads about refactoring the type system. :-)
Anyway sorry for the digression. . . .

> So, I wouldn't spend a lot of time on typmod for multiranges.

Okay, thanks! There is plenty else to do. I think I'm already
supporting it as much as range types do.

Btw I have working multirange_{send,recv,in,out} now, and I
automatically create a multirange type and its array type when someone
creates a new range type. I have a decent start on passing tests and
no compiler warnings. I also have a start on anymultirange and
anyrangearray. (I think I need the latter to support a range-like
constructor function, so you can say `int4multirange(int4range(1,4),
int4range(8,10))`.) I want to get the any* types done and improve the
test coverage, and then I'll probably be ready to share a patch.

Here are a couple other questions:

- Does anyone have advice for the typanalyze function? I feel pretty
out of my depth there (although I haven't looked into typanalyze stuff
very deeply yet). I can probably get some inspiration from
range_typanalyze and array_typanalyze, but those are both quite
detailed (their statistics functions that is).

- What should a multirange do if you give it an empty range? I'm
thinking it should just ignore it, but then `'{}'::int4multirange =
'{empty}'::int4multirange`. Does that seem okay? (It does to me
actually, if we think of `empty` as the additive identity. Similarly
mr + empty = mr.

- What should a multirange do if you give it a null, like
`int4multirange(int4range(1,4), null)`. I'm thinking it should be
null, just like mr + null = null. Right?

Thanks!
Paul

Re: range_agg

From

Jeff Davis

Date:

26 August 2019, 18:34:27

On Wed, 2019-08-21 at 21:54 -0700, Paul A Jungwirth wrote:
> Sometimes I think about having a maybe<T> type instead of null/not
> null. SQL nulls are already very "monadic" I think but nullability
> doesn't travel.

Yeah, that would be a great direction, but there is some additional
complexity that we'd need to deal with that a "normal" compiler does
not:

  * handling both existing global types (like int) as well as on-the-
fly types like Maybe<Either<Int,Bool>>
  * types need to do more things, like have serialized representations,
interface with indexing strategies, and present the optimizer with
choices that may influence which indexes can be used or not
  * at some point needs to work with normal SQL types and NULL
  * there are a lot of times we care not just whether a type is
sortable, but we actually care about the way it's sorted (e.g.
localization). typeclasses+newtype would probably be unacceptable for
trying to match SQL behavior here.

I'm all in favor of pursuing this, but it's not going to bear fruit
very soon.

> Btw I have working multirange_{send,recv,in,out} now, and I
> automatically create a multirange type and its array type when
> someone
> creates a new range type. I have a decent start on passing tests and
> no compiler warnings. I also have a start on anymultirange and
> anyrangearray. (I think I need the latter to support a range-like
> constructor function, so you can say `int4multirange(int4range(1,4),
> int4range(8,10))`.) I want to get the any* types done and improve the
> test coverage, and then I'll probably be ready to share a patch.

Awesome!

> Here are a couple other questions:
> 
> - Does anyone have advice for the typanalyze function? I feel pretty
> out of my depth there (although I haven't looked into typanalyze
> stuff
> very deeply yet). I can probably get some inspiration from
> range_typanalyze and array_typanalyze, but those are both quite
> detailed (their statistics functions that is).

I think Alexander Korotkov did a lot of the heavy lifting here, perhaps
he has a comment? I'd keep it simple for now if you can, and we can try
to improve it later.

> - What should a multirange do if you give it an empty range? I'm
> thinking it should just ignore it, but then `'{}'::int4multirange =
> '{empty}'::int4multirange`. Does that seem okay? (It does to me
> actually, if we think of `empty` as the additive identity. Similarly
> mr + empty = mr.

I agree. Multiranges are more than just an array of ranges, so they
coalesce into some canonical form.

> - What should a multirange do if you give it a null, like
> `int4multirange(int4range(1,4), null)`. I'm thinking it should be
> null, just like mr + null = null. Right?

Yes. NULL is for the overall multirange datum (that is, a multirange
column can be NULL), but I don't think individual parts of a datatype
make much sense as NULL. So, I agree that mr + null = null. (Note that
arrays and records can have NULL parts, but I don't see a reason we
should follow those examples for multiranges.)

Regards,
    Jeff Davis

Re: range_agg

From

Paul A Jungwirth

Date:

01 September 2019, 13:26:11

> > Btw I have working multirange_{send,recv,in,out} now. . . .

Just about all the other operators are done too, but I wonder what
symbols people like for union and minus? Range uses + for union. I
have working code and tests that adds this:

r + mr = mr
mr + r = mr
mr + mr = mr

But I would like to use a different symbol instead, like ++, so I can
have all four:

r ++ r = mr
r ++ mr = mr
mr ++ r = mr
mr ++ mr = mr

(The existing r + r operator throws an error if the inputs have a gap.)

The trouble is that ++ isn't allowed. (Neither is --.) From
https://www.postgresql.org/docs/11/sql-createoperator.html :

> A multicharacter operator name cannot end in + or -, unless the name also contains at least one of these characters:
>     ~ ! @ # % ^ & | ` ?

So are there any other suggestions? I'm open to arguments that I
should just use +, but I think having a way to add two simple ranges
and get a multirange would be nice too, so my preference is to find a
new operator. It should work with minus and intersection (* for
ranges) too. Some proposals:

+* and -* and ** (* as in regex "zero or many" reminds me of how a
multirange holds zero or many ranges. ** is confusing though because
it's like exponentiation.)

@+ and @- and @* (I dunno why but I kind of like it. We already have @> and <@.)

<+> and <-> and <*> (I was hoping for (+) etc for the math connection
to circled operators, but this is close. Maybe this would be stronger
if multirange_{in,out} used <> delims instead of {}, although I also
like how {} is consistent with arrays.)

Anyone else have any thoughts?

Thanks,
Paul

Re: range_agg

From

Jeff Davis

Date:

05 September 2019, 17:15:43

On Sun, 2019-09-01 at 06:26 -0700, Paul A Jungwirth wrote:
> @+ and @- and @* (I dunno why but I kind of like it. We already have
> @> and <@.)

I think I like this proposal best; it reminds me of perl. Though some
might say that's an argument against it.

Regards,
    Jeff Davis

Re: range_agg

From

Paul A Jungwirth

Date:

05 September 2019, 17:45:55

On Thu, Sep 5, 2019 at 10:15 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Sun, 2019-09-01 at 06:26 -0700, Paul A Jungwirth wrote:
> > @+ and @- and @* (I dunno why but I kind of like it. We already have
> > @> and <@.)
>
> I think I like this proposal best; it reminds me of perl. Though some
> might say that's an argument against it.

Thanks Jeff, it's my favorite too. :-) Strangely it feels the hardest
to justify. Right now I have + and - and * implemented but I'll change
them to @+ and @- and @* so that I can support `range R range =
multirange` too.

Btw is there any reason to send a "preview" patch with my current
progress, since we're starting a new commit fest? Here is what I have
left to do:

- Change those three operators.
- Write range_intersect_agg. (range_agg is done but needs some tests
before I commit it.)
- Write documentation.
- Add multiranges to resolve_generic_type, and figure out how to test
that (see the other thread about two latent range-related bugs there).
- Rebase on current master. (It should be just a few weeks behind right now.)
- Run pgindent to make sure I'm conforming to whitespace/style guidelines.
- Split it up into a few separate patch files.

Right now I'm planning to do all that before sending a patch. I'm
happy to send something something in-progress too, but I don't want to
waste any reviewers' time. If folks want an early peak though let me
know. (You can also find my messy progress at
https://github.com/pjungwir/postgresql/tree/multirange)

Also here are some other items that won't be in my next patch, but
should probably be done (maybe by someone else but I'm happy to figure
it out too) before this is really committed:

- typanalyze
- selectivity
- gist support
- spgist support

If anyone would like to help with those, let me know. :-)

Yours,
Paul

Re: range_agg

From

Jeff Davis

Date:

05 September 2019, 18:52:37

On Thu, 2019-09-05 at 10:45 -0700, Paul A Jungwirth wrote:
> Right now I'm planning to do all that before sending a patch. I'm
> happy to send something something in-progress too, but I don't want
> to
> waste any reviewers' time. If folks want an early peak though let me
> know. (You can also find my messy progress at
> https://github.com/pjungwir/postgresql/tree/multirange)

Sounds good. The rule I use is: "will the feedback I get be helpful, or
just tell me about obvious problems I already know about".

Regards,
    Jeff Davis

Re: range_agg

From

Paul A Jungwirth

Date:

22 September 2019, 04:50:55

On Thu, Sep 5, 2019 at 11:52 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Thu, 2019-09-05 at 10:45 -0700, Paul A Jungwirth wrote:
> > Right now I'm planning to do all that before sending a patch. I'm
> > happy to send something something in-progress too, but I don't want
> > to
> > waste any reviewers' time. If folks want an early peak though let me
> > know. (You can also find my messy progress at
> > https://github.com/pjungwir/postgresql/tree/multirange)
>
> Sounds good. The rule I use is: "will the feedback I get be helpful, or
> just tell me about obvious problems I already know about".

Here are some patches to add multiranges. I tried to split things up a
bit but most things landed in parts 1 & 2.

Things I haven't done (but would be interested in doing or getting help with):

- gist opclass
- spgist opclass
- typanalyze
- selectivity
- anyrangearray
- anymultirangearray?
- UNNEST for multirange and/or a way to convert it to an array
- indexing/subscripting (see patch for standardized subscripting)

Attachment

Re: range_agg

From

Alvaro Herrera

Date:

26 September 2019, 21:13:49

Hello Paul, I've started to review this patch.  Here's a few minor
things I ran across -- mostly compiler warnings (is my compiler too
ancient?).  You don't have to agree with every fix -- feel free to use
different fixes if you have them.  Also, feel free to squash them onto
whatever commit you like (I think they all belong onto 0001 except the
last which seems to be for 0002).

Did you not push your latest version to your github repo?  I pulled from
there and branch 'multirange' does not seem to match what you posted.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: range_agg

From

Paul A Jungwirth

Date:

06 November 2019, 23:02:35

On Thu, Sep 26, 2019 at 2:13 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Hello Paul, I've started to review this patch.  Here's a few minor
> things I ran across -- mostly compiler warnings (is my compiler too
> ancient?).  You don't have to agree with every fix -- feel free to use
> different fixes if you have them.  Also, feel free to squash them onto
> whatever commit you like (I think they all belong onto 0001 except the
> last which seems to be for 0002).

Hi Alvaro, sorry, I missed your note from September. I really
appreciate your review and will take a look at your suggested changes!
I just opened this thread to post a rebased set patches (especially
because of the `const` additions to range functions). Maybe it's not
that helpful since they don't include your changes yet but here they
are anyway. I'll post some more with your changes shortly.

> Did you not push your latest version to your github repo?  I pulled from
> there and branch 'multirange' does not seem to match what you posted.

Hmm, I'll take a look. Before I made the v3 files I switched to
multirange-patch so I could squash things and use git to generate one
patch file per commit. So `multirange` isn't rebased as currently as
`multirange-patch`. If you don't mind a force-push I can update
`multirange` to be the same.

Yours,
Paul

Attachment

Re: range_agg

From

Paul A Jungwirth

Date:

07 November 2019, 02:35:28

On Wed, Nov 6, 2019 at 3:02 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> On Thu, Sep 26, 2019 at 2:13 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Hello Paul, I've started to review this patch.  Here's a few minor
> > things I ran across -- mostly compiler warnings (is my compiler too
> > ancient?).
> I just opened this thread to post a rebased set patches (especially
> because of the `const` additions to range functions). Maybe it's not
> that helpful since they don't include your changes yet but here they
> are anyway. I'll post some more with your changes shortly.

Here is another batch of patches incorporating your improvements. It
seems like almost all the warnings were about moving variable
declarations above any other statements. For some reason I don't get
warnings about that on my end (compiling on OS X):

platter:postgres paul$ gcc --version
Configured with:
--prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple clang version 11.0.0 (clang-1100.0.33.12)
Target: x86_64-apple-darwin18.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

For configure I'm saying this:

./configure CFLAGS=-ggdb 5-Og -g3 -fno-omit-frame-pointer
--enable-tap-tests --enable-cassert --enable-debug
--prefix=/Users/paul/local

Any suggestions to get better warnings? On my other patch I got
feedback about the very same kind. I could just compile on Linux but
it's nice to work on this away from my desk on the laptop. Maybe
installing a real gcc is the way to go.

Thanks,
Paul

Attachment

Re: range_agg

From

Pavel Stehule

Date:

19 November 2019, 09:16:47

čt 7. 11. 2019 v 3:36 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Wed, Nov 6, 2019 at 3:02 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> On Thu, Sep 26, 2019 at 2:13 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Hello Paul, I've started to review this patch. Here's a few minor
> > things I ran across -- mostly compiler warnings (is my compiler too
> > ancient?).
> I just opened this thread to post a rebased set patches (especially
> because of the `const` additions to range functions). Maybe it's not
> that helpful since they don't include your changes yet but here they
> are anyway. I'll post some more with your changes shortly.

Here is another batch of patches incorporating your improvements. It
seems like almost all the warnings were about moving variable
declarations above any other statements. For some reason I don't get
warnings about that on my end (compiling on OS X):

platter:postgres paul$ gcc --version
Configured with:
--prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple clang version 11.0.0 (clang-1100.0.33.12)
Target: x86_64-apple-darwin18.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

For configure I'm saying this:

./configure CFLAGS=-ggdb 5-Og -g3 -fno-omit-frame-pointer
--enable-tap-tests --enable-cassert --enable-debug
--prefix=/Users/paul/local

Any suggestions to get better warnings? On my other patch I got
feedback about the very same kind. I could just compile on Linux but
it's nice to work on this away from my desk on the laptop. Maybe
installing a real gcc is the way to go.

I tested last patches. I found some issues

1. you should not to try patch catversion.

2. there is warning

parse_coerce.c: In function ‘enforce_generic_type_consistency’:
parse_coerce.c:1975:11: warning: ‘range_typelem’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1975 | else if (range_typelem != elem_typeid)

3. there are problems with pg_upgrade. Regress tests fails

command: "/home/pavel/src/postgresql.master/tmp_install/usr/local/pgsql/bin/pg_restore" --host /home/pavel/src/postgresql.master/src/b
pg_restore: connecting to database for restore
pg_restore: creating DATABASE "regression"
pg_restore: connecting to new database "regression"
pg_restore: connecting to database "regression" as user "pavel"
pg_restore: creating DATABASE PROPERTIES "regression"
pg_restore: connecting to new database "regression"
pg_restore: connecting to database "regression" as user "pavel"
pg_restore: creating pg_largeobject "pg_largeobject"
pg_restore: creating SCHEMA "fkpart3"
pg_restore: creating SCHEMA "fkpart4"
pg_restore: creating SCHEMA "fkpart5"
pg_restore: creating SCHEMA "fkpart6"
pg_restore: creating SCHEMA "mvtest_mvschema"
pg_restore: creating SCHEMA "regress_indexing"
pg_restore: creating SCHEMA "regress_rls_schema"
pg_restore: creating SCHEMA "regress_schema_2"
pg_restore: creating SCHEMA "testxmlschema"
pg_restore: creating TRANSFORM "TRANSFORM FOR integer LANGUAGE "sql""
pg_restore: creating TYPE "public.aggtype"
pg_restore: creating TYPE "public.arrayrange"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 1653; 1247 17044 TYPE arrayrange pavel
pg_restore: error: could not execute query: ERROR: pg_type array OID value not set when in binary upgrade mode
Command was:.
-- For binary upgrade, must preserve pg_type oid
SELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('17044'::pg_catalog.oid);

-- For binary upgrade, must preserve pg_type array oid
SELECT pg_catalog.binary_upgrade_set_next_array_pg_type_oid('17045'::pg_catalog.oid);

CREATE TYPE "public"."arrayrange" AS RANGE (
subtype = integer[]
);

4. there is a problem with doc

echo "<!ENTITY version \"13devel\">"; \
echo "<!ENTITY majorversion \"13\">"; \
} > version.sgml
'/usr/bin/perl' ./mk_feature_tables.pl YES ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt > features-supported.sgml
'/usr/bin/perl' ./mk_feature_tables.pl NO ../../../src/backend/catalog/sql_feature_packages.txt ../../../src/backend/catalog/sql_features.txt > features-unsupported.sgml
'/usr/bin/perl' ./generate-errcodes-table.pl ../../../src/backend/utils/errcodes.txt > errcodes-table.sgml
'/usr/bin/perl' ./generate-keywords-table.pl . > keywords-table.sgml
/usr/bin/xmllint --path . --noout --valid postgres.sgml
extend.sgml:281: parser error : Opening and ending tag mismatch: para line 270 and type
type of the ranges in an </type>anymultirange</type>.
^
extend.sgml:281: parser error : Opening and ending tag mismatch: sect2 line 270 and type
type of the ranges in an </type>anymultirange</type>.
^
extend.sgml:282: parser error : Opening and ending tag mismatch: sect1 line 270 and para
</para>
^
extend.sgml:324: parser error : Opening and ending tag mismatch: chapter line 270 and sect2
</sect2>
^

I am not sure how much is correct to use <literallayout class="monospaced"> in doc. It is used for ranges, and multiranges, but no in other places

All other looks well

Pavel

Thanks,
Paul

Re: range_agg

From

Paul A Jungwirth

Date:

20 November 2019, 05:49:06

On Tue, Nov 19, 2019 at 1:17 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
> I tested last patches. I found some issues

Thank you for the review!

> 1. you should not to try patch catversion.

I've seen discussion on pgsql-hackers going both ways, but I'll leave
it out of future patches. :-)

> 2. there is warning
>
> parse_coerce.c: In function ‘enforce_generic_type_consistency’:
> parse_coerce.c:1975:11: warning: ‘range_typelem’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>  1975 |   else if (range_typelem != elem_typeid)

Fixed locally, will include in my next patch.

> 3. there are problems with pg_upgrade. Regress tests fails
> . . .
> pg_restore: while PROCESSING TOC:
> pg_restore: from TOC entry 1653; 1247 17044 TYPE arrayrange pavel
> pg_restore: error: could not execute query: ERROR:  pg_type array OID value not set when in binary upgrade mode

I see what's going on here. (Sorry if this verbose explanation is
obvious; it's as much for me as for anyone.) With pg_upgrade the
values of pg_type.oid and pg_type.typarray must be the same before &
after. For built-in types there's no problem, because those are fixed
by pg_type.dat. But for user-defined types we have to take extra steps
to make sure they don't change. CREATE TYPE always uses two oids: one
for the type and one for the type's array type. But now when you
create a range type we use *four*: the range type, the array of that
range, the multirange type, and the array of that multirange.
Currently when you run pg_dump in "binary mode" (i.e. as part of
pg_upgrade) it includes calls to special functions to set the next oid
to use for pg_type.oid and pg_type.typarray. Then CREATE TYPE also has
special "binary mode" code to check those variables and use those oids
(e.g. AssignTypeArrayOid). After using them once it sets them back to
InvalidOid so it doesn't keep using them. So I guess I need to add
code to pg_dump so that it also outputs calls to two new special
functions that similarly set the oid to use for the next multirange
and multirange[]. For v12->v13 it will chose high-enough oids like we
do already for arrays of domains. (For other upgrades it will use the
existing value.) And then I can change the CREATE TYPE code to check
those pre-set values when obtaining the next oid. Does that sound like
the right approach here?

> 4. there is a problem with doc
>
> extend.sgml:281: parser error : Opening and ending tag mismatch: para line 270 and type
>      type of the ranges in an </type>anymultirange</type>.

Hmm, yikes, I'll fix that!

> I am not sure how much is correct to use <literallayout class="monospaced"> in doc. It is used for ranges, and
multiranges,but no in other places 

I could use some advice here. Many operators seem best presented in
groups of four, where only their parameter types change, for example:

int8range < int8range
int8range < int8multirange
int8multirange < int8range
int8multirange < int8multirange

All I really want is to show those separated by line breaks. I
couldn't find any other examples of that happening inside a table cell
though (i.e. inside <row><entry></entry></row>). What is the best way
to do that?

Thanks,
Paul

Re: range_agg

From

Paul A Jungwirth

Date:

20 November 2019, 19:32:10

On Tue, Nov 19, 2019 at 9:49 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On Tue, Nov 19, 2019 at 1:17 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > Hi
> > I tested last patches. I found some issues
>
> Thank you for the review!

Here is an updated patch series fixing the problems from that last
review. I would still like some direction about the doc formatting:

> > I am not sure how much is correct to use <literallayout class="monospaced"> in doc. It is used for ranges, and
multiranges,but no in other places
 
>
> I could use some advice here. Many operators seem best presented in
> groups of four, where only their parameter types change, for example:
>
> int8range < int8range
> int8range < int8multirange
> int8multirange < int8range
> int8multirange < int8multirange
>
> All I really want is to show those separated by line breaks. I
> couldn't find any other examples of that happening inside a table cell
> though (i.e. inside <row><entry></entry></row>). What is the best way
> to do that?

Thanks,
Paul

Attachment

Re: range_agg

From

Pavel Stehule

Date:

21 November 2019, 09:06:01

st 20. 11. 2019 v 20:32 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Tue, Nov 19, 2019 at 9:49 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On Tue, Nov 19, 2019 at 1:17 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> > Hi
> > I tested last patches. I found some issues
>
> Thank you for the review!

Here is an updated patch series fixing the problems from that last
review. I would still like some direction about the doc formatting:

yes, these bugs are fixed

there are not compilation's issues

tests passed

doc is ok

I have notes

1. the chapter should be renamed to "Range Functions and Operators" to "Range and Multirange Functions and Operators"

But now the doc is not well readable - there is not clean, what functions are for range type, what for multirange and what for both

2. I don't like introduction "safe" operators - now the basic operators are doubled, and nobody without documentation will use @* operators.

It is not intuitive. I think is better to map this functionality to basic operators +- * and implement it just for pairs (Multirange, Multirange) and (Multirange, Range) if it is possible

It's same relation line Numeric X integer. There should not be introduced new operators. If somebody need it for ranges, then he can use cast to multirange, and can continue.

The "safe" operators can be implement on user space - but should not be default solution.

3. There are not prepared casts -

postgres=# select int8range(10,15)::int8multirange;
ERROR: cannot cast type int8range to int8multirange
LINE 1: select int8range(10,15)::int8multirange;
^

There should be some a) fully generic solution, or b) possibility to build implicit cast when any multirange type is created.

Regards

Pavel

> > I am not sure how much is correct to use <literallayout class="monospaced"> in doc. It is used for ranges, and multiranges, but no in other places
>
> I could use some advice here. Many operators seem best presented in
> groups of four, where only their parameter types change, for example:
>
> int8range < int8range
> int8range < int8multirange
> int8multirange < int8range
> int8multirange < int8multirange
>
> All I really want is to show those separated by line breaks. I
> couldn't find any other examples of that happening inside a table cell
> though (i.e. inside <row><entry></entry></row>). What is the best way
> to do that?

Personally I think it should be cleaned. Mainly if there is not visible differences. But range related doc it uses, so it is consistent with it. And then this is not big issue.

Thanks,
Paul

Re: range_agg

From

Paul Jungwirth

Date:

21 November 2019, 20:15:28

On 11/21/19 1:06 AM, Pavel Stehule wrote:
> 2. I don't like introduction "safe" operators - now the basic operators 
> are doubled, and nobody without documentation will use @* operators.
> 
> It is not intuitive. I think is better to map this functionality to 
> basic operators +- * and implement it just for pairs (Multirange, 
> Multirange) and (Multirange, Range) if it is possible
> 
> It's same relation line Numeric X integer. There should not be 
> introduced new operators. If somebody need it for ranges, then he can 
> use cast to multirange, and can continue.
 > [snip]
> 3. There are not prepared casts -
> 
> postgres=# select int8range(10,15)::int8multirange;
> ERROR:  cannot cast type int8range to int8multirange
> LINE 1: select int8range(10,15)::int8multirange;
>                                 ^
> There should be some a) fully generic solution, or b) possibility to 
> build implicit cast when any multirange type is created.

Okay, I like the idea of just having `range + range` and `multirange + 
multirange`, then letting you cast between ranges and multiranges. The 
analogy to int/numeric seems strong. I guess if you cast a multirange 
with more than one element to a range it will raise an error. That will 
let me clean up the docs a lot too.

Thanks!

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: range_agg

From

Pavel Stehule

Date:

22 November 2019, 05:21:04

čt 21. 11. 2019 v 21:15 odesílatel Paul Jungwirth <pj@illuminatedcomputing.com> napsal:

On 11/21/19 1:06 AM, Pavel Stehule wrote:
> 2. I don't like introduction "safe" operators - now the basic operators
> are doubled, and nobody without documentation will use @* operators.
>
> It is not intuitive. I think is better to map this functionality to
> basic operators +- * and implement it just for pairs (Multirange,
> Multirange) and (Multirange, Range) if it is possible
>
> It's same relation line Numeric X integer. There should not be
> introduced new operators. If somebody need it for ranges, then he can
> use cast to multirange, and can continue.
> [snip]
> 3. There are not prepared casts -
>
> postgres=# select int8range(10,15)::int8multirange;
> ERROR: cannot cast type int8range to int8multirange
> LINE 1: select int8range(10,15)::int8multirange;
> ^
> There should be some a) fully generic solution, or b) possibility to
> build implicit cast when any multirange type is created.

Okay, I like the idea of just having `range + range` and `multirange +
multirange`, then letting you cast between ranges and multiranges. The
analogy to int/numeric seems strong. I guess if you cast a multirange
with more than one element to a range it will raise an error. That will
let me clean up the docs a lot too.

I though about it, and I think so cast from multirange to range is useless, minimally it should be explicit.

On second hand - from range to multirange should be implicit.

The original patch did

1. MR @x MR = MR

2. R @x R = MR

3. MR @x R = MR

I think so @1 & @3 has sense, but without introduction of special operator. @2 is bad and can be solved by cast one or second operand.

Pavel

Thanks!

--
Paul ~{:-)
pj@illuminatedcomputing.com

Re: range_agg

From

Paul A Jungwirth

Date:

22 November 2019, 16:25:41

On Thu, Nov 21, 2019 at 9:21 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I though about it, and I think so cast from multirange to range is useless, minimally it should be explicit.

I agree: definitely not implicit. If I think of a good reason for it
I'll add it, but otherwise I'll leave it out.

> On second hand - from range to multirange should be implicit.

Okay.

> The original patch did
>
> 1. MR @x MR = MR
> 2. R @x R = MR
> 3. MR @x R = MR
>
> I think so @1 & @3 has sense, but without introduction of special operator. @2 is bad and can be solved by cast one
orsecond operand.

Yes. I like how #2 follows the int/numeric analogy: if you want a
numeric result from `int / int` you can say `int::numeric / int`.

So my understanding is that conventionally cast functions are named
after the destination type, e.g. int8multirange(int8range) would be
the function to cast an int8range to an int8multirange. And
int8range(int8multirange) would go the other way (if we do that). We
already use these names for the "constructor" functions, but I think
that is actually okay. For the multirange->range cast, the parameter
type & number are different, so there is no real conflict. For the
range->multirange cast, the parameter type is the same, and the
constructor function is variadic---but I think that's fine, because
the semantics are the same: build a multirange whose only element is
the given range:

regression=# select int8multirange(int8range(1,2));
 int8multirange
----------------
 {[1,2)}
(1 row)

Even the NULL handling is already what we want:

regression=# select int8multirange(null);
 int8multirange
----------------
 NULL
(1 row)

So I think it's fine, but I'm curious whether you see any problems
there? (I guess if there is a problem it's no big deal to name the
function something else....)

Thanks,
Paul

Re: range_agg

From

Pavel Stehule

Date:

22 November 2019, 16:57:12

pá 22. 11. 2019 v 17:25 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Thu, Nov 21, 2019 at 9:21 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I though about it, and I think so cast from multirange to range is useless, minimally it should be explicit.

I agree: definitely not implicit. If I think of a good reason for it
I'll add it, but otherwise I'll leave it out.

> On second hand - from range to multirange should be implicit.

Okay.

> The original patch did
>
> 1. MR @x MR = MR
> 2. R @x R = MR
> 3. MR @x R = MR
>
> I think so @1 & @3 has sense, but without introduction of special operator. @2 is bad and can be solved by cast one or second operand.

Yes. I like how #2 follows the int/numeric analogy: if you want a
numeric result from `int / int` you can say `int::numeric / int`.

So my understanding is that conventionally cast functions are named
after the destination type, e.g. int8multirange(int8range) would be
the function to cast an int8range to an int8multirange. And
int8range(int8multirange) would go the other way (if we do that). We
already use these names for the "constructor" functions, but I think
that is actually okay. For the multirange->range cast, the parameter
type & number are different, so there is no real conflict. For the
range->multirange cast, the parameter type is the same, and the
constructor function is variadic---but I think that's fine, because
the semantics are the same: build a multirange whose only element is
the given range:

regression=# select int8multirange(int8range(1,2));
int8multirange
----------------
{[1,2)}
(1 row)

Even the NULL handling is already what we want:

regression=# select int8multirange(null);
int8multirange
----------------
NULL
(1 row)

So I think it's fine, but I'm curious whether you see any problems
there? (I guess if there is a problem it's no big deal to name the
function something else....)

It looks well now. I am not sure about benefit of cast from MR to R if MR has more than one values. But it can be there for completeness.

I think in this moment is not important to implement all functionality - for start is good to implement basic functionality that can be good. It can be enhanced step by step in next versions.

Pavel

Thanks,
Paul

Re: range_agg

From

Alvaro Herrera

Date:

30 November 2019, 02:10:26

Hi Paul

I'm starting to look at this again.  Here's a few proposed changes to
the current code as I read along.

I noticed that 0001 does not compile on its own.  It works as soon as I
add 0002.  What this is telling me is that the current patch split is
not serving any goals; I think it's okay to merge them all into a single
commit.  If you want to split it in smaller pieces, it needs to be stuff
that can be committed separately.

One possible candidate for that is the new makeUniqueTypeName function
you propose.  I added this comment to explain what it does:

 /*
- * makeUniqueTypeName: Prepend underscores as needed until we make a name that
- * doesn't collide with anything. Tries the original typeName if requested.
+ * makeUniqueTypeName
+ *     Generate a unique name for a prospective new type
+ *
+ * Given a typeName of length namelen, produce a new name into dest (an output
+ * buffer allocated by caller, which must of length NAMEDATALEN) by prepending
+ * underscores, until a non-conflicting name results.
+ *
+ * If tryOriginalName, first try with zero underscores.
  *
  * Returns the number of underscores added.
  */

This seems a little too strange; why not have the function allocate its
output buffer instead, and return it?  In order to support the case of
it failing to find an appropriate name, have it return NULL, for caller
to throw the "could not form ... name" error.

The attached 0001 simplifies makeMultirangeConstructors; I think it was
too baroque just because of it trying to imitate makeRangeConstructors;
it seems easier to just repeat the ProcedureCreate call than trying to
be clever.  (makeRangeConstructors's comment is lying about the number
of constructors it creates after commit df73584431e7, BTW.  But note
that the cruft in it to support doing it twice is not as much as in the
new code).

The other patches should be self-explanatory (I already submitted 0002
previously.)

I'll keep going over the rest of it.  Thanks!

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: range_agg

From

Alvaro Herrera

Date:

20 December 2019, 17:43:21

I took the liberty of rebasing this series on top of recent branch
master.  The first four are mostly Paul's originals, except for conflict
fixes; the rest are changes I'm proposing as I go along figuring out the
whole thing.  (I would post just my proposed changes, if it weren't for
the rebasing; apologies for the messiness.)

I am not convinced that adding TYPTYPE_MULTIRANGE is really necessary.
Why can't we just treat those types as TYPTYPE_RANGE and distinguish
them using TYPCATEGORY_MULTIRANGE?  That's what we do for arrays.  I'll
try to do that next.

I think the algorithm for coming up with the multirange name is
suboptimal.  It works fine with the name is short enough that we can add
a few extra letters, but otherwise the result look pretty silly.  I
think we can still improve on that.  I propose to make
makeUniqueTypeName accept a suffix, and truncate the letters that appear
*before* the suffix rather than truncating after it's been appended.

There's a number of ereport() calls that should become elog(); and a
bunch of others that should probably acquire errcode() and be
reformatted per our style.


Regarding Pavel's documentation markup issue,

> I am not sure how much is correct to use <literallayout class="monospaced">
> in doc. It is used for ranges, and multiranges, but no in other places

I looked at the generated PDF and the table looks pretty bad; the words
in those entries overlap the words in the cell to their right.  But that
also happens with entries that do not use <literallayout class="x">!
See [1] for an example of the existing docs being badly formatted.  The
docbook documentation [2] seems to suggest that what Paul used is the
appropriate way to do this.

Maybe a way is to make each entry have more than one row -- so the
example would appear below the other three fields in its own row, and
would be able to use the whole width of the table.

[1] https://twitter.com/alvherre/status/1205563468595781633
[2] https://tdg.docbook.org/tdg/5.1/literallayout.html 

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: range_agg

From

Pavel Stehule

Date:

20 December 2019, 18:19:07

pá 20. 12. 2019 v 18:43 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> napsal:

I took the liberty of rebasing this series on top of recent branch
master. The first four are mostly Paul's originals, except for conflict
fixes; the rest are changes I'm proposing as I go along figuring out the
whole thing. (I would post just my proposed changes, if it weren't for
the rebasing; apologies for the messiness.)

I am not convinced that adding TYPTYPE_MULTIRANGE is really necessary.
Why can't we just treat those types as TYPTYPE_RANGE and distinguish
them using TYPCATEGORY_MULTIRANGE? That's what we do for arrays. I'll
try to do that next.

I think the algorithm for coming up with the multirange name is
suboptimal. It works fine with the name is short enough that we can add
a few extra letters, but otherwise the result look pretty silly. I
think we can still improve on that. I propose to make
makeUniqueTypeName accept a suffix, and truncate the letters that appear
*before* the suffix rather than truncating after it's been appended.

There's a number of ereport() calls that should become elog(); and a
bunch of others that should probably acquire errcode() and be
reformatted per our style.

Regarding Pavel's documentation markup issue,

> I am not sure how much is correct to use <literallayout class="monospaced">
> in doc. It is used for ranges, and multiranges, but no in other places

I looked at the generated PDF and the table looks pretty bad; the words
in those entries overlap the words in the cell to their right. But that
also happens with entries that do not use <literallayout class="x">!
See [1] for an example of the existing docs being badly formatted. The
docbook documentation [2] seems to suggest that what Paul used is the
appropriate way to do this.

Maybe a way is to make each entry have more than one row -- so the
example would appear below the other three fields in its own row, and
would be able to use the whole width of the table.

I had a talk with Paul about possible simplification of designed operators. Last message from Paul was - he is working on new version.

Regards

Pavel

[1] https://twitter.com/alvherre/status/1205563468595781633
[2] https://tdg.docbook.org/tdg/5.1/literallayout.html

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Alvaro Herrera

Date:

20 December 2019, 19:13:06

On 2019-Dec-20, Alvaro Herrera wrote:

> I am not convinced that adding TYPTYPE_MULTIRANGE is really necessary.
> Why can't we just treat those types as TYPTYPE_RANGE and distinguish
> them using TYPCATEGORY_MULTIRANGE?  That's what we do for arrays.  I'll
> try to do that next.

I think this can be simplified if we make the the multirange's
pg_type.typelem carry the base range's OID (the link in the other
direction already appears as pg_range.mltrngtypid, though I'd recommend
renaming that to pg_range.rngmultitypid to maintain the "rng" prefix
convention).  Then we can distinguish a multirange from a plain range
easily, both of which have typtype as TYPTYPE_RANGE, because typelem !=
0 in a multi.  That knowledge can be encapsulated easily in
type_is_multirange and pg_dump's getTypes.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Paul A Jungwirth

Date:

20 December 2019, 19:20:28

On Fri, Dec 20, 2019 at 10:19 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I had a talk with Paul about possible simplification of designed operators. Last message from Paul was - he is
workingon new version.
 

Thanks Alvaro & Pavel for helping move this forward. I've added the
casts but they aren't used automatically for things like
`range_union(r, mr)` or `mr + r`, even though they are implicit.
That's because the casts are for concrete types (e.g. int4range ->
int4multirange) but the functions & operators are for polymorphic
types (anymultirange + anymultirange). So I'd like to get some
feedback about the best way to proceed.

Is it permitted to add casts with polymorphic inputs & outputs? Is
that something that we would actually want to do? I'd probably need
both the polymorphic and concrete casts so that you could still say
`int4range(1,2)::int4multirange`.

Should I change the coerce code to look for casts among concrete types
when the function has polymorphic types? I'm pretty scared to do
something like that though, both because of the complexity and lest I
cause unintended effects.

Should I just give up on implicit casts and require you to specify
one? That makes it a little more annoying to mix range & multirange
types, but personally I'm okay with that. This is my preferred
approach.

I have some time over the holidays to work on the other changes Alvaro
has suggested.

Thanks,
Paul

Re: range_agg

From

Alvaro Herrera

Date:

20 December 2019, 19:29:01

On 2019-Dec-20, Paul A Jungwirth wrote:

> Is it permitted to add casts with polymorphic inputs & outputs? Is
> that something that we would actually want to do? I'd probably need
> both the polymorphic and concrete casts so that you could still say
> `int4range(1,2)::int4multirange`.

I'm embarrased to admit that I don't grok the type system well enough
(yet) to answer this question.

> Should I change the coerce code to look for casts among concrete types
> when the function has polymorphic types? I'm pretty scared to do
> something like that though, both because of the complexity and lest I
> cause unintended effects.

Yeah, I suggest to stay away from that.  I think this multirange thing
is groundbreaking enough that we don't need to cause additional
potential breakage.

> Should I just give up on implicit casts and require you to specify
> one? That makes it a little more annoying to mix range & multirange
> types, but personally I'm okay with that. This is my preferred
> approach.

+1

> I have some time over the holidays to work on the other changes Alvaro
> has suggested.

I hope not to have made things worse by posting a rebase.  Anyway,
that's the reason I posted my other changes separately.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Tom Lane

Date:

20 December 2019, 21:15:24

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Dec-20, Paul A Jungwirth wrote:
>> Is it permitted to add casts with polymorphic inputs & outputs? Is
>> that something that we would actually want to do? I'd probably need
>> both the polymorphic and concrete casts so that you could still say
>> `int4range(1,2)::int4multirange`.

> I'm embarrased to admit that I don't grok the type system well enough
> (yet) to answer this question.

I would say no; if you want behavior like that you'd have to add code for
it into the coercion machinery, much like the casts around, say, types
record and record[] versus named composites and arrays of same.  Expecting
the generic cast machinery to do the right thing would be foolhardy.

In any case, even if it did do the right thing, you'd still need
some additional polymorphic type to express the behavior you wanted,
no?  So it's not clear there'd be any net savings of effort.

            regards, tom lane

Re: range_agg

From

Paul A Jungwirth

Date:

04 January 2020, 05:29:33

On Fri, Dec 20, 2019 at 11:29 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> > Should I just give up on implicit casts and require you to specify
> > one? That makes it a little more annoying to mix range & multirange
> > types, but personally I'm okay with that. This is my preferred
> > approach.
>
> +1

Here is a patch adding the casts, rebasing on the latest master, and
incorporating Alvaro's changes. Per his earlier suggestion I combined
it all into one patch file, which also makes it easier to apply
rebases & updates.

My work on adding casts also removes the @+ / @- / @* operators and
adds + / - / * operators where both parameters are multiranges. I
retained other operators with mixed range/multirange parameters, both
because there are already range operators with mixed range/scalar
parameters (e.g. <@), and because it seemed like the objection to @+ /
@- / @* was not mixed parameters per se, but rather their
unguessability. Since the other operators are the same as the existing
range operators, they don't share that problem.

This still leaves the question of how best to format the docs for
these operators. I like being able to combine all the <@ variations
(e.g.) into one table row, but if that is too ugly I could give them
separate rows instead. Giving them all their own row consumes a lot of
vertical space though, and to me that makes the docs more tedious to
read & browse, so it's harder to grasp all the available range-related
operations at a glance.

I'm skeptical of changing pg_type.typtype from 'm' to 'r'. A
multirange isn't a range, so why should we give it the same type? Also
won't this break any queries that are using that column to find range
types? What is the motivation to use the same typtype for both ranges
and multiranges? (There is plenty I don't understand here, e.g. why we
have both typtype and typcategory, so maybe there is a good reason I'm
missing.)

I experimented with setting pg_type.typelem to the multirange's range
type, but it seemed to break a lot of things, and reading the code I
saw some places that treat a non-zero typelem as synonymous with being
an array. So I'm reluctant to make this change also, especially when
it is just as easy to query pg_range to get a multirange's range type.
Also range types themselves don't set typelem to their base type, and
it seems like we'd want to treat ranges and multiranges the same way
here.

Alvaro also suggested renaming pg_range.mltrngtypid to
pg_range.rngmultitypid, so it shares the same "rng" prefix as the
other columns in this table. Having a different prefix does stand out.
I've included that change in this patch too.

Yours,
Paul

Attachment

v8-multirange.patch

Re: range_agg

From

Pavel Stehule

Date:

10 January 2020, 09:38:21

so 4. 1. 2020 v 6:29 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Fri, Dec 20, 2019 at 11:29 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> > Should I just give up on implicit casts and require you to specify
> > one? That makes it a little more annoying to mix range & multirange
> > types, but personally I'm okay with that. This is my preferred
> > approach.
>
> +1

Here is a patch adding the casts, rebasing on the latest master, and
incorporating Alvaro's changes. Per his earlier suggestion I combined
it all into one patch file, which also makes it easier to apply
rebases & updates.

This patch was applied cleanly and all tests passed

My work on adding casts also removes the @+ / @- / @* operators and
adds + / - / * operators where both parameters are multiranges. I
retained other operators with mixed range/multirange parameters, both
because there are already range operators with mixed range/scalar
parameters (e.g. <@), and because it seemed like the objection to @+ /
@- / @* was not mixed parameters per se, but rather their
unguessability. Since the other operators are the same as the existing
range operators, they don't share that problem.

looks well

This still leaves the question of how best to format the docs for
these operators. I like being able to combine all the <@ variations
(e.g.) into one table row, but if that is too ugly I could give them
separate rows instead. Giving them all their own row consumes a lot of
vertical space though, and to me that makes the docs more tedious to
read & browse, so it's harder to grasp all the available range-related
operations at a glance.

I have similar opinion - maybe is better do documentation for range and multirange separately. Sometimes there are still removed operators @+

I'm skeptical of changing pg_type.typtype from 'm' to 'r'. A
multirange isn't a range, so why should we give it the same type? Also
won't this break any queries that are using that column to find range
types? What is the motivation to use the same typtype for both ranges
and multiranges? (There is plenty I don't understand here, e.g. why we
have both typtype and typcategory, so maybe there is a good reason I'm
missing.)

If you can share TYPTYPE_RANGE in code for multiranges, then it should be 'r'. If not, then it needs own value.

I experimented with setting pg_type.typelem to the multirange's range
type, but it seemed to break a lot of things, and reading the code I
saw some places that treat a non-zero typelem as synonymous with being
an array. So I'm reluctant to make this change also, especially when
it is just as easy to query pg_range to get a multirange's range type.

ok, it is unhappy, but it is true. This note should be somewhere in code, please

Also range types themselves don't set typelem to their base type, and
it seems like we'd want to treat ranges and multiranges the same way
here.

Alvaro also suggested renaming pg_range.mltrngtypid to
pg_range.rngmultitypid, so it shares the same "rng" prefix as the
other columns in this table. Having a different prefix does stand out.
I've included that change in this patch too.

Personally I have not any comments to implemented functionality.

Yours,
Paul

Re: range_agg

From

Paul A Jungwirth

Date:

17 January 2020, 20:07:54

On Fri, Jan 10, 2020 at 1:38 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> This still leaves the question of how best to format the docs for
>> these operators. I like being able to combine all the <@ variations
>> (e.g.) into one table row, but if that is too ugly I could give them
>> separate rows instead. Giving them all their own row consumes a lot of
>> vertical space though, and to me that makes the docs more tedious to
>> read & browse, so it's harder to grasp all the available range-related
>> operations at a glance.
>
>
> I have similar opinion - maybe is better do documentation for range and multirange separately. Sometimes there are
stillremoved operators @+
 

I like keeping the range/multirange operators together since they are
so similar for both types, but if others disagree I'd be grateful for
more feedback.

You're right that I left in a few references to the old @+ style
operators in the examples; I've fixed those.

> If you can share TYPTYPE_RANGE in code for multiranges, then it should be 'r'. If not, then it needs own value.

Okay. I think a new 'm' value is warranted because they are not interchangeable.

>> I experimented with setting pg_type.typelem to the multirange's range
>> type, but it seemed to break a lot of things, and reading the code I
>> saw some places that treat a non-zero typelem as synonymous with being
>> an array. So I'm reluctant to make this change also, especially when
>> it is just as easy to query pg_range to get a multirange's range type.
>
>
> ok, it is unhappy, but it is true. This note should be somewhere in code, please

I've added a comment about this. I put it at the top of DefineRange
but let me know if that's the wrong place.

The attached file is also rebased on currrent master.

Thanks!
Paul

Attachment

v9-multirange.patch

Re: range_agg

From

Pavel Stehule

Date:

18 January 2020, 15:19:24

pá 17. 1. 2020 v 21:08 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Fri, Jan 10, 2020 at 1:38 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> This still leaves the question of how best to format the docs for
>> these operators. I like being able to combine all the <@ variations
>> (e.g.) into one table row, but if that is too ugly I could give them
>> separate rows instead. Giving them all their own row consumes a lot of
>> vertical space though, and to me that makes the docs more tedious to
>> read & browse, so it's harder to grasp all the available range-related
>> operations at a glance.
>
>
> I have similar opinion - maybe is better do documentation for range and multirange separately. Sometimes there are still removed operators @+

I like keeping the range/multirange operators together since they are
so similar for both types, but if others disagree I'd be grateful for
more feedback.

You're right that I left in a few references to the old @+ style
operators in the examples; I've fixed those.

> If you can share TYPTYPE_RANGE in code for multiranges, then it should be 'r'. If not, then it needs own value.

Okay. I think a new 'm' value is warranted because they are not interchangeable.

>> I experimented with setting pg_type.typelem to the multirange's range
>> type, but it seemed to break a lot of things, and reading the code I
>> saw some places that treat a non-zero typelem as synonymous with being
>> an array. So I'm reluctant to make this change also, especially when
>> it is just as easy to query pg_range to get a multirange's range type.
>
>
> ok, it is unhappy, but it is true. This note should be somewhere in code, please

I've added a comment about this. I put it at the top of DefineRange
but let me know if that's the wrong place.

The attached file is also rebased on currrent master.

Can be nice to have a polymorphic function

multirange(anymultirange, anyrange) returns anymultirange. This functions should to do multirange from $2 to type $1

It can enhance to using polymorphic types and simplify casting.

Usage

CREATE OR REPLACE FUNCTION diff(anymultirange, anyrange)

RETURNS anymultirange AS $$

SELECT $1 - multirange($1, $2)

$$ LANGUAGE sql;

when I tried to write this function in plpgsql I got

create or replace function multirange(anymultirange, anyrange) returns anymultirange as $$
begin
execute format('select $2::%I', pg_typeof($1)) into $1;
return $1;
end;
$$ language plpgsql immutable strict;

ERROR: unrecognized typtype: 109
CONTEXT: compilation of PL/pgSQL function "multirange" near line 1

So probably some support in PL is missing

But all others looks very well

Regards

Pavel

Thanks!
Paul

Re: range_agg

From

Paul A Jungwirth

Date:

18 January 2020, 16:07:08

On Sat, Jan 18, 2020 at 7:20 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Can be nice to have a polymorphic function
>
> multirange(anymultirange, anyrange) returns anymultirange. This functions should to do multirange from $2 to type $1
>
> It can enhance to using polymorphic types and simplify casting.

Thanks for taking another look! I actually have that already but it is
named anymultirange:

regression=# select anymultirange(int4range(1,2));
 anymultirange
---------------
 {[1,2)}
(1 row)

Will that work for you?

I think I only wrote that to satisfy some requirement of having an
anymultirange type, but I agree it could be useful. (I even used it in
the regress tests.) Maybe it's worth documenting too?

> when I tried to write this function in plpgsql I got
>
> create or replace function multirange(anymultirange, anyrange) returns anymultirange as $$
> begin
>   execute format('select $2::%I', pg_typeof($1)) into $1;
>   return $1;
> end;
> $$ language plpgsql immutable strict;
>
> ERROR:  unrecognized typtype: 109
> CONTEXT:  compilation of PL/pgSQL function "multirange" near line 1

Hmm, I'll add a test for that and see if I can find the problem.

Thanks!
Paul

Re: range_agg

From

Pavel Stehule

Date:

18 January 2020, 16:35:54

so 18. 1. 2020 v 17:07 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Sat, Jan 18, 2020 at 7:20 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Can be nice to have a polymorphic function
>
> multirange(anymultirange, anyrange) returns anymultirange. This functions should to do multirange from $2 to type $1
>
> It can enhance to using polymorphic types and simplify casting.

Thanks for taking another look! I actually have that already but it is
named anymultirange:

regression=# select anymultirange(int4range(1,2));
anymultirange
---------------
{[1,2)}
(1 row)

Will that work for you?

It's better than I though

I think I only wrote that to satisfy some requirement of having an
anymultirange type, but I agree it could be useful. (I even used it in
the regress tests.) Maybe it's worth documenting too?

yes

> when I tried to write this function in plpgsql I got
>
> create or replace function multirange(anymultirange, anyrange) returns anymultirange as $$
> begin
> execute format('select $2::%I', pg_typeof($1)) into $1;
> return $1;
> end;
> $$ language plpgsql immutable strict;
>
> ERROR: unrecognized typtype: 109
> CONTEXT: compilation of PL/pgSQL function "multirange" near line 1

Hmm, I'll add a test for that and see if I can find the problem.

Thanks!
Paul

Re: range_agg

From

Pavel Stehule

Date:

19 January 2020, 08:09:56

so 18. 1. 2020 v 17:35 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

so 18. 1. 2020 v 17:07 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:
On Sat, Jan 18, 2020 at 7:20 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Can be nice to have a polymorphic function
>
> multirange(anymultirange, anyrange) returns anymultirange. This functions should to do multirange from $2 to type $1
>
> It can enhance to using polymorphic types and simplify casting.

Thanks for taking another look! I actually have that already but it is
named anymultirange:

regression=# select anymultirange(int4range(1,2));
anymultirange
---------------
{[1,2)}
(1 row)

Will that work for you?

It's better than I though

I think I only wrote that to satisfy some requirement of having an
anymultirange type, but I agree it could be useful. (I even used it in
the regress tests.) Maybe it's worth documenting too?

Now, I think so name "anymultirange" is not good. Maybe better name is just "multirange"

yes

> when I tried to write this function in plpgsql I got
>
> create or replace function multirange(anymultirange, anyrange) returns anymultirange as $$
> begin
> execute format('select $2::%I', pg_typeof($1)) into $1;
> return $1;
> end;
> $$ language plpgsql immutable strict;
>
> ERROR: unrecognized typtype: 109
> CONTEXT: compilation of PL/pgSQL function "multirange" near line 1

Hmm, I'll add a test for that and see if I can find the problem.

ok

Thanks!
Paul

Re: range_agg

From

Paul A Jungwirth

Date:

19 January 2020, 23:34:25

On Sun, Jan 19, 2020 at 12:10 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Now, I think so name "anymultirange" is not good. Maybe better name is just "multirange"

Are you sure? This function exists to be a cast to an anymultirange,
and I thought the convention was to name cast functions after their
destination type. I can change it, but in my opinion anymultirange
follows the Postgres conventions better. But just let me know and I'll
do "multirange" instead!

Yours,
Paul

Re: range_agg

From

Tom Lane

Date:

20 January 2020, 00:38:13

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
> On Sun, Jan 19, 2020 at 12:10 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Now, I think so name "anymultirange" is not good. Maybe better name is just "multirange"

> Are you sure? This function exists to be a cast to an anymultirange,
> and I thought the convention was to name cast functions after their
> destination type.

True for casts involving concrete types, mainly because we'd like
the identity "value::typename == typename(value)" to hold without
too much worry about whether the latter is a plain function call
or a special case.  Not sure whether it makes as much sense for
polymorphics, since casting to a polymorphic type is pretty silly:
we do seem to allow you to do that, but it's a no-op.

I'm a little troubled by the notion that what you're talking about
here is not a no-op (if it were, you wouldn't need a function).
That seems like there's something fundamentally not quite right
either with the design or with how you're thinking about it.

As a comparison point, we sometimes describe subscripting as
being a polymorphic operation like

    subscript(anyarray, integer) returns anyelement

It would be completely unhelpful to call that anyelement().
I feel like you might be making a similar mistake here.

Alternatively, consider this: a cast from some concrete multirange type
to anymultirange is a no-op, while any other sort of cast probably ought
to be casting to some particular concrete multirange type.  That would
line up with the existing operations for plain ranges.

            regards, tom lane

Re: range_agg

From

Pavel Stehule

Date:

20 January 2020, 04:49:38

po 20. 1. 2020 v 1:38 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
> On Sun, Jan 19, 2020 at 12:10 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Now, I think so name "anymultirange" is not good. Maybe better name is just "multirange"

> Are you sure? This function exists to be a cast to an anymultirange,
> and I thought the convention was to name cast functions after their
> destination type.

True for casts involving concrete types, mainly because we'd like
the identity "value::typename == typename(value)" to hold without
too much worry about whether the latter is a plain function call
or a special case. Not sure whether it makes as much sense for
polymorphics, since casting to a polymorphic type is pretty silly:
we do seem to allow you to do that, but it's a no-op.

I'm a little troubled by the notion that what you're talking about
here is not a no-op (if it were, you wouldn't need a function).
That seems like there's something fundamentally not quite right
either with the design or with how you're thinking about it.

I thinking about completeness of operations

I can to write

CREATE OR REPLACE FUNCTION fx(anyarray, anyelement)

RETURNS anyarray AS $$

SELECT $1 || ARRAY[$2]

$$ LANGUAGE sql;

I need to some functionality for moving a value to different category (it is more generic than casting to specific type (that can hold category)

CREATE OR REPLACE FUNCTION fx(anymultirange, anyrange)

RETURNS anyrage AS $$

SELECT $1 + multirange($1)

$$ LANGUAGE sql;

is just a analogy.

Regards

Pavel

As a comparison point, we sometimes describe subscripting as
being a polymorphic operation like

subscript(anyarray, integer) returns anyelement

It would be completely unhelpful to call that anyelement().
I feel like you might be making a similar mistake here.

Alternatively, consider this: a cast from some concrete multirange type
to anymultirange is a no-op, while any other sort of cast probably ought
to be casting to some particular concrete multirange type. That would
line up with the existing operations for plain ranges.

regards, tom lane

Re: range_agg

From

Paul A Jungwirth

Date:

20 January 2020, 05:57:57

On Sun, Jan 19, 2020 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> True for casts involving concrete types, mainly because we'd like
> the identity "value::typename == typename(value)" to hold without
> too much worry about whether the latter is a plain function call
> or a special case.  Not sure whether it makes as much sense for
> polymorphics, since casting to a polymorphic type is pretty silly:
> we do seem to allow you to do that, but it's a no-op.
>
> ...
>
> Alternatively, consider this: a cast from some concrete multirange type
> to anymultirange is a no-op, while any other sort of cast probably ought
> to be casting to some particular concrete multirange type.  That would
> line up with the existing operations for plain ranges.

I agree you wouldn't actually cast by saying x::anymultirange, and the
casts we define are already concrete, so instead you'd say
x::int4multirange. But I think having a polymorphic function to
convert from an anyrange to an anymultirange is useful so you can
write generic functions. I can see how calling it "anymultirange" may
be preferring the implementor perspective over the user perspective
though, and how simply "multirange" would be more empathetic. I don't
mind taking that approach.

Yours,
Paul

Re: range_agg

From

Paul A Jungwirth

Date:

21 January 2020, 23:54:52

On Sun, Jan 19, 2020 at 9:57 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> On Sun, Jan 19, 2020 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > True for casts involving concrete types, mainly because we'd like
> > the identity "value::typename == typename(value)" to hold without
> > too much worry about whether the latter is a plain function call
> > or a special case.  Not sure whether it makes as much sense for
> > polymorphics, since casting to a polymorphic type is pretty silly:
> > we do seem to allow you to do that, but it's a no-op.
> >
> > ...
> >
> > Alternatively, consider this: a cast from some concrete multirange type
> > to anymultirange is a no-op, while any other sort of cast probably ought
> > to be casting to some particular concrete multirange type.  That would
> > line up with the existing operations for plain ranges.
>
> I agree you wouldn't actually cast by saying x::anymultirange, and the
> casts we define are already concrete, so instead you'd say
> x::int4multirange. But I think having a polymorphic function to
> convert from an anyrange to an anymultirange is useful so you can
> write generic functions. I can see how calling it "anymultirange" may
> be preferring the implementor perspective over the user perspective
> though, and how simply "multirange" would be more empathetic. I don't
> mind taking that approach.

Here is a patch with anymultirange(anyrange) renamed to
multirange(anyrange). I also rebased on the latest master, added
documentation about the multirange(anyrange) function, and slightly
adjusted the formatting of the range functions table.

Thanks,
Paul

Attachment

v10-multirange.patch

Re: range_agg

From

Pavel Stehule

Date:

24 January 2020, 07:08:56

st 22. 1. 2020 v 0:55 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Sun, Jan 19, 2020 at 9:57 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> On Sun, Jan 19, 2020 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > True for casts involving concrete types, mainly because we'd like
> > the identity "value::typename == typename(value)" to hold without
> > too much worry about whether the latter is a plain function call
> > or a special case. Not sure whether it makes as much sense for
> > polymorphics, since casting to a polymorphic type is pretty silly:
> > we do seem to allow you to do that, but it's a no-op.
> >
> > ...
> >
> > Alternatively, consider this: a cast from some concrete multirange type
> > to anymultirange is a no-op, while any other sort of cast probably ought
> > to be casting to some particular concrete multirange type. That would
> > line up with the existing operations for plain ranges.
>
> I agree you wouldn't actually cast by saying x::anymultirange, and the
> casts we define are already concrete, so instead you'd say
> x::int4multirange. But I think having a polymorphic function to
> convert from an anyrange to an anymultirange is useful so you can
> write generic functions. I can see how calling it "anymultirange" may
> be preferring the implementor perspective over the user perspective
> though, and how simply "multirange" would be more empathetic. I don't
> mind taking that approach.

Here is a patch with anymultirange(anyrange) renamed to
multirange(anyrange). I also rebased on the latest master, added
documentation about the multirange(anyrange) function, and slightly
adjusted the formatting of the range functions table.

I think so this patch is ready for commiter.

All tests passed, the doc is good enough (the chapter name "Range functions and Operators" should be renamed to "Range/multirange functions and Operators"

The code formatting and comments looks well

Thank you for your work

Regards

Pavel

Thanks,
Paul

Re: range_agg

From

Alvaro Herrera

Date:

04 March 2020, 02:28:49

I started to review this again.  I'm down to figuring out whether the
typecache changes make sense; in doing so I realized that the syscaches
weren't perfectly defined (I think leftovers from when there was a
pg_multirange catalog, earlier in development), so I fixed that.

0001 is mostly Paul's v10 patch, rebased to current master; no
conflicts, I had to make a couple of small other adjustments to catch up
with current times.

The other patches are fairly obvious; changes in 0004 are described in
its commit message.

I'll continue to try to think through the typecache aspects of this
patch.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

so 7. 3. 2020 v 22:20 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

I wrote:
> Actually ... have you given any thought to just deciding that ranges and
> multiranges are the same type? That is, any range can now potentially
> contain multiple segments? That would eliminate a whole lot of the
> tedious infrastructure hacking involved in this patch, and let you focus
> on the actually-useful functionality.

Also, this would allow us to remove at least one ugly misfeature:

regression=# select '[1,2]'::int4range + '[3,10)'::int4range;
?column?
----------
[1,10)
(1 row)

regression=# select '[1,2]'::int4range + '[4,10)'::int4range;
ERROR: result of range union would not be contiguous

If the result of range_union can be a multirange as easily as not,
we would no longer have to throw an error here.

I think this behave is correct. Sometimes you should to get only one range - and this check is a protection against not continuous range.

if you expect multirange, then do

select '[1,2]'::int4range::multirange + '[4,10)'::int4range;

Regards

Pavel

regards, tom lane

Re: range_agg

From

David Fetter

Date:

07 March 2020, 22:13:13

On Sat, Mar 07, 2020 at 04:06:32PM -0500, Tom Lane wrote:
> I wrote:
> > However, what I'm on about right at the moment is that I don't think
> > there should be any delta in that test at all.  As far as I can see,
> > the design idea here is that multiranges will be automatically created
> > over range types, and the user doesn't need to do that.  To my mind,
> > that means that they're an implementation detail and should not show up as
> > separately-owned objects, any more than an autogenerated array type does.
> 
> Actually ... have you given any thought to just deciding that ranges and
> multiranges are the same type?  That is, any range can now potentially
> contain multiple segments?  That would eliminate a whole lot of the
> tedious infrastructure hacking involved in this patch, and let you focus
> on the actually-useful functionality.

If we're changing range types rather than constructing a new
multi-range layer atop them, I think it would be helpful to have some
way to figure out quickly whether this new range type was contiguous.
One way to do that would be to include a "range cardinality" in the
data structure which be the number of left ends in it.

One of the things I'd pictured doing with multiranges was along the
lines of a "full coverage" constraint like "During a shift, there can
be no interval that's not covered," which would correspond to a "range
cardinality" of 1.

I confess I'm getting a little twitchy about the idea of eliding the
cases of "one" and "many", though.

> Assuming that that's ok, it seems like we could consider the traditional
> range functions like lower() and upper() to report on the first or last
> range bound in a multirange --- essentially, they ignore any "holes"
> that exist inside the range.  And the new functions for multiranges
> act much like array slicing, in that they give you back pieces of a range
> that aren't actually of a distinct type.

So new functions along the lines of lowers(), uppers(), opennesses(),
etc.? I guess this could be extended as needs emerge.

There's another use case not yet covered here that could make this
even more complex, we should probably plan for it: multi-ranges with
weights.

For example,

SELECT weighted_range_union(r)
FROM (VALUES('[0,1)'::float8range), ('[0,3)'), '('[2,5)')) AS t(r)

would yield something along the lines of:

(([0,1),1), ([1,3),2), ([3,5),1))

and wedging that into the range type seems messy. Each range would
then have a cardinality, and each range within would have a weight,
all of which would be an increasingly heavy burden on the common case
where there's just a single range.

Enhancing a separate multirange type to have weights seems like a
cleaner path forward.

Given that, I'm -1 on mushing multi-ranges into a special case of
ranges, or /vice versa/.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: range_agg

From

Tom Lane

Date:

07 March 2020, 23:45:44

David Fetter <david@fetter.org> writes:
> There's another use case not yet covered here that could make this
> even more complex, we should probably plan for it: multi-ranges with
> weights.

I'm inclined to reject that as completely out of scope.  The core
argument for unifying multiranges with ranges, if you ask me, is
to make the data type closed under union.  Weights are from some
other universe.

            regards, tom lane

Re: range_agg

From

David Fetter

Date:

08 March 2020, 02:45:05

On Sat, Mar 07, 2020 at 06:45:44PM -0500, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > There's another use case not yet covered here that could make this
> > even more complex, we should probably plan for it: multi-ranges
> > with weights.
> 
> I'm inclined to reject that as completely out of scope.  The core
> argument for unifying multiranges with ranges, if you ask me, is to
> make the data type closed under union.  Weights are from some other
> universe.

I don't think they are. SQL databases are super useful because they do
bags in addition to sets, so set union isn't the only, or maybe even
the most important, operation over which ranges ought to be closed.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: range_agg

From

Isaac Morland

Date:

08 March 2020, 05:27:08

On Sat, 7 Mar 2020 at 16:27, Pavel Stehule <pavel.stehule@gmail.com> wrote:

so 7. 3. 2020 v 22:20 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
I wrote:
> Actually ... have you given any thought to just deciding that ranges and
> multiranges are the same type? That is, any range can now potentially
> contain multiple segments? That would eliminate a whole lot of the
> tedious infrastructure hacking involved in this patch, and let you focus
> on the actually-useful functionality.

I think this behave is correct. Sometimes you should to get only one range - and this check is a protection against not continuous range.

if you expect multirange, then do

select '[1,2]'::int4range::multirange + '[4,10)'::int4range;

Definitely agreed that range and multirange (or whatever it's called) should be different. In the work I do I have a number of uses for ranges, but not (yet) for multiranges. I want to be able to declare a column as range and be sure that it is just a single range, and then call lower() and upper() on it and be sure to get just one value in each case; and if I accidentally try to take the union of ranges where the union isn’t another range, I want to get an error rather than calculate some weird (in my context) multirange.

On a related note, I was thinking about this and I don’t think I like range_agg as a name at all. I know we have array_agg and string_agg but surely shouldn’t this be called union_agg, and shouldn’t there also be an intersect_agg? I mean, taking the union isn’t the only possible aggregate on ranges or multiranges.

Re: range_agg

From

Tom Lane

Date:

08 March 2020, 16:56:02

Isaac Morland <isaac.morland@gmail.com> writes:
>> so 7. 3. 2020 v 22:20 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>>> Actually ... have you given any thought to just deciding that ranges and
>>> multiranges are the same type?  That is, any range can now potentially
>>> contain multiple segments?

> Definitely agreed that range and multirange (or whatever it's called)
> should be different. In the work I do I have a number of uses for ranges,
> but not (yet) for multiranges. I want to be able to declare a column as
> range and be sure that it is just a single range, and then call lower() and
> upper() on it and be sure to get just one value in each case; and if I
> accidentally try to take the union of ranges where the union isn’t another
> range, I want to get an error rather than calculate some weird (in my
> context) multirange.

I do not find that argument convincing at all.  Surely you could put
that constraint on your column using "CHECK (numranges(VALUE) <= 1)"
or some such notation.

Also, you're attacking a straw man with respect to lower() and upper();
I did not suggest changing them to return arrays, but rather interpreting
them as returning the lowest or highest endpoint, which I think would be
transparent in most cases.  (There would obviously need to be some other
functions that could dissect a multirange more completely.)

The real problem with the proposal as it stands, I think, is exactly
that range union has failure conditions and you have to use some other
operator if you want to get a successful result always.  That's an
enormously ugly kluge, and if we'd done it right the first time nobody
would have objected.

Bottom line is that I don't think that we should add a pile of new moving
parts to the type system just because people are afraid of change;
arguably, that's *more* change (and more risk of bugs), not less.
Unifying the types would, for example, get rid of the pesky question
of what promoting a range to multirange should look like exactly,
because it'd be a no-op.

            regards, tom lane

Re: range_agg

From

"David G. Johnston"

Date:

08 March 2020, 17:28:24

On Fri, Dec 20, 2019 at 10:43 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I took the liberty of rebasing this series on top of recent branch
master.

In the tests there is:

+select '{[a,a],[b,b]}'::textmultirange;

+ textmultirange
+----------------
+ {[a,a],[b,b]}
+(1 row)
+
+-- without canonicalization, we can't join these:
+select '{[a,a], [b,b]}'::textmultirange;
+ textmultirange
+----------------
+ {[a,a],[b,b]}
+(1 row)
+

Aside from the comment they are identical so I'm confused as to why both tests exist - though I suspect it has to do with the fact that the expected result would be {[a,b]} since text is discrete.

Also, the current patch set seems a bit undecided on whether it wants to be truly a multi-range or a range that can report non-contiguous components. Specifically,

+select '{[a,d), [b,f]}'::textmultirange;
+ textmultirange
+----------------
+ {[a,f]}
+(1 row)

There is a an argument that a multi-range should output {[a,d),[b,f]}. IMO its arguable that a multi-range container should not try and reduce the number of contained ranges at all. If that is indeed a desire, which seems like it is, that feature alone goes a long way to support wanting to just merge the desired functionality into the existing range type, where the final output has the minimum number of contiguous ranges possible, rather than having a separate multirange type.

David J.

Re: range_agg

From

Robert Haas

Date:

09 March 2020, 17:53:18

On Sat, Mar 7, 2020 at 4:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's possible that this is a bad idea.  It bears a lot of similarity,
> I guess, to the way that Postgres doesn't consider arrays of different
> dimensionality to be distinct types.  That has some advantages but it
> surely also has downsides.  I think on the whole the advantages win,
> and I feel like that might also be the case here.

Personally, I'm pretty unhappy with the fact that the array system
conflates arrays with different numbers of dimensions. Like, you end
up having to write array_upper(X, 1) instead of just array_upper(X),
and then you're still left wondering whether whatever you wrote is
going to blow up if somebody sneaks a multidimensional array in there,
or for that matter, an array with a non-standard lower bound. There's
lots of little things like that, where the decision to decorate the
array type with these extra frammishes makes it harder to use for
everybody even though most people don't use (or even want) those
features.

So count me as +1 for keeping range and multirange separate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: range_agg

From

Alvaro Herrera

Date:

09 March 2020, 19:59:03

I wonder what's the point of multirange arrays.  Is there a reason we
create those?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Tom Lane

Date:

09 March 2020, 21:37:34

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I wonder what's the point of multirange arrays.  Is there a reason we
> create those?

That's what we thought about arrays of composites to start with,
too.

            regards, tom lane

Re: range_agg

From

Jeff Davis

Date:

10 March 2020, 01:34:04

On Sat, 2020-03-07 at 16:06 -0500, Tom Lane wrote:
> Actually ... have you given any thought to just deciding that ranges
> and
> multiranges are the same type? 

It has come up in a number of conversations, but I'm not sure if it was
discussed on this list.

> I think on the whole the advantages win,
> and I feel like that might also be the case here.

Some things to think about:

1. Ranges are common -- at least implicitly -- in a lot of
applications/systems. It's pretty easy to represent extrernal data as
ranges in postgres, and also to represent postgres ranges in external
systems. But I can see multiranges causing friction around a lot of
common tasks, like displaying in a UI. If you only expect ranges, you
can add a CHECK constraint, so this is annoying but not necessarily a
deal-breaker.

2. There are existing client libraries[1] that support range types and
transform them to types within the host language. Obviously, those
would need to be updated to expect multiple ranges.

3. It seems like we would want some kind of base "range" type. When you
try to break a multirange down into constituent ranges, what type would
those pieces be? (Aside: how do you get the constituent ranges?)

I'm thinking more about casting to see if there's a possible compromise
there.

Regards,
    Jeff Davis

[1] 
https://sfackler.github.io/rust-postgres-range/doc/v0.8.2/postgres_range/

Re: range_agg

From

David Fetter

Date:

10 March 2020, 16:08:04

On Mon, Mar 09, 2020 at 06:34:04PM -0700, Jeff Davis wrote:
> On Sat, 2020-03-07 at 16:06 -0500, Tom Lane wrote:
> > Actually ... have you given any thought to just deciding that ranges
> > and
> > multiranges are the same type? 
> 
> It has come up in a number of conversations, but I'm not sure if it was
> discussed on this list.
> 
> > I think on the whole the advantages win,
> > and I feel like that might also be the case here.
> 
> Some things to think about:
> 
> 1. Ranges are common -- at least implicitly -- in a lot of
> applications/systems. It's pretty easy to represent extrernal data as
> ranges in postgres, and also to represent postgres ranges in external
> systems. But I can see multiranges causing friction around a lot of
> common tasks, like displaying in a UI. If you only expect ranges, you
> can add a CHECK constraint, so this is annoying but not necessarily a
> deal-breaker.

It could become well and truly burdensome in a UI or an API. The
difference between one, as ranges are now, and many, as multi-ranges
would be if we shoehorn them into the range type, are pretty annoying
to deal with.

> 2. There are existing client libraries[1] that support range types and
> transform them to types within the host language. Obviously, those
> would need to be updated to expect multiple ranges.

The type systems that would support such types might get unhappy with
us if we started messing with some of the properties like
contiguousness.

> 3. It seems like we would want some kind of base "range" type. When you
> try to break a multirange down into constituent ranges, what type would
> those pieces be? (Aside: how do you get the constituent ranges?)
> 
> I'm thinking more about casting to see if there's a possible compromise
> there.

I think the right compromise is to recognize that the closure of a set
(ranges) over an operation (set union) may well be a different set
(multi-ranges). Other operations have already been proposed, complete
with concrete use cases that could really make PostgreSQL stand out.

That we don't have an obvious choice of "most correct" operation over
which to close ranges makes it even bigger a potential foot-gun
when we choose one arbitrarily and declare it to be the canonical one.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: range_agg

From

Paul A Jungwirth

Date:

11 March 2020, 23:04:40

Thanks everyone for offering some thoughts on this!

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> have you given any thought to just deciding that ranges and
> multiranges are the same type?

I can see how it might be nice to have just one type to think about.
Still I think keeping them separate makes sense. Other folks have
brought up several reasons already. Just to chime in:

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Isaac Morland <isaac.morland@gmail.com> writes:
> > Definitely agreed that range and multirange (or whatever it's called)
> > should be different. In the work I do I have a number of uses for ranges,
> > but not (yet) for multiranges. I want to be able to declare a column as
> > range and be sure that it is just a single range, and then call lower() and
> > upper() on it and be sure to get just one value in each case; and if I
> > accidentally try to take the union of ranges where the union isn’t another
> > range, I want to get an error rather than calculate some weird (in my
> > context) multirange.
>
> I do not find that argument convincing at all.  Surely you could put
> that constraint on your column using "CHECK (numranges(VALUE) <= 1)"
> or some such notation.

A check constraint works for columns, but there are other contexts
where you'd like to restrict things to just a contiguous range, e.g.
user-defined functions and intermediate results in queries. Basic
ranges seem a lot simpler to think about, so I can appreciate how
letting any range be a multirange adds a heavy cognitive burden. I
think a lot of people will share Isaac's opinion here.

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Also, this would allow us to remove at least one ugly misfeature:
>
> regression=# select '[1,2]'::int4range + '[3,10)'::int4range;
>  ?column?
>  ----------
>   [1,10)
>   (1 row)
>
>   regression=# select '[1,2]'::int4range + '[4,10)'::int4range;
>   ERROR:  result of range union would not be contiguous

Because of backwards compatibility we can't really change +/-/* not to
raise (right?), so if we joined ranges and multiranges we'd need to
add operators with a different name. I was calling those @+/@-/@*
before, but that was considered too unintuitive and undiscoverable.
Having two types lets us use the nicer operator names.

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> it seems like we could consider the traditional
> range functions like lower() and upper() to report on the first or last
> range bound in a multirange

I tried to keep functions/operators similar, so already lower(mr) =
lower(r) and upper(mr) = upper(r).
I think *conceptually* it's good to make ranges & multiranges as
interchangable as possible, but that doesn't mean they have to be the
same type.

Adding multiranges-as-ranges also raises questions about their string
format. If a multirange is {[1,2), [4,5)} would you only print the
curly braces when there is more than one element?

I don't *think* allowing non-contiguous ranges would break how we use
them in GiST indexes or exclusion constraints, but maybe someone can
think of some problem I can't. It's one place to be wary anyway. At
the very least it would make those things slower I expect.

On a few other issues people have raised recently:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I wonder what's the point of multirange arrays. Is there a reason we
> create those?

We have arrays of everything else, so why not have them for
multiranges? We don't have to identify specific use cases here,
although I can see how you'd want to call array_agg/UNNEST on some
multiranges, e.g. (Actually I really want to add an UNNEST that
*takes* a multirange, but that could be a follow-on commit.) If
nothing else I think omitting arrays of multiranges would be a strange
irregularity in the type system.

David G. Johnston <david.g.johnston@gmail.com> wrote:
> In the tests there is:
>
> +select '{[a,a],[b,b]}'::textmultirange;
> + textmultirange
> +----------------
> + {[a,a],[b,b]}
> +(1 row)
> +
> +-- without canonicalization, we can't join these:
> +select '{[a,a], [b,b]}'::textmultirange;
> + textmultirange
> +----------------
> + {[a,a],[b,b]}
> +(1 row)
> +
>
> Aside from the comment they are identical so I'm confused as to why both tests exist - though I suspect it has to do
withthe fact that the expected result would be {[a,b]} since text is discrete. 

Those tests are for basic string parsing (multirange_in), so one is
testing {A,B} and the other {A, B} (with a space after the comma).
(There are some tests right above those that also have blank spaces,
but they only output a single element in the multirange result.)

David G. Johnston <david.g.johnston@gmail.com> wrote:
> Also, the current patch set seems a bit undecided on whether it wants to be truly a multi-range or a range that can
reportnon-contiguous components.  Specifically, 
>
> +select '{[a,d), [b,f]}'::textmultirange;
> + textmultirange
> +----------------
> + {[a,f]}
> +(1 row)

Without a canonicalization function, we can't know that [a,a] touches
[b,b], but we *can* know that [a,d) touches [b,f). Or even:

regression=# select '{[a,b), [b,b]}'::textmultirange;
 textmultirange
 ----------------
  {[a,b]}
  (1 row)

So I'm joining ranges whenever we know they touch. I think this is
consistent with existing range operators, e.g.:

regression=# select '[a,a]'::textrange -|- '[b,b]';
 ?column?
----------
 f

regression=# select '[a,b)'::textrange -|- '[b,b]';
 ?column?
----------
 t

David G. Johnston <david.g.johnston@gmail.com> wrote:
> There is a an argument that a multi-range should output {[a,d),[b,f]}.  IMO its arguable that a multi-range container
shouldnot try and reduce the number of contained ranges at all. 

Automatically combining touching ranges seems very desirable to me,
and one of the motivations to building a multirange type instead of
just using an array of ranges. Mathematically {[1,2), [2,3)} is
equivalent to {[1,3)}, and merging the touching elements keeps things
easier to read/understand and faster. Ranges themselves have a
canonical form too. Not canonicalizing raises a lot of questions, like
when are two "equivalent" ranges equal? And when you compose
operators/functions, do you keep all the internally-introduced splits,
or somehow preserve only splits that were present in your top-level
inputs? If you really want a list of possibly-touching ranges, I would
use an array for that. Why even have a range_agg (instead of just
array_agg'ing the ranges) if you're not going to merge the inputs?

Isaac Morland <isaac.morland@gmail.com> writes:
> On a related note, I was thinking about this and I don’t think I like
> range_agg as a name at all. I know we have array_agg and string_agg but surely
> shouldn’t this be called union_agg, and shouldn’t there also be an
> intersect_agg? I mean, taking the union isn’t the only possible aggregate on
> ranges or multiranges.

The patch does include a range_intersect_agg already. Since there are
so many set-like things in SQL, I don't think the unqualified
union_agg/intersect_agg are appropriate to give to ranges alone. And
the existing ${type}_agg functions are all "additive": json_agg,
json_object_agg, array_agg, string_agg, xmlagg. So I think range_agg
is the least-surprising name for this behavior. I'm not even the first
person to call it that, as you can see from [1].

David Fetter <david@fetter.org> writes:
> One way to do that would be to include a "range cardinality" in the
> data structure which be the number of left ends in it.

I agree that is probably useful enough to add to this patch. I'll work on it.

David Fetter <david@fetter.org> writes:
> There's another use case not yet covered here that could make this
> even more complex, we should probably plan for it: multi-ranges with
> weights.

Several people have asked me about this. I think it would need to be a
separate type though, e.g. weighted_multirange. Personally I wouldn't
mind working on it eventually, but I don't think it needs to be part
of this initial patch. Possibly it could even be an extension. In lieu
of a real type you also have an array-of-ranges, which is what I
originally proposed range_agg to return.

Finally, I think I mentioned this a long time ago, but I'm still not
sure if this patch needs work around these things:

- gist opclass
- spgist opclass
- typanalyze
- selectivity

I'd love for a real Postgres expert to tell me "No, we can add that
later" or "Yes, you have to add that now." Even better if they can
offer some help, because I'm not sure I understand those areas well
enough to do it myself.

Thanks all,
Paul

[1]
https://git.proteus-tech.com/open-source/django-postgres/blob/fa91cf9b43ce942e84b1a9be22f445f3515ca360/postgres/sql/range_agg.sql

Re: range_agg

From

Alvaro Herrera

Date:

12 March 2020, 12:38:16

Hello Paul, thanks for the thorough response to all these points.

Regarding the merge of multiranges with ranges, I had also thought of
that at some point and was leaning towards doing that, but after the
latest responses I think the arguments against it are sensible; and now
there's a clear majority for keeping them separate.

I'll be posting an updated version of the patch later today.

I was a bit scared bit this part:

On 2020-Mar-11, Paul A Jungwirth wrote:

> Finally, I think I mentioned this a long time ago, but I'm still not
> sure if this patch needs work around these things:
> 
> - gist opclass
> - spgist opclass
> - typanalyze
> - selectivity
> 
> I'd love for a real Postgres expert to tell me "No, we can add that
> later" or "Yes, you have to add that now."

While I think that the gist and spgist opclass are in the "very nice to
have but still optional" category, the other two items seem mandatory
(but I'm not 100% certain about that, TBH).  I'm not sure we have time
to get those ready during this commitfest.

... thinking about gist+spgist, I think they could be written
identically to those for ranges, using the lowest (first) lower bound
and the higher (last) upper bound.

... thinking about selectivity, I think the way to write that is to
first compute the selectivity for the range across the first lower bound
and the last upper bound, and then subtract that for the "negative"
space between the contained ranges.

I have no immediate thoughts about typanalyze.  I suppose it should be
somehow based on the implementation for ranges ... maybe a first-cut is
to construct fake ranges covering the whole multirange (as above) and
just use the ranges implementation (compute_range_stats).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Paul A Jungwirth

Date:

13 March 2020, 16:32:20

On Thu, Mar 12, 2020 at 5:38 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> ... thinking about gist+spgist, I think they could be written
> identically to those for ranges, using the lowest (first) lower bound
> and the higher (last) upper bound.
>
> ... thinking about selectivity, I think the way to write that is to
> first compute the selectivity for the range across the first lower bound
> and the last upper bound, and then subtract that for the "negative"
> space between the contained ranges.
>
> I have no immediate thoughts about typanalyze.  I suppose it should be
> somehow based on the implementation for ranges ... maybe a first-cut is
> to construct fake ranges covering the whole multirange (as above) and
> just use the ranges implementation (compute_range_stats).

Thanks, this is pretty much what I was thinking too, but I'm really
glad to have someone who knows better confirm it. I can get started on
these right away, and I'll let folks know if I need any help. When I
looked at this last fall there was a lot I didn't understand. More or
less using the existing ranges implementation should be a big help
though.

Paul

Re: range_agg

From

Paul A Jungwirth

Date:

13 March 2020, 16:33:56

On Wed, Mar 11, 2020 at 4:39 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On Sat, Mar 7, 2020 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > > [ v11 patches ]
> > The cfbot isn't too happy with this; it's getting differently-ordered
> > results than you apparently did for the list of owned objects in
> > dependency.out's DROP OWNED BY test.  Not sure why that should be ---
> > it seems like af6550d34 should have ensured that there's only one
> > possible ordering.
>
> Oh, my last email left out the most important part. :-) Is this
> failure online somewhere so I can take a look at it and fix it?

Looks like I sent this just to Tom before. This is something I need to
fix, right?

Regards,
Paul

Re: range_agg

From

Tom Lane

Date:

13 March 2020, 17:06:29

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
> On Wed, Mar 11, 2020 at 4:39 PM Paul A Jungwirth
> <pj@illuminatedcomputing.com> wrote:
>> Oh, my last email left out the most important part. :-) Is this
>> failure online somewhere so I can take a look at it and fix it?

Look for your patch(es) at

http://commitfest.cputube.org

Right now it's not even applying, presumably because Alvaro already
pushed some pieces, so you need to rebase.  But when it was applying,
one or both of the test builds was failing.

            regards, tom lane

Re: range_agg

From

Alvaro Herrera

Date:

13 March 2020, 21:39:06

On 2020-Mar-13, Tom Lane wrote:

> Right now it's not even applying, presumably because Alvaro already
> pushed some pieces, so you need to rebase.  But when it was applying,
> one or both of the test builds was failing.

Here's the rebased version.

I just realized I didn't include the API change I proposed in
https://postgr.es/m/20200306200343.GA625@alvherre.pgsql ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: range_agg

From

Paul A Jungwirth

Date:

14 March 2020, 17:34:44

On Fri, Mar 13, 2020 at 2:39 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Here's the rebased version.
>
> I just realized I didn't include the API change I proposed in
> https://postgr.es/m/20200306200343.GA625@alvherre.pgsql ...

Thanks for your help with this Alvaro!

I was just adding your changes to my own branch and I noticed your
v12-0001 has different parameter names here:

diff --git a/src/backend/utils/adt/multirangetypes.c
b/src/backend/utils/adt/multirangetypes.c
index f9dd0378cc..0c9afd5448 100644
--- a/src/backend/utils/adt/multirangetypes.c
+++ b/src/backend/utils/adt/multirangetypes.c
@@ -376,11 +375,11 @@ multirange_typanalyze(PG_FUNCTION_ARGS)
  * pointer to a type cache entry.
  */
 static MultirangeIOData *
-get_multirange_io_data(FunctionCallInfo fcinfo, Oid mltrngtypid,
IOFuncSelector func)
+get_multirange_io_data(FunctionCallInfo fcinfo, Oid rngtypid,
IOFuncSelector func)
 {
     MultirangeIOData *cache = (MultirangeIOData *) fcinfo->flinfo->fn_extra;

-    if (cache == NULL || cache->typcache->type_id != mltrngtypid)
+    if (cache == NULL || cache->typcache->type_id != rngtypid)
     {
         int16       typlen;
         bool        typbyval;
@@ -389,9 +388,9 @@ get_multirange_io_data(FunctionCallInfo fcinfo,
Oid mltrngtypid, IOFuncSelector

         cache = (MultirangeIOData *)
MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,

sizeof(MultirangeIOData));
-        cache->typcache = lookup_type_cache(mltrngtypid,
TYPECACHE_MULTIRANGE_INFO);
+        cache->typcache = lookup_type_cache(rngtypid,
TYPECACHE_MULTIRANGE_INFO);
         if (cache->typcache->rngtype == NULL)
-            elog(ERROR, "type %u is not a multirange type", mltrngtypid);
+            elog(ERROR, "type %u is not a multirange type", rngtypid);

         /* get_type_io_data does more than we need, but is convenient */
         get_type_io_data(cache->typcache->rngtype->type_id,

I'm pretty sure mltrngtypid is the correct name here. Right? Let me
know if I'm missing something. :-)

Yours,
Paul

Re: range_agg

From

Paul A Jungwirth

Date:

14 March 2020, 18:13:54

On Fri, Mar 13, 2020 at 10:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
> > On Wed, Mar 11, 2020 at 4:39 PM Paul A Jungwirth
> > <pj@illuminatedcomputing.com> wrote:
> >> Oh, my last email left out the most important part. :-) Is this
> >> failure online somewhere so I can take a look at it and fix it?
>
> Look for your patch(es) at
>
> http://commitfest.cputube.org
>
> Right now it's not even applying, presumably because Alvaro already
> pushed some pieces, so you need to rebase.  But when it was applying,
> one or both of the test builds was failing.

Here are all Alvaro's changes rolled into one patch, along with the
get_multirange_io_data parameter renamed to mltrngtypid, and this
small fix to the dependency regression test:
diff --git a/src/test/regress/expected/dependency.out
b/src/test/regress/expected/dependency.out
index 778699a961..8232795148 100644
--- a/src/test/regress/expected/dependency.out
+++ b/src/test/regress/expected/dependency.out
@@ -140,8 +140,8 @@ owner of sequence deptest_a_seq
 owner of table deptest
 owner of function deptest_func()
 owner of type deptest_enum
-owner of type deptest_range
 owner of type deptest_multirange
+owner of type deptest_range
 owner of table deptest2
 owner of sequence ss1
 owner of type deptest_t

I think that should fix the cfbot failure.

Yours,
Paul

Attachment

v13-multiranges.patch

Re: range_agg

From

Paul A Jungwirth

Date:

17 March 2020, 04:52:12

On Sat, Mar 14, 2020 at 11:13 AM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> I think that should fix the cfbot failure.

I saw this patch was failing to apply again. There was some
refactoring to how polymorphic types are determined. I added my
changes for anymultirange to that new approach, and things should be
passing again.

Yours,
Paul

Attachment

v14-multirange.patch

Re: range_agg

From

Alvaro Herrera

Date:

19 March 2020, 20:41:56

On 2020-Mar-16, Paul A Jungwirth wrote:

> On Sat, Mar 14, 2020 at 11:13 AM Paul A Jungwirth
> <pj@illuminatedcomputing.com> wrote:
> > I think that should fix the cfbot failure.
> 
> I saw this patch was failing to apply again. There was some
> refactoring to how polymorphic types are determined. I added my
> changes for anymultirange to that new approach, and things should be
> passing again.

There's been another flurry of commits in the polymorphic types area.
Can you please rebase again?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Paul A Jungwirth

Date:

19 March 2020, 20:43:48

On Thu, Mar 19, 2020 at 1:42 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2020-Mar-16, Paul A Jungwirth wrote:
>
> > On Sat, Mar 14, 2020 at 11:13 AM Paul A Jungwirth
> > <pj@illuminatedcomputing.com> wrote:
> > > I think that should fix the cfbot failure.
> >
> > I saw this patch was failing to apply again. There was some
> > refactoring to how polymorphic types are determined. I added my
> > changes for anymultirange to that new approach, and things should be
> > passing again.
>
> There's been another flurry of commits in the polymorphic types area.
> Can you please rebase again?

I noticed that too. :-) I'm about halfway through a rebase right now.
I can probably finish it up tonight.

Paul

Re: range_agg

From

Paul A Jungwirth

Date:

20 March 2020, 05:24:53

On Thu, Mar 19, 2020 at 1:43 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> On Thu, Mar 19, 2020 at 1:42 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > There's been another flurry of commits in the polymorphic types area.
> > Can you please rebase again?
>
> I noticed that too. :-) I'm about halfway through a rebase right now.
> I can probably finish it up tonight.

Here is that patch. I should probably add an anycompatiblemultirange
type now too? I'll get started on that tomorrow.

Regards,
Paul

Attachment

v15-multirange.patch

Re: range_agg

From

Alvaro Herrera

Date:

20 March 2020, 14:48:12

On 2020-Mar-14, Paul A Jungwirth wrote:

> On Fri, Mar 13, 2020 at 2:39 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Here's the rebased version.
> >
> > I just realized I didn't include the API change I proposed in
> > https://postgr.es/m/20200306200343.GA625@alvherre.pgsql ...
> 
> Thanks for your help with this Alvaro!
> 
> I was just adding your changes to my own branch and I noticed your
> v12-0001 has different parameter names here:
> 
>  static MultirangeIOData *
> -get_multirange_io_data(FunctionCallInfo fcinfo, Oid mltrngtypid,
> IOFuncSelector func)
> +get_multirange_io_data(FunctionCallInfo fcinfo, Oid rngtypid,
> IOFuncSelector func)

> I'm pretty sure mltrngtypid is the correct name here. Right? Let me
> know if I'm missing something. :-)

Heh.  The intention here was to abbreviate to "typid", but if you want
to keep the longer name, it's OK too.  I don't think that name is
particularly critical, since it should be obvious that it must be a
multirange type.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Alvaro Herrera

Date:

23 March 2020, 23:32:27

On 2020-Mar-19, Paul A Jungwirth wrote:

> On Thu, Mar 19, 2020 at 1:43 PM Paul A Jungwirth
> <pj@illuminatedcomputing.com> wrote:
> > On Thu, Mar 19, 2020 at 1:42 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > > There's been another flurry of commits in the polymorphic types area.
> > > Can you please rebase again?
> >
> > I noticed that too. :-) I'm about halfway through a rebase right now.
> > I can probably finish it up tonight.
> 
> Here is that patch. I should probably add an anycompatiblemultirange
> type now too? I'll get started on that tomorrow.

Thanks for the new version.  Here's a few minor adjustments while I
continue to read through it.

Thinking about the on-disk representation, can we do better than putting
the contained ranges in long-varlena format, including padding; also we
include the type OID with each element.  Sounds wasteful.  A more
compact representation might be to allow short varlenas and doing away
with the alignment padding, put the the type OID just once.  This is
important because we cannot change it later.

I'm also wondering if multirange_in() is the right strategy.  Would it
be sensible to give each range to range_parse or range_parse_bounde, so
that it determines where each range starts and ends?  Then that function
doesn't have to worry about each quote and escape, duplicating range
parsing code.  (This will probably require changing signature of the
rangetypes.c function, and exporting it; for example have
range_parse_bound allow bound_str to be NULL and in that case don't mess
with the StringInfo and just return the end position of the parsed
bound.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: range_agg

From

Paul Jungwirth

Date:

24 March 2020, 04:23:31

Thanks Alvaro!

On Mon, Mar 23, 2020 at 4:33 PM Alvaro Herrera 
<alvherre@2ndquadrant.com> wrote:
 >
 > Thinking about the on-disk representation, can we do better than putting
 > the contained ranges in long-varlena format, including padding; also we
 > include the type OID with each element.  Sounds wasteful.  A more
 > compact representation might be to allow short varlenas and doing away
 > with the alignment padding, put the the type OID just once.  This is
 > important because we cannot change it later.

Can you give me some guidance on this? I don't know how to make the 
on-disk format different from the in-memory format. (And for the 
in-memory format, I think it's important to have actual RangeTypes 
inside the multirange.) Is there something in the documentation, or a 
README in the repo, or even another type I can follow?

 > I'm also wondering if multirange_in() is the right strategy.  Would 
it> be sensible to give each range to range_parse or range_parse_bounde, so
 > that it determines where each range starts and ends?  Then that function
 > doesn't have to worry about each quote and escape, duplicating range
 > parsing code.  (This will probably require changing signature of the
 > rangetypes.c function, and exporting it; for example have
 > range_parse_bound allow bound_str to be NULL and in that case don't mess
 > with the StringInfo and just return the end position of the parsed
 > bound.)

Yeah, I really wanted to do it that way originally too. As you say it 
would require passing back more information from the range-parsing code. 
I can take a stab at making the necessary changes. I'm a bit more 
confident now than I was then in changing the range code we have already.

Regards,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: range_agg

From

Alvaro Herrera

Date:

04 April 2020, 23:10:18

On 2020-Mar-23, Paul Jungwirth wrote:

> On Mon, Mar 23, 2020 at 4:33 PM Alvaro Herrera <alvherre@2ndquadrant.com>
> wrote:
> >
> > Thinking about the on-disk representation, can we do better than putting
> > the contained ranges in long-varlena format, including padding; also we
> > include the type OID with each element.  Sounds wasteful.  A more
> > compact representation might be to allow short varlenas and doing away
> > with the alignment padding, put the the type OID just once.  This is
> > important because we cannot change it later.
> 
> Can you give me some guidance on this? I don't know how to make the on-disk
> format different from the in-memory format. (And for the in-memory format, I
> think it's important to have actual RangeTypes inside the multirange.) Is
> there something in the documentation, or a README in the repo, or even
> another type I can follow?

Sorry I didn't reply earlier, but I didn't know the answer then and I
still don't know the answer now.

Anyway, I rebased this to verify that the code hasn't broken, and it
hasn't -- the tests still pass.  There was a minor conflict in
pg_operator.dat which I fixed.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

v16-multirange.patch

Re: range_agg

From

Alvaro Herrera

Date:

05 April 2020, 22:13:13

v17 is a rebase fixing a minor parse_coerce.c edit; v16 lasted little
:-(  I chose to change the wording of the conflicting comment in
enforce_generic_type_consistency():

 * 3) Similarly, if return type is ANYRANGE or ANYMULTIRANGE, and any
 *      argument is ANYRANGE or ANYMULTIRANGE, use that argument's
 *      actual type, range type or multirange type as the function's return
 *      type.

This wording is less precise, in that it doesn't say exactly which of
the three types is the actual result for each of the possible four cases
(r->r, r->m, m->m, m->r) but I think it should be straightforward.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Alvaro Herrera

Date:

05 April 2020, 22:14:06

On 2020-Apr-05, Alvaro Herrera wrote:

> v17 is a rebase fixing a minor parse_coerce.c edit; v16 lasted little
> :-(  I chose to change the wording of the conflicting comment in
> enforce_generic_type_consistency():

Hm, attached.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

v17-multirange.patch

Re: range_agg

From

Paul A Jungwirth

Date:

11 April 2020, 03:44:30

On Sat, Apr 4, 2020 at 4:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Sorry I didn't reply earlier, but I didn't know the answer then and I
> still don't know the answer now.

Okay, thanks Alvaro! I'll see if I can figure it out myself. I assume
it is actually possible, right? I've seen references to on-disk format
vs in-memory format before, but I've never encountered anything in the
code supporting a difference.

> Anyway, I rebased this to verify that the code hasn't broken, and it
> hasn't -- the tests still pass.  There was a minor conflict in
> pg_operator.dat which I fixed.

Thanks, and thanks for your v17 also. Here is a patch building on that
and adding support for anycompatiblemultirange.

Regards,
Paul

Attachment

v18-multirange.patch

Re: range_agg

From

Paul A Jungwirth

Date:

11 April 2020, 16:36:37

On Fri, Apr 10, 2020 at 8:44 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> Thanks, and thanks for your v17 also. Here is a patch building on that
> and adding support for anycompatiblemultirange.

Here is a v19 that just moved the multirange tests to a new parallel
group to avoid a max-20-tests error. Sorry about that!

Btw I'm working on typanalyze + selectivity, and it seems like the
test suite doesn't run those things? At least I can't seem to get it
to call the existing range typanalyze functions. Those would run from
the vacuum process, not an ordinary backend, right? Is there a
separate test suite for that I'm overlooking? I'm sure I can figure it
out, but since I'm uploading a new patch file I thought I'd ask. . . .

Thanks,
Paul

Attachment

v19-multirange.patch

Re: range_agg

From

Paul A Jungwirth

Date:

11 April 2020, 16:42:44

On Sat, Apr 11, 2020 at 9:36 AM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> Btw I'm working on typanalyze + selectivity, and it seems like the
> test suite doesn't run those things?

Nevermind, I just had to add `analyze numrange_test` to
src/test/regress/sql/rangetypes.sql. :-) Do you want a separate patch
for that? Or maybe it should go in sql/vacuum.sql?

Regards,
Paul

Re: range_agg

From

Alvaro Herrera

Date:

11 April 2020, 16:56:47

On 2020-Apr-11, Paul A Jungwirth wrote:

> On Sat, Apr 11, 2020 at 9:36 AM Paul A Jungwirth
> <pj@illuminatedcomputing.com> wrote:
> > Btw I'm working on typanalyze + selectivity, and it seems like the
> > test suite doesn't run those things?
> 
> Nevermind, I just had to add `analyze numrange_test` to
> src/test/regress/sql/rangetypes.sql. :-) Do you want a separate patch
> for that? Or maybe it should go in sql/vacuum.sql?

Dunno, it seems fine in rangetypes.sql.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: range_agg

From

Justin Pryzby

Date:

05 July 2020, 17:20:51

On Sat, Apr 11, 2020 at 09:36:37AM -0700, Paul A Jungwirth wrote:
> On Fri, Apr 10, 2020 at 8:44 PM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote:
> > Thanks, and thanks for your v17 also. Here is a patch building on that
> > and adding support for anycompatiblemultirange.
> 
> Here is a v19 that just moved the multirange tests to a new parallel
> group to avoid a max-20-tests error. Sorry about that!

This needs to be rebased ; set cfbot to "waiting".

-- 
Justin

Re: range_agg

From

Paul A Jungwirth

Date:

05 July 2020, 19:11:15

On Sun, Jul 5, 2020 at 10:20 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> This needs to be rebased ; set cfbot to "waiting".

Here is a patch that is rebased onto current master. It also includes
the analyze/selectivity additions.

I haven't made much progress storing on-disk multiranges without the
range type oids. Peter Geoghegan suggested I look at how we handle
arrays in heap_deform_tuple, but I don't see anything there to help me
(probably I misunderstood him though). Just knowing that arrays are
something we do this for is enough to hunt for clues, but if anyone
can point me more directly to code that will help me do it for
multiranges, I'd be appreciative.

Yours,
Paul

Attachment

v20-multirange.patch

Re: range_agg

From

Paul A Jungwirth

Date:

05 July 2020, 19:25:49

On Sun, Jul 5, 2020 at 12:11 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> Just knowing that arrays are
> something we do this for is enough to hunt for clues, but if anyone
> can point me more directly to code that will help me do it for
> multiranges, I'd be appreciative.

It looks like expandeddatum.h is where I should be looking. . . .

Paul

Re: range_agg

From

Paul A Jungwirth

Date:

16 August 2020, 19:55:21

On Sun, Jul 5, 2020 at 12:11 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> I haven't made much progress storing on-disk multiranges without the
> range type oids.

Here is a patch using the TOAST EXTENDED infrastructure to store
multiranges on disk with a new "short" range type struct that omits
the range type oids, but then loads ordinary range structs for its
in-memory operations. One nice thing that fell out from that work is
that I can build an ordinary RangeType ** list at the same time.
(Since RangeTypes are varlena and their bounds may be varlena, you
already needed to get a list like that for nearly any operation.)

This is rebased on the current master, including some changes to doc
tables and pg_upgrade handling of type oids.

Yours,
Paul

Attachment

v21-multirange.patch

Re: range_agg

From

Paul A Jungwirth

Date:

24 September 2020, 00:04:57

On Sun, Aug 16, 2020 at 12:55 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> This is rebased on the current master, including some changes to doc
> tables and pg_upgrade handling of type oids.

Here is a rebased version of this patch, including a bunch of cleanup
from Alvaro. (Thanks Alvaro!)

Paul

Attachment

v22-multirange.patch

Re: range_agg

From

Pavel Stehule

Date:

09 October 2020, 06:51:36

čt 24. 9. 2020 v 2:05 odesílatel Paul A Jungwirth <pj@illuminatedcomputing.com> napsal:

On Sun, Aug 16, 2020 at 12:55 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> This is rebased on the current master, including some changes to doc
> tables and pg_upgrade handling of type oids.

Here is a rebased version of this patch, including a bunch of cleanup
from Alvaro. (Thanks Alvaro!)

I tested this patch and It looks well, I have not any objections

1. there are not new warnings

2. make check-world passed

3. build doc without problems

4. doc is enough, regress tests too

5. there was not objection against this feature in discussion, and I think it is interesting and useful feature - good additional to arrays

Regards

Pavel

Paul

Re: range_agg

From

Alexander Korotkov

Date:

27 November 2020, 08:35:37

Hi!

On Thu, Sep 24, 2020 at 3:05 AM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> On Sun, Aug 16, 2020 at 12:55 PM Paul A Jungwirth
> <pj@illuminatedcomputing.com> wrote:
> > This is rebased on the current master, including some changes to doc
> > tables and pg_upgrade handling of type oids.
>
> Here is a rebased version of this patch, including a bunch of cleanup
> from Alvaro. (Thanks Alvaro!)

I'd like to review this patch.  Could you please rebase it once again?  Thanks.

------
Regards,
Alexander Korotkov

Re: range_agg

From

Paul A Jungwirth

Date:

29 November 2020, 17:11:05

On Fri, Nov 27, 2020 at 12:35 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> I'd like to review this patch.  Could you please rebase it once again?  Thanks.

Thanks! Here is a rebased version. It also includes one more cleanup
commit from Alvaro since the last one.

Yours,
Paul

Attachment

v23-multirange.patch

Re: range_agg

From

Alexander Korotkov

Date:

29 November 2020, 19:43:14

On Sun, Nov 29, 2020 at 8:11 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> On Fri, Nov 27, 2020 at 12:35 AM Alexander Korotkov
> <aekorotkov@gmail.com> wrote:
> > I'd like to review this patch.  Could you please rebase it once again?  Thanks.
>
> Thanks! Here is a rebased version. It also includes one more cleanup
> commit from Alvaro since the last one.

Thank you.  Could you please, update doc/src/sgml/catalogs.sgml,
because pg_type and pg_range catalogs are updated.

------
Regards,
Alexander Korotkov

Re: range_agg

From

Paul A Jungwirth

Date:

29 November 2020, 20:53:40

On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> Thank you.  Could you please, update doc/src/sgml/catalogs.sgml,
> because pg_type and pg_range catalogs are updated.

Attached! :-)

Paul

Attachment

v24-multirange.patch

Re: range_agg

From

Alexander Korotkov

Date:

30 November 2020, 19:35:06

On Sun, Nov 29, 2020 at 11:53 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov
> <aekorotkov@gmail.com> wrote:
> > Thank you.  Could you please, update doc/src/sgml/catalogs.sgml,
> > because pg_type and pg_range catalogs are updated.
>
> Attached! :-)

You're quick, thank you.  Please, also take a look at cfbot failure
https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/746623942
I've tried to reproduce it, but didn't manage yet.

------
Regards,
Alexander Korotkov

Re: range_agg

From

Alexander Korotkov

Date:

30 November 2020, 20:39:24

On Mon, Nov 30, 2020 at 10:35 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> On Sun, Nov 29, 2020 at 11:53 PM Paul A Jungwirth
> <pj@illuminatedcomputing.com> wrote:
> >
> > On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov
> > <aekorotkov@gmail.com> wrote:
> > > Thank you.  Could you please, update doc/src/sgml/catalogs.sgml,
> > > because pg_type and pg_range catalogs are updated.
> >
> > Attached! :-)
>
> You're quick, thank you.  Please, also take a look at cfbot failure
> https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/746623942
> I've tried to reproduce it, but didn't manage yet.

Got it.  type_sanity test fails on any platform, you just need to
repeat "make check" till it fails.

The failed query checked consistency of range types, but it didn't
take into account ranges of domains and ranges of records, which are
exercised by multirangetypes test running in parallel.  We could teach
this query about such kinds of ranges, but I think that would be
overkill, because we're not going to introduce such builtin ranges
yet.  So, I'm going to just move multirangetypes test into another
group of parallel tests.

------
Regards,
Alexander Korotkov

Re: range_agg

From

Alexander Korotkov

Date:

07 December 2020, 23:45:57

On Mon, Nov 30, 2020 at 11:39 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> On Mon, Nov 30, 2020 at 10:35 PM Alexander Korotkov
> <aekorotkov@gmail.com> wrote:
> > On Sun, Nov 29, 2020 at 11:53 PM Paul A Jungwirth
> > <pj@illuminatedcomputing.com> wrote:
> > >
> > > On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov
> > > <aekorotkov@gmail.com> wrote:
> > > > Thank you.  Could you please, update doc/src/sgml/catalogs.sgml,
> > > > because pg_type and pg_range catalogs are updated.
> > >
> > > Attached! :-)
> >
> > You're quick, thank you.  Please, also take a look at cfbot failure
> > https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/746623942
> > I've tried to reproduce it, but didn't manage yet.
>
> Got it.  type_sanity test fails on any platform, you just need to
> repeat "make check" till it fails.
>
> The failed query checked consistency of range types, but it didn't
> take into account ranges of domains and ranges of records, which are
> exercised by multirangetypes test running in parallel.  We could teach
> this query about such kinds of ranges, but I think that would be
> overkill, because we're not going to introduce such builtin ranges
> yet.  So, I'm going to just move multirangetypes test into another
> group of parallel tests.

I also found a problem in multirange types naming logic.  Consider the
following example.

create type a_multirange AS (x float, y float);
create type a as range(subtype=text, collation="C");
create table tbl (x __a_multirange);
drop type a_multirange;

If you dump this database, the dump couldn't be restored.  The
multirange type is named __a_multirange, because the type named
a_multirange already exists.  However, it might appear that
a_multirange type is already deleted.  When the dump is restored, a
multirange type is named a_multirange, and the corresponding table
fails to be created.  The same thing doesn't happen with arrays,
because arrays are not referenced in dumps by their internal names.

I think we probably should add an option to specify multirange type
names while creating a range type.  Then dump can contain exact type
names used in the database, and restore wouldn't have a names
collision.

Another thing that worries me is the multirange serialization format.

typedef struct
{
    int32       vl_len_;        /* varlena header */
    char        flags;          /* range flags */
    char        _padding;       /* Bounds must be aligned */
    /* Following the header are zero to two bound values. */
} ShortRangeType;

Comment says this structure doesn't contain a varlena header, while
structure obviously has it.

In general, I wonder if we can make the binary format of multiranges
more efficient.  It seems that every function involving multiranges
from multirange_deserialize().  I think we can make functions like
multirange_contains_elem() much more efficient.  Multirange is
basically an array of ranges.  So we can pack it as follows.
1. Typeid and rangecount
2. Tightly packed array of flags (1-byte for each range)
3. Array of indexes of boundaries (4-byte for each range).  Or even
better we can combine offsets and lengths to be compression-friendly
like jsonb JEntry's do.
4. Boundary values
Using this format, we can implement multirange_contains_elem(),
multirange_contains_range() without deserialization and using binary
search.  That would be much more efficient.  What do you think?

------
Regards,
Alexander Korotkov

Re: range_agg

From

Alvaro Herrera

Date:

08 December 2020, 00:00:10

On 2020-Dec-08, Alexander Korotkov wrote:

> I also found a problem in multirange types naming logic.  Consider the
> following example.
> 
> create type a_multirange AS (x float, y float);
> create type a as range(subtype=text, collation="C");
> create table tbl (x __a_multirange);
> drop type a_multirange;
> 
> If you dump this database, the dump couldn't be restored.  The
> multirange type is named __a_multirange, because the type named
> a_multirange already exists.  However, it might appear that
> a_multirange type is already deleted.  When the dump is restored, a
> multirange type is named a_multirange, and the corresponding table
> fails to be created.  The same thing doesn't happen with arrays,
> because arrays are not referenced in dumps by their internal names.
> 
> I think we probably should add an option to specify multirange type
> names while creating a range type.  Then dump can contain exact type
> names used in the database, and restore wouldn't have a names
> collision.

Hmm, good point.  I agree that a dump must preserve the name, since once
created it is user-visible.  I had not noticed this problem, but it's
obvious in retrospect.

> In general, I wonder if we can make the binary format of multiranges
> more efficient.  It seems that every function involving multiranges
> from multirange_deserialize().  I think we can make functions like
> multirange_contains_elem() much more efficient.  Multirange is
> basically an array of ranges.  So we can pack it as follows.
> 1. Typeid and rangecount
> 2. Tightly packed array of flags (1-byte for each range)
> 3. Array of indexes of boundaries (4-byte for each range).  Or even
> better we can combine offsets and lengths to be compression-friendly
> like jsonb JEntry's do.
> 4. Boundary values
> Using this format, we can implement multirange_contains_elem(),
> multirange_contains_range() without deserialization and using binary
> search.  That would be much more efficient.  What do you think?

I also agree.  I spent some time staring at the I/O code a couple of
months back but was unable to focus on it for long enough.  I don't know
JEntry's format, but I do remember that the storage format for JSONB was
widely discussed back then; it seems wise to apply similar logic or at
least similar reasoning.

Re: range_agg

From

Alexander Korotkov

Date:

08 December 2020, 00:20:10

On Tue, Dec 8, 2020 at 3:00 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2020-Dec-08, Alexander Korotkov wrote:
>
> > I also found a problem in multirange types naming logic.  Consider the
> > following example.
> >
> > create type a_multirange AS (x float, y float);
> > create type a as range(subtype=text, collation="C");
> > create table tbl (x __a_multirange);
> > drop type a_multirange;
> >
> > If you dump this database, the dump couldn't be restored.  The
> > multirange type is named __a_multirange, because the type named
> > a_multirange already exists.  However, it might appear that
> > a_multirange type is already deleted.  When the dump is restored, a
> > multirange type is named a_multirange, and the corresponding table
> > fails to be created.  The same thing doesn't happen with arrays,
> > because arrays are not referenced in dumps by their internal names.
> >
> > I think we probably should add an option to specify multirange type
> > names while creating a range type.  Then dump can contain exact type
> > names used in the database, and restore wouldn't have a names
> > collision.
>
> Hmm, good point.  I agree that a dump must preserve the name, since once
> created it is user-visible.  I had not noticed this problem, but it's
> obvious in retrospect.
>
> > In general, I wonder if we can make the binary format of multiranges
> > more efficient.  It seems that every function involving multiranges
> > from multirange_deserialize().  I think we can make functions like
> > multirange_contains_elem() much more efficient.  Multirange is
> > basically an array of ranges.  So we can pack it as follows.
> > 1. Typeid and rangecount
> > 2. Tightly packed array of flags (1-byte for each range)
> > 3. Array of indexes of boundaries (4-byte for each range).  Or even
> > better we can combine offsets and lengths to be compression-friendly
> > like jsonb JEntry's do.
> > 4. Boundary values
> > Using this format, we can implement multirange_contains_elem(),
> > multirange_contains_range() without deserialization and using binary
> > search.  That would be much more efficient.  What do you think?
>
> I also agree.  I spent some time staring at the I/O code a couple of
> months back but was unable to focus on it for long enough.  I don't know
> JEntry's format, but I do remember that the storage format for JSONB was
> widely discussed back then; it seems wise to apply similar logic or at
> least similar reasoning.

Thank you for your feedback!

I'd like to publish my revision of the patch.  So Paul could start
from it.  The changes I made are minor
1. Add missing types to typedefs.list
2. Run pg_indent run over the changed files and some other formatting changes
3. Reorder the regression tests to evade the error spotted by
commitfest.cputube.org

I'm switching this patch to WOA.

------
Regards,
Alexander Korotkov

Attachment

v25-multirange.patch

Re: range_agg

From

Alexander Korotkov

Date:

15 December 2020, 23:21:47

On Tue, Dec 8, 2020 at 3:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> I'd like to publish my revision of the patch.  So Paul could start
> from it.  The changes I made are minor
> 1. Add missing types to typedefs.list
> 2. Run pg_indent run over the changed files and some other formatting changes
> 3. Reorder the regression tests to evade the error spotted by
> commitfest.cputube.org
>
> I'm switching this patch to WOA.

I decided to work on this patch myself.  The next revision is attached.

The changes are as follows.

1. CREATE TYPE ... AS RANGE command now accepts new argument
multirange_type_name.  If multirange_type_name isn't specified, then
multirange type name is selected automatically.  pg_dump always
specifies multirange_type_name (if dumping at least pg14).  Thanks to
that dumps are always restorable.
2. Multiranges now have a new binary format.  After the MultirangeType
struct, an array of offsets comes, then an array of flags and finally
bounds themselves.  Offsets points to the bounds of particular range
within multirange.  Thanks to that particular range could be accessed
by number without deserialization of the whole multirange.  Offsets
are stored in compression-friendly format similar to jsonb (actually
only every 4th of those "offsets" is really offsets, others are
lengths).
3. Most of simple functions working with multirages now don't
deserialize the whole multirange.  Instead they fetch bounds of
particular ranges, and that doesn't even require any additional memory
allocation.
4. I've removed ExpandedObject support from the patch.  I don't see
much point in it assuming all the functions are returning serialized
multirage anyway.  We can add ExpandedObject support in future if
needed.
5. multirange_contains_element(), multirange_contains_range(),
multirange_overlaps_range() now use binary search.  Thanks to binary
format, which doesn't require full deserialization, these functions
now work with O(log N) complexity.

Comments and documentation still need revision according to these
changes.  I'm going to continue with this.

------
Regards,
Alexander Korotkov

Attachment

v26-multirange.patch

Re: range_agg

From

Alexander Korotkov

Date:

16 December 2020, 04:14:41

On Wed, Dec 16, 2020 at 2:21 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> I decided to work on this patch myself.  The next revision is attached.
>
> The changes are as follows.
>
> 1. CREATE TYPE ... AS RANGE command now accepts new argument
> multirange_type_name.  If multirange_type_name isn't specified, then
> multirange type name is selected automatically.  pg_dump always
> specifies multirange_type_name (if dumping at least pg14).  Thanks to
> that dumps are always restorable.
> 2. Multiranges now have a new binary format.  After the MultirangeType
> struct, an array of offsets comes, then an array of flags and finally
> bounds themselves.  Offsets points to the bounds of particular range
> within multirange.  Thanks to that particular range could be accessed
> by number without deserialization of the whole multirange.  Offsets
> are stored in compression-friendly format similar to jsonb (actually
> only every 4th of those "offsets" is really offsets, others are
> lengths).
> 3. Most of simple functions working with multirages now don't
> deserialize the whole multirange.  Instead they fetch bounds of
> particular ranges, and that doesn't even require any additional memory
> allocation.
> 4. I've removed ExpandedObject support from the patch.  I don't see
> much point in it assuming all the functions are returning serialized
> multirage anyway.  We can add ExpandedObject support in future if
> needed.
> 5. multirange_contains_element(), multirange_contains_range(),
> multirange_overlaps_range() now use binary search.  Thanks to binary
> format, which doesn't require full deserialization, these functions
> now work with O(log N) complexity.
>
> Comments and documentation still need revision according to these
> changes.  I'm going to continue with this.

The next 27th revision is attached.  It contains minor documentation
and code changes, in particular it should address
commitfest.cputube.org complaints.

------
Regards,
Alexander Korotkov

Attachment

v27-multirange.patch

Re: range_agg

From

Alexander Korotkov

Date:

17 December 2020, 19:10:56

On Wed, Dec 16, 2020 at 7:14 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Wed, Dec 16, 2020 at 2:21 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > I decided to work on this patch myself.  The next revision is attached.
> >
> > The changes are as follows.
> >
> > 1. CREATE TYPE ... AS RANGE command now accepts new argument
> > multirange_type_name.  If multirange_type_name isn't specified, then
> > multirange type name is selected automatically.  pg_dump always
> > specifies multirange_type_name (if dumping at least pg14).  Thanks to
> > that dumps are always restorable.
> > 2. Multiranges now have a new binary format.  After the MultirangeType
> > struct, an array of offsets comes, then an array of flags and finally
> > bounds themselves.  Offsets points to the bounds of particular range
> > within multirange.  Thanks to that particular range could be accessed
> > by number without deserialization of the whole multirange.  Offsets
> > are stored in compression-friendly format similar to jsonb (actually
> > only every 4th of those "offsets" is really offsets, others are
> > lengths).
> > 3. Most of simple functions working with multirages now don't
> > deserialize the whole multirange.  Instead they fetch bounds of
> > particular ranges, and that doesn't even require any additional memory
> > allocation.
> > 4. I've removed ExpandedObject support from the patch.  I don't see
> > much point in it assuming all the functions are returning serialized
> > multirage anyway.  We can add ExpandedObject support in future if
> > needed.
> > 5. multirange_contains_element(), multirange_contains_range(),
> > multirange_overlaps_range() now use binary search.  Thanks to binary
> > format, which doesn't require full deserialization, these functions
> > now work with O(log N) complexity.
> >
> > Comments and documentation still need revision according to these
> > changes.  I'm going to continue with this.
>
> The next 27th revision is attached.  It contains minor documentation
> and code changes, in particular it should address
> commitfest.cputube.org complaints.

The next 28th revision is attached.  It comes with minor code
improvements, comments and commit message.

Also, given now we have a manual multirange type naming mechanism,
I've removed logic for prepending automatically generated names with
underscores to evade collision.  Instead, user is advised to name
multirange manually (as discussed in [1]).

I think this patch is very close to committable.  I'm going to spend
some more time further polishing it and commit (if I don't find a
major issue or face objections).

Links
1. https://www.postgresql.org/message-id/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com

------
Regards,
Alexander Korotkov

Attachment

v28-multirange.patch

Re: range_agg

From

Alexander Korotkov

Date:

27 December 2020, 09:50:07

On Thu, Dec 17, 2020 at 10:10 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
>
> I think this patch is very close to committable.  I'm going to spend
> some more time further polishing it and commit (if I don't find a
> major issue or face objections).

The main patch is committed.  I've prepared a set of improvements.
0001 Fixes bug in bsearch comparison functions
0002 Implements missing @> (range,multirange) operator and its commutator
0003 Does refactors signatures of *_internal() multirange functions
0004 Adds cross-type (range, multirange) operators handling to
existing range GiST opclass
0005 Adds support for GiST multirange indexing by approximation of
multirange as the union range with no gaps

The patchset is quite trivial.  I'm going to push it if there are no objections.

The SP-GiST handling is more tricky and requires substantial work.

------
Regards,
Alexander Korotkov

Attachment

Re: range_agg

From

Zhihong Yu

Date:

27 December 2020, 17:53:13

Hi,

This is not an ideal way to index multirages, but something we can easily have.

typo: multiranges

Cheers

On Sun, Dec 27, 2020 at 1:50 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Thu, Dec 17, 2020 at 10:10 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
>
> I think this patch is very close to committable. I'm going to spend
> some more time further polishing it and commit (if I don't find a
> major issue or face objections).

The main patch is committed. I've prepared a set of improvements.
0001 Fixes bug in bsearch comparison functions
0002 Implements missing @> (range,multirange) operator and its commutator
0003 Does refactors signatures of *_internal() multirange functions
0004 Adds cross-type (range, multirange) operators handling to
existing range GiST opclass
0005 Adds support for GiST multirange indexing by approximation of
multirange as the union range with no gaps

The patchset is quite trivial. I'm going to push it if there are no objections.

The SP-GiST handling is more tricky and requires substantial work.

------
Regards,
Alexander Korotkov

Re: range_agg

From

David Fetter

Date:

27 December 2020, 18:07:51

On Sun, Dec 27, 2020 at 09:53:13AM -0800, Zhihong Yu wrote:
> Hi,
> 
> This is not an ideal way to index multirages, but something we can
> easily have.

What sort of indexing improvements do you have in mind?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: range_agg

From

Alexander Korotkov

Date:

27 December 2020, 19:31:27

On Sun, Dec 27, 2020 at 8:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> This is not an ideal way to index multirages, but something we can easily have.
>
> typo: multiranges

Thanks for catching.  I will revise the commit message before committing.

------
Regards,
Alexander Korotkov

Re: range_agg

From

Alexander Korotkov

Date:

27 December 2020, 19:38:38

On Sun, Dec 27, 2020 at 9:07 PM David Fetter <david@fetter.org> wrote:
> On Sun, Dec 27, 2020 at 09:53:13AM -0800, Zhihong Yu wrote:
> > This is not an ideal way to index multirages, but something we can
> > easily have.
>
> What sort of indexing improvements do you have in mind?

Approximation of multirange as a range can cause false positives.
It's good if gaps are small, but what if they aren't.

Ideally, we should split multirange to the ranges and index them
separately.  So, we would need a GIN-like index.  The problem is that
the GIN entry tree is a B-tree, which is not very useful for searching
for ranges.  If we could replace the GIN entry tree with GiST or
SP-GiST, that should be good.  We could index multirage parts
separately and big gaps wouldn't be a problem.  Similar work was
already prototyped (it was prototyped under the name "vodka", but I'm
not a big fan of this name).  FWIW, such a new access method would
need a lot of work to bring it to commit.  I don't think it would be
reasonable, before multiranges get popular.

Regarding the GiST opclass, it seems the best we can do in GiST.

------
Regards,
Alexander Korotkov