Re: range_agg - Mailing list pgsql-hackers
From | Paul A Jungwirth |
---|---|
Subject | Re: range_agg |
Date | |
Msg-id | CA+renyX7jsWnyry6SOn-WSgdgQLtyFJ7SLWBVs+oatwQsMzvog@mail.gmail.com Whole thread Raw |
In response to | Re: range_agg (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: range_agg
Re: range_agg |
List | pgsql-hackers |
On Mon, Jul 1, 2019 at 3:38 PM Jeff Davis <pgsql@j-davis.com> wrote: > > For getting into core though, it should be a more complete set of > related operations. The patch is implicitly introducing the concept of > a "multirange" (in this case, an array of ranges), but it's not making > the concept whole. > > What else should return a multirange? This patch implements the union- > agg of ranges, but we also might want an intersection-agg of ranges > (that is, the set of ranges that are subranges of every input). Given > that there are other options here, the name "range_agg" is too generic > to mean union-agg specifically. Thanks for the review! I like the idea of adding a multirange type that works like range types, although I'm not sure I want to build it. :-) My own motivations for the range_agg patch are for temporal databases, where I have two use-cases: checking temporal foreign keys [1] and doing a Snodgrass "coalesce" operation [2]. The FKs use-case is more important. For coalesce I just immediately UNNEST the array, so a multirange would sort of "get in the way". It's no big deal if I can cast a multirange to an array, although for something that would run on every INSERT/UPDATE/DELETE I'd like to understand what the cast would cost us in performance. But coalesce isn't part of SQL:2011 and should be optional behavior or just something in an extension. The FKs use case matters to me a lot more, and I think a multirange would be fine for that. Also a multirange would solve the generics problem Pavel asked about. (I'll say more about that in a reply to him.) I'm not convinced that an intersection aggregate function for multiranges would be used by anyone, but I don't mind building one. Every other *_agg function has an "additive" sense, not a "subtractive" sense. json{,b}_object_agg are the closest since you *could* imagine intersection semantics for those, but they are unions. So in terms of *naming* I think using "range_agg" for union semantics is natural and would fit expectations. (I'm not the first to name this function range_agg btw: [3]). But there is clearly more than one worthwhile way to aggregate ranges: - David suggested weighted_range_agg and covering_range_agg. At PGCon 2019 someone else said he has had to build something that was essentially weighted_range_agg. I can see it used for scheduling/capacity/max-utilization problems. - You suggested an intersection range_agg. - At [4] there is a missing_ranges function that only gives the *gaps* between the input ranges. Nonetheless I still think I would call the union behavior simply range_agg, and then use weighted_range_agg, covering_range_agg, intersection_range_agg, and missing_range_agg for the rest (assuming we built them all). I'm not going to insist on any of that, but it's what feels most user-friendly to me. > What can we do with a multirange? A lot of range operators still make > sense, like "contains" or "overlaps"; but "adjacent" doesn't quite > work. What about new operations like inverting a multirange to get the > gaps? I can think of a lot of cool operators for `multirange \bigoplus multirange` or `multirange \bigoplus range` (commuting of course). And I've certainly wanted `range + range` and `range - range` in the past, which would both return a multirange. > If we have a more complete set of operators here, the flags for > handling overlapping ranges and gaps will be unnecessary. Both of my use cases should permit overlaps & gaps (range_agg(r, true, true)), so I'm actually pretty okay with dropping the flags entirely and just giving a one-param function that behavior. Or defining a strict_range_agg that offers more control. But also I don't understand how richer *operators* make the flags for the *aggregate* unnecessary. So I really like the idea of multiranges, but I'm reluctant to take it on myself, especially since this patch is just a utility function for my other temporal patches. But I don't want my rush to leave a blemish in our standard library either. But I think what really persuades me to add multiranges is making a range_agg that more easily supports user-defined range types. So how about I start on it and see how it goes? I expect I can follow the existing code for range types pretty closely, so maybe it won't be too hard. Another option would be to rename my function range_array_agg (or something) so that we are leaving space for a multirange function in the future. I don't love this idea myself but it would could a Plan B. What do you think of that? Regards, Paul [1] With range_agg I can make the temporal FK check use similar SQL to the non-temporal FK check: SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...] FOR KEY SHARE OF x vs SELECT 1 FROM ( SELECT range_agg(r, true, true) AS r FROM ( SELECT pkperiodatt AS r FROM [ONLY] pktable x WHERE pkatt1 = $1 [AND ...] FOR KEY SHARE OF x ) x1 ) x2 WHERE $n <@ x2.r[1] (The temporal version could be simplified a bit if FOR KEY SHARE ever supports aggregate functions.) [2] page number 159ff in https://www2.cs.arizona.edu/~rts/tdbbook.pdf (section 6.5.1, starts on page 183 of the PDF) [3] https://git.proteus-tech.com/open-source/django-postgres/blob/fa91cf9b43ce942e84b1a9be22f445f3515ca360/postgres/sql/range_agg.sql [4] https://schinckel.net/2014/11/18/aggregating-ranges-in-postgres/
pgsql-hackers by date: