Thanks for looking at this again!
On 3/4/20 1:33 PM, Alvaro Herrera wrote:
> I came across an interesting thing, namely multirange_canonicalize()'s
> use of qsort_arg with a callback of range_compare(). range_compare()
> calls range_deserialize() (non-trivial parsing) for each input range;
> multirange_canonicalize() later does a few extra deserialize calls of
> its own. Call me a premature optimization guy if you will, but I think
> it makes sense to have a different struct (let's call it
> "InMemoryRange") which stores the parsed representation of each range;
> then we can deserialize all ranges up front, and use that as many times
> as needed, without having to deserialize each range every time.
I don't know, this sounds like a drastic change. I agree that
multirange_deserialize and range_deserialize do a lot of copying (not
really any parsing though, and they both assume their inputs are already
de-TOASTED). But they are used very extensively, so if you wanted to
remove them you'd have to rewrite a lot.
I interpreted the intention of range_deserialize to be a way to keep the
range struct fairly "private" and give a standard interface to
extracting its attributes. Its motive seems akin to deconstruct_array.
So I wrote multirange_deserialize to follow that principle. Both
functions also handle memory alignment issues for you. With
multirange_deserialize, there isn't actually much structure (just the
list of ranges), so perhaps you could more easily omit it and give
callers direct access into the multirange contents. That still seems
risky though, and less well encapsulated.
My preference would be to see if these functions are really a
performance problem first, and only redo the in-memory structures if
they are. Also that seems like something you could do as a separate
project. (I wouldn't mind working on it myself, although I'd prefer to
do actual temporal database features first.) There are no
backwards-compatibility concerns to changing the in-memory structure,
right? (Even if there are, it's too late to avoid them for ranges.)
> While I'm at this, why not name the new file simply multiranges.c
> instead of multirangetypes.c?
As someone who doesn't do a lot of Postgres hacking, I tried to follow
the approach in rangetypes.c as closely as I could, especially for
naming things. So I named the file multirangetypes.c because there was
already rangetypes.c. But also I can see how the "types" emphasizes that
ranges and multiranges are not concrete types themselves, but more like
abstract data types or generics (like arrays).
Yours,
--
Paul ~{:-)
pj@illuminatedcomputing.com