Re: range_agg - Mailing list pgsql-hackers

From Paul Jungwirth
Subject Re: range_agg
Date
Msg-id 0b0de089-4987-0ae9-615d-a07b4f8ee5aa@illuminatedcomputing.com
Whole thread Raw
In response to Re: range_agg  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Thanks for looking at this again!

On 3/4/20 1:33 PM, Alvaro Herrera wrote:
> I came across an interesting thing, namely multirange_canonicalize()'s
> use of qsort_arg with a callback of range_compare().  range_compare()
> calls range_deserialize() (non-trivial parsing) for each input range;
> multirange_canonicalize() later does a few extra deserialize calls of
> its own.  Call me a premature optimization guy if you will, but I think
> it makes sense to have a different struct (let's call it
> "InMemoryRange") which stores the parsed representation of each range;
> then we can deserialize all ranges up front, and use that as many times
> as needed, without having to deserialize each range every time.

I don't know, this sounds like a drastic change. I agree that 
multirange_deserialize and range_deserialize do a lot of copying (not 
really any parsing though, and they both assume their inputs are already 
de-TOASTED). But they are used very extensively, so if you wanted to 
remove them you'd have to rewrite a lot.

I interpreted the intention of range_deserialize to be a way to keep the 
range struct fairly "private" and give a standard interface to 
extracting its attributes. Its motive seems akin to deconstruct_array. 
So I wrote multirange_deserialize to follow that principle. Both 
functions also handle memory alignment issues for you. With 
multirange_deserialize, there isn't actually much structure (just the 
list of ranges), so perhaps you could more easily omit it and give 
callers direct access into the multirange contents. That still seems 
risky though, and less well encapsulated.

My preference would be to see if these functions are really a 
performance problem first, and only redo the in-memory structures if 
they are. Also that seems like something you could do as a separate 
project. (I wouldn't mind working on it myself, although I'd prefer to 
do actual temporal database features first.) There are no 
backwards-compatibility concerns to changing the in-memory structure, 
right? (Even if there are, it's too late to avoid them for ranges.)

> While I'm at this, why not name the new file simply multiranges.c
> instead of multirangetypes.c?

As someone who doesn't do a lot of Postgres hacking, I tried to follow 
the approach in rangetypes.c as closely as I could, especially for 
naming things. So I named the file multirangetypes.c because there was 
already rangetypes.c. But also I can see how the "types" emphasizes that 
ranges and multiranges are not concrete types themselves, but more like 
abstract data types or generics (like arrays).

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Ought to use heap_multi_insert() for pg_attribute/dependinsertions?
Next
From: Paul Jungwirth
Date:
Subject: Re: useless RangeIOData->typiofunc