Re: range_agg - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: range_agg |
Date | |
Msg-id | CAPpHfdtRY10Acg4LNCnj6uu0RAF6QZWzmPHd8qcc1aorub-1AQ@mail.gmail.com Whole thread Raw |
In response to | Re: range_agg (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Responses |
Re: range_agg
|
List | pgsql-hackers |
On Tue, Dec 8, 2020 at 3:00 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2020-Dec-08, Alexander Korotkov wrote: > > > I also found a problem in multirange types naming logic. Consider the > > following example. > > > > create type a_multirange AS (x float, y float); > > create type a as range(subtype=text, collation="C"); > > create table tbl (x __a_multirange); > > drop type a_multirange; > > > > If you dump this database, the dump couldn't be restored. The > > multirange type is named __a_multirange, because the type named > > a_multirange already exists. However, it might appear that > > a_multirange type is already deleted. When the dump is restored, a > > multirange type is named a_multirange, and the corresponding table > > fails to be created. The same thing doesn't happen with arrays, > > because arrays are not referenced in dumps by their internal names. > > > > I think we probably should add an option to specify multirange type > > names while creating a range type. Then dump can contain exact type > > names used in the database, and restore wouldn't have a names > > collision. > > Hmm, good point. I agree that a dump must preserve the name, since once > created it is user-visible. I had not noticed this problem, but it's > obvious in retrospect. > > > In general, I wonder if we can make the binary format of multiranges > > more efficient. It seems that every function involving multiranges > > from multirange_deserialize(). I think we can make functions like > > multirange_contains_elem() much more efficient. Multirange is > > basically an array of ranges. So we can pack it as follows. > > 1. Typeid and rangecount > > 2. Tightly packed array of flags (1-byte for each range) > > 3. Array of indexes of boundaries (4-byte for each range). Or even > > better we can combine offsets and lengths to be compression-friendly > > like jsonb JEntry's do. > > 4. Boundary values > > Using this format, we can implement multirange_contains_elem(), > > multirange_contains_range() without deserialization and using binary > > search. That would be much more efficient. What do you think? > > I also agree. I spent some time staring at the I/O code a couple of > months back but was unable to focus on it for long enough. I don't know > JEntry's format, but I do remember that the storage format for JSONB was > widely discussed back then; it seems wise to apply similar logic or at > least similar reasoning. Thank you for your feedback! I'd like to publish my revision of the patch. So Paul could start from it. The changes I made are minor 1. Add missing types to typedefs.list 2. Run pg_indent run over the changed files and some other formatting changes 3. Reorder the regression tests to evade the error spotted by commitfest.cputube.org I'm switching this patch to WOA. ------ Regards, Alexander Korotkov
Attachment
pgsql-hackers by date: