Re: range_agg - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: range_agg |
Date | |
Msg-id | CAPpHfdvKrecVTSMmWxNAMveReAwP2taxb-vLOCF--Ne=KwMiow@mail.gmail.com Whole thread Raw |
In response to | Re: range_agg (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: range_agg
|
List | pgsql-hackers |
On Mon, Nov 30, 2020 at 11:39 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Mon, Nov 30, 2020 at 10:35 PM Alexander Korotkov > <aekorotkov@gmail.com> wrote: > > On Sun, Nov 29, 2020 at 11:53 PM Paul A Jungwirth > > <pj@illuminatedcomputing.com> wrote: > > > > > > On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov > > > <aekorotkov@gmail.com> wrote: > > > > Thank you. Could you please, update doc/src/sgml/catalogs.sgml, > > > > because pg_type and pg_range catalogs are updated. > > > > > > Attached! :-) > > > > You're quick, thank you. Please, also take a look at cfbot failure > > https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/746623942 > > I've tried to reproduce it, but didn't manage yet. > > Got it. type_sanity test fails on any platform, you just need to > repeat "make check" till it fails. > > The failed query checked consistency of range types, but it didn't > take into account ranges of domains and ranges of records, which are > exercised by multirangetypes test running in parallel. We could teach > this query about such kinds of ranges, but I think that would be > overkill, because we're not going to introduce such builtin ranges > yet. So, I'm going to just move multirangetypes test into another > group of parallel tests. I also found a problem in multirange types naming logic. Consider the following example. create type a_multirange AS (x float, y float); create type a as range(subtype=text, collation="C"); create table tbl (x __a_multirange); drop type a_multirange; If you dump this database, the dump couldn't be restored. The multirange type is named __a_multirange, because the type named a_multirange already exists. However, it might appear that a_multirange type is already deleted. When the dump is restored, a multirange type is named a_multirange, and the corresponding table fails to be created. The same thing doesn't happen with arrays, because arrays are not referenced in dumps by their internal names. I think we probably should add an option to specify multirange type names while creating a range type. Then dump can contain exact type names used in the database, and restore wouldn't have a names collision. Another thing that worries me is the multirange serialization format. typedef struct { int32 vl_len_; /* varlena header */ char flags; /* range flags */ char _padding; /* Bounds must be aligned */ /* Following the header are zero to two bound values. */ } ShortRangeType; Comment says this structure doesn't contain a varlena header, while structure obviously has it. In general, I wonder if we can make the binary format of multiranges more efficient. It seems that every function involving multiranges from multirange_deserialize(). I think we can make functions like multirange_contains_elem() much more efficient. Multirange is basically an array of ranges. So we can pack it as follows. 1. Typeid and rangecount 2. Tightly packed array of flags (1-byte for each range) 3. Array of indexes of boundaries (4-byte for each range). Or even better we can combine offsets and lengths to be compression-friendly like jsonb JEntry's do. 4. Boundary values Using this format, we can implement multirange_contains_elem(), multirange_contains_range() without deserialization and using binary search. That would be much more efficient. What do you think? ------ Regards, Alexander Korotkov
pgsql-hackers by date: