Re: range_agg - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: range_agg
Date
Msg-id CAPpHfdvKrecVTSMmWxNAMveReAwP2taxb-vLOCF--Ne=KwMiow@mail.gmail.com
Whole thread Raw
In response to Re: range_agg  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: range_agg
List pgsql-hackers
On Mon, Nov 30, 2020 at 11:39 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> On Mon, Nov 30, 2020 at 10:35 PM Alexander Korotkov
> <aekorotkov@gmail.com> wrote:
> > On Sun, Nov 29, 2020 at 11:53 PM Paul A Jungwirth
> > <pj@illuminatedcomputing.com> wrote:
> > >
> > > On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov
> > > <aekorotkov@gmail.com> wrote:
> > > > Thank you.  Could you please, update doc/src/sgml/catalogs.sgml,
> > > > because pg_type and pg_range catalogs are updated.
> > >
> > > Attached! :-)
> >
> > You're quick, thank you.  Please, also take a look at cfbot failure
> > https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/746623942
> > I've tried to reproduce it, but didn't manage yet.
>
> Got it.  type_sanity test fails on any platform, you just need to
> repeat "make check" till it fails.
>
> The failed query checked consistency of range types, but it didn't
> take into account ranges of domains and ranges of records, which are
> exercised by multirangetypes test running in parallel.  We could teach
> this query about such kinds of ranges, but I think that would be
> overkill, because we're not going to introduce such builtin ranges
> yet.  So, I'm going to just move multirangetypes test into another
> group of parallel tests.

I also found a problem in multirange types naming logic.  Consider the
following example.

create type a_multirange AS (x float, y float);
create type a as range(subtype=text, collation="C");
create table tbl (x __a_multirange);
drop type a_multirange;

If you dump this database, the dump couldn't be restored.  The
multirange type is named __a_multirange, because the type named
a_multirange already exists.  However, it might appear that
a_multirange type is already deleted.  When the dump is restored, a
multirange type is named a_multirange, and the corresponding table
fails to be created.  The same thing doesn't happen with arrays,
because arrays are not referenced in dumps by their internal names.

I think we probably should add an option to specify multirange type
names while creating a range type.  Then dump can contain exact type
names used in the database, and restore wouldn't have a names
collision.

Another thing that worries me is the multirange serialization format.

typedef struct
{
    int32       vl_len_;        /* varlena header */
    char        flags;          /* range flags */
    char        _padding;       /* Bounds must be aligned */
    /* Following the header are zero to two bound values. */
} ShortRangeType;

Comment says this structure doesn't contain a varlena header, while
structure obviously has it.

In general, I wonder if we can make the binary format of multiranges
more efficient.  It seems that every function involving multiranges
from multirange_deserialize().  I think we can make functions like
multirange_contains_elem() much more efficient.  Multirange is
basically an array of ranges.  So we can pack it as follows.
1. Typeid and rangecount
2. Tightly packed array of flags (1-byte for each range)
3. Array of indexes of boundaries (4-byte for each range).  Or even
better we can combine offsets and lengths to be compression-friendly
like jsonb JEntry's do.
4. Boundary values
Using this format, we can implement multirange_contains_elem(),
multirange_contains_range() without deserialization and using binary
search.  That would be much more efficient.  What do you think?

------
Regards,
Alexander Korotkov



pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: Consider parallel for lateral subqueries with limit
Next
From: Alvaro Herrera
Date:
Subject: Re: range_agg