Thread: pgsql: Multirange datatypes

pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
Multirange datatypes

Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.

Since v14, each range type automatically gets a corresponding multirange
datatype.  There are both manual and automatic mechanisms for naming multirange
types.  Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE.  Otherwise, a multirange type name is generated
automatically.  If the range type name contains "range" then we change that to
"multirange".  Otherwise, we add "_multirange" to the end.

Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids.  Altogether this format allows fetching a particular range by its index
in O(n).

Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.

Catversion is bumped.

Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion:
https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/6df7a9698bb036610c1e8c6d375e1be38cb26d5f

Modified Files
--------------
doc/src/sgml/catalogs.sgml                       |   15 +-
doc/src/sgml/datatype.sgml                       |   23 +
doc/src/sgml/extend.sgml                         |   59 +-
doc/src/sgml/func.sgml                           |  656 +++++-
doc/src/sgml/rangetypes.sgml                     |   67 +-
doc/src/sgml/ref/create_type.sgml                |   21 +
src/backend/catalog/pg_range.c                   |   10 +-
src/backend/catalog/pg_type.c                    |  118 +-
src/backend/commands/typecmds.c                  |  357 ++-
src/backend/executor/functions.c                 |    3 +-
src/backend/parser/parse_coerce.c                |  350 ++-
src/backend/utils/adt/Makefile                   |    2 +
src/backend/utils/adt/multirangetypes.c          | 2679 ++++++++++++++++++++++
src/backend/utils/adt/multirangetypes_selfuncs.c | 1320 +++++++++++
src/backend/utils/adt/pg_upgrade_support.c       |   22 +
src/backend/utils/adt/pseudotypes.c              |   38 +
src/backend/utils/adt/rangetypes.c               |  177 +-
src/backend/utils/adt/rangetypes_typanalyze.c    |   78 +-
src/backend/utils/cache/lsyscache.c              |   62 +-
src/backend/utils/cache/syscache.c               |   12 +
src/backend/utils/cache/typcache.c               |   98 +-
src/backend/utils/fmgr/funcapi.c                 |  180 +-
src/bin/pg_dump/pg_dump.c                        |  166 +-
src/bin/pg_dump/pg_dump.h                        |    1 +
src/bin/pg_dump/t/002_pg_dump.pl                 |    2 +
src/include/access/tupmacs.h                     |    4 +-
src/include/catalog/binary_upgrade.h             |    2 +
src/include/catalog/catversion.h                 |    2 +-
src/include/catalog/pg_aggregate.dat             |   11 +
src/include/catalog/pg_amop.dat                  |   22 +
src/include/catalog/pg_amproc.dat                |   12 +-
src/include/catalog/pg_cast.dat                  |   13 +
src/include/catalog/pg_opclass.dat               |    4 +
src/include/catalog/pg_operator.dat              |  169 ++
src/include/catalog/pg_opfamily.dat              |    4 +
src/include/catalog/pg_proc.dat                  |  282 +++
src/include/catalog/pg_range.dat                 |   15 +-
src/include/catalog/pg_range.h                   |    9 +-
src/include/catalog/pg_type.dat                  |   45 +
src/include/catalog/pg_type.h                    |   10 +-
src/include/commands/typecmds.h                  |    2 +
src/include/utils/lsyscache.h                    |    3 +
src/include/utils/multirangetypes.h              |  116 +
src/include/utils/rangetypes.h                   |   12 +
src/include/utils/selfuncs.h                     |    3 +
src/include/utils/syscache.h                     |    1 +
src/include/utils/typcache.h                     |   38 +-
src/pl/plpgsql/src/pl_comp.c                     |    6 +
src/test/regress/expected/dependency.out         |    1 +
src/test/regress/expected/hash_func.out          |   13 +
src/test/regress/expected/multirangetypes.out    | 2466 ++++++++++++++++++++
src/test/regress/expected/opr_sanity.out         |   32 +-
src/test/regress/expected/plpgsql.out            |    4 +-
src/test/regress/expected/polymorphism.out       |  125 +-
src/test/regress/expected/rangefuncs.out         |    2 +-
src/test/regress/expected/rangetypes.out         |   42 +-
src/test/regress/expected/sanity_check.out       |    2 +
src/test/regress/expected/type_sanity.out        |   46 +-
src/test/regress/parallel_schedule               |    6 +-
src/test/regress/serial_schedule                 |    1 +
src/test/regress/sql/hash_func.sql               |   10 +
src/test/regress/sql/multirangetypes.sql         |  677 ++++++
src/test/regress/sql/opr_sanity.sql              |   24 +-
src/test/regress/sql/polymorphism.sql            |   52 +
src/test/regress/sql/rangetypes.sql              |   14 +
src/test/regress/sql/type_sanity.sql             |   16 +-
src/tools/pgindent/typedefs.list                 |    4 +
67 files changed, 10568 insertions(+), 270 deletions(-)


Re: pgsql: Multirange datatypes

From
Tom Lane
Date:
Alexander Korotkov <akorotkov@postgresql.org> writes:
> Multirange datatypes

crake's not happy with this.  Looks like you did not think through
what pg_dump has to do for back-branch servers.

            regards, tom lane



Re: pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
On Sun, Dec 20, 2020 at 8:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <akorotkov@postgresql.org> writes:
> > Multirange datatypes
>
> crake's not happy with this.  Looks like you did not think through
> what pg_dump has to do for back-branch servers.

Thank you.  Should be fixed by 8344d72ccc.

------
Regards,
Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
On Sun, Dec 20, 2020 at 7:21 AM Alexander Korotkov
<akorotkov@postgresql.org> wrote:
> Implementation of multiranges comes with a space-efficient internal
> representation format, which evades extra paddings and duplicated storage of
> oids.  Altogether this format allows fetching a particular range by its index
> in O(n).

BTW, I mean O(1) here. O(n) would be hardly an achievement.

------
Regards,
Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Erik Rijkers
Date:
On 2020-12-20 05:21, Alexander Korotkov wrote:
> Multirange datatypes
> 
> Multiranges are basically sorted arrays of non-overlapping ranges with
> set-theoretic operations defined over them.
> 

With  --enable-cassert  gcc 10.1.0 gives me:

typecmds.c: In function ‘DefineRange’:
typecmds.c:1367:16: warning: variable ‘mltrngaddress’ set but not used 
[-Wunused-but-set-variable]
  1367 |  ObjectAddress mltrngaddress;
       |                ^~~~~~~~~~~~~

(Build on debian.)


Thanks,

Eik Rijkers




Re: pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
On Sun, Dec 20, 2020 at 9:06 AM Erik Rijkers <er@xs4all.nl> wrote:
> On 2020-12-20 05:21, Alexander Korotkov wrote:
> > Multirange datatypes
> >
> > Multiranges are basically sorted arrays of non-overlapping ranges with
> > set-theoretic operations defined over them.
> >
>
> With  --enable-cassert  gcc 10.1.0 gives me:
>
> typecmds.c: In function ‘DefineRange’:
> typecmds.c:1367:16: warning: variable ‘mltrngaddress’ set but not used
> [-Wunused-but-set-variable]
>   1367 |  ObjectAddress mltrngaddress;
>        |                ^~~~~~~~~~~~~

Thank you.  But are you sure that happened with --enable-cassert?
I've reproduced this only without --enable-cassert.  And that seems
reasonable, because there is an assert, which uses this variable.

------
Regards,
Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
On Sun, Dec 20, 2020 at 4:18 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Sun, Dec 20, 2020 at 9:06 AM Erik Rijkers <er@xs4all.nl> wrote:
> > On 2020-12-20 05:21, Alexander Korotkov wrote:
> > > Multirange datatypes
> > >
> > > Multiranges are basically sorted arrays of non-overlapping ranges with
> > > set-theoretic operations defined over them.
> > >
> >
> > With  --enable-cassert  gcc 10.1.0 gives me:
> >
> > typecmds.c: In function ‘DefineRange’:
> > typecmds.c:1367:16: warning: variable ‘mltrngaddress’ set but not used
> > [-Wunused-but-set-variable]
> >   1367 |  ObjectAddress mltrngaddress;
> >        |                ^~~~~~~~~~~~~
>
> Thank you.  But are you sure that happened with --enable-cassert?
> I've reproduced this only without --enable-cassert.  And that seems
> reasonable, because there is an assert, which uses this variable.

Warning arising without --enable-cassert should be fixed by
11072e8693.  Let me know if the problem persists for you.

------
Regards,
Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Erik Rijkers
Date:
On 2020-12-20 14:18, Alexander Korotkov wrote:
> On Sun, Dec 20, 2020 at 9:06 AM Erik Rijkers <er@xs4all.nl> wrote:
>> On 2020-12-20 05:21, Alexander Korotkov wrote:
>> > Multirange datatypes
>> >
>> > Multiranges are basically sorted arrays of non-overlapping ranges with
>> > set-theoretic operations defined over them.
>> >
>> 
>> With  --enable-cassert  gcc 10.1.0 gives me:
>> 
>> typecmds.c: In function ‘DefineRange’:
>> typecmds.c:1367:16: warning: variable ‘mltrngaddress’ set but not used
>> [-Wunused-but-set-variable]
>>   1367 |  ObjectAddress mltrngaddress;
>>        |                ^~~~~~~~~~~~~
> 
> Thank you.  But are you sure that happened with --enable-cassert?
> I've reproduced this only without --enable-cassert.  And that seems
> reasonable, because there is an assert, which uses this variable.

Sorry, you're right, of course. I misread my output.

It occurs only without --enable-cassert

> 
> ------
> Regards,
> Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
On Sun, Dec 20, 2020 at 8:16 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Sun, Dec 20, 2020 at 8:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alexander Korotkov <akorotkov@postgresql.org> writes:
> > > Multirange datatypes
> >
> > crake's not happy with this.  Looks like you did not think through
> > what pg_dump has to do for back-branch servers.
>
> Thank you.  Should be fixed by 8344d72ccc.

There also appears to be some floating bug appearing here and there.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2020-12-20%2007%3A40%3A25
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-12-20%2004%3A25%3A18
I'm investigating it.

------
Regards,
Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
Hi!

On Sun, Dec 20, 2020 at 4:37 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Sun, Dec 20, 2020 at 8:16 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Sun, Dec 20, 2020 at 8:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > Alexander Korotkov <akorotkov@postgresql.org> writes:
> > > > Multirange datatypes
> > >
> > > crake's not happy with this.  Looks like you did not think through
> > > what pg_dump has to do for back-branch servers.
> >
> > Thank you.  Should be fixed by 8344d72ccc.
>
> There also appears to be some floating bug appearing here and there.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2020-12-20%2007%3A40%3A25
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-12-20%2004%3A25%3A18
> I'm investigating it.

It seems that my commit is reliably crashing gharial.  Could I get
access to gharial to investigate this issue?

------
Regards,
Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
On Sun, Dec 20, 2020 at 5:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Sun, Dec 20, 2020 at 4:37 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Sun, Dec 20, 2020 at 8:16 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > On Sun, Dec 20, 2020 at 8:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > > Alexander Korotkov <akorotkov@postgresql.org> writes:
> > > > > Multirange datatypes
> > > >
> > > > crake's not happy with this.  Looks like you did not think through
> > > > what pg_dump has to do for back-branch servers.
> > >
> > > Thank you.  Should be fixed by 8344d72ccc.
> >
> > There also appears to be some floating bug appearing here and there.
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2020-12-20%2007%3A40%3A25
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-12-20%2004%3A25%3A18
> > I'm investigating it.
>
> It seems that my commit is reliably crashing gharial.  Could I get
> access to gharial to investigate this issue?

No more need for this.  I've managed to reproduce this locally.  It
fails during selectivity estimation.  Still investigating...

------
Regards,
Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Sandeep Thakkar
Date:
Hi

On Mon, Dec 21, 2020 at 1:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sun, Dec 20, 2020 at 5:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Sun, Dec 20, 2020 at 4:37 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Sun, Dec 20, 2020 at 8:16 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > On Sun, Dec 20, 2020 at 8:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > > Alexander Korotkov <akorotkov@postgresql.org> writes:
> > > > > Multirange datatypes
> > > >
> > > > crake's not happy with this.  Looks like you did not think through
> > > > what pg_dump has to do for back-branch servers.
> > >
> > > Thank you.  Should be fixed by 8344d72ccc.
> >
> > There also appears to be some floating bug appearing here and there.
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2020-12-20%2007%3A40%3A25
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-12-20%2004%3A25%3A18
> > I'm investigating it.
>
> It seems that my commit is reliably crashing gharial.  Could I get
> access to gharial to investigate this issue?

No more need for this.  I've managed to reproduce this locally.  It
fails during selectivity estimation.  Still investigating...

Sure, if you need anything from me then do let me know.
 
------
Regards,
Alexander Korotkov




--
Sandeep Thakkar


Re: pgsql: Multirange datatypes

From
Michael Paquier
Date:
Hi Alexander,

On Sun, Dec 20, 2020 at 04:21:16AM +0000, Alexander Korotkov wrote:
> Multirange datatypes
>
> Multiranges are basically sorted arrays of non-overlapping ranges with
> set-theoretic operations defined over them.

This commit is creating a compilation warning on Windows:
multirangetypes.c(1033): warning C4715: 'multirange_constructor0' :
not all control paths return a value

See for example woodlouse:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=woodlouse&dt=2020-12-21%2001%3A42%3A28&stg=make

Looking at the code, it is obvious that the compiler cannot understand
that this should always return.  Please find attached a suggestion of
patch.

Thanks,
--
Michael

Attachment

Re: pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
Hi, Michael!

On Mon, Dec 21, 2020 at 10:34 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Sun, Dec 20, 2020 at 04:21:16AM +0000, Alexander Korotkov wrote:
> > Multirange datatypes
> >
> > Multiranges are basically sorted arrays of non-overlapping ranges with
> > set-theoretic operations defined over them.
>
> This commit is creating a compilation warning on Windows:
> multirangetypes.c(1033): warning C4715: 'multirange_constructor0' :
> not all control paths return a value
>
> See for example woodlouse:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=woodlouse&dt=2020-12-21%2001%3A42%3A28&stg=make
>
> Looking at the code, it is obvious that the compiler cannot understand
> that this should always return.  Please find attached a suggestion of
> patch.

Thank you for catching this!  Pushed.

------
Regards,
Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
On Sun, Dec 20, 2020 at 11:02 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> On Sun, Dec 20, 2020 at 5:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Sun, Dec 20, 2020 at 4:37 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > On Sun, Dec 20, 2020 at 8:16 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > There also appears to be some floating bug appearing here and there.
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2020-12-20%2007%3A40%3A25
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2020-12-20%2004%3A25%3A18
> > > I'm investigating it.
> >
> > It seems that my commit is reliably crashing gharial.  Could I get
> > access to gharial to investigate this issue?
>
> No more need for this.  I've managed to reproduce this locally.  It
> fails during selectivity estimation.  Still investigating...

Should be fixed by 4e1ee79e31.

------
Regards,
Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Michael Paquier
Date:
On Mon, Dec 21, 2020 at 02:31:44PM +0300, Alexander Korotkov wrote:
> Thank you for catching this!  Pushed.

Thanks for the commit, Alexander.
--
Michael

Attachment

Re: pgsql: Multirange datatypes

From
Tom Lane
Date:
Alexander Korotkov <akorotkov@postgresql.org> writes:
> Multirange datatypes

Multiple buildfarm members are issuing similar warnings about
multirangetypes.c:

 calliphoridae | 2020-12-29 23:10:11 |
/home/andres/build/buildfarm-calliphoridae/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/multirangetypes.c:218:37:
warning:suggest braces around empty body in an 'else' statement [-Wempty-body] 
 calliphoridae | 2020-12-29 23:10:11 |
/home/andres/build/buildfarm-calliphoridae/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/multirangetypes.c:236:37:
warning:suggest braces around empty body in an 'else' statement [-Wempty-body] 

These are evidently unhappy with code like

                if ...
                   ...
                }
                else
                     /* include it in range_str */ ;

So apparently preferred style is more like

                else
                {
                     /* include it in range_str */
                }

I don't particularly care for the wording of this comment, either:
it looks like it's describing an action that's about to be taken,
only there's no action there.  Could it say something more like
"we already included it in range_str", or "we'll include it in
range_str below the switch", or whatever's appropriate?

            regards, tom lane



Re: pgsql: Multirange datatypes

From
Alexander Korotkov
Date:
On Wed, Dec 30, 2020 at 4:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <akorotkov@postgresql.org> writes:
> > Multirange datatypes
>
> Multiple buildfarm members are issuing similar warnings about
> multirangetypes.c:
>
>  calliphoridae | 2020-12-29 23:10:11 |
/home/andres/build/buildfarm-calliphoridae/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/multirangetypes.c:218:37:
warning:suggest braces around empty body in an 'else' statement [-Wempty-body]
 
>  calliphoridae | 2020-12-29 23:10:11 |
/home/andres/build/buildfarm-calliphoridae/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/multirangetypes.c:236:37:
warning:suggest braces around empty body in an 'else' statement [-Wempty-body]
 
>
> These are evidently unhappy with code like
>
>                 if ...
>                    ...
>                 }
>                 else
>                      /* include it in range_str */ ;
>
> So apparently preferred style is more like
>
>                 else
>                 {
>                      /* include it in range_str */
>                 }
>
> I don't particularly care for the wording of this comment, either:
> it looks like it's describing an action that's about to be taken,
> only there's no action there.  Could it say something more like
> "we already included it in range_str", or "we'll include it in
> range_str below the switch", or whatever's appropriate?

Thank you for noticing.  I've slightly refactored this function in 16d531a30a.

------
Regards,
Alexander Korotkov



Re: pgsql: Multirange datatypes

From
Tom Lane
Date:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Wed, Dec 30, 2020 at 4:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Multiple buildfarm members are issuing similar warnings about
>> multirangetypes.c:
>> calliphoridae | 2020-12-29 23:10:11 |
/home/andres/build/buildfarm-calliphoridae/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/multirangetypes.c:218:37:
warning:suggest braces around empty body in an 'else' statement [-Wempty-body] 
>> calliphoridae | 2020-12-29 23:10:11 |
/home/andres/build/buildfarm-calliphoridae/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/multirangetypes.c:236:37:
warning:suggest braces around empty body in an 'else' statement [-Wempty-body] 

> Thank you for noticing.  I've slightly refactored this function in 16d531a30a.

Looks good, thanks!

            regards, tom lane