Re: RFC: compression dictionaries for JSONB - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: RFC: compression dictionaries for JSONB
Date
Msg-id CAEze2Wjti+=fcu-DFb9aLATJaqCjzKmtuu94XGA_Zc0Sksk+HA@mail.gmail.com
Whole thread Raw
In response to Re: RFC: compression dictionaries for JSONB  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: RFC: compression dictionaries for JSONB
List pgsql-hackers
On Fri, 8 Oct 2021 at 21:21, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2021-Oct-08, Matthias van de Meent wrote:
>
> > That's a good point, but if we're extending this syntax to allow the
> > ability of including other types, then I'd instead extend the syntax
> > that of below, so that the type of the dictionary entries is required
> > in the syntax:
> >
> > CREATE TYPE name AS DICTIONARY OF jsonb [ ( ...entries ) ] [ WITH (
> > ...options ) ];
>
> I don't think this gives you any guarantees of the sort you seem to
> expect.  See CREATE AGGREGATE as a precedent where there are some
> options in the parenthesized options list you cannot omit.

Bikeshedding on syntax:
I guess? I don't really like 'required options' patterns. If you're
required to use/specify an option, then it's not optional, and should
thus not be included in the group of 'options'.

> > > The pg_type entry would have to provide some support procedure that
> > > makes use of the dictionary in some way.  This seems better than tying
> > > the SQL object to a specific type.
> >
> > Agreed, but this might mean that much more effort would be required to
> > get such a useful quality-of-life feature committed.
>
> I don't understand what you mean by that.  I'm not saying that the patch
> has to provide support for any additional datatypes.  Its only
> obligation would be to provide a new column in pg_type which is zero for
> all rows except jsonb, and in that row it is the OID of a
> jsonb_dictionary() function that's called from all the right places and
> receives all the right arguments.

This seems feasable to do, but I still have limited knowledge on the
intricacies of the type system, and as such I don't see how this part
would function:

I was expecting more something in the line of how array types seem to
work: Type _A is an array type, containing elements of Type A. It's
containing type is defined in pg_type.typbasetype. No special
functions are defined on base types to allow their respective array
types, that part is handled by the array infrastructure. Same for
Domain types.

Now that I think about it, we should still provide the information on
_how_ to find the type functions for the dictionaried type: Arrays and
domains are generic, but dictionaries will require deep understanding
of the underlying type.

So, yes, you are correct, there should be one more function, which
would supply the necessary pg_type functions that CREATE TYPE
DICTIONARY can then register in the pg_type entry of the dictionary
type. The alternative would initially be hardcoding this for the base
types that have dictionary support, which definitely would be possible
for a first iteration, but wouldn't be great.


Kind regards,

Matthias



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: should we allow users with a predefined role to access pg_backend_memory_contexts view and pg_log_backend_memory_contexts function?
Next
From: Bharath Rupireddy
Date:
Subject: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()