Re: RFC: compression dictionaries for JSONB - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: RFC: compression dictionaries for JSONB
Date
Msg-id CAEze2WiUEhP6gSMTumZZR7P=7-FttT8jHaadEa2KM3wxdLiv7A@mail.gmail.com
Whole thread Raw
In response to Re: RFC: compression dictionaries for JSONB  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: RFC: compression dictionaries for JSONB
List pgsql-hackers
On Wed, 13 Oct 2021 at 11:48, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Matthias,
>
> > Assuming this above is option 1. If I understand correctly, this
> > option was 'adapt the data type so that it understands how to handle a
> > shared dictionary, decreasing storage requirements'.
> > [...]
> > Assuming this was the 2nd option. If I understand correctly, this
> > option is effectively 'adapt or wrap TOAST to understand and handle
> > dictionaries for dictionary encoding common values'.
>
> Yes, exactly.
>
> > I think that an 'universal dictionary encoder' would be useful, but
> > that a data type might also have good reason to implement their
> > replacement methods by themselves for better overall performance (such
> > as maintaining partial detoast support in dictionaried items, or
> > overall lower memory footprint, or ...). As such, I'd really
> > appreciate it if Option 1 is not ruled out by any implementation of
> > Option 2.
>
> I agree, having the benefits of two approaches in one feature would be
> great. However, I'm having some difficulties imagining how the
> implementation would look like in light of the pros and cons stated
> above. I could use some help here.
>
> One approach I can think of is introducing a new entity, let's call it
> "dictionary compression method". The idea is similar to access methods
> and tableam's. There is a set of callbacks the dictionary compression
> method should implement, some are mandatory, some can be set to NULL.

You might also want to look into the  'pluggable compression support'
[0] and  'Custom compression methods' [1] threads for inspiration, as
that seems very similar to what was originally proposed there. (†)

One important difference from those discussed at [0][1] is that the
compression proposed here is at the type level, while the compression
proposed in both 'Pluggable compression support' and 'Custom
compression methods' is at the column / table / server level.

> Users can specify the compression method for the dictionary:
>
> ```
> CREATE TYPE name AS DICTIONARY OF JSONB (
>   compression_method = 'jsonb_best_compression'
>   -- compression_methods = 'jsonb_fastest_partial_decompression'
>   -- if not specified, some default compression method is used
> );
> ```
>
> JSONB is maybe not the best example of the type for which people may
> need multiple compression methods in practice. But I can imagine how
> overwriting a compression method for, let's say, arrays in an
> extension could be beneficial depending on the application.
>
> This approach will make an API well-defined and, more importantly,
> extendable. In the future, we could add additional (optional) methods
> for particular scenarios, like partial decompression.
>
> Does it sound like a reasonable approach?

Yes, I think that's doable.


Kind regards,

Matthias

(†): 'Custom compression methods' eventually got committed in an
entirely different state by the way of commit bbe0a81db, where LZ4 is
now a toast compression option that can be configured at the column /
system level. This is a hard-coded compression method, so no
infrastructure (or at least, API) is available for custom compression
methods in that code.

[0] https://www.postgresql.org/message-id/flat/20130614230142.GC19641%40awork2.anarazel.de
[1] https://www.postgresql.org/message-id/flat/20170907194236.4cefce96@wp.localdomain



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: prevent immature WAL streaming
Next
From: Andres Freund
Date:
Subject: Re: [RFC] building postgres with meson