Re: RFC: compression dictionaries for JSONB - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: RFC: compression dictionaries for JSONB
Date
Msg-id CAJ7c6TM7z=cBbD8F76E3CnjxTgOoQGFmzPovs0hFjTPn4BO3+A@mail.gmail.com
Whole thread Raw
In response to Re: RFC: compression dictionaries for JSONB  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: RFC: compression dictionaries for JSONB  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
Hi Matthias,

> Assuming this above is option 1. If I understand correctly, this
> option was 'adapt the data type so that it understands how to handle a
> shared dictionary, decreasing storage requirements'.
> [...]
> Assuming this was the 2nd option. If I understand correctly, this
> option is effectively 'adapt or wrap TOAST to understand and handle
> dictionaries for dictionary encoding common values'.

Yes, exactly.

> I think that an 'universal dictionary encoder' would be useful, but
> that a data type might also have good reason to implement their
> replacement methods by themselves for better overall performance (such
> as maintaining partial detoast support in dictionaried items, or
> overall lower memory footprint, or ...). As such, I'd really
> appreciate it if Option 1 is not ruled out by any implementation of
> Option 2.

I agree, having the benefits of two approaches in one feature would be
great. However, I'm having some difficulties imagining how the
implementation would look like in light of the pros and cons stated
above. I could use some help here.

One approach I can think of is introducing a new entity, let's call it
"dictionary compression method". The idea is similar to access methods
and tableam's. There is a set of callbacks the dictionary compression
method should implement, some are mandatory, some can be set to NULL.
Users can specify the compression method for the dictionary:

```
CREATE TYPE name AS DICTIONARY OF JSONB (
  compression_method = 'jsonb_best_compression'
  -- compression_methods = 'jsonb_fastest_partial_decompression'
  -- if not specified, some default compression method is used
);
```

JSONB is maybe not the best example of the type for which people may
need multiple compression methods in practice. But I can imagine how
overwriting a compression method for, let's say, arrays in an
extension could be beneficial depending on the application.

This approach will make an API well-defined and, more importantly,
extendable. In the future, we could add additional (optional) methods
for particular scenarios, like partial decompression.

Does it sound like a reasonable approach?

-- 
Best regards,
Aleksander Alekseev



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: should we allow users with a predefined role to access pg_backend_memory_contexts view and pg_log_backend_memory_contexts function?gr
Next
From: Etsuro Fujita
Date:
Subject: Re: postgres_fdw: misplaced? comments in connection.c