Re: RFC: compression dictionaries for JSONB - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: RFC: compression dictionaries for JSONB |
Date | |
Msg-id | CAEze2WiUEhP6gSMTumZZR7P=7-FttT8jHaadEa2KM3wxdLiv7A@mail.gmail.com Whole thread Raw |
In response to | Re: RFC: compression dictionaries for JSONB (Aleksander Alekseev <aleksander@timescale.com>) |
Responses |
Re: RFC: compression dictionaries for JSONB
|
List | pgsql-hackers |
On Wed, 13 Oct 2021 at 11:48, Aleksander Alekseev <aleksander@timescale.com> wrote: > > Hi Matthias, > > > Assuming this above is option 1. If I understand correctly, this > > option was 'adapt the data type so that it understands how to handle a > > shared dictionary, decreasing storage requirements'. > > [...] > > Assuming this was the 2nd option. If I understand correctly, this > > option is effectively 'adapt or wrap TOAST to understand and handle > > dictionaries for dictionary encoding common values'. > > Yes, exactly. > > > I think that an 'universal dictionary encoder' would be useful, but > > that a data type might also have good reason to implement their > > replacement methods by themselves for better overall performance (such > > as maintaining partial detoast support in dictionaried items, or > > overall lower memory footprint, or ...). As such, I'd really > > appreciate it if Option 1 is not ruled out by any implementation of > > Option 2. > > I agree, having the benefits of two approaches in one feature would be > great. However, I'm having some difficulties imagining how the > implementation would look like in light of the pros and cons stated > above. I could use some help here. > > One approach I can think of is introducing a new entity, let's call it > "dictionary compression method". The idea is similar to access methods > and tableam's. There is a set of callbacks the dictionary compression > method should implement, some are mandatory, some can be set to NULL. You might also want to look into the 'pluggable compression support' [0] and 'Custom compression methods' [1] threads for inspiration, as that seems very similar to what was originally proposed there. (†) One important difference from those discussed at [0][1] is that the compression proposed here is at the type level, while the compression proposed in both 'Pluggable compression support' and 'Custom compression methods' is at the column / table / server level. > Users can specify the compression method for the dictionary: > > ``` > CREATE TYPE name AS DICTIONARY OF JSONB ( > compression_method = 'jsonb_best_compression' > -- compression_methods = 'jsonb_fastest_partial_decompression' > -- if not specified, some default compression method is used > ); > ``` > > JSONB is maybe not the best example of the type for which people may > need multiple compression methods in practice. But I can imagine how > overwriting a compression method for, let's say, arrays in an > extension could be beneficial depending on the application. > > This approach will make an API well-defined and, more importantly, > extendable. In the future, we could add additional (optional) methods > for particular scenarios, like partial decompression. > > Does it sound like a reasonable approach? Yes, I think that's doable. Kind regards, Matthias (†): 'Custom compression methods' eventually got committed in an entirely different state by the way of commit bbe0a81db, where LZ4 is now a toast compression option that can be configured at the column / system level. This is a hard-coded compression method, so no infrastructure (or at least, API) is available for custom compression methods in that code. [0] https://www.postgresql.org/message-id/flat/20130614230142.GC19641%40awork2.anarazel.de [1] https://www.postgresql.org/message-id/flat/20170907194236.4cefce96@wp.localdomain
pgsql-hackers by date: