Home > mailing lists

[PATCH] Compression dictionaries for JSONB - Mailing list pgsql-hackers

From	Aleksander Alekseev
Subject	[PATCH] Compression dictionaries for JSONB
Date	April 22, 2022 08:30:01
Msg-id	CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com Whole thread
Responses	Re: [PATCH] Compression dictionaries for JSONB Re: [PATCH] Compression dictionaries for JSONB
List	pgsql-hackers

Tree view

Hi hackers,

This is a follow-up thread to `RFC: compression dictionaries for JSONB` [1]. I would like to share my current progress in order to get early feedback. The patch is currently in a draft state but implements the basic functionality. I did my best to account for all the great feedback I previously got from Alvaro and Matthias.

Usage example:

```
CREATE TYPE mydict AS DICTIONARY OF jsonb ('aaa', 'bbb');

SELECT '{"aaa":"bbb"}' :: mydict;
mydict
----------------
{"aaa": "bbb"}

SELECT ('{"aaa":"bbb"}' :: mydict) -> 'aaa';
?column?
----------
"bbb"
```

Here `mydict` works as a transparent replacement for `jsonb`. However, its internal representation differs. The provided dictionary entries ('aaa', 'bbb') are stored in the new catalog table:

```
SELECT * FROM pg_dict;
oid | dicttypid | dictentry
-------+-----------+-----------
39476 | 39475 | aaa
39477 | 39475 | bbb

(2 rows)

```

When `mydict` sees 'aaa' in the document, it replaces it with the corresponding code, in this case - 39476. For more details regarding the compression algorithm and choosen compromises please see the comments in the patch.

In pg_type `mydict` has typtype = TYPTYPE_DICT. It works the same way as TYPTYPE_BASE with only difference: corresponding `<type>_in` (pg_type.typinput) and `<another-type>_<type>` (pg_cast.castfunc) procedures receive the dictionary Oid as a `typmod` argument. This way the procedures can distinguish `mydict1` from `mydict2` and use the proper compression dictionary.

The approach with alternative `typmod` role is arguably a bit hacky, but it was the less invasive way to implement the feature I've found. I'm open to alternative suggestions.

Current limitations (todo):
- ALTER TYPE is not implemented
- Tests and documentation are missing
- Autocomplete is missing

Future work (out of scope of this patch):
- Support types other than JSONB: TEXT, XML, etc
- Automatically updated dictionaries, e.g. during VACUUM
- Alternative compression algorithms. Note that this will not require any further changes in the catalog, only the values we write to pg_type and pg_cast will differ.

Open questions:
- Dictionary entries are currently stored as NameData, the same type that is used for enums. Are we OK with the accompanying limitations? Any alternative suggestions?

- All in all, am I moving the right direction?

Your feedback is very much welcomed!

[1]: https://postgr.es/m/CAJ7c6TPx7N-bVw0dZ1ASCDQKZJHhBYkT6w4HV1LzfS%2BUUTUfmA%40mail.gmail.com

--
Best regards,
Aleksander Alekseev

Attachment

v1-0001-CREATE-TYPE-foo-AS-DICTIONARY-OF-JSONB.patch

pgsql-hackers by date:

From: Magnus Hagander
Date: 22 April 2022, 07:49:34
Subject: Re: Estimating HugePages Requirements?

From: Aleksander Alekseev
Date: 22 April 2022, 08:33:56
Subject: Re: RFC: compression dictionaries for JSONB

[PATCH] Compression dictionaries for JSONB - Mailing list pgsql-hackers

Attachment

Previous

Next