Hi Nikhil,
> Thank you for highlighting the previous discussion—I reviewed [1]
> closely. While both methods involve dictionary-based compression, the
> approach I'm proposing differs significantly.
>
> The previous method explicitly extracted string values from JSONB and
> assigned unique OIDs to each entry, resulting in distinct dictionary
> entries for every unique value. In contrast, this approach directly
> leverages Zstandard's dictionary training API. We provide raw data
> samples to Zstd, which generates a dictionary of a specified size.
> This dictionary is then stored in a catalog table and used to compress
> subsequent inserts for the specific attribute it was trained on.
>
> [...]
You didn't read closely enough I'm afraid. As Tom pointed out, the
title of the thread is misleading. On top of that there are several
separate threads. I did my best to cross-reference them, but
apparently didn't do good enough.
Initially I proposed to add ZSON extension [1][2] to the PostgreSQL
core. However the idea evolved into TOAST improvements that don't
require a user to use special types. You may also find interesting the
related "Pluggable TOASTer" discussion [3]. The idea there was rather
different but the discussion about extending TOAST pointers so that in
the future we can use something else than ZSTD is relevant.
You will find the recent summary of the reached agreements somewhere
around this message [4], take a look at the thread a bit above and
below it.
I believe this effort is important. You can't, however, simply discard
everything that was discussed in this area for the past several years.
If you want to succeed of course. No one will look at your patch if it
doesn't account for all the previous discussions. I'm sorry, I know
it's disappointing. This being said you should have done better
research before submitting the code. You could just ask if anyone was
working on something like this before and save a lot of time.
Personally I would suggest starting with one little step toward
compression dictionaries. Particularly focusing on extendability of
TOAST pointers. You are going to need to store dictionary ids there
and allow using other compression algorithms in the future. This will
require something like a varint/utf8-like bitmask for this. See the
previous discussions.
[1]: https://github.com/afiskon/zson
[2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com
[3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru
[4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com
--
Best regards,
Aleksander Alekseev