Re: [PATCH] Compression dictionaries for JSONB - Mailing list pgsql-hackers
From | Aleksander Alekseev |
---|---|
Subject | Re: [PATCH] Compression dictionaries for JSONB |
Date | |
Msg-id | CAJ7c6TP5TtM6V2YnbMSYJ-MKDci8Ay--ObzJWBfnNvf1xQ=BNg@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Compression dictionaries for JSONB (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [PATCH] Compression dictionaries for JSONB
Re: [PATCH] Compression dictionaries for JSONB |
List | pgsql-hackers |
Hi, > I don't think the approaches in either of these threads is > promising. They add a lot of complexity, require implementation effort > for each type, manual work by the administrator for column, etc. I would like to point out that compression dictionaries don't require per-type work. Current implementation is artificially limited to JSONB because it's a PoC. I was hoping to get more feedback from the community before proceeding further. Internally it uses type-agnostic compression and doesn't care whether it compresses JSON(B), XML, TEXT, BYTEA or arrays. This choice was explicitly done in order to support types other than JSONB. > One of the major justifications for work in this area is the cross-row > redundancy for types like jsonb. I think there's ways to improve that > across types, instead of requiring per-type work. To be fair, there are advantages in using type-aware compression. The compression algorithm can be more efficient than a general one and in theory one can implement lazy decompression, e.g. the one that decompresses only the accessed fields of a JSONB document. I agree though that particularly for PostgreSQL this is not necessarily the right path, especially considering the accompanying complexity. If the user cares about the disk space consumption why storing JSONB in a relational DBMS in the first place? We already have a great solution for compacting the data, it was invented in the 70s and is called normalization. Since PostgreSQL is not a specified document-oriented DBMS I think we better focus our (far from being infinite) resources on something more people would benefit from: AIO/DIO [1] or perhaps getting rid of freezing [2], to name a few examples. > [...] > step back and decide which one we dislike the less, and how to fix that > one into a shape that we no longer dislike. For the sake of completeness, doing neither type-aware TOASTing nor compression dictionaries and leaving this area to the extension authors (e.g. ZSON) is also a possible choice, for the same reasons named above. However having a built-in type-agnostic dictionary compression IMO is a too attractive idea to completely ignore it. Especially considering the fact that the implementation was proven to be fairly simple and there was even no need to rebase the patch since November :) I know that there were concerns [3] regarding the typmod hack. I don't like it either and 100% open to suggestions here. This is merely a current implementation detail used in a PoC, not a fundamental design decision. [1]: https://postgr.es/m/20210223100344.llw5an2aklengrmn%40alap3.anarazel.de [2]: https://postgr.es/m/CAJ7c6TOk1mx4KfF0AHkvXi%2BpkdjFqwTwvRE-JmdczZMAYnRQ0w%40mail.gmail.com [3]: https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2023_Developer_Meeting#v16_Patch_Triage -- Best regards, Aleksander Alekseev
pgsql-hackers by date: