Re: [PATCH] Compression dictionaries for JSONB - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: [PATCH] Compression dictionaries for JSONB
Date
Msg-id CAJ7c6TP5TtM6V2YnbMSYJ-MKDci8Ay--ObzJWBfnNvf1xQ=BNg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Compression dictionaries for JSONB  (Andres Freund <andres@anarazel.de>)
Responses Re: [PATCH] Compression dictionaries for JSONB
Re: [PATCH] Compression dictionaries for JSONB
List pgsql-hackers
Hi,

> I don't think the approaches in either of these threads is
> promising. They add a lot of complexity, require implementation effort
> for each type, manual work by the administrator for column, etc.

I would like to point out that compression dictionaries don't require
per-type work.

Current implementation is artificially limited to JSONB because it's a
PoC. I was hoping to get more feedback from the community before
proceeding further. Internally it uses type-agnostic compression and
doesn't care whether it compresses JSON(B), XML, TEXT, BYTEA or
arrays. This choice was explicitly done in order to support types
other than JSONB.

> One of the major justifications for work in this area is the cross-row
> redundancy for types like jsonb. I think there's ways to improve that
> across types, instead of requiring per-type work.

To be fair, there are advantages in using type-aware compression. The
compression algorithm can be more efficient than a general one and in
theory one can implement lazy decompression, e.g. the one that
decompresses only the accessed fields of a JSONB document.

I agree though that particularly for PostgreSQL this is not
necessarily the right path, especially considering the accompanying
complexity.

If the user cares about the disk space consumption why storing JSONB
in a relational DBMS in the first place? We already have a great
solution for compacting the data, it was invented in the 70s and is
called normalization.

Since PostgreSQL is not a specified document-oriented DBMS I think we
better focus our (far from being infinite) resources on something more
people would benefit from: AIO/DIO [1] or perhaps getting rid of
freezing [2], to name a few examples.

> [...]
> step back and decide which one we dislike the less, and how to fix that
> one into a shape that we no longer dislike.

For the sake of completeness, doing neither type-aware TOASTing nor
compression dictionaries and leaving this area to the extension
authors (e.g. ZSON) is also a possible choice, for the same reasons
named above. However having a built-in type-agnostic dictionary
compression IMO is a too attractive idea to completely ignore it.
Especially considering the fact that the implementation was proven to
be fairly simple and there was even no need to rebase the patch since
November :)

I know that there were concerns [3] regarding the typmod hack. I don't
like it either and 100% open to suggestions here. This is merely a
current implementation detail used in a PoC, not a fundamental design
decision.

[1]: https://postgr.es/m/20210223100344.llw5an2aklengrmn%40alap3.anarazel.de
[2]: https://postgr.es/m/CAJ7c6TOk1mx4KfF0AHkvXi%2BpkdjFqwTwvRE-JmdczZMAYnRQ0w%40mail.gmail.com
[3]: https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2023_Developer_Meeting#v16_Patch_Triage

-- 
Best regards,
Aleksander Alekseev



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Generating code for query jumbling through gen_node_support.pl
Next
From: Andres Freund
Date:
Subject: Re: undersized unions