Re: [PATCH] Compression dictionaries for JSONB - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: [PATCH] Compression dictionaries for JSONB
Date
Msg-id CAEze2Wg+HM180NudppyAHH3t6-ttg6FT06M6s4BaJsZnAkg6zg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Compression dictionaries for JSONB  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: [PATCH] Compression dictionaries for JSONB
List pgsql-hackers
Hi Alexander,

On Fri, 17 Jun 2022 at 17:04, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>> These are just my initial thoughts I would like to share though. I may
>> change my mind after diving deeper into a "pluggable TOASTer" patch.
>
> I familiarized myself with the "pluggable TOASTer" thread and joined
> the discussion [1].
>
> I'm afraid so far I failed to understand your suggestion to base
> "compression dictionaries" patch on "pluggable TOASTer", considering
> the fair amount of push-back it got from the community, not to mention
> a somewhat raw state of the patchset. It's true that Teodor and I are
> trying to address similar problems. This however doesn't mean that
> there should be a dependency between these patches.

The reason I think this is better implemented as a pluggable toaster
is because casts are necessarily opaque and require O(sizeofdata)
copies or processing. The toaster infrastructure that is proposed in
[0] seems to improve on the O(sizeofdata) requirement for toast, but
that will not work with casts.

> Also, I completely agree with Tomas [2]:
>
>> My main point is that we should not be making too many radical
>> changes at once - it makes it much harder to actually get anything done.
>
> IMO the patches don't depend on each other but rather complement each
> other. The user can switch between different TOAST methods, and the
> compression dictionaries can work on top of different TOAST methods.

I don't think that is possible (or at least, not as performant). To
treat type X' as type X and use it as a stored medium instead, you
must have either the whole binary representation of X, or have access
to the internals of type X. I find it difficult to believe that casts
can be done without a full detoast (or otherwise without deep
knowledge about internal structure of the data type such as 'type A is
binary compatible with type X'), and as such I think this feature
'compression dictionaries' is competing with the 'pluggable toaster'
feature, if the one is used on top of the other. That is, the
dictionary is still created like in the proposed patches (though
preferably without the 64-byte NAMELEN limit), but the usage will be
through "TOASTER my_dict_enabled_toaster".

Additionally, I don't think we've ever accepted two different
implementations of the same concept, at least not without first having
good arguments why both competing implementations have obvious
benefits over the other, and both implementations being incompatible.

> Although there is also a high-level idea (according to the
> presentations) to share common data between different TOASTed values,
> similarly to what compression dictionaries do, by looking at the
> current feedback and considering the overall complexity and the amount
> of open questions (e.g. interaction with different TableAMs, etc), I
> seriously doubt that this particular part of "pluggable TOASTer" will
> end-up in the core.

Yes, and that's why I think that this where this dictionary
infrastructure could provide value, as an alternative or extension to
the proposed jsonb toaster in the 'pluggable toaster' thread.

Kind regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Next
From: "Drouvot, Bertrand"
Date:
Subject: Re: Patch proposal: New hooks in the connection path