Re: [PATCH] Compression dictionaries for JSONB - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PATCH] Compression dictionaries for JSONB
Date
Msg-id 20230204133123.mv6rkxloxnkfakww@alap3.anarazel.de
Whole thread Raw
In response to Re: [PATCH] Compression dictionaries for JSONB  (Pavel Borisov <pashkin.elfe@gmail.com>)
Responses Re: [PATCH] Compression dictionaries for JSONB
List pgsql-hackers
Hi,

On 2023-02-03 14:39:31 +0400, Pavel Borisov wrote:
> On Fri, 3 Feb 2023 at 14:04, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > This patch came up at the developer meeting in Brussels yesterday.
> > https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2023_Developer_Meeting#v16_Patch_Triage
> >
> > First, as far as I can tell, there is a large overlap between this patch
> > and "Pluggable toaster" patch.  The approaches are completely different,
> > but they seem to be trying to fix the same problem: the fact that the
> > default TOAST stuff isn't good enough for JSONB.  I think before asking
> > developers of both patches to rebase over and over, we should take a
> > step back and decide which one we dislike the less, and how to fix that
> > one into a shape that we no longer dislike.
> >
> > (Don't get me wrong.  I'm all for having better JSONB compression.
> > However, for one thing, both patches require action from the user to set
> > up a compression mechanism by hand.  Perhaps it would be even better if
> > the system determines that a JSONB column uses a different compression
> > implementation, without the user doing anything explicitly; or maybe we
> > want to give the user *some* agency for specific columns if they want,
> > but not force them into it for every single jsonb column.)
> >
> > Now, I don't think either of these patches can get to a committable
> > shape in time for v16 -- even assuming we had an agreed design, which
> > AFAICS we don't.  But I encourage people to continue discussion and try
> > to find consensus.
> >
> Hi, Alvaro!
>
> I'd like to give my +1 in favor of implementing a pluggable toaster
> interface first. Then we can work on custom toast engines for
> different scenarios, not limited to JSON(b).

I don't think the approaches in either of these threads is
promising. They add a lot of complexity, require implementation effort
for each type, manual work by the administrator for column, etc.


One of the major justifications for work in this area is the cross-row
redundancy for types like jsonb. I think there's ways to improve that
across types, instead of requiring per-type work. We could e.g. use
compression dictionaries to achieve much higher compression
rates. Training of the dictionairy could even happen automatically by
analyze, if we wanted to.  It's unlikely to get you everything a very
sophisticated per-type compression is going to give you, but it's going
to be a lot better than today, and it's going to work across types.


> For example, I find it useful to decrease WAL overhead on the
> replication of TOAST updates. It is quite a pain now that we need to
> rewrite all toast chunks at any TOAST update. Also, it could be good
> for implementing undo access methods etc., etc. Now, these kinds of
> activities in extensions face the fact that core has only one TOAST
> which is quite inefficient in many scenarios.
>
> So overall I value the extensibility part of this activity as the most
> important one and will be happy to see it completed first.

I think the complexity will just make improving toast in-core harder,
without much benefit.


Regards,

Andres



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: undersized unions
Next
From: "Daniel Verite"
Date:
Subject: Re: Allow tailoring of ICU locales with custom rules