Re: [PATCH] Compression dictionaries for JSONB - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [PATCH] Compression dictionaries for JSONB |
Date | |
Msg-id | 20230204133123.mv6rkxloxnkfakww@alap3.anarazel.de Whole thread Raw |
In response to | Re: [PATCH] Compression dictionaries for JSONB (Pavel Borisov <pashkin.elfe@gmail.com>) |
Responses |
Re: [PATCH] Compression dictionaries for JSONB
|
List | pgsql-hackers |
Hi, On 2023-02-03 14:39:31 +0400, Pavel Borisov wrote: > On Fri, 3 Feb 2023 at 14:04, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > This patch came up at the developer meeting in Brussels yesterday. > > https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2023_Developer_Meeting#v16_Patch_Triage > > > > First, as far as I can tell, there is a large overlap between this patch > > and "Pluggable toaster" patch. The approaches are completely different, > > but they seem to be trying to fix the same problem: the fact that the > > default TOAST stuff isn't good enough for JSONB. I think before asking > > developers of both patches to rebase over and over, we should take a > > step back and decide which one we dislike the less, and how to fix that > > one into a shape that we no longer dislike. > > > > (Don't get me wrong. I'm all for having better JSONB compression. > > However, for one thing, both patches require action from the user to set > > up a compression mechanism by hand. Perhaps it would be even better if > > the system determines that a JSONB column uses a different compression > > implementation, without the user doing anything explicitly; or maybe we > > want to give the user *some* agency for specific columns if they want, > > but not force them into it for every single jsonb column.) > > > > Now, I don't think either of these patches can get to a committable > > shape in time for v16 -- even assuming we had an agreed design, which > > AFAICS we don't. But I encourage people to continue discussion and try > > to find consensus. > > > Hi, Alvaro! > > I'd like to give my +1 in favor of implementing a pluggable toaster > interface first. Then we can work on custom toast engines for > different scenarios, not limited to JSON(b). I don't think the approaches in either of these threads is promising. They add a lot of complexity, require implementation effort for each type, manual work by the administrator for column, etc. One of the major justifications for work in this area is the cross-row redundancy for types like jsonb. I think there's ways to improve that across types, instead of requiring per-type work. We could e.g. use compression dictionaries to achieve much higher compression rates. Training of the dictionairy could even happen automatically by analyze, if we wanted to. It's unlikely to get you everything a very sophisticated per-type compression is going to give you, but it's going to be a lot better than today, and it's going to work across types. > For example, I find it useful to decrease WAL overhead on the > replication of TOAST updates. It is quite a pain now that we need to > rewrite all toast chunks at any TOAST update. Also, it could be good > for implementing undo access methods etc., etc. Now, these kinds of > activities in extensions face the fact that core has only one TOAST > which is quite inefficient in many scenarios. > > So overall I value the extensibility part of this activity as the most > important one and will be happy to see it completed first. I think the complexity will just make improving toast in-core harder, without much benefit. Regards, Andres
pgsql-hackers by date: