Re: pglz performance - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: pglz performance |
Date | |
Msg-id | 20190802170039.o4pabnzm4xy3z7uj@development Whole thread Raw |
In response to | Re: pglz performance (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pglz performance
|
List | pgsql-hackers |
On Fri, Aug 02, 2019 at 09:39:48AM -0700, Andres Freund wrote: >Hi, > >On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote: >> We have some kind of "roadmap" of "extensible pglz". We plan to >> provide implementation on Novembers CF. > >I don't understand why it's a good idea to improve the compression side >of pglz. There's plenty other people that spent a lot of time >developing better compression algorithms. > Isn't it beneficial for existing systems, that will be stuck with pglz even if we end up adding other algorithms? > >> Currently, pglz starts with empty cache map: there is no prior 4k >> bytes before start. We can add imaginary prefix to any data with >> common substrings: this will enhance compression ratio. It is hard >> to decide on training data set for this "common prefix". So we want >> to produce extension with aggregate function which produces some >> "adapted common prefix" from users's data. Then we can "reserve" few >> negative bytes for "decompression commands". This command can >> instruct database on which common prefix to use. But also system >> command can say "invoke decompression from extension". >> >> Thus, user will be able to train database compression on his data and >> substitute pglz compression with custom compression method >> seamlessly. >> >> This will make hard-choosen compression unneeded, but seems overly >> hacky. But there will be no need to have lz4, zstd, brotli, lzma and >> others in core. Why not provide e.g. "time series compression"? Or >> "DNA compression"? Whatever gun user wants for his foot. > >I think this is way too complicated, and will provide not particularly >much benefit for the majority users. > I agree with this. I do see value in the feature, but probably not as a drop-in replacement for the default compression algorithm. I'd compare it to the "custom compression methods" patch that was submitted some time ago. >In fact, I'll argue that we should flat out reject any such patch until >we have at least one decent default compression algorithm in core. >You're trying to work around a poor compression algorithm with >complicated dictionary improvement, that require user interaction, and >only will work in a relatively small subset of the cases, and will very >often increase compression times. > I wouldn't be so strict I guess. But I do agree an algorithm that requires additional steps (training, ...) is unlikely to be a good candidate for default instance compression alorithm. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: