Re: pglz performance - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: pglz performance
Date
Msg-id 20190802170039.o4pabnzm4xy3z7uj@development
Whole thread Raw
In response to Re: pglz performance  (Andres Freund <andres@anarazel.de>)
Responses Re: pglz performance
List pgsql-hackers
On Fri, Aug 02, 2019 at 09:39:48AM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
>> We have some kind of "roadmap" of "extensible pglz". We plan to
>> provide implementation on Novembers CF.
>
>I don't understand why it's a good idea to improve the compression side
>of pglz. There's plenty other people that spent a lot of time
>developing better compression algorithms.
>

Isn't it beneficial for existing systems, that will be stuck with pglz
even if we end up adding other algorithms?

>
>> Currently, pglz starts with empty cache map: there is no prior 4k
>> bytes before start. We can add imaginary prefix to any data with
>> common substrings: this will enhance compression ratio.  It is hard
>> to decide on training data set for this "common prefix". So we want
>> to produce extension with aggregate function which produces some
>> "adapted common prefix" from users's data.  Then we can "reserve" few
>> negative bytes for "decompression commands". This command can
>> instruct database on which common prefix to use.  But also system
>> command can say "invoke decompression from extension".
>>
>> Thus, user will be able to train database compression on his data and
>> substitute pglz compression with custom compression method
>> seamlessly.
>>
>> This will make hard-choosen compression unneeded, but seems overly
>> hacky. But there will be no need to have lz4, zstd, brotli, lzma and
>> others in core. Why not provide e.g. "time series compression"? Or
>> "DNA compression"? Whatever gun user wants for his foot.
>
>I think this is way too complicated, and will provide not particularly
>much benefit for the majority users.
>

I agree with this. I do see value in the feature, but probably not as a
drop-in replacement for the default compression algorithm. I'd compare
it to the "custom compression methods" patch that was submitted some
time ago.

>In fact, I'll argue that we should flat out reject any such patch until
>we have at least one decent default compression algorithm in core.
>You're trying to work around a poor compression algorithm with
>complicated dictionary improvement, that require user interaction, and
>only will work in a relatively small subset of the cases, and will very
>often increase compression times.
>

I wouldn't be so strict I guess. But I do agree an algorithm that 
requires additional steps (training, ...) is unlikely to be a good
candidate for default instance compression alorithm.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Add client connection check during the execution of the query
Next
From: Andres Freund
Date:
Subject: Re: pglz performance