Re: pglz performance - Mailing list pgsql-hackers
From | Andrey Borodin |
---|---|
Subject | Re: pglz performance |
Date | |
Msg-id | DBB2A9E5-29FD-40BF-AC60-BD990FBF142F@yandex-team.ru Whole thread Raw |
In response to | Re: pglz performance (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: pglz performance
|
List | pgsql-hackers |
Thanks for looking into this! > 2 авг. 2019 г., в 19:43, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а): > > On Fri, Aug 02, 2019 at 04:45:43PM +0300, Konstantin Knizhnik wrote: >> >> It takes me some time to understand that your memcpy optimization is correct;) Seems that comments are not explanatory enough... will try to fix. >> I have tested different ways of optimizing this fragment of code, but failed tooutperform your implementation! JFYI we tried optimizations with memcpy with const size (optimized into assembly instead of call), unrolling literal loopand some others. All these did not work better. >> But ... below are results for lz4: >> >> Decompressor score (summ of all times): >> NOTICE: Decompressor lz4_decompress result 3.660066 >> Compressor score (summ of all times): >> NOTICE: Compressor lz4_compress result 10.288594 >> >> There is 2 times advantage in decompress speed and 10 times advantage in compress speed. >> So may be instead of "hacking" pglz algorithm we should better switch to lz4? >> > > I think we should just bite the bullet and add initdb option to pick > compression algorithm. That's been discussed repeatedly, but we never > ended up actually doing that. See for example [1]. > > If there's anyone willing to put some effort into getting this feature > over the line, I'm willing to do reviews & commit. It's a seemingly > small change with rather insane potential impact. > > But even if we end up doing that, it still makes sense to optimize the > hell out of pglz, because existing systems will still use that > (pg_upgrade can't switch from one compression algorithm to another). We have some kind of "roadmap" of "extensible pglz". We plan to provide implementation on Novembers CF. Currently, pglz starts with empty cache map: there is no prior 4k bytes before start. We can add imaginary prefix to anydata with common substrings: this will enhance compression ratio. It is hard to decide on training data set for this "common prefix". So we want to produce extension with aggregate functionwhich produces some "adapted common prefix" from users's data. Then we can "reserve" few negative bytes for "decompression commands". This command can instruct database on which commonprefix to use. But also system command can say "invoke decompression from extension". Thus, user will be able to train database compression on his data and substitute pglz compression with custom compressionmethod seamlessly. This will make hard-choosen compression unneeded, but seems overly hacky. But there will be no need to have lz4, zstd, brotli,lzma and others in core. Why not provide e.g. "time series compression"? Or "DNA compression"? Whatever gun user wantsfor his foot. Best regards, Andrey Borodin.
pgsql-hackers by date: