Re: pglz performance - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: pglz performance
Date
Msg-id 20190802144345.d62jtiyyx6r2y73f@development
Whole thread Raw
In response to Re: pglz performance  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: pglz performance
List pgsql-hackers
On Fri, Aug 02, 2019 at 04:45:43PM +0300, Konstantin Knizhnik wrote:
>
>
>On 27.06.2019 21:33, Andrey Borodin wrote:
>>
>>>13 мая 2019 г., в 12:14, Michael Paquier <michael@paquier.xyz> написал(а):
>>>
>>>Decompression can matter a lot for mostly-read workloads and
>>>compression can become a bottleneck for heavy-insert loads, so
>>>improving compression or decompression should be two separate
>>>problems, not two problems linked.  Any improvement in one or the
>>>other, or even both, is nice to have.
>>Here's patch hacked by Vladimir for compression.
>>
>>Key differences (as far as I see, maybe Vladimir will post more complete list of optimizations):
>>1. Use functions instead of macro-functions: not surprisingly it's easier to optimize them and provide less
constraintsfor compiler to optimize.
 
>>2. More compact hash table: use indexes instead of pointers.
>>3. More robust segment comparison: like memcmp, but return index of first different byte
>>
>>In weighted mix of different data (same as for compression), overall speedup is x1.43 on my machine.
>>
>>Current implementation is integrated into test_pglz suit for benchmarking purposes[0].
>>
>>Best regards, Andrey Borodin.
>>
>>[0] https://github.com/x4m/test_pglz
>
>It takes me some time to understand that your memcpy optimization is 
>correct;)
>I have tested different ways of optimizing this fragment of code, but 
>failed tooutperform your implementation!
>Results at my computer is simlar with yours:
>
>Decompressor score (summ of all times):
>NOTICE:  Decompressor pglz_decompress_hacked result 6.627355
>NOTICE:  Decompressor pglz_decompress_hacked_unrolled result 7.497114
>NOTICE:  Decompressor pglz_decompress_hacked8 result 7.412944
>NOTICE:  Decompressor pglz_decompress_hacked16 result 7.792978
>NOTICE:  Decompressor pglz_decompress_vanilla result 10.652603
>
>Compressor score (summ of all times):
>NOTICE:  Compressor pglz_compress_vanilla result 116.970005
>NOTICE:  Compressor pglz_compress_hacked result 89.706105
>
>
>But ...  below are results for lz4:
>
>Decompressor score (summ of all times):
>NOTICE:  Decompressor lz4_decompress result 3.660066
>Compressor score (summ of all times):
>NOTICE:  Compressor lz4_compress result 10.288594
>
>There is 2 times advantage in decompress speed and 10 times advantage 
>in compress speed.
>So may be instead of "hacking" pglz algorithm we should better switch 
>to lz4?
>

I think we should just bite the bullet and add initdb option to pick
compression algorithm. That's been discussed repeatedly, but we never
ended up actually doing that. See for example [1].

If there's anyone willing to put some effort into getting this feature
over the line, I'm willing to do reviews & commit. It's a seemingly
small change with rather insane potential impact.

But even if we end up doing that, it still makes sense to optimize the
hell out of pglz, because existing systems will still use that
(pg_upgrade can't switch from one compression algorithm to another).

regards

[1] https://www.postgresql.org/message-id/flat/55341569.1090107%402ndquadrant.com

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: "Karl O. Pinc"
Date:
Subject: Re: Patch to document base64 encoding
Next
From: Tom Lane
Date:
Subject: Re: Patch to document base64 encoding