Re: alternative compression algorithms? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: alternative compression algorithms?
Date
Msg-id 5541816A.1000303@2ndquadrant.com
Whole thread Raw
In response to Re: alternative compression algorithms?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: alternative compression algorithms?
List pgsql-hackers

On 04/30/15 02:42, Robert Haas wrote:
> On Wed, Apr 29, 2015 at 6:55 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> I'm not convinced not compressing the data is a good idea - it suspect it
>> would only move the time to TOAST, increase memory pressure (in general and
>> in shared buffers). But I think that using a more efficient compression
>> algorithm would help a lot.
>>
>> For example, when profiling the multivariate stats patch (with multiple
>> quite large histograms), the pglz_decompress is #1 in the profile, occupying
>> more than 30% of the time. After replacing it with the lz4, the data are bit
>> larger, but it drops to ~0.25% in the profile and planning the drops
>> proportionally.
>
> That seems to imply a >100x improvement in decompression speed.  Really???

Sorry, that was a bit misleading over-statement. The profiles (same 
dataset, same workload) look like this:


pglz_decompress
---------------  44.51%  postgres      [.] pglz_decompress  13.60%  postgres      [.] update_match_bitmap_histogram
8.40% postgres      [.] float8_cmp_internal   7.43%  postgres      [.] float8lt   6.49%  postgres      [.]
deserialize_mv_histogram  6.23%  postgres      [.] FunctionCall2Coll   4.06%  postgres      [.] DatumGetFloat8   3.48%
libc-2.18.so [.] __isnan   1.26%  postgres      [.] clauselist_mv_selectivity   1.09%  libc-2.18.so  [.]
__memcpy_sse2_unaligned

lz4
---  18.05%  postgres          [.] update_match_bitmap_histogram  11.67%  postgres          [.] float8_cmp_internal
10.53% postgres          [.] float8lt   8.67%  postgres          [.] FunctionCall2Coll   8.52%  postgres          [.]
deserialize_mv_histogram  5.52%  postgres          [.] DatumGetFloat8   4.90%  libc-2.18.so      [.] __isnan   3.92%
liblz4.so.1.6.0  [.] 0x0000000000002603   2.08%  liblz4.so.1.6.0   [.] 0x0000000000002847   1.81%  postgres
[.]clauselist_mv_selectivity   1.47%  libc-2.18.so      [.] __memcpy_sse2_unaligned   1.33%  liblz4.so.1.6.0   [.]
0x000000000000260f  1.16%  liblz4.so.1.6.0   [.] 0x00000000000025e3   (and then a long tail of other lz4 calls)
 

The difference used to more significant, but I've done a lot of 
improvements in the update_match_bitmap method (so the lz4 methods are 
more significant).

The whole script (doing a lot of estimates) takes 1:50 with pglz and 
only 1:25 with lz4. That's ~25-30% improvement.

The results are slightly unreliable because collected in a Xen VM, and 
the overhead is non-negligible (but the same in both cases). I wouldn't 
be surprised if the difference was more significant without the VM.

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Additional role attributes && superuser review
Next
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade: quote directory names in delete_old_cluster script