Re: Optimizing pglz compressor - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: Optimizing pglz compressor
Date
Msg-id CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4YebMOA-wA@mail.gmail.com
Whole thread Raw
In response to Re: Optimizing pglz compressor  (Joachim Wieland <joe@mcknight.de>)
List pgsql-hackers
On Wed, Mar 6, 2013 at 6:32 AM, Joachim Wieland <joe@mcknight.de> wrote:
> On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> With these tweaks, I was able to make pglz-based delta encoding perform
>> roughly as well as Amit's patch.
>
> Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ?

This one is for the archives, as I thought it surprising: there can be
a surprisingly huge magnitude of performance difference of these
algorithms depending on architecture.  Here's a table reproduced from:
http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9

"""
testdata/alice29.txt                     :
ZLIB:    [b 1M] bytes 152089 ->  54404 35.8%  comp   0.8 MB/s  uncomp   8.1 MB/s
LZO:     [b 1M] bytes 152089 ->  82721 54.4%  comp  14.5 MB/s  uncomp  43.0 MB/s
CSNAPPY: [b 1M] bytes 152089 ->  90965 59.8%  comp   2.1 MB/s  uncomp   4.4 MB/s
SNAPPY:  [b 4M] bytes 152089 ->  90965 59.8%  comp   1.8 MB/s  uncomp   2.8 MB/s
testdata/asyoulik.txt                    :
ZLIB:    [b 1M] bytes 125179 ->  48897 39.1%  comp   0.8 MB/s  uncomp   7.7 MB/s
LZO:     [b 1M] bytes 125179 ->  73224 58.5%  comp  15.3 MB/s  uncomp  42.4 MB/s
CSNAPPY: [b 1M] bytes 125179 ->  80207 64.1%  comp   2.0 MB/s  uncomp   4.2 MB/s
SNAPPY:  [b 4M] bytes 125179 ->  80207 64.1%  comp   1.7 MB/s  uncomp   2.7 MB/s

LZO was ~8x faster compressing and ~16x faster decompressing. Only on
uncompressible data was Snappy was faster:

testdata/house.jpg                       :
ZLIB:    [b 1M] bytes 126958 -> 126513 99.6%  comp   1.2 MB/s  uncomp   9.6 MB/s
LZO:     [b 1M] bytes 126958 -> 127173 100.2%  comp   4.2 MB/s  uncomp74.9 MB/s
CSNAPPY: [b 1M] bytes 126958 -> 126803 99.9%  comp  24.6 MB/s  uncomp 381.2 MB/s
SNAPPY:  [b 4M] bytes 126958 -> 126803 99.9%  comp  22.8 MB/s  uncomp 354.4 MB/s
"""

So that's one more gotcha to worry about, since I surmise most numbers
are being taken on x86.  Apparently this has something to do with
alignment of accesses.  Some of it may be fixable by tweaking the
implementation rather than the compression encoding, although I am no
expert in the matter.

-- 
fdr



pgsql-hackers by date:

Previous
From: Daniel Farina
Date:
Subject: Re: Enabling Checksums
Next
From: Craig Ringer
Date:
Subject: Re: Trust intermediate CA for client certificates