Re: libpq compression - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: libpq compression
Date
Msg-id 73447F47-E9A3-420B-8903-9F6A4513E229@phlo.org
Whole thread Raw
In response to Re: libpq compression  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: libpq compression
List pgsql-hackers
On Jun25, 2012, at 04:04 , Robert Haas wrote:
> If, for
> example, someone can demonstrate that an awesomebsdlz compresses 10x
> as fast as OpenSSL...  that'd be pretty compelling.

That, actually, is demonstrably the case for at least Google's snappy.
(and LZO, but that's not an option since its license is GPL) They state in
their documentation that
 In our tests, Snappy usually is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while
achievingcomparable compression ratios.
 

The only widely supported compression method for SSL seems to be DEFLATE,
which is also what gzip/zlib uses. I've benchmarked LZO against gzip/zlib
a few months ago, and LZO outperformed zlib in fast mode (i.e. gzip -1) by
an order of magnitude.

The compression ratio achieved by DEFLATE/gzip/zlib is much better, though.
The snappy documentation states
 Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and
ofcourse 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are
2.6-2.8x,3-7x and 1.0x, respectively.
 

Here are a few numbers for LZO vs. gzip. Snappy should be comparable to
LZO - I tested LZO because I still had the command-line compressor lzop
lying around on my machine, whereas I'd have needed to download and compile
snappy first.

$ dd if=/dev/random of=data bs=1m count=128
$ time gzip -1 < data > data.gz
real    0m6.189s
user    0m5.947s
sys    0m0.224s
$ time lzop < data > data.lzo
real    0m2.697s
user    0m0.295s
sys    0m0.224s
$ ls -lh data*
-rw-r--r--  1 fgp  staff   128M Jun 25 14:43 data
-rw-r--r--  1 fgp  staff   128M Jun 25 14:44 data.gz
-rw-r--r--  1 fgp  staff   128M Jun 25 14:44 data.lzo

$ dd if=/dev/zero of=zeros bs=1m count=128
$ time gzip -1 < zeros > zeros.gz
real    0m1.083s
user    0m1.019s
sys    0m0.052s
$ time lzop < zeros > zeros.lzo
real    0m0.186s
user    0m0.123s
sys    0m0.053s
$ ls -lh zeros*
-rw-r--r--  1 fgp  staff   128M Jun 25 14:47 zeros
-rw-r--r--  1 fgp  staff   572K Jun 25 14:47 zeros.gz
-rw-r--r--  1 fgp  staff   598K Jun 25 14:47 zeros.lzo

To summarize, on my 2.66 Ghz Core2 Duo Macbook Pro, LZO compresses about
350MB/s if the data is purely random, and about 800MB/s if the data
compresses extremely well. (Numbers based on user time since that indicates
the CPU time used, and ignores the IO overhead, which is substantial)

IMHO, the only compelling argument (and a very compelling one) to use
SSL compression was that it requires very little code on our side. We've
since discovered that it's not actually that simple, at least if we want
to support compression without authentication or encryption, and don't
want to restrict ourselves to using OpenSSL forever. So unless we give
up at least one of those requirements, the arguments for using
SSL-compression are rather thin, I think.

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Kohei KaiGai
Date:
Subject: Re: WIP Patch: Selective binary conversion of CSV file foreign tables
Next
From: "ktm@rice.edu"
Date:
Subject: Re: libpq compression