Home > mailing lists

Re: libpq compression - Mailing list pgsql-hackers

From	Florian Pflug
Subject	Re: libpq compression
Date	June 25, 2012 13:13:13
Msg-id	73447F47-E9A3-420B-8903-9F6A4513E229@phlo.org Whole thread Raw
In response to	Re: libpq compression (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: libpq compression
List	pgsql-hackers

Tree view

On Jun25, 2012, at 04:04 , Robert Haas wrote:
> If, for
> example, someone can demonstrate that an awesomebsdlz compresses 10x
> as fast as OpenSSL... that'd be pretty compelling.

That, actually, is demonstrably the case for at least Google's snappy.
(and LZO, but that's not an option since its license is GPL) They state in
their documentation that
In our tests, Snappy usually is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while
achievingcomparable compression ratios.

The only widely supported compression method for SSL seems to be DEFLATE,
which is also what gzip/zlib uses. I've benchmarked LZO against gzip/zlib
a few months ago, and LZO outperformed zlib in fast mode (i.e. gzip -1) by
an order of magnitude.

The compression ratio achieved by DEFLATE/gzip/zlib is much better, though.
The snappy documentation states
Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and
ofcourse 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are
2.6-2.8x,3-7x and 1.0x, respectively.

Here are a few numbers for LZO vs. gzip. Snappy should be comparable to
LZO - I tested LZO because I still had the command-line compressor lzop
lying around on my machine, whereas I'd have needed to download and compile
snappy first.

$ dd if=/dev/random of=data bs=1m count=128
$ time gzip -1 < data > data.gz
real 0m6.189s
user 0m5.947s
sys 0m0.224s
$ time lzop < data > data.lzo
real 0m2.697s
user 0m0.295s
sys 0m0.224s
$ ls -lh data*
-rw-r--r-- 1 fgp staff 128M Jun 25 14:43 data
-rw-r--r-- 1 fgp staff 128M Jun 25 14:44 data.gz
-rw-r--r-- 1 fgp staff 128M Jun 25 14:44 data.lzo

$ dd if=/dev/zero of=zeros bs=1m count=128
$ time gzip -1 < zeros > zeros.gz
real 0m1.083s
user 0m1.019s
sys 0m0.052s
$ time lzop < zeros > zeros.lzo
real 0m0.186s
user 0m0.123s
sys 0m0.053s
$ ls -lh zeros*
-rw-r--r-- 1 fgp staff 128M Jun 25 14:47 zeros
-rw-r--r-- 1 fgp staff 572K Jun 25 14:47 zeros.gz
-rw-r--r-- 1 fgp staff 598K Jun 25 14:47 zeros.lzo

To summarize, on my 2.66 Ghz Core2 Duo Macbook Pro, LZO compresses about
350MB/s if the data is purely random, and about 800MB/s if the data
compresses extremely well. (Numbers based on user time since that indicates
the CPU time used, and ignores the IO overhead, which is substantial)

IMHO, the only compelling argument (and a very compelling one) to use
SSL compression was that it requires very little code on our side. We've
since discovered that it's not actually that simple, at least if we want
to support compression without authentication or encryption, and don't
want to restrict ourselves to using OpenSSL forever. So unless we give
up at least one of those requirements, the arguments for using
SSL-compression are rather thin, I think.

best regards,
Florian Pflug

pgsql-hackers by date:

From: Kohei KaiGai
Date: 25 June 2012, 12:49:41
Subject: Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From: "ktm@rice.edu"
Date: 25 June 2012, 13:20:15
Subject: Re: libpq compression

Re: libpq compression - Mailing list pgsql-hackers

Previous

Next