Re: Faster compression, again - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: Faster compression, again
Date
Msg-id CAHyXU0xaouYYRLNSTkpAXPReO3imT3YDWd7L39TgN6dKfF8t8w@mail.gmail.com
Whole thread Raw
In response to Faster compression, again  (Daniel Farina <daniel@heroku.com>)
Responses Re: Faster compression, again  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
On Wed, Mar 14, 2012 at 1:06 PM, Daniel Farina <daniel@heroku.com> wrote:
> For 9.3 at a minimum.
>
> The topic of LZO became mired in doubts about:
>
> * Potential Patents
> * The author's intention for the implementation to be GPL
>
> Since then, Google released "Snappy," also an LZ77-class
> implementation, and it has been ported to C (recently, and with some
> quirks, like no LICENSE file...yet, although it is linked from the
> original Snappy project).  The original Snappy (C++) has a BSD license
> and a patent grant (which shields you from Google, at least).  Do we
> want to investigate a very-fast compression algorithm inclusion again
> in the 9.3 cycle?
>
> I've been using the similar implementation "LZO" for WAL archiving and
> it is a significant savings (not as much as pg_lesslog, but also less
> invasive).  It is also fast enough that even if one were not to uproot
> TOAST's compression that it would probably be very close to a complete
> win for protocol traffic, whereas SSL's standardized zlib can
> definitely be a drag in some cases.
>
> This idea resurfaces often, but the reason why I wrote in about it is
> because I have a table which I categorized as "small" but was, in
> fact, 1.5MB, which made transferring it somewhat slow over a remote
> link.  zlib compression takes it down to about 550K and lzo (similar,
> but not identical) 880K.  If we're curious how it affects replication
> traffic, I could probably gather statistics on LZO-compressed WAL
> traffic, of which we have a pretty huge amount captured.

there are plenty of on gpl lz based libraries out there (for example:
http://www.fastlz.org/) and always have been.  they are all much
faster than zlib.  the main issue is patents...you have to be careful
even though all the lz77/78 patents seem to have expired or apply to
specifics not relevant to general use.

see here for the last round of talks on this:
http://archives.postgresql.org/pgsql-performance/2009-08/msg00052.php

lzo is nearing its 20th birthday, so even if you are paranoid about
patents (admittedly, there is good reason to be), the window is
closing fast to have patent issues that aren't A expired or B  covered
by prior art on that or the various copycat implementations, at least
in the US.

snappy looks amazing.

merlin


pgsql-hackers by date:

Previous
From: Jeff Janes
Date:
Subject: Re: wal_buffers, redux
Next
From: Peter Eisentraut
Date:
Subject: Re: Syntax error and reserved keywords