Re: Optimizing pglz compressor - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Optimizing pglz compressor
Date
Msg-id CAMkU=1ypB4oBxEhVMjs_dSSgHAfdjgMiq0LEQH8TRx1QVEOQfw@mail.gmail.com
Whole thread Raw
In response to Re: Optimizing pglz compressor  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Optimizing pglz compressor  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Wed, Mar 6, 2013 at 8:53 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote:
> On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland <joe@mcknight.de> wrote:
> > On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
> > <hlinnakangas@vmware.com> wrote:
> >> With these tweaks, I was able to make pglz-based delta encoding perform
> >> roughly as well as Amit's patch.
> >
> > Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ?
>
> This has been a subject of much recent discussion. It compares very
> poorly, but installing a new compressor tends to be problematic due to
> patent concerns (something which I disagree with but it's there).  All
> that said, Heikki's proposed changes seem to be low risk and quite
> fast.

Imo the licensing part is by far the smaller one. The interesting part
is making a compatible change to the way toast compression works that
supports multiple compression schemes. Afaics nobody has done that work.
After that the choice of to-be-integrated compression schemes needs to
be discussed, sure.


Another thing to consider would be some way of recording an exemplar value for each column which is used to seed whatever compression algorithm is used.  I think there often a lot of redundancy that does not appear within any given value, but does appear when viewing all the values of a given column.  Finding some way to take advantage of that could give a big improvement in compression ratio.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Writable foreign tables: how to identify rows
Next
From: Fujii Masao
Date:
Subject: Re: Support for REINDEX CONCURRENTLY