Thread: Re: [GENERAL] lztext and compression ratios...

Re: [GENERAL] lztext and compression ratios...

From
JanWieck@t-online.de (Jan Wieck)
Date:
Tom Lane wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> >     One  thing  to  keep  in mind is that the LZ algorithm you're
> >     thinking of must be distributable under the terms of the  BSD
> >     license.  If it's copyrighted or patented by any third party,
> >     not agreeing to these terms, it's out of discussion and  will
> >     never  appear in the Postgres source tree.  Especially the LZ
> >     algorithm used in GIF is one of these show-stoppers.
>
> As long as you brought it up: how sure are you that the method you've
> used is not subject to any patents?  The mere fact that you learned
> it from someone who didn't patent it does not guarantee anything ---
> someone else could have invented it independently and filed for a
> patent.

    Now that you ask for it: I'm not sure. Could be.

> If you can show that this method uses no ideas not found in zlib,
> then I'll feel reassured --- a good deal of legal research went into
> zlib to make sure it didn't fall foul of any patents, and zlib has
> now been around long enough that it'd be tough for anyone to get a
> new patent on one of its techniques.  But if SLZ has any additional
> ideas in it, then we could be in trouble.  There are an awful lot of
> compression patents.

    To  do  so  I  don't know enough about the algorithms used in
    zlib.  Is there someone out here who could verify that  if  I
    detailed enough describe what our compression code does?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: [GENERAL] lztext and compression ratios...

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
>> As long as you brought it up: how sure are you that the method you've
>> used is not subject to any patents?

>     Now that you ask for it: I'm not sure. Could be.

>> If you can show that this method uses no ideas not found in zlib,
>> then I'll feel reassured

>     To  do  so  I  don't know enough about the algorithms used in
>     zlib.  Is there someone out here who could verify that  if  I
>     detailed enough describe what our compression code does?

After a quick look at the code, I don't think there is anything
problematic about the data representation or the decompression
algorithm.  The compression algorithm is another story, and it's
not real well commented :-(.  The important issues are how you
search for matches in the past text and how you decide which match
is the best one to use.  Please update the code comments to describe
that, and I'll take another look.

            regards, tom lane

Re: Re: [GENERAL] lztext and compression ratios...

From
JanWieck@t-online.de (Jan Wieck)
Date:
Tom Lane wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> >> As long as you brought it up: how sure are you that the method you've
> >> used is not subject to any patents?
>
> >     Now that you ask for it: I'm not sure. Could be.
>
> >> If you can show that this method uses no ideas not found in zlib,
> >> then I'll feel reassured
>
> >     To  do  so  I  don't know enough about the algorithms used in
> >     zlib.  Is there someone out here who could verify that  if  I
> >     detailed enough describe what our compression code does?
>
> After a quick look at the code, I don't think there is anything
> problematic about the data representation or the decompression
> algorithm.  The compression algorithm is another story, and it's
> not real well commented :-(.  The important issues are how you
> search for matches in the past text and how you decide which match
> is the best one to use.  Please update the code comments to describe
> that, and I'll take another look.

    Done. You'll find a new section in the top comments.

    While  writing  it  I  noticed  that  the algorithm is really
    expensive for big items. The history lookup  table  allocated
    is  8  times (on 32 bit architectures) the size of the input.
    So if you want to have 1MB compressed, it'll allocate 8MB for
    the  history.  It  hit  me  when  I  was hunting a bug in the
    toaster earlier today. Doing an update to a toasted  item  of
    5MB, resulting in a new value of 10MB, the backend blew up to
    290MB of virtual memory - oh boy. I definitely need  to  make
    that smarter.

    When  I wrote it I never thought about items that big. It was
    before we had the idea of TOAST.

    This all might  open  another  discussion  I'll  start  in  a
    separate thread.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #