Re: [SQL] Re: lztext and compression ratios... - Mailing list pgsql-general

From JanWieck@t-online.de (Jan Wieck)
Subject Re: [SQL] Re: lztext and compression ratios...
Date
Msg-id 200007062109.XAA19800@hot.jw.home
Whole thread Raw
In response to Re: lztext and compression ratios...  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom Lane wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> >> As long as you brought it up: how sure are you that the method you've
> >> used is not subject to any patents?
>
> >     Now that you ask for it: I'm not sure. Could be.
>
> >> If you can show that this method uses no ideas not found in zlib,
> >> then I'll feel reassured
>
> >     To  do  so  I  don't know enough about the algorithms used in
> >     zlib.  Is there someone out here who could verify that  if  I
> >     detailed enough describe what our compression code does?
>
> After a quick look at the code, I don't think there is anything
> problematic about the data representation or the decompression
> algorithm.  The compression algorithm is another story, and it's
> not real well commented :-(.  The important issues are how you
> search for matches in the past text and how you decide which match
> is the best one to use.  Please update the code comments to describe
> that, and I'll take another look.

    Done. You'll find a new section in the top comments.

    While  writing  it  I  noticed  that  the algorithm is really
    expensive for big items. The history lookup  table  allocated
    is  8  times (on 32 bit architectures) the size of the input.
    So if you want to have 1MB compressed, it'll allocate 8MB for
    the  history.  It  hit  me  when  I  was hunting a bug in the
    toaster earlier today. Doing an update to a toasted  item  of
    5MB, resulting in a new value of 10MB, the backend blew up to
    290MB of virtual memory - oh boy. I definitely need  to  make
    that smarter.

    When  I wrote it I never thought about items that big. It was
    before we had the idea of TOAST.

    This all might  open  another  discussion  I'll  start  in  a
    separate thread.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



pgsql-general by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] PostgreSQL & the BSD License
Next
From: Bruce Momjian
Date:
Subject: Re: PostgreSQL 7.1