LZ compressing data type - Mailing list pgsql-hackers

From wieck@debis.com (Jan Wieck)
Subject LZ compressing data type
Date
Msg-id m11oDim-0003kGC@orion.SAPserv.Hamburg.dsh.de
Whole thread Raw
Responses Re: [HACKERS] LZ compressing data type  (Michael Simms <grim@argh.demon.co.uk>)
List pgsql-hackers
Hi,

    I just committed some changes that require an initdb.

    New  are  the  discussed,  simple  LZ compressor, placed into
    /utils/adt/pg_compress.c, and a new lztext data type based on
    it.   You'll  find  a  fairly  detailed  description  of  the
    compression  algorithm  in  the  comments  at  the   top   of
    pg_lzcompress.c.

    Not very surprisingly to me it turns out, that the compressor
    does a very good job on rule action strings. I  used  the  48
    rules  that  can  be found in pg_rewrite after the regression
    test. The original string sizes range from 820  to  4615  and
    the compression rates from 35-76% with an average of 60%. The
    4615  size  rule  action  has  been   coded   into   a   1126
    octet_length.

    For  the  lztext type, there are conversion functions to/from
    text and the length() and octet_length() functions available.
    Length()  returns  the  same  as  length on text would. While
    octet_length returns the compressed size without VARHDRSZ.

    The type does not support MULTIBYTE or CYR_ENCODE up to  now.
    It  shouldn't  be too hard to add it and after that, we might
    add  another  lzbpchar  type  too.  The  latter   is   really
    interesting,  because an empty char(200) (thus containing 200
    spaces) could result in an octet_length of 12 instead of  204
    -  that's  a compression rate of 94.1%! It actually wouldn't,
    because the compressors default is to start only if the input
    is at least 256 bytes, but there is a mechanism so a lzbpchar
    type could force this behaviour.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] regression tests
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] regression tests