Thread: LZ compressing data type

LZ compressing data type

From
wieck@debis.com (Jan Wieck)
Date:
Hi,

    I just committed some changes that require an initdb.

    New  are  the  discussed,  simple  LZ compressor, placed into
    /utils/adt/pg_compress.c, and a new lztext data type based on
    it.   You'll  find  a  fairly  detailed  description  of  the
    compression  algorithm  in  the  comments  at  the   top   of
    pg_lzcompress.c.

    Not very surprisingly to me it turns out, that the compressor
    does a very good job on rule action strings. I  used  the  48
    rules  that  can  be found in pg_rewrite after the regression
    test. The original string sizes range from 820  to  4615  and
    the compression rates from 35-76% with an average of 60%. The
    4615  size  rule  action  has  been   coded   into   a   1126
    octet_length.

    For  the  lztext type, there are conversion functions to/from
    text and the length() and octet_length() functions available.
    Length()  returns  the  same  as  length on text would. While
    octet_length returns the compressed size without VARHDRSZ.

    The type does not support MULTIBYTE or CYR_ENCODE up to  now.
    It  shouldn't  be too hard to add it and after that, we might
    add  another  lzbpchar  type  too.  The  latter   is   really
    interesting,  because an empty char(200) (thus containing 200
    spaces) could result in an octet_length of 12 instead of  204
    -  that's  a compression rate of 94.1%! It actually wouldn't,
    because the compressors default is to start only if the input
    is at least 256 bytes, but there is a mechanism so a lzbpchar
    type could force this behaviour.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] LZ compressing data type

From
Michael Simms
Date:
> 
> Hi,
> 
>     I just committed some changes that require an initdb.
> 
>     New  are  the  discussed,  simple  LZ compressor, placed into
>     /utils/adt/pg_compress.c, and a new lztext data type based on
>     it.   You'll  find  a  fairly  detailed  description  of  the
>     compression  algorithm  in  the  comments  at  the   top   of
>     pg_lzcompress.c.

One question.

You say this is an LZ algorythm. Is this the same as LZW, as in the
same kind that gifs use. If so, are we sure that this algorythm is not
covered by the Unisys patent? Their patent is on the algorythm for
compression not on gifs themselves. 

If it is, you are liable for a $5,000 fee to use it throughout your
site, or a per-licence fee if you are distributing (thay are worked out
on a per case basis but typical licences are $5 per unit sold from what
I am told).

I came across this problem with a gif manipulation program that *I
WROTE FROM SCRATCH* and had to switch to using the libungif
compression routines.

Just thought Id mention this, in case it has been overlooked.
                    ~Michael


Re: [HACKERS] LZ compressing data type

From
wieck@debis.com (Jan Wieck)
Date:
>
> >
> > Hi,
> >
> >     I just committed some changes that require an initdb.
> >
> >     New  are  the  discussed,  simple  LZ compressor, placed into
> >     /utils/adt/pg_compress.c, and a new lztext data type based on
> >     it.   You'll  find  a  fairly  detailed  description  of  the
> >     compression  algorithm  in  the  comments  at  the   top   of
> >     pg_lzcompress.c.
>
> One question.
>
> You say this is an LZ algorythm. Is this the same as LZW, as in the
> same kind that gifs use. If so, are we sure that this algorythm is not
> covered by the Unisys patent? Their patent is on the algorythm for
> compression not on gifs themselves.
>
> If it is, you are liable for a $5,000 fee to use it throughout your
> site, or a per-licence fee if you are distributing (thay are worked out
> on a per case basis but typical licences are $5 per unit sold from what
> I am told).

    It't  an  SLZ  algorithm, not LZW. There are FLZ and LZ77 out
    too (and I don't know how many other subtypes). At least,  LZ
    is  a  family  of  compression algorithms where LZW is just a
    member of them.

    I've written the entire code  from  scatch,  inspired  by  an
    article  from  Adisak  Pochanayon dated back in 1993. If they
    have a license on this algorithm, they have one on code  that
    can   be   coded  from  scatch  in  20  hours  after  reading
    information that's  claimed  to  be  free  on  the  internet,
    congrats.

    This  is  M$ practice, I never thought that there's more than
    one company out doing this kind of business.

    But thanks for the info.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] LZ compressing data type

From
wieck@debis.com (Jan Wieck)
Date:
> > If it is, you are liable for a $5,000 fee to use it throughout your
> > site, or a per-licence fee if you are distributing (thay are worked out
> > on a per case basis but typical licences are $5 per unit sold from what
> > I am told).
>
>     It't  an  SLZ  algorithm, not LZW. There are FLZ and LZ77 out
>     too (and I don't know how many other subtypes). At least,  LZ
>     is  a  family  of  compression algorithms where LZW is just a
>     member of them.
>
> [...]
>
>     But thanks for the info.

    From the very beginning of US patent No. 4,558,302 (the thing
    behind Unisys's LZW):

        The compressor searches the input stream to determine the
        longest  match  to  a  stored  string. Each stored string
        comprises a prefix  string  and  an  extension  character
        where  the  extension  character is the last character in
        the string and the prefix string comprises  all  but  the
        extension  character.  Each  string  has  a  code  signal
        associated therewith and a string is stored in the string
        table  by,  at  least implicitly, storing the code signal
        for the string, the code signal for the string prefix and
        the extension character.

    The  format  my  code stores is different to this definition.
    It only stores offset/length pairs or literal bytes, signaled
    by  a  bit  in  a  control unit. So there is no prefix string
    and/or extension character at all.

    If someone might state that storing TAG's and  LITERAL  chars
    is  still equivalent to PREFIX and EXTension character, I say
    that my  code  possibly  output's  sequences  of  immediately
    following  TAG's, that aren't neccessarily PREFIX'es. One TAG
    could code a sequence of characters, not  previously  occured
    in  any other TAG, without any intermediate LITERAL character
    occured.

    And my  code  forgets  AUTOMATICALLY  about  too  far  behind
    occurences of matching strings for speed AND compression rate
    improvement.

    Thus I think my implementation is not related to this patent.

    When  reading  patent  #4,558,302  I  really  remembered  the
    question, if anyone could ever have  a  patent  on  rectangle
    front  lights for cars.  The definition is so general, that I
    cannot  think  that  gzip,  bzip  or   any   other   existing
    compression  tool doesn't use it's METHOD.  If such a GENERAL
    patent is possible at all, car  manufacturers  all  over  the
    world  would  have  to  pay millions of dollars license fee's
    just to sell cars with rectangular lights.

    I don't see any relationship between my code and what  Unisys
    has a patent for. If they see, I might be blind - or they see
    a horizon from inside a black hole. If  they  need  to,  I'll
    enlighten them.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] LZ compressing data type

From
Philip Warner
Date:
At 03:23 18/11/99 +0100, Jan Wieck wrote:
>
>    From the very beginning of US patent No. 4,558,302 (the thing
>    behind Unisys's LZW):
>

The other thing to remember is that there are a very large number of
countries in which this patent does not apply because it was not even
applied for until after the alogorithm was made public. In the first
publication of the LZW algorithm, there was no patent notice, but I think
the US patent had been applied for.

For US-based distribution sites, however, you may find the threat of legal
action from a large company is enough to make it less desirable to
distribute. 

FWIW, I think Unisys patented the LZW algorithm only.

It's probably a bit late to ask, but how difficult would it be to
generalize your code to use the compressor/encryption library of choice?
zlib and pgp spring to mind. This would kill three birds with one stone.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: [HACKERS] LZ compressing data type

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> FWIW, I think Unisys patented the LZW algorithm only.

More specifically, Unisys patented Terry Welch's variant of the LZ78
class of algorithms.  Jan's code falls in the substantially different
LZ77 class.  There could be problematic patents out there, but Unisys'
is certainly not one.

(If you don't know the difference between LZ77 and LZ78, see the
comp.compression FAQ ... but don't raise alarms about compression
patents if you don't know even that much about the field, eh?)
        regards, tom lane


Re: [HACKERS] LZ compressing data type

From
Philip Warner
Date:
At 23:28 17/11/99 -0500, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> FWIW, I think Unisys patented the LZW algorithm only.
>
>More specifically, Unisys patented Terry Welch's variant of the LZ78
>class of algorithms.  

Hence the 'W' in LZW, no?

>Jan's code falls in the substantially different
>LZ77 class.  There could be problematic patents out there, but Unisys'
>is certainly not one.
>

Is this a legal opinion, or a personal one?


>(If you don't know the difference between LZ77 and LZ78, see the
>comp.compression FAQ ... but don't raise alarms about compression
>patents if you don't know even that much about the field, eh?)

1. I did not raise the alarm, I just responded to it.

2. Since Unisys have moved to block code that does not even use LZW
compression, but happens to be compatible with most GIF decompressors, I
think your comment is misinformed, eh? (See the GD graphics library)

The fear is not whether one wins court cases, but whether you can afford to
fight them.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: [HACKERS] LZ compressing data type

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 23:28 17/11/99 -0500, Tom Lane wrote:
>> Jan's code falls in the substantially different
>> LZ77 class.  There could be problematic patents out there, but Unisys'
>> is certainly not one.

> Is this a legal opinion, or a personal one?

I'm not a lawyer, but I believe I qualify as an expert witness when
it comes to compression questions.  It is not a matter of opinion
whether Jan's code is LZ77 or LZ78, nor is it a matter of opinion
which class Unisys has claims on a small piece of.

> The fear is not whether one wins court cases, but whether you can afford to
> fight them.

There are patents related to databases.  Shall we therefore shut down
Postgres development and run screaming for the hills?  If we let
ourselves be intimidated by irrelevant patents, the Microsofts and
Unisyses will win without a fight.
        regards, tom lane


Re: [HACKERS] LZ compressing data type

From
wieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

> There are patents related to databases.  Shall we therefore shut down
> Postgres development and run screaming for the hills?  If we let
> ourselves be intimidated by irrelevant patents, the Microsofts and
> Unisyses will win without a fight.

    It's  definitely  a LZ77 coder, and this technique is used in
    many other compressors too (lha, arj, zlib, gzip, ...). So  I
    don't think we have to fear anything.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #