Thread: LZ compressing data type
Hi, I just committed some changes that require an initdb. New are the discussed, simple LZ compressor, placed into /utils/adt/pg_compress.c, and a new lztext data type based on it. You'll find a fairly detailed description of the compression algorithm in the comments at the top of pg_lzcompress.c. Not very surprisingly to me it turns out, that the compressor does a very good job on rule action strings. I used the 48 rules that can be found in pg_rewrite after the regression test. The original string sizes range from 820 to 4615 and the compression rates from 35-76% with an average of 60%. The 4615 size rule action has been coded into a 1126 octet_length. For the lztext type, there are conversion functions to/from text and the length() and octet_length() functions available. Length() returns the same as length on text would. While octet_length returns the compressed size without VARHDRSZ. The type does not support MULTIBYTE or CYR_ENCODE up to now. It shouldn't be too hard to add it and after that, we might add another lzbpchar type too. The latter is really interesting, because an empty char(200) (thus containing 200 spaces) could result in an octet_length of 12 instead of 204 - that's a compression rate of 94.1%! It actually wouldn't, because the compressors default is to start only if the input is at least 256 bytes, but there is a mechanism so a lzbpchar type could force this behaviour. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> > Hi, > > I just committed some changes that require an initdb. > > New are the discussed, simple LZ compressor, placed into > /utils/adt/pg_compress.c, and a new lztext data type based on > it. You'll find a fairly detailed description of the > compression algorithm in the comments at the top of > pg_lzcompress.c. One question. You say this is an LZ algorythm. Is this the same as LZW, as in the same kind that gifs use. If so, are we sure that this algorythm is not covered by the Unisys patent? Their patent is on the algorythm for compression not on gifs themselves. If it is, you are liable for a $5,000 fee to use it throughout your site, or a per-licence fee if you are distributing (thay are worked out on a per case basis but typical licences are $5 per unit sold from what I am told). I came across this problem with a gif manipulation program that *I WROTE FROM SCRATCH* and had to switch to using the libungif compression routines. Just thought Id mention this, in case it has been overlooked. ~Michael
> > > > > Hi, > > > > I just committed some changes that require an initdb. > > > > New are the discussed, simple LZ compressor, placed into > > /utils/adt/pg_compress.c, and a new lztext data type based on > > it. You'll find a fairly detailed description of the > > compression algorithm in the comments at the top of > > pg_lzcompress.c. > > One question. > > You say this is an LZ algorythm. Is this the same as LZW, as in the > same kind that gifs use. If so, are we sure that this algorythm is not > covered by the Unisys patent? Their patent is on the algorythm for > compression not on gifs themselves. > > If it is, you are liable for a $5,000 fee to use it throughout your > site, or a per-licence fee if you are distributing (thay are worked out > on a per case basis but typical licences are $5 per unit sold from what > I am told). It't an SLZ algorithm, not LZW. There are FLZ and LZ77 out too (and I don't know how many other subtypes). At least, LZ is a family of compression algorithms where LZW is just a member of them. I've written the entire code from scatch, inspired by an article from Adisak Pochanayon dated back in 1993. If they have a license on this algorithm, they have one on code that can be coded from scatch in 20 hours after reading information that's claimed to be free on the internet, congrats. This is M$ practice, I never thought that there's more than one company out doing this kind of business. But thanks for the info. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> > If it is, you are liable for a $5,000 fee to use it throughout your > > site, or a per-licence fee if you are distributing (thay are worked out > > on a per case basis but typical licences are $5 per unit sold from what > > I am told). > > It't an SLZ algorithm, not LZW. There are FLZ and LZ77 out > too (and I don't know how many other subtypes). At least, LZ > is a family of compression algorithms where LZW is just a > member of them. > > [...] > > But thanks for the info. From the very beginning of US patent No. 4,558,302 (the thing behind Unisys's LZW): The compressor searches the input stream to determine the longest match to a stored string. Each stored string comprises a prefix string and an extension character where the extension character is the last character in the string and the prefix string comprises all but the extension character. Each string has a code signal associated therewith and a string is stored in the string table by, at least implicitly, storing the code signal for the string, the code signal for the string prefix and the extension character. The format my code stores is different to this definition. It only stores offset/length pairs or literal bytes, signaled by a bit in a control unit. So there is no prefix string and/or extension character at all. If someone might state that storing TAG's and LITERAL chars is still equivalent to PREFIX and EXTension character, I say that my code possibly output's sequences of immediately following TAG's, that aren't neccessarily PREFIX'es. One TAG could code a sequence of characters, not previously occured in any other TAG, without any intermediate LITERAL character occured. And my code forgets AUTOMATICALLY about too far behind occurences of matching strings for speed AND compression rate improvement. Thus I think my implementation is not related to this patent. When reading patent #4,558,302 I really remembered the question, if anyone could ever have a patent on rectangle front lights for cars. The definition is so general, that I cannot think that gzip, bzip or any other existing compression tool doesn't use it's METHOD. If such a GENERAL patent is possible at all, car manufacturers all over the world would have to pay millions of dollars license fee's just to sell cars with rectangular lights. I don't see any relationship between my code and what Unisys has a patent for. If they see, I might be blind - or they see a horizon from inside a black hole. If they need to, I'll enlighten them. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
At 03:23 18/11/99 +0100, Jan Wieck wrote: > > From the very beginning of US patent No. 4,558,302 (the thing > behind Unisys's LZW): > The other thing to remember is that there are a very large number of countries in which this patent does not apply because it was not even applied for until after the alogorithm was made public. In the first publication of the LZW algorithm, there was no patent notice, but I think the US patent had been applied for. For US-based distribution sites, however, you may find the threat of legal action from a large company is enough to make it less desirable to distribute. FWIW, I think Unisys patented the LZW algorithm only. It's probably a bit late to ask, but how difficult would it be to generalize your code to use the compressor/encryption library of choice? zlib and pgp spring to mind. This would kill three birds with one stone. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > FWIW, I think Unisys patented the LZW algorithm only. More specifically, Unisys patented Terry Welch's variant of the LZ78 class of algorithms. Jan's code falls in the substantially different LZ77 class. There could be problematic patents out there, but Unisys' is certainly not one. (If you don't know the difference between LZ77 and LZ78, see the comp.compression FAQ ... but don't raise alarms about compression patents if you don't know even that much about the field, eh?) regards, tom lane
At 23:28 17/11/99 -0500, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> FWIW, I think Unisys patented the LZW algorithm only. > >More specifically, Unisys patented Terry Welch's variant of the LZ78 >class of algorithms. Hence the 'W' in LZW, no? >Jan's code falls in the substantially different >LZ77 class. There could be problematic patents out there, but Unisys' >is certainly not one. > Is this a legal opinion, or a personal one? >(If you don't know the difference between LZ77 and LZ78, see the >comp.compression FAQ ... but don't raise alarms about compression >patents if you don't know even that much about the field, eh?) 1. I did not raise the alarm, I just responded to it. 2. Since Unisys have moved to block code that does not even use LZW compression, but happens to be compatible with most GIF decompressors, I think your comment is misinformed, eh? (See the GD graphics library) The fear is not whether one wins court cases, but whether you can afford to fight them. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 23:28 17/11/99 -0500, Tom Lane wrote: >> Jan's code falls in the substantially different >> LZ77 class. There could be problematic patents out there, but Unisys' >> is certainly not one. > Is this a legal opinion, or a personal one? I'm not a lawyer, but I believe I qualify as an expert witness when it comes to compression questions. It is not a matter of opinion whether Jan's code is LZ77 or LZ78, nor is it a matter of opinion which class Unisys has claims on a small piece of. > The fear is not whether one wins court cases, but whether you can afford to > fight them. There are patents related to databases. Shall we therefore shut down Postgres development and run screaming for the hills? If we let ourselves be intimidated by irrelevant patents, the Microsofts and Unisyses will win without a fight. regards, tom lane
Tom Lane wrote: > There are patents related to databases. Shall we therefore shut down > Postgres development and run screaming for the hills? If we let > ourselves be intimidated by irrelevant patents, the Microsofts and > Unisyses will win without a fight. It's definitely a LZ77 coder, and this technique is used in many other compressors too (lha, arj, zlib, gzip, ...). So I don't think we have to fear anything. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #