Thread: Re: [GENERAL] lztext and compression ratios...
Tom Lane wrote: > JanWieck@t-online.de (Jan Wieck) writes: > > One thing to keep in mind is that the LZ algorithm you're > > thinking of must be distributable under the terms of the BSD > > license. If it's copyrighted or patented by any third party, > > not agreeing to these terms, it's out of discussion and will > > never appear in the Postgres source tree. Especially the LZ > > algorithm used in GIF is one of these show-stoppers. > > As long as you brought it up: how sure are you that the method you've > used is not subject to any patents? The mere fact that you learned > it from someone who didn't patent it does not guarantee anything --- > someone else could have invented it independently and filed for a > patent. Now that you ask for it: I'm not sure. Could be. > If you can show that this method uses no ideas not found in zlib, > then I'll feel reassured --- a good deal of legal research went into > zlib to make sure it didn't fall foul of any patents, and zlib has > now been around long enough that it'd be tough for anyone to get a > new patent on one of its techniques. But if SLZ has any additional > ideas in it, then we could be in trouble. There are an awful lot of > compression patents. To do so I don't know enough about the algorithms used in zlib. Is there someone out here who could verify that if I detailed enough describe what our compression code does? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
JanWieck@t-online.de (Jan Wieck) writes: >> As long as you brought it up: how sure are you that the method you've >> used is not subject to any patents? > Now that you ask for it: I'm not sure. Could be. >> If you can show that this method uses no ideas not found in zlib, >> then I'll feel reassured > To do so I don't know enough about the algorithms used in > zlib. Is there someone out here who could verify that if I > detailed enough describe what our compression code does? After a quick look at the code, I don't think there is anything problematic about the data representation or the decompression algorithm. The compression algorithm is another story, and it's not real well commented :-(. The important issues are how you search for matches in the past text and how you decide which match is the best one to use. Please update the code comments to describe that, and I'll take another look. regards, tom lane
Tom Lane wrote: > JanWieck@t-online.de (Jan Wieck) writes: > >> As long as you brought it up: how sure are you that the method you've > >> used is not subject to any patents? > > > Now that you ask for it: I'm not sure. Could be. > > >> If you can show that this method uses no ideas not found in zlib, > >> then I'll feel reassured > > > To do so I don't know enough about the algorithms used in > > zlib. Is there someone out here who could verify that if I > > detailed enough describe what our compression code does? > > After a quick look at the code, I don't think there is anything > problematic about the data representation or the decompression > algorithm. The compression algorithm is another story, and it's > not real well commented :-(. The important issues are how you > search for matches in the past text and how you decide which match > is the best one to use. Please update the code comments to describe > that, and I'll take another look. Done. You'll find a new section in the top comments. While writing it I noticed that the algorithm is really expensive for big items. The history lookup table allocated is 8 times (on 32 bit architectures) the size of the input. So if you want to have 1MB compressed, it'll allocate 8MB for the history. It hit me when I was hunting a bug in the toaster earlier today. Doing an update to a toasted item of 5MB, resulting in a new value of 10MB, the backend blew up to 290MB of virtual memory - oh boy. I definitely need to make that smarter. When I wrote it I never thought about items that big. It was before we had the idea of TOAST. This all might open another discussion I'll start in a separate thread. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #