Re: lztext and compression ratios... - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: lztext and compression ratios...
Date
Msg-id 396D7238.E8C5A793@tm.ee
Whole thread Raw
In response to Re: Re: [SQL] Re: [GENERAL] lztext and compression ratios...  (JanWieck@t-online.de (Jan Wieck))
List pgsql-hackers
Tom Lane wrote:
> 
> JanWieck@t-online.de (Jan Wieck) writes:
> >     Some quick numbers though:
> >     I  simply  stripped  down pg_lzcompress.c to call compress2()
> >     and uncompress() instead of doing  anything  itself  (what  a
> >     nice,  small  source  file  :-).
> 
> I went at it in a different way: pulled out pg_lzcompress into a
> standalone source program that could also call zlib.  These numbers
> represent the pure compression or decompression time for memory-to-
> memory processing, no other overhead at all.  Each run was iterated
> 1000 times to make it long enough to time accurately (so you can
> read the times as "milliseconds per operation", though they're
> really seconds).

We could just make this part extensible as well, like the rest of 
postgres, so we would have directory tree like 

/compressors /nullcompressor /lzcompress /zlib /lzo /bzip2 /my_new_supercompressor
/classic_huffman_for_uppercase_american_english

and select the desired compressor at compilt time, or even better, on 
field by field basis at runtime, so that field that stores mainly 
tar.gz-s at compression level 9 will use nullcompressor, and others 
will use what is best for them.

> 
> >     Fix  it's history allocation for huge values and have someone
> >     (PgSQL Inc.?)  patenting the compression algorithm, so  we're
> >     safe at some point in the future.
> 
> That would be a really *bad* idea.  What will people say if we say
> "Postgres contains patented algorithms, but we'll let you use them
> for free" ?  They'll say "no thanks, I remember Unisys' repeatedly
> broken promises about the GIF patent" and stay away in droves.
> There is a *lot* of bad blood in the air about compression patents
> of any sort.  We mustn't risk tainting Postgres' reputation with
> that mess.
> (In any case, one would hope you couldn't get a patent on this
> method, though I suppose it never pays to overestimate the competence
> of the USPTO...)

And AFAIK (IANAL ;) you can only patent previously _unpublished_ work,
even by the patent applicant.

> 
> >     If there's a patent problem
> >     in it, we are already running the risk to get sued, the  PGLZ
> >     code got shipped with 7.0, used in lztext.
> 
> But it hasn't been documented or advertised.  If we take it out
> again in 7.1, I think our exposure to potential lawsuits from it is
> negligible.  Not that I think there is any big risk there anyway,
> but we ought to consider the possibility.
> 
> My feeling is that going with zlib is probably the right choice.
> The technical case for using a homebrew compressor instead isn't
> very compelling,

Speed seems to be a good reason, if we can keep it up.

> and the advantages of using a standardized,
> known-patent-free library are not to be ignored.

OTOH, there are possibly patents on other part of postgres, 
like indexing, storage methods, the mere fact that something is 
stored in another relation, using 'Z' as a protocol character, etc. 
etc. So using a patent-free compression library does not help much.

So if PgSQL Inc. has lots of lawyers with nothing to do, they could do
some patent research and scare all developers away with their findings
;)

-------------
Hannu


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Some Improvement
Next
From: Zeugswetter Andreas SB
Date:
Subject: AW: AW: lztext and compression ratios...