Re: [HACKERS] compression in LO and other fields - Mailing list pgsql-hackers

From Karel Zak - Zakkr
Subject Re: [HACKERS] compression in LO and other fields
Date
Msg-id Pine.LNX.3.96.991112101708.14930B-100000@ara.zf.jcu.cz
Whole thread Raw
In response to Re: [HACKERS] compression in LO and other fields  (wieck@debis.com (Jan Wieck))
Responses Re: [HACKERS] compression in LO and other fields  (wieck@debis.com (Jan Wieck))
List pgsql-hackers
On Fri, 12 Nov 1999, Jan Wieck wrote:

>     Just  in case someone want to implement a complete compressed
>     data type  (including  comarision  functions,  operators  and
>     indexing default operator class).
> 
>     I  already  made  some  tests  with  a type I called 'lztext'
>     locally.  Only the input-/output-functions exist so  far  and
>     as  the  name  might  suggest, it would be an alternative for
>     'text'. It uses a simple but fast, byte oriented LZ  backward
>     pointing  method.  No  Huffman coding or variable offset/size
>     tagging. First byte of a chunk  tells  bitwise  if  the  next
>     following  8  items are raw bytes to copy or 12 bit offset, 4
>     bit size copy information.  That is max back offset 4096  and
>     max match size 17 bytes.

I is your original implementation or you use any current compression 
code? I try bzip2, but output from this algorithm is total binary, 
I don't know how this use in PgSQL if in backend are all routines
(in/out) use *char (yes, I'am newbie for PgSQL hacking:-). 

> 
>     What   made  it  my  preferred  method  was  the  fact,  that
>     decompression is done entirely using the already decompressed
>     portion  of  the data, so it does not need any code tables or
>     the like at that time.
> 
>     It is really FASTEST on decompression, which I  assume  would
>     be  the  mostly often used operation on huge data types. With
>     some care,  comparision  could  be  done  on  the  fly  while
>     decompressing  two values, so that the entire comparision can
>     be aborted at the occurence of the first difference.
> 
>     The compression rates aren't that giantic.  I've  got  30-50%

Not is problem, that your implementation compress all data at once?
Typically compression use a stream, and compress only small a buffer 
in any cycle.

>     for  rule  plan  strings  (size  limit  on views!!!). And the
>     method used only allows for  buffer  back  references  of  4K
>     offsets  at  most,  so the rate will not grow for larger data
>     chunks. That's a heavy tradeoff between compression rate  and
>     no  memory  leakage  for sure and speed, I know, but I prefer
>     not to force it, instead I usually use a bigger  hammer  (the
>     tuple  size limit is still our original problem - and another
>     IBM 72GB disk doing 22-37 MB/s will make any compressing data
>     type obsolete then).
> 
>     Sorry  for the compression specific slang here.  Well, anyone
>     interested in the code?

Yes, for me - I finish to_char()/to_data() ora compatible routines 
(Thomas, you still quiet?) and this is new appeal for me :-)
                    Karel



pgsql-hackers by date:

Previous
From: Brian Hirt
Date:
Subject: Re: [HACKERS] Slow - grindingly slow - query
Next
From: Peter Eisentraut
Date:
Subject: Re: psql and \p\g