lztext and compression ratios... - Mailing list pgsql-general

From Jeffery Collins
Subject lztext and compression ratios...
Date
Msg-id 39636960.FEF2400C@onyx-technologies.com
Whole thread Raw
Responses Re: lztext and compression ratios...
Re: lztext and compression ratios...
List pgsql-general
I have been looking at using the lztext type and I have some
questions/observations.   Most of my experience comes from attempting to
compress text records in a different database (CTREE), but I think the
experience is transferable.

My typical table consists of variable length text records.  The average
length record is around 1K bytes.  I would like to compress my records
to save space and improve I/O performance (smaller records means more
records fit into the file system cache which means less I/O - or so the
theory goes).  I am not too concerned about CPU as we are using a 4-way
Sun Enterprise class server.  So compress seems like a good idea to me.

My experience with attempting to compress such a relatively small
(around 1K) text string is that the compression ration is not very
good.  This is because the string is not long enough for the LZ
compression algorithm to establish really good compression patterns and
the fact that the de-compression table has to be built into each
record.  What I have done in the past to get around these problems is
that I have "taught" the compression algorithm the patterns ahead of
time and stored the de-compression patterns in an external table.  Using
this technique, I have achieved *much* better compression ratios.

So my questions/comments are:

    - What are the typical compression rations on relatively small (i.e.
around 1K) strings seen with lztext?
    - Does anyone see a need/use for a generalized string compression
type that can be "trained" external to the individual records?
    - Am I crazy in even attempting to compress strings of this relative
size?  My largest table correct contains about 2 million entries of
roughly 1k size strings or about 2Gig of data.  If I could compress this
to about 33% of it's original size (not unreasonable with a trained LZ
compression), I would save a lot of disk space (not really important)
and a lot of file system cache space (very important) and be able to fit
the entire table into memory (very, very important).

Thank you,
Jeff



pgsql-general by date:

Previous
From: "John Daniels"
Date:
Subject: Re: responses to licensing discussion
Next
From: Tom Lane
Date:
Subject: Re: lztext and compression ratios...