Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) - Mailing list pgsql-hackers

From Stephen R. van den Berg
Subject Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date
Msg-id 20090105231137.GB1251@cuci.nl
Whole thread Raw
In response to Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
>"Robert Haas" <robertmhaas@gmail.com> writes:
>> The whole thing got started because Alex Hunsaker pointed out that his
>> database got a lot bigger because we disabled compression on columns >
>> 1MB.  It seems like the obvious thing to do is turn it back on again.

>After poking around in those threads a bit, I think that the current
>threshold of 1MB was something I just made up on the fly (I did note
>that it needed tuning...).  Perhaps something like 10MB would be a
>better default.  Another possibility is to have different minimum
>compression rates for "small" and "large" datums.

As far as I can imagine, the following use cases apply:
a. Columnsize <= 2048 bytes without substring access.
b. Columnsize <= 2048 bytes with substring access.
c. Columnsize  > 2048 bytes compressible without substring access (text).
d. Columnsize  > 2048 bytes uncompressible with substring access (multimedia).

Can anyone think of another use case I missed here?

To cover those cases, the following solutions seem feasible:
Sa. Disable compression for this column (manually, by the DBA).
Sb. Check if the compression saves more than 20%, store uncompressed otherwise.
Sc. Check if the compression saves more than 20%, store uncompressed otherwise.
Sd. Check if the compression saves more than 20%, store uncompressed otherwise.

For Sb, Sc and Sd we should probably only check the first 256KB or so to
determine the expected savings.
-- 
Sincerely,          Stephen R. van den Berg.

"Well, if we're going to make a party of it, let's nibble Nobby's nuts!"


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Next
From: Alvaro Herrera
Date:
Subject: Re: Segfault on CVS HEAD