pg_lzcompress strategy parameters - Mailing list pgsql-hackers

From Tom Lane
Subject pg_lzcompress strategy parameters
Date
Msg-id 8566.1186265970@sss.pgh.pa.us
Whole thread Raw
Responses Re: pg_lzcompress strategy parameters  ("Joshua D. Drake" <jd@commandprompt.com>)
Re: pg_lzcompress strategy parameters  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
Greg complained here
http://archives.postgresql.org/pgsql-patches/2007-07/msg00342.php
that the default strategy parameters used by the TOAST compressor
might need some adjustment.  After thinking about it a little I wonder
whether they're not even more broken than that.  The present behavior
is:

1. Never compress for inputs < min_input_size (256 bytes by default).
2. Compress inputs >= force_input_size (6K by default), as long as  compression produces a result at least 1 byte
smallerthan the input.
 
3. For inputs between min_input_size and force_input_size, compress only  if compression of at least min_comp_rate
percentis achieved  (20% by default).
 

This whole structure seems a bit broken, independently of whether the
particular parameter values are good.  If the compressor is given an
input of 1000000 bytes and manages to compress it to 999999 bytes,
we'll store it compressed, and pay for decompression cycles on every
access, even though the I/O savings are nonexistent.  That's not sane.

I'm inclined to think that the concept of force_input_size is wrong.
Instead I suggest that we have a min_comp_rate (minimum percentage
savings) and a min_savings (minimum absolute savings), and compress
if either one is met.  For instance, with min_comp_rate = 10% and
min_savings = 1MB, then for inputs below 10MB you'd require at least
10% savings to compress them, but for inputs above 10MB you'd require
at least 1MB saved to compress.

Or maybe it should just be a min_comp_rate and nothing else.
Compressing a 1GB field to 999MB is probably not very sane either.

This is all independent of what the specific parameter settings should
be, but I concur with Greg that those could do with a fresh look.

Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Document and/or remove unreachable code in tuptoaster.c from varvarlena patch
Next
From: "Joshua D. Drake"
Date:
Subject: Re: pg_lzcompress strategy parameters