Re: pg_lzcompress strategy parameters - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: pg_lzcompress strategy parameters
Date
Msg-id 87bqdlkheg.fsf@oxford.xeocode.com
Whole thread Raw
In response to pg_lzcompress strategy parameters  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_lzcompress strategy parameters  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> This whole structure seems a bit broken, independently of whether the
> particular parameter values are good.  If the compressor is given an
> input of 1000000 bytes and manages to compress it to 999999 bytes,
> we'll store it compressed, and pay for decompression cycles on every
> access, even though the I/O savings are nonexistent.  That's not sane.

Especially given that uncompressed toasted data is quite a bit more flexible
in that it can handle substr() efficiently.

Thinking about it, if the datum is stored inline then a single byte saved is
at least theoretically helpful. If it's stored in a toast table then anything
less than 2k is pretty slim odds to be helpful at all even if the percentage
gain is pretty big.

I don't know what the right answer is yet but it looks to me like there does
need to be two strategies, one for inline toasted tuples and one for
externally toasted tuples.

Unfortunately that's not the way the toaster is structured. First it goes
through and compresses all the fields starting with the largest and then it
starts pushing out to external storage all the fields starting with the
largest remaining. It doesn't really know whether something's going to be
stored externally when it's compressing.

It seems to me that having a fairly high minimum percentage of 25% would get
pretty close to the intended behaviour. Small data which happens to be highly
compressible would only have to save 8-32 bytes to be compressed. Data over 8k
would have to save at least 2k or more to be compressed.

(Incidentally, this means what I said earlier about uselessly trying to
compress objects below 256 is even grosser than I realized. If you have a
single large object which even after compressing will be over the toast target
it will force *every* varlena to be considered for compression even though
they mostly can't be compressed. Considering a varlena smaller than 256 for
compression only costs a useless palloc, so it's not the end of the world but
still. It does seem kind of strange that a tuple which otherwise wouldn't be
toasted at all suddenly gets all its fields compressed if you add one more
field which ends up being stored externally.)

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Document and/or remove unreachable code in tuptoaster.c from varvarlena patch
Next
From: Andrew Dunstan
Date:
Subject: Re: Strange file in cvs repo