Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date
Msg-id 200901060407.n06478e02611@momjian.us
Whole thread Raw
In response to Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  ("Robert Haas" <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas wrote:
> > After poking around in those threads a bit, I think that the current
> > threshold of 1MB was something I just made up on the fly (I did note
> > that it needed tuning...).  Perhaps something like 10MB would be a
> > better default.  Another possibility is to have different minimum
> > compression rates for "small" and "large" datums.
> 
> After reading these discussions, I guess I still don't understand why
> we would treat small and large datums differently.  It seems to me
> that you had it about right here:
> 
> http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php
> 
> # Or maybe it should just be a min_comp_rate and nothing else.
> # Compressing a 1GB field to 999MB is probably not very sane either.
> 
> I agree with that.  force_input_size doesn't seem like a good idea
> because compression can be useless on big datums just as it can be on
> little ones - the obvious case being media file formats that are
> already internally compressed.  Even if you can squeeze a little more
> out, you're using a lot of CPU time for a very small gain in storage
> and/or I/O.  Furthermore, on a large object, saving even 1MB is not
> very significant if the datum is 1GB in size - so, again, a percentage
> seems like the right thing.
> 
> On the other hand, even after reading these threads, I still don't see
> any need to disable compression for large datums.  I can't think of
> any reason why I would want to try compressing a 900kB object but not
> 1MB one.  It makes sense to me to not compress if the object doesn't
> compress well, or if some initial segment of the object doesn't
> compress well (say, if we can't squeeze 10% out of the first 64kB),
> but size by itself doesn't seem significant.
> 
> To put that another way, if small objects and large objects are to be
> treated differently, which one will we try harder to compress and why?
>  Greg Stark makes an argument that we should try harder when it might
> avoid the need for a toast table:
> 
> http://archives.postgresql.org/pgsql-hackers/2007-08/msg00087.php
> 
> ...which has some merit, though clearly it would be a lot better if we
> could do it when, and only when, it was actually going to work.  Also,
> not compressing very small datums (< 256 bytes) also seems smart,
> since that could end up producing a lot of extra compression attempts,
> most of which will end up saving little or no space.
> 
> Apart from those two cases I don't see any clear motivation for
> discriminating on size.

Agreed.  I have seen a lot of discussion on this topic and the majority
seems to feel that a size limit on compress doesn't make sense in the
general case.  It is true that there is dimminished performance for
substring operations as TOAST values get longer but compression does
give better performance for longer values for full field retrieval.  I
don't think we should be optimizing TOAST for substrings --- users who
know they are going to be using substrings can specify the storage type
for the column directly.  Having any kind of maximum makes it hard for
administrators to know exactly what is happening in TOAST.

I think the upper limit should be removed with a documentation mention
in the substring() section mentioning the use of non-compressed TOAST
storage.  The only way I think an upper compression limit makes sense is
if the backend can't uncompress the value to return it to the user, but
then you have to wonder how the value got into the database in the first
place.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: dblink vs SQL/MED - security and implementation details
Next
From: Tom Lane
Date:
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)