Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date
Msg-id 603c8f070901051839m395e091fu28bc292e49e75919@mail.gmail.com
Whole thread Raw
In response to Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  (Bruce Momjian <bruce@momjian.us>)
Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> I suggest that before we make any knee-jerk responses, we need to go
> back and reread the prior discussion.
> http://archives.postgresql.org/pgsql-patches/2008-02/msg00053.php
> and that message links to several older threads that were complaining
> about the 8.3 behavior.  In particular the notion of an upper limit
> on what we should attempt to compress was discussed in this thread:
> http://archives.postgresql.org/pgsql-general/2007-08/msg01129.php

Thanks for the pointers.

> After poking around in those threads a bit, I think that the current
> threshold of 1MB was something I just made up on the fly (I did note
> that it needed tuning...).  Perhaps something like 10MB would be a
> better default.  Another possibility is to have different minimum
> compression rates for "small" and "large" datums.

After reading these discussions, I guess I still don't understand why
we would treat small and large datums differently.  It seems to me
that you had it about right here:

http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php

# Or maybe it should just be a min_comp_rate and nothing else.
# Compressing a 1GB field to 999MB is probably not very sane either.

I agree with that.  force_input_size doesn't seem like a good idea
because compression can be useless on big datums just as it can be on
little ones - the obvious case being media file formats that are
already internally compressed.  Even if you can squeeze a little more
out, you're using a lot of CPU time for a very small gain in storage
and/or I/O.  Furthermore, on a large object, saving even 1MB is not
very significant if the datum is 1GB in size - so, again, a percentage
seems like the right thing.

On the other hand, even after reading these threads, I still don't see
any need to disable compression for large datums.  I can't think of
any reason why I would want to try compressing a 900kB object but not
1MB one.  It makes sense to me to not compress if the object doesn't
compress well, or if some initial segment of the object doesn't
compress well (say, if we can't squeeze 10% out of the first 64kB),
but size by itself doesn't seem significant.

To put that another way, if small objects and large objects are to be
treated differently, which one will we try harder to compress and why?Greg Stark makes an argument that we should try
harderwhen it might
 
avoid the need for a toast table:

http://archives.postgresql.org/pgsql-hackers/2007-08/msg00087.php

...which has some merit, though clearly it would be a lot better if we
could do it when, and only when, it was actually going to work.  Also,
not compressing very small datums (< 256 bytes) also seems smart,
since that could end up producing a lot of extra compression attempts,
most of which will end up saving little or no space.

Apart from those two cases I don't see any clear motivation for
discriminating on size.

...Robert


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Segfault on CVS HEAD
Next
From: Bruce Momjian
Date:
Subject: Fix for compiler warning in reloptions.c