Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) |
Date | |
Msg-id | 603c8f070901051839m395e091fu28bc292e49e75919@mail.gmail.com Whole thread Raw |
In response to | Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: QuickLZ compression algorithm (Re: Inclusion in
the PostgreSQL backend for toasting rows)
Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) |
List | pgsql-hackers |
> I suggest that before we make any knee-jerk responses, we need to go > back and reread the prior discussion. > http://archives.postgresql.org/pgsql-patches/2008-02/msg00053.php > and that message links to several older threads that were complaining > about the 8.3 behavior. In particular the notion of an upper limit > on what we should attempt to compress was discussed in this thread: > http://archives.postgresql.org/pgsql-general/2007-08/msg01129.php Thanks for the pointers. > After poking around in those threads a bit, I think that the current > threshold of 1MB was something I just made up on the fly (I did note > that it needed tuning...). Perhaps something like 10MB would be a > better default. Another possibility is to have different minimum > compression rates for "small" and "large" datums. After reading these discussions, I guess I still don't understand why we would treat small and large datums differently. It seems to me that you had it about right here: http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php # Or maybe it should just be a min_comp_rate and nothing else. # Compressing a 1GB field to 999MB is probably not very sane either. I agree with that. force_input_size doesn't seem like a good idea because compression can be useless on big datums just as it can be on little ones - the obvious case being media file formats that are already internally compressed. Even if you can squeeze a little more out, you're using a lot of CPU time for a very small gain in storage and/or I/O. Furthermore, on a large object, saving even 1MB is not very significant if the datum is 1GB in size - so, again, a percentage seems like the right thing. On the other hand, even after reading these threads, I still don't see any need to disable compression for large datums. I can't think of any reason why I would want to try compressing a 900kB object but not 1MB one. It makes sense to me to not compress if the object doesn't compress well, or if some initial segment of the object doesn't compress well (say, if we can't squeeze 10% out of the first 64kB), but size by itself doesn't seem significant. To put that another way, if small objects and large objects are to be treated differently, which one will we try harder to compress and why?Greg Stark makes an argument that we should try harderwhen it might avoid the need for a toast table: http://archives.postgresql.org/pgsql-hackers/2007-08/msg00087.php ...which has some merit, though clearly it would be a lot better if we could do it when, and only when, it was actually going to work. Also, not compressing very small datums (< 256 bytes) also seems smart, since that could end up producing a lot of extra compression attempts, most of which will end up saving little or no space. Apart from those two cases I don't see any clear motivation for discriminating on size. ...Robert
pgsql-hackers by date: