Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | 5cb1995e-a202-2739-bf46-1b6950e5bdb6@2ndquadrant.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [HACKERS] Custom compression methods
Re: [HACKERS] Custom compression methods Re: [HACKERS] Custom compression methods |
List | pgsql-hackers |
On 12/01/2017 10:52 PM, Andres Freund wrote: > On 2017-12-01 16:14:58 -0500, Robert Haas wrote: >> Honestly, if we can give everybody a 4% space reduction by >> switching to lz4, I think that's totally worth doing -- but let's >> not make people choose it, let's make it the default going forward, >> and keep pglz support around so we don't break pg_upgrade >> compatibility (and so people can continue to choose it if for some >> reason it works better in their use case). That kind of improvement >> is nothing special in a specific workload, but TOAST is a pretty >> general-purpose mechanism. I have become, through a few bitter >> experiences, a strong believer in the value of trying to reduce our >> on-disk footprint, and knocking 4% off the size of every TOAST >> table in the world does not sound worthless to me -- even though >> context-aware compression can doubtless do a lot better. > > +1. It's also a lot faster, and I've seen way way to many workloads > with 50%+ time spent in pglz. > TBH the 4% figure is something I mostly made up (I'm fake news!). On the mailing list archive (which I believe is pretty compressible) I observed something like 2.5% size reduction with lz4 compared to pglz, at least with the compression levels I've used ... Other algorithms (e.g. zstd) got significantly better compression (25%) compared to pglz, but in exchange for longer compression. I'm sure we could lower compression level to make it faster, but that will of course hurt the compression ratio. I don't think switching to a different compression algorithm is a way forward - it was proposed and explored repeatedly in the past, and every time it failed for a number of reasons, most of which are still valid. Firstly, it's going to be quite hard (or perhaps impossible) to find an algorithm that is "universally better" than pglz. Some algorithms do work better for text documents, some for binary blobs, etc. I don't think there's a win-win option. Sure, there are workloads where pglz performs poorly (I've seen such cases too), but IMHO that's more an argument for the custom compression method approach. pglz gives you good default compression in most cases, and you can change it for columns where it matters, and where a different space/time trade-off makes sense. Secondly, all the previous attempts ran into some legal issues, i.e. licensing and/or patents. Maybe the situation changed since then (no idea, haven't looked into that), but in the past the "pluggable" approach was proposed as a way to address this. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: