Re: pluggable compression support - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: pluggable compression support
Date
Msg-id 51BD909B.5070602@2ndQuadrant.com
Whole thread Raw
In response to Re: pluggable compression support  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 06/16/2013 03:50 AM, Robert Haas wrote:
> On Sat, Jun 15, 2013 at 8:11 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>> Claiming that the algorithm will be one of only two (current and
>> "whatever algorithm we come up with ") suggests that it is
>> only one bit, which is undoubtedly too little for having a "pluggable"
>> compression API :)
> See http://www.postgresql.org/message-id/20130607143053.GJ29964@alap2.anarazel.de
>
>>> But those identifiers should be *small* (since they are added to all
>>> Datums)
>> if there will be any alignment at all between the datums, then
>> one byte will be lost in the noise ("remember: nobody will need
>> more than 256 compression algorithms")
>> OTOH, if you plan to put these format markers in the compressed
>> stream and change the compression algorithm while reading it, I am lost.
> The above-linked email addresses this point as well: there are bits
> available in the toast pointer.  But there aren't MANY bits without
> increasing the storage footprint, so trying to do something that's
> more general than we really need is going to cost us in terms of
> on-disk footprint.  Is that really worth it?  And if so, why?  I don't
> find the idea of a trade-off between compression/decompression speed
> and compression ratio to be very exciting.  As Andres says, bzip2 is
> impractically slow for ... almost everything.  If there's a good
> BSD-licensed algorithm available, let's just use it and be done.  Our
> current algorithm has lasted us a very long time; 
My scepticism about current algorithm comes from a brief test
(which may have been flawed) which showed almost no compression
for plain XML fields.

It may very well be that I was doing something stupid and got
wrong results though, as I the functions to ask for toast internals
like "is this field compressed" or "what is the compressed
length of this field" are well hidden - if available at all - in our
documentation.

> I see no reason to
> think we'll want to change this again for another 10 years, and by
> that time, we may have redesigned the storage format altogether,
> making the limited extensibility of our current TOAST pointer format
> moot.
Agreed.

I just hoped that "pluggable compression support" would
be something that enables people not directly interested in
hacking the core to experiment with compression and thereby
possibly coming up with something that changes your "not
useful in next 10 years" prediction :)

Seeing that the scope of this patch is actually much narrower,
I have no objections of doing it as proposed by Andres.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Patch for fail-back without fresh backup
Next
From: Simon Riggs
Date:
Subject: Re: Patch for fail-back without fresh backup