Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id 3cc48e8f-8d07-b5f9-ff13-dcfd3e0ec169@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: [HACKERS] Custom compression methods
List pgsql-hackers

On 12/13/2017 05:55 PM, Alvaro Herrera wrote:
> Tomas Vondra wrote:
> 
>> On 12/13/2017 01:54 AM, Robert Haas wrote:
> 
>>> 3. Compression is only applied to large-ish values.  If you are just
>>> making the data type representation more compact, you probably want to
>>> apply the new representation to all values.  If you are compressing in
>>> the sense that the original data gets smaller but harder to interpret,
>>> then you probably only want to apply the technique where the value is
>>> already pretty wide, and maybe respect the user's configured storage
>>> attributes.  TOAST knows about some of that kind of stuff.
>>
>> Good point. One such parameter that I really miss is compression level.
>> I can imagine tuning it through CREATE COMPRESSION METHOD, but it does
>> not seem quite possible with compression happening in a datatype.
> 
> Hmm, actually isn't that the sort of thing that you would tweak using a
> column-level option instead of a compression method?
>   ALTER TABLE ALTER COLUMN SET (compression_level=123)
> The only thing we need for this is to make tuptoaster.c aware of the
> need to check for a parameter.
> 

Wouldn't that require some universal compression level, shared by all
supported compression algorithms? I don't think there is such thing.

Defining it should not be extremely difficult, although I'm sure there
will be some cumbersome cases. For example what if an algorithm "a"
supports compression levels 0-10, and algorithm "b" only supports 0-3?

You may define 11 "universal" compression levels, and map the four
levels for "b" to that (how). But then everyone has to understand how
that "universal" mapping is defined.

Another issue is that there are algorithms without a compression level
(e.g. pglz does not have one, AFAICS), or with somewhat definition (lz4
does not have levels, and instead has "acceleration" which may be an
arbitrary positive integer, so not really compatible with "universal"
compression level).

So to me the

    ALTER TABLE ALTER COLUMN SET (compression_level=123)

seems more like an unnecessary hurdle ...

>>> I don't think TOAST needs to be entirely transparent for the
>>> datatypes.  We've already dipped our toe in the water by allowing some
>>> operations on "short" varlenas, and there's really nothing to prevent
>>> a given datatype from going further.  The OID problem you mentioned
>>> would presumably be solved by hard-coding the OIDs for any built-in,
>>> privileged compression methods.
>>
>> Stupid question, but what do you mean by "short" varlenas?
> 
> Those are varlenas with 1-byte header rather than the standard 4-byte
> header.
> 

OK, that's what I thought. But that is still pretty transparent to the
data types, no?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: [HACKERS] Surjective functional indexes
Next
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] A design for amcheck heapam verification