Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id 58471b21-2f8c-9fa9-63ca-4c37883b8307@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>)
Responses Re: [HACKERS] Custom compression methods  (Евгений Шишкин <itparanoia@gmail.com>)
List pgsql-hackers

On 11/20/2017 10:44 AM, Ildus Kurbangaliev wrote:
> On Mon, 20 Nov 2017 00:23:23 +0100
> Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> 
>> On 11/15/2017 02:13 PM, Robert Haas wrote:
>>> On Wed, Nov 15, 2017 at 4:09 AM, Ildus Kurbangaliev
>>> <i.kurbangaliev@postgrespro.ru> wrote:  
>>>> So in the next version of the patch I can just unlink the options
>>>> from compression methods and dropping compression method will not
>>>> affect already compressed tuples. They still could be
>>>> decompressed.  
>>>
>>> I guess I don't understand how that can work.  I mean, if somebody
>>> removes a compression method - i.e. uninstalls the library - and you
>>> don't have a way to make sure there are no tuples that can only be
>>> uncompressed by that library - then you've broken the database.
>>> Ideally, there should be a way to add a new compression method via
>>> an extension ... and then get rid of it and all dependencies
>>> thereupon. 
>>
>> I share your confusion. Once you do DROP COMPRESSION METHOD, there
>> must be no remaining data compressed with it. But that's what the
>> patch is doing already - it enforces this using dependencies, as
>> usual.
>>
>> Ildus, can you explain what you meant? How could the data still be
>> decompressed after DROP COMPRESSION METHOD, and possibly after
>> removing the .so library?
> 
> The removal of the .so library will broke all compressed tuples. I
> don't see a way to avoid it. I meant that DROP COMPRESSION METHOD could
> remove the record from 'pg_compression' table, but actually the
> compressed tuple needs only a record from 'pg_compression_opt' where
> its options are located. And there is dependency between an extension
> and the options so you can't just remove the extension without CASCADE,
> postgres will complain.
> 

I don't think we need to do anything smart here - it should behave just
like dropping a data type, for example. That is, error out if there are
columns using the compression method (without CASCADE), and drop all the
columns (with CASCADE).

Leaving around the pg_compression_opt is not a solution. Not only it's
confusing and I'm not aware about any extension  because the user is
likely to remove the .so file (perhaps not directly, but e.g. by
removing the rpm package providing it).

> Still it's a problem if the user used for example `SELECT
> <compressed_column> INTO * FROM *` because postgres will copy compressed
> tuples, and there will not be any dependencies between destination and
> the options.
> 

This seems like a rather fatal design flaw, though. I'd say we need to
force recompression of the data, in such cases. Otherwise all the
dependency tracking is rather pointless.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Alik Khilazhev
Date:
Subject: Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Next
From: Евгений Шишкин
Date:
Subject: Re: [HACKERS] Custom compression methods