Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Ildus Kurbangaliev
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id 20180423213928.7c704810@gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers
On Mon, 23 Apr 2018 19:34:38 +0300
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

> >  
> Sorry, I really looking at this patch under the different angle.
> And this is why I have some doubts about general idea.
> Postgres allows to defined custom types, access methods,...
> But do you know any production system using some special data types
> or custom indexes which are not included in standard Postgres
> distribution or popular extensions (like postgis)?
> 
> IMHO end-user do not have skills and time to create their own 
> compression algorithms. And without knowledge of specific of
> particular data set,
> it is very hard to implement something more efficient than universal 
> compression library.
> But if you think that it is not a right place and time to discuss it,
> I do not insist.
> 
> But in any case, I think that it will be useful to provide some more 
> examples of custom compression API usage.
>  From my point of view the most useful will be integration with zstd.
> But if it is possible to find some example of data-specific
> compression algorithms which show better results than universal
> compression, it will be even more impressive.
> 
> 

Ok, let me clear up the purpose of this patch. I understand that you
want to compress everything by it but now the idea is just to bring
basic functionality to compress toasting values with external
compression algorithms. It's unlikely that compression algorithms like
zstd, snappy and others will be in postgres core but with this patch
it's really easy to make an extension and start to compress values
using it right away. And the end-user should not be expert in
compression algorithms to make such extension. One of these algorithms
could end up in core if its license will allow it.

I'm not trying to cover all the places in postgres which will benefit
from compression, and this patch only is the first step. It's quite big
already and with every new feature that will increase its size, chances
of its reviewing and commiting will decrease.

The API is very simple now and contains what an every compression
method can do - get some block of data and return a compressed form
of the data. And it can be extended with streaming and other features in
the future.

Maybe the reason of your confusion is that there is no GUC that changes
pglz to some custom compression so all new attributes will use it. I
will think about adding it. Also there was a discussion about
specifying the compression for the type and it was decided that's
better to do it later by a separate patch.

As an example of specialized compression could be time series
compression described in [1]. [2] contains an example of an extension
that adds lz4 compression using this patch.

[1] http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
[2] https://github.com/zilder/pg_lz4


-- 
----
Regards,
Ildus Kurbangaliev


pgsql-hackers by date:

Previous
From: Vladimir Borodin
Date:
Subject: Re: Built-in connection pooling
Next
From: Robert Haas
Date:
Subject: Re: Built-in connection pooling