Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Ildus Kurbangaliev
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id 20171102124101.5a28ecab@wp.localdomain
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: [HACKERS] Custom compression methods  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On Wed, 1 Nov 2017 17:05:58 -0400
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 9/12/17 10:55, Ildus Kurbangaliev wrote:
> >> The patch also includes custom compression method for tsvector
> >> which is used in tests.
> >>
> >> [1]
> >> https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com  
> > Attached rebased version of the patch. Added support of pg_dump, the
> > code was simplified, and a separate cache for compression options
> > was added.  
> 
> I would like to see some more examples of how this would be used, so
> we can see how it should all fit together.
> 
> So far, it's not clear to me that we need a compression method as a
> standalone top-level object.  It would make sense, perhaps, to have a
> compression function attached to a type, so a type can provide a
> compression function that is suitable for its specific storage.

In this patch compression methods is suitable for MAIN and EXTENDED
storages like in current implementation in postgres. Just instead only
of LZ4 you can specify any other compression method. 

Idea is not to change compression for some types, but give the user and
extension developers opportunity to change how data in some attribute
will be compressed because they know about it more than database itself.

> 
> The proposal here is very general: You can use any of the eligible
> compression methods for any attribute.  That seems very complicated to
> manage.  Any attribute could be compressed using either a choice of
> general compression methods or a type-specific compression method, or
> perhaps another type-specific compression method.  That's a lot.  Is
> this about packing certain types better, or trying out different
> compression algorithms, or about changing the TOAST thresholds, and
> so on?

It is about extensibility of postgres, for example if you
need to store a lot of time series data you can create an extension that
stores array of timestamps in more optimized way, using delta encoding
or something else. I'm not sure that such specialized things should be
in core.

In case of array of timestamps in could look like this:

CREATE EXTENSION timeseries; -- some extension that provides compression            method

Extension installs a compression method:

CREATE OR REPLACE FUNCTION timestamps_compression_handler(INTERNAL)
RETURNS COMPRESSION_HANDLER AS 'MODULE_PATHNAME',
'timestamps_compression_handler' LANGUAGE C STRICT;

CREATE COMPRESSION METHOD cm1 HANDLER timestamps_compression_handler;

And user can specify it in his table:

CREATE TABLE t1 (time_series_data timestamp[] COMPRESSED cm1;
)

I think generalization of some method to a type is not a good idea. For
some attribute you could be happy with builtin LZ4, for other you can
need more compressibility and so on.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] [POC] hash partitioning
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Add some const decorations to prototypes