[HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Ildus Kurbangaliev
Subject [HACKERS] Custom compression methods
Date
Msg-id 20170907194236.4cefce96@wp.localdomain
Whole thread Raw
Responses Re: [HACKERS] Custom compression methods
Re: [HACKERS] Custom compression methods
List pgsql-hackers
Hello hackers!

I've attached a patch that implements custom compression
methods. This patch is based on Nikita Glukhov's code (which he hasn't
publish in mailing lists) for jsonb compression. This is early but
working version of the patch, and there are still few fixes and features
that should be implemented (like pg_dump support and support of
compression options for types), and it requires more testing. But I'd
like to get some feedback at the current stage first.

There's been a proposal [1] of Alexander Korotkov and some discussion
about custom compression methods before. This is an implementation of
per-datum compression. Syntax is similar to the one in proposal but not
the same.

Syntax:

CREATE COMPRESSION METHOD <cmname> HANDLER <compression_handler>;
DROP COMPRESSION METHOD <cmname>;

Compression handler is a function that returns a structure containing
compression routines:

- configure - function called when the compression method applied to
  an attribute
- drop - called when the compression method is removed from an attribute
- compress - compress function
- decompress - decompress function

User can create compressed columns with the commands below:

CREATE TABLE t(a tsvector COMPRESSED <cmname> WITH <options>);
ALTER TABLE t ALTER COLUMN a SET COMPRESSED <cmname> WITH <options>; 
ALTER TABLE t ALTER COLUMN a SET NOT COMPRESSED;

Also there is syntax of binding compression methods to types:

ALTER TYPE <type> SET COMPRESSED <cmname>;
ALTER TYPE <type> SET NOT COMPRESSED;

There are two new tables in the catalog, pg_compression and
pg_compression_opt. pg_compression is used as storage of compression
methods, and pg_compression_opt is used to store specific compression
options for particular column.

When user binds a compression method to some column a new record in
pg_compression_opt is created and all further attribute values will
contain compression options Oid while old values will remain unchanged.
And when we alter a compression method for
the attribute it won't change previous record in pg_compression_opt.
Instead it'll create a new one and new values will be stored
with new Oid. That way there is no need of recompression of the old
tuples. And also tuples containing compressed datums can be copied to
other tables so records in pg_compression_opt shouldn't be removed. In
the current patch they can be removed with DROP COMPRESSION METHOD
CASCADE, but after that decompression won't be possible on compressed
tuples. Maybe CASCADE should keep compression options.

I haven't changed the base logic of working with compressed datums. It
means that custom compressed datums behave exactly the same as current
LZ compressed datums, and the logic differs only in toast_compress_datum
and toast_decompress_datum.

This patch doesn't break backward compability and should work seamlessly
with older version of database. I used one of two free bits in
`va_rawsize` from `varattrib_4b->va_compressed` as flag of custom
compressed datums. Also I renamed it to `va_info` since it contains not
only rawsize now.

The patch also includes custom compression method for tsvector which is
used in tests.

[1]
https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com
-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: [HACKERS] Secondary index access optimizations
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] Re: issue: record or row variable cannot be part ofmultiple-item INTO list