Thread: [HACKERS] Custom compression methods

[HACKERS] Custom compression methods

From

Ildus Kurbangaliev

Date:

07 September 2017, 19:42:36

Hello hackers!

I've attached a patch that implements custom compression
methods. This patch is based on Nikita Glukhov's code (which he hasn't
publish in mailing lists) for jsonb compression. This is early but
working version of the patch, and there are still few fixes and features
that should be implemented (like pg_dump support and support of
compression options for types), and it requires more testing. But I'd
like to get some feedback at the current stage first.

There's been a proposal [1] of Alexander Korotkov and some discussion
about custom compression methods before. This is an implementation of
per-datum compression. Syntax is similar to the one in proposal but not
the same.

Syntax:

CREATE COMPRESSION METHOD <cmname> HANDLER <compression_handler>;
DROP COMPRESSION METHOD <cmname>;

Compression handler is a function that returns a structure containing
compression routines:

- configure - function called when the compression method applied to
an attribute
- drop - called when the compression method is removed from an attribute
- compress - compress function
- decompress - decompress function

User can create compressed columns with the commands below:

CREATE TABLE t(a tsvector COMPRESSED <cmname> WITH <options>);
ALTER TABLE t ALTER COLUMN a SET COMPRESSED <cmname> WITH <options>;
ALTER TABLE t ALTER COLUMN a SET NOT COMPRESSED;

Also there is syntax of binding compression methods to types:

ALTER TYPE <type> SET COMPRESSED <cmname>;
ALTER TYPE <type> SET NOT COMPRESSED;

There are two new tables in the catalog, pg_compression and
pg_compression_opt. pg_compression is used as storage of compression
methods, and pg_compression_opt is used to store specific compression
options for particular column.

When user binds a compression method to some column a new record in
pg_compression_opt is created and all further attribute values will
contain compression options Oid while old values will remain unchanged.
And when we alter a compression method for
the attribute it won't change previous record in pg_compression_opt.
Instead it'll create a new one and new values will be stored
with new Oid. That way there is no need of recompression of the old
tuples. And also tuples containing compressed datums can be copied to
other tables so records in pg_compression_opt shouldn't be removed. In
the current patch they can be removed with DROP COMPRESSION METHOD
CASCADE, but after that decompression won't be possible on compressed
tuples. Maybe CASCADE should keep compression options.

I haven't changed the base logic of working with compressed datums. It
means that custom compressed datums behave exactly the same as current
LZ compressed datums, and the logic differs only in toast_compress_datum
and toast_decompress_datum.

This patch doesn't break backward compability and should work seamlessly
with older version of database. I used one of two free bits in
`va_rawsize` from `varattrib_4b->va_compressed` as flag of custom
compressed datums. Also I renamed it to `va_info` since it contains not
only rawsize now.

The patch also includes custom compression method for tsvector which is
used in tests.

[1]
https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com
--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Sr No	File name	Code Coverage
		Before		After
		Line %	Function %	Line %	Function %
1	src/backend/access/brin/brin_tuple.c	96.7	100	96.7	100
2	src/backend/access/common/detoast.c	88	100	88.6	100
3	src/backend/access/common/indextuple.c	97.1	100	97.1	100
4	src/backend/access/common/toast_internals.c	88.8	88.9	88.6	88.9
5	src/backend/access/common/tupdesc.c	97.2	100	97.2	100
6	src/backend/access/compression/compress_lz4.c	NA	NA	93.5	100
7	src/backend/access/compression/compress_pglz.c	NA	NA	82.2	100
8	src/backend/access/compression/compressamapi.c	NA	NA	78.3	100
9	src/backend/access/index/amapi.c	73.5	100	74.5	100
10	src/backend/access/table/toast_helper.c	97.5	100	97.5	100
11	src/backend/access/common/reloptions.c	90.6	83.3	89.7	81.6
12	src/backend/bootstrap/bootparse.y	84.2	100	84.2	100
13	src/backend/bootstrap/bootstrap.c	66.4	100	66.4	100
14	src/backend/commands/cluster.c	90.4	100	90.4	100
15	src/backend/catalog/heap.c	97.3	100	97.3	100
16	src/backend/catalog/index.c	93.8	94.6	93.8	94.6
17	src/backend/catalog/toasting.c	96.7	100	96.8	100
18	src/backend/catalog/objectaddress.c	89.7	95.9	89.7	95.9
19	src/backend/catalog/pg_depend.c	98.6	100	98.6	100
20	src/backend/commands/foreigncmds.c	95.7	95.5	95.6	95.2
21	src/backend/commands/compressioncmds.c	NA	NA	97.2	100
22	src/backend/commands/amcmds.c	92.1	100	90.1	100
23	src/backend/commands/createas.c	96.8	90	96.8	90
24	src/backend/commands/matview.c	92.5	85.7	92.6	85.7
25	src/backend/commands/tablecmds.c	93.6	98.5	93.7	98.5
26	src/backend/executor/nodeModifyTable.c	93.8	92.9	93.7	92.9
27	src/backend/nodes/copyfuncs.c	79.1	78.7	79.2	78.8
28	src/backend/nodes/equalfuncs.c	28.8	23.9	28.7	23.8
29	src/backend/nodes/nodeFuncs.c	80.4	100	80.3	100
30	src/backend/nodes/outfuncs.c	38.2	38.1	38.1	38
31	src/backend/parser/gram.y	87.6	100	87.7	100
32	src/backend/parser/parse_utilcmd.c	91.6	100	91.6	100
33	src/backend/replication/logical/reorderbuffer.c	94.1	97	94.1	97
34	src/backend/utils/adt/pg_upgrade_support.c	56.2	83.3	58.4	84.6
35	src/backend/utils/adt/pseudotypes.c	18.5	11.3	18.3	10.9
36	src/backend/utils/adt/varlena.c	86.5	89	86.6	89.1
37	src/bin/pg_dump/pg_dump.c	89.4	97.4	89.5	97.4
38	src/bin/psql/tab-complete.c	50.8	57.7	50.8	57.7
39	src/bin/psql/describe.c	60.7	55.1	60.6	54.2
40	contrib/cmzlib/cmzlib.c	NA	NA	74.7	87.5

Thread: [HACKERS] Custom compression methods

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment

Attachment