[HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Ildus Kurbangaliev |
---|---|
Subject | [HACKERS] Custom compression methods |
Date | |
Msg-id | 20170907194236.4cefce96@wp.localdomain Whole thread Raw |
Responses |
Re: [HACKERS] Custom compression methods
Re: [HACKERS] Custom compression methods |
List | pgsql-hackers |
Hello hackers! I've attached a patch that implements custom compression methods. This patch is based on Nikita Glukhov's code (which he hasn't publish in mailing lists) for jsonb compression. This is early but working version of the patch, and there are still few fixes and features that should be implemented (like pg_dump support and support of compression options for types), and it requires more testing. But I'd like to get some feedback at the current stage first. There's been a proposal [1] of Alexander Korotkov and some discussion about custom compression methods before. This is an implementation of per-datum compression. Syntax is similar to the one in proposal but not the same. Syntax: CREATE COMPRESSION METHOD <cmname> HANDLER <compression_handler>; DROP COMPRESSION METHOD <cmname>; Compression handler is a function that returns a structure containing compression routines: - configure - function called when the compression method applied to an attribute - drop - called when the compression method is removed from an attribute - compress - compress function - decompress - decompress function User can create compressed columns with the commands below: CREATE TABLE t(a tsvector COMPRESSED <cmname> WITH <options>); ALTER TABLE t ALTER COLUMN a SET COMPRESSED <cmname> WITH <options>; ALTER TABLE t ALTER COLUMN a SET NOT COMPRESSED; Also there is syntax of binding compression methods to types: ALTER TYPE <type> SET COMPRESSED <cmname>; ALTER TYPE <type> SET NOT COMPRESSED; There are two new tables in the catalog, pg_compression and pg_compression_opt. pg_compression is used as storage of compression methods, and pg_compression_opt is used to store specific compression options for particular column. When user binds a compression method to some column a new record in pg_compression_opt is created and all further attribute values will contain compression options Oid while old values will remain unchanged. And when we alter a compression method for the attribute it won't change previous record in pg_compression_opt. Instead it'll create a new one and new values will be stored with new Oid. That way there is no need of recompression of the old tuples. And also tuples containing compressed datums can be copied to other tables so records in pg_compression_opt shouldn't be removed. In the current patch they can be removed with DROP COMPRESSION METHOD CASCADE, but after that decompression won't be possible on compressed tuples. Maybe CASCADE should keep compression options. I haven't changed the base logic of working with compressed datums. It means that custom compressed datums behave exactly the same as current LZ compressed datums, and the logic differs only in toast_compress_datum and toast_decompress_datum. This patch doesn't break backward compability and should work seamlessly with older version of database. I used one of two free bits in `va_rawsize` from `varattrib_4b->va_compressed` as flag of custom compressed datums. Also I renamed it to `va_info` since it contains not only rawsize now. The patch also includes custom compression method for tsvector which is used in tests. [1] https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: