Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAJ3gD9c370yZFgHRAv+Up987o=t94sr22R36T=BKX5=_fw=TzA@mail.gmail.com Whole thread Raw |
In response to | Re: Re: [HACKERS] Custom compression methods (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
On Thu, 13 Aug 2020 at 17:18, Dilip Kumar <dilipbalaut@gmail.com> wrote: > I have rebased the patch on the latest head and currently, broken into 3 parts. > > v1-0001: As suggested by Robert, it provides the syntax support for > setting the compression method for a column while creating a table and > adding columns. However, we don't support changing the compression > method for the existing column. As part of this patch, there is only > one built-in compression method that can be set (pglz). In this, we > have one in-build am (pglz) and the compressed attributes will directly > store the oid of the AM. In this patch, I have removed the > pg_attr_compresion as we don't support changing the compression > for the existing column so we don't need to preserve the old > compressions. > v1-0002: Add another built-in compression method (zlib) > v1:0003: Remaining patch set (nothing is changed except rebase on the > current head, stabilizing check-world and 0001 and 0002 are pulled > out of this) > > Next, I will be working on separating out the remaining patches as per > the suggestion by Robert. Thanks for this new feature. Looks promising and very useful, with so many good compression libraries already available. I see that with the patch-set, I would be able to create an extension that defines a PostgreSQL C handler function which assigns all the required hook function implementations for compressing, decompressing and validating, etc. In short, I would be able to use a completely different compression algorithm to compress toast data if I write such an extension. Correct me if I am wrong with my interpretation. Just a quick superficial set of review comments .... A minor re-base is required due to a conflict in a regression test ------------- In heap_toast_insert_or_update() and in other places, the comments for new parameter preserved_am_info are missing. ------------- +toast_compress_datum(Datum value, Oid acoid) { struct varlena *tmp = NULL; int32 valsize; - CompressionAmOptions cmoptions; + CompressionAmOptions *cmoptions = NULL; I think tmp and cmoptions need not be initialized to NULL ------------- - TOAST_COMPRESS_SET_RAWSIZE(tmp, valsize); - SET_VARSIZE_COMPRESSED(tmp, len + TOAST_COMPRESS_HDRSZ); /* successful compression */ + toast_set_compressed_datum_info(tmp, amoid, valsize); return PointerGetDatum(tmp); Any particular reason why is this code put in a new extern function ? Is there a plan to re-use it ? Otherwise, it's not necessary to do this. ------------ Also, not sure why "HTAB *amoptions_cache" and "MemoryContext amoptions_cache_mcxt" aren't static declarations. They are being used only in toast_internals.c ----------- The tab-completion doesn't show COMPRESSION : postgres=# create access method my_method TYPE INDEX TABLE postgres=# create access method my_method TYPE Also, the below syntax also would better be tab-completed so as to display all the installed compression methods, in line with how we show all the storage methods like plain,extended,etc: postgres=# ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION ------------ I could see the differences in compression ratio, and the compression and decompression speed when I use lz versus zib : CREATE TABLE zlibtab(t TEXT COMPRESSION zlib WITH (level '4')); create table lztab(t text); ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION pglz; pgg:s2:pg$ time psql -c "\copy zlibtab from text.data" COPY 13050 real 0m1.344s user 0m0.031s sys 0m0.026s pgg:s2:pg$ time psql -c "\copy lztab from text.data" COPY 13050 real 0m2.088s user 0m0.008s sys 0m0.050s pgg:s2:pg$ time psql -c "select pg_table_size('zlibtab'::regclass), pg_table_size('lztab'::regclass)" pg_table_size | pg_table_size ---------------+--------------- 1261568 | 1687552 pgg:s2:pg$ time psql -c "select NULL from zlibtab where t like '0000'" > /dev/null real 0m0.127s user 0m0.000s sys 0m0.002s pgg:s2:pg$ time psql -c "select NULL from lztab where t like '0000'" > /dev/null real 0m0.050s user 0m0.002s sys 0m0.000s -- Thanks, -Amit Khandekar Huawei Technologies
pgsql-hackers by date: