Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: Re: [HACKERS] Custom compression methods
Date
Msg-id CAJ3gD9c370yZFgHRAv+Up987o=t94sr22R36T=BKX5=_fw=TzA@mail.gmail.com
Whole thread Raw
In response to Re: Re: [HACKERS] Custom compression methods  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Re: [HACKERS] Custom compression methods  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Thu, 13 Aug 2020 at 17:18, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I have rebased the patch on the latest head and currently, broken into 3 parts.
>
> v1-0001: As suggested by Robert, it provides the syntax support for
> setting the compression method for a column while creating a table and
> adding columns.  However, we don't support changing the compression
> method for the existing column.  As part of this patch, there is only
> one built-in compression method that can be set (pglz).  In this, we
> have one in-build am (pglz) and the compressed attributes will directly
> store the oid of the AM.  In this patch, I have removed the
> pg_attr_compresion as we don't support changing the compression
> for the existing column so we don't need to preserve the old
> compressions.
> v1-0002: Add another built-in compression method (zlib)
> v1:0003: Remaining patch set (nothing is changed except rebase on the
> current head, stabilizing check-world  and  0001 and 0002 are pulled
> out of this)
>
> Next, I will be working on separating out the remaining patches as per
> the suggestion by Robert.

Thanks for this new feature. Looks promising and very useful, with so
many good compression libraries already available.

I see that with the patch-set, I would be able to create an extension
that defines a PostgreSQL C handler function which assigns all the
required hook function implementations for compressing, decompressing
and validating, etc. In short, I would be able to use a completely
different compression algorithm to compress toast data if I write such
an extension. Correct me if I am wrong with my interpretation.

Just a quick superficial set of review comments ....

A minor re-base is required due to a conflict in a regression test

-------------

In heap_toast_insert_or_update() and in other places, the comments for
new parameter preserved_am_info are missing.

-------------

+toast_compress_datum(Datum value, Oid acoid)
 {
        struct varlena *tmp = NULL;
        int32 valsize;
-       CompressionAmOptions cmoptions;
+       CompressionAmOptions *cmoptions = NULL;

I think tmp and cmoptions need not be initialized to NULL

-------------

- TOAST_COMPRESS_SET_RAWSIZE(tmp, valsize);
- SET_VARSIZE_COMPRESSED(tmp, len + TOAST_COMPRESS_HDRSZ);
  /* successful compression */
+ toast_set_compressed_datum_info(tmp, amoid, valsize);
      return PointerGetDatum(tmp);

Any particular reason why is this code put in a new extern function ?
Is there a plan to re-use it ? Otherwise, it's not necessary to do
this.

------------

Also, not sure why "HTAB *amoptions_cache" and "MemoryContext
amoptions_cache_mcxt" aren't static declarations. They are being used
only in toast_internals.c

-----------

The tab-completion doesn't show COMPRESSION :
postgres=# create access method my_method TYPE
INDEX  TABLE
postgres=# create access method my_method TYPE

Also, the below syntax also would better be tab-completed  so as to
display all the installed compression methods, in line with how we
show all the storage methods like plain,extended,etc:
postgres=# ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION

------------

I could see the differences in compression ratio, and the compression
and decompression speed when I use lz versus zib :

CREATE TABLE zlibtab(t TEXT COMPRESSION zlib WITH (level '4'));
create table lztab(t text);
ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION pglz;

pgg:s2:pg$ time psql -c "\copy zlibtab from text.data"
COPY 13050

real    0m1.344s
user    0m0.031s
sys     0m0.026s

pgg:s2:pg$ time psql -c "\copy lztab from text.data"
COPY 13050

real    0m2.088s
user    0m0.008s
sys     0m0.050s


pgg:s2:pg$ time psql -c "select pg_table_size('zlibtab'::regclass),
pg_table_size('lztab'::regclass)"
 pg_table_size | pg_table_size
---------------+---------------
       1261568 |       1687552

pgg:s2:pg$ time psql -c "select NULL from zlibtab where t like '0000'"
 > /dev/null

real    0m0.127s
user    0m0.000s
sys     0m0.002s

pgg:s2:pg$ time psql -c "select NULL from lztab where t like '0000'"
> /dev/null

real    0m0.050s
user    0m0.002s
sys     0m0.000s


-- 
Thanks,
-Amit Khandekar
Huawei Technologies



pgsql-hackers by date:

Previous
From: Rahila
Date:
Subject: Re: More tests with USING INDEX replident and dropped indexes
Next
From: Amit Kapila
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions