Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-tkGOtVSJD0FDZCuVrAYCCURWwRtBxT_4KVduJu-9fueQ@mail.gmail.com
Whole thread Raw
In response to Re: Re: [HACKERS] Custom compression methods  (Amit Khandekar <amitdkhan.pg@gmail.com>)
List pgsql-hackers
On Mon, Aug 31, 2020 at 10:45 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Thu, 13 Aug 2020 at 17:18, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I have rebased the patch on the latest head and currently, broken into 3 parts.
> >
> > v1-0001: As suggested by Robert, it provides the syntax support for
> > setting the compression method for a column while creating a table and
> > adding columns.  However, we don't support changing the compression
> > method for the existing column.  As part of this patch, there is only
> > one built-in compression method that can be set (pglz).  In this, we
> > have one in-build am (pglz) and the compressed attributes will directly
> > store the oid of the AM.  In this patch, I have removed the
> > pg_attr_compresion as we don't support changing the compression
> > for the existing column so we don't need to preserve the old
> > compressions.
> > v1-0002: Add another built-in compression method (zlib)
> > v1:0003: Remaining patch set (nothing is changed except rebase on the
> > current head, stabilizing check-world  and  0001 and 0002 are pulled
> > out of this)
> >
> > Next, I will be working on separating out the remaining patches as per
> > the suggestion by Robert.
>
> Thanks for this new feature. Looks promising and very useful, with so
> many good compression libraries already available.

Thanks for looking into it.

> I see that with the patch-set, I would be able to create an extension
> that defines a PostgreSQL C handler function which assigns all the
> required hook function implementations for compressing, decompressing
> and validating, etc. In short, I would be able to use a completely
> different compression algorithm to compress toast data if I write such
> an extension. Correct me if I am wrong with my interpretation.
>
> Just a quick superficial set of review comments ....
>
> A minor re-base is required due to a conflict in a regression test

Okay, I will do this.

> -------------
>
> In heap_toast_insert_or_update() and in other places, the comments for
> new parameter preserved_am_info are missing.
>
> -------------

ok

> +toast_compress_datum(Datum value, Oid acoid)
>  {
>         struct varlena *tmp = NULL;
>         int32 valsize;
> -       CompressionAmOptions cmoptions;
> +       CompressionAmOptions *cmoptions = NULL;
>
> I think tmp and cmoptions need not be initialized to NULL

Right

> -------------
>
> - TOAST_COMPRESS_SET_RAWSIZE(tmp, valsize);
> - SET_VARSIZE_COMPRESSED(tmp, len + TOAST_COMPRESS_HDRSZ);
>   /* successful compression */
> + toast_set_compressed_datum_info(tmp, amoid, valsize);
>       return PointerGetDatum(tmp);
>
> Any particular reason why is this code put in a new extern function ?
> Is there a plan to re-use it ? Otherwise, it's not necessary to do
> this.
>
> ------------
>
> Also, not sure why "HTAB *amoptions_cache" and "MemoryContext
> amoptions_cache_mcxt" aren't static declarations. They are being used
> only in toast_internals.c

> -----------
>
> The tab-completion doesn't show COMPRESSION :
> postgres=# create access method my_method TYPE
> INDEX  TABLE
> postgres=# create access method my_method TYPE
>
> Also, the below syntax also would better be tab-completed  so as to
> display all the installed compression methods, in line with how we
> show all the storage methods like plain,extended,etc:
> postgres=# ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION
>
> ------------

I will fix these comments in the next version of the patch.

> I could see the differences in compression ratio, and the compression
> and decompression speed when I use lz versus zib :
>
> CREATE TABLE zlibtab(t TEXT COMPRESSION zlib WITH (level '4'));
> create table lztab(t text);
> ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION pglz;
>
> pgg:s2:pg$ time psql -c "\copy zlibtab from text.data"
> COPY 13050
>
> real    0m1.344s
> user    0m0.031s
> sys     0m0.026s
>
> pgg:s2:pg$ time psql -c "\copy lztab from text.data"
> COPY 13050
>
> real    0m2.088s
> user    0m0.008s
> sys     0m0.050s
>
>
> pgg:s2:pg$ time psql -c "select pg_table_size('zlibtab'::regclass),
> pg_table_size('lztab'::regclass)"
>  pg_table_size | pg_table_size
> ---------------+---------------
>        1261568 |       1687552
>
> pgg:s2:pg$ time psql -c "select NULL from zlibtab where t like '0000'"
>  > /dev/null
>
> real    0m0.127s
> user    0m0.000s
> sys     0m0.002s
>
> pgg:s2:pg$ time psql -c "select NULL from lztab where t like '0000'"
> > /dev/null
>
> real    0m0.050s
> user    0m0.002s
> sys     0m0.000s
>

Thanks for testing this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: Michael Paquier
Date:
Subject: Refactor ReindexStmt and its "concurrent" boolean