Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
| From | Amit Khandekar |
|---|---|
| Subject | Re: Re: [HACKERS] Custom compression methods |
| Date | |
| Msg-id | CAJ3gD9c370yZFgHRAv+Up987o=t94sr22R36T=BKX5=_fw=TzA@mail.gmail.com Whole thread Raw |
| In response to | Re: Re: [HACKERS] Custom compression methods (Dilip Kumar <dilipbalaut@gmail.com>) |
| Responses |
Re: Re: [HACKERS] Custom compression methods
|
| List | pgsql-hackers |
On Thu, 13 Aug 2020 at 17:18, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I have rebased the patch on the latest head and currently, broken into 3 parts.
>
> v1-0001: As suggested by Robert, it provides the syntax support for
> setting the compression method for a column while creating a table and
> adding columns. However, we don't support changing the compression
> method for the existing column. As part of this patch, there is only
> one built-in compression method that can be set (pglz). In this, we
> have one in-build am (pglz) and the compressed attributes will directly
> store the oid of the AM. In this patch, I have removed the
> pg_attr_compresion as we don't support changing the compression
> for the existing column so we don't need to preserve the old
> compressions.
> v1-0002: Add another built-in compression method (zlib)
> v1:0003: Remaining patch set (nothing is changed except rebase on the
> current head, stabilizing check-world and 0001 and 0002 are pulled
> out of this)
>
> Next, I will be working on separating out the remaining patches as per
> the suggestion by Robert.
Thanks for this new feature. Looks promising and very useful, with so
many good compression libraries already available.
I see that with the patch-set, I would be able to create an extension
that defines a PostgreSQL C handler function which assigns all the
required hook function implementations for compressing, decompressing
and validating, etc. In short, I would be able to use a completely
different compression algorithm to compress toast data if I write such
an extension. Correct me if I am wrong with my interpretation.
Just a quick superficial set of review comments ....
A minor re-base is required due to a conflict in a regression test
-------------
In heap_toast_insert_or_update() and in other places, the comments for
new parameter preserved_am_info are missing.
-------------
+toast_compress_datum(Datum value, Oid acoid)
{
struct varlena *tmp = NULL;
int32 valsize;
- CompressionAmOptions cmoptions;
+ CompressionAmOptions *cmoptions = NULL;
I think tmp and cmoptions need not be initialized to NULL
-------------
- TOAST_COMPRESS_SET_RAWSIZE(tmp, valsize);
- SET_VARSIZE_COMPRESSED(tmp, len + TOAST_COMPRESS_HDRSZ);
/* successful compression */
+ toast_set_compressed_datum_info(tmp, amoid, valsize);
return PointerGetDatum(tmp);
Any particular reason why is this code put in a new extern function ?
Is there a plan to re-use it ? Otherwise, it's not necessary to do
this.
------------
Also, not sure why "HTAB *amoptions_cache" and "MemoryContext
amoptions_cache_mcxt" aren't static declarations. They are being used
only in toast_internals.c
-----------
The tab-completion doesn't show COMPRESSION :
postgres=# create access method my_method TYPE
INDEX TABLE
postgres=# create access method my_method TYPE
Also, the below syntax also would better be tab-completed so as to
display all the installed compression methods, in line with how we
show all the storage methods like plain,extended,etc:
postgres=# ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION
------------
I could see the differences in compression ratio, and the compression
and decompression speed when I use lz versus zib :
CREATE TABLE zlibtab(t TEXT COMPRESSION zlib WITH (level '4'));
create table lztab(t text);
ALTER TABLE lztab ALTER COLUMN t SET COMPRESSION pglz;
pgg:s2:pg$ time psql -c "\copy zlibtab from text.data"
COPY 13050
real 0m1.344s
user 0m0.031s
sys 0m0.026s
pgg:s2:pg$ time psql -c "\copy lztab from text.data"
COPY 13050
real 0m2.088s
user 0m0.008s
sys 0m0.050s
pgg:s2:pg$ time psql -c "select pg_table_size('zlibtab'::regclass),
pg_table_size('lztab'::regclass)"
pg_table_size | pg_table_size
---------------+---------------
1261568 | 1687552
pgg:s2:pg$ time psql -c "select NULL from zlibtab where t like '0000'"
> /dev/null
real 0m0.127s
user 0m0.000s
sys 0m0.002s
pgg:s2:pg$ time psql -c "select NULL from lztab where t like '0000'"
> /dev/null
real 0m0.050s
user 0m0.002s
sys 0m0.000s
--
Thanks,
-Amit Khandekar
Huawei Technologies
pgsql-hackers by date: