Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id 20201004220713.6vlmm2e3amlz2dil@development
Whole thread Raw
In response to Re: Re: [HACKERS] Custom compression methods  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [HACKERS] Custom compression methods
List pgsql-hackers
Hi,

I took a look at this patch after a long time, and done a bit of a
review+testing. I haven't re-read the whole thread since 2017 so some of
the following comments might be mistaken - sorry about that :-(


1) The "cmapi.h" naming seems unnecessarily short. I'd suggest using
simply compression or something like that. I see little reason to
shorten "compression" to "cm", or to prefix files with "cm_". For
example compression/cm_zlib.c might just be compression/zlib.c.


2) I see index_form_tuple does this:

     Datum  cvalue = toast_compress_datum(untoasted_values[i],
                                          DefaultCompressionMethod);

which seems wrong - why shouldn't the indexes use the same compression
method as the underlying table?


3) dumpTableSchema in pg_dump.c does this:

     switch (tbinfo->attcompression[j])
     {
         case 'p':
             cmname = "pglz";
         case 'z':
             cmname = "zlib";
     }

which is broken as it's missing break, so 'p' will produce 'zlib'.


4) The name ExecCompareCompressionMethod is somewhat misleading, as the
functions is not merely comparing compression methods - it also
recompresses the data.


5) CheckCompressionMethodsPreserved should document what the return
value is (true when new list contains all old values, thus not requiring
a rewrite). Maybe "Compare" would be a better name?


6) The new field in ColumnDef is missing a comment.


7) It's not clear to me what "partial list" in the PRESERVE docs means.

+ which of them should be kept on the column. Without PRESERVE or partial
+ list of compression methods the table will be rewritten.


8) The initial synopsis in alter_table.sgml includes the PRESERVE
syntax, but then later in the page it's omitted (yet the section talks
about the keyword).


9) attcompression ...

The main issue I see is what the patch does with attcompression. Instead
of just using it to store a the compression method, it's also used to
store the preserved compression methods. And using NameData to store
this seems wrong too - if we really want to store this info, the correct
way is either using text[] or inventing charvector or similar.

But to me this seems very much like a misuse of attcompression to track
dependencies on compression methods, necessary because we don't have a
separate catalog listing compression methods. If we had that, I think we
could simply add dependencies between attributes and that catalog.

Moreover, having the catalog would allow adding compression methods
(from extensions etc) instead of just having a list of hard-coded
compression methods. Which seems like a strange limitation, considering
this thread is called "custom compression methods".


10) compression parameters?

I wonder if we could/should allow parameters, like compression level
(and maybe other stuff, depending on the compression method). PG13
allowed that for opclasses, so perhaps we should allow it here too.


11) pg_column_compression

When specifying compression method not present in attcompression, we get
this error message and hint:

   test=# alter table t alter COLUMN a set compression "pglz" preserve (zlib);
   ERROR:  "zlib" compression access method cannot be preserved
   HINT:  use "pg_column_compression" function for list of compression methods

but there is no pg_column_compression function, so the hint is wrong.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: A modest proposal: let's add PID to assertion failure messages
Next
From: Michael Paquier
Date:
Subject: Re: Buggy handling of redundant options in COPY