Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | 20201004220713.6vlmm2e3amlz2dil@development Whole thread Raw |
In response to | Re: Re: [HACKERS] Custom compression methods (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
Hi, I took a look at this patch after a long time, and done a bit of a review+testing. I haven't re-read the whole thread since 2017 so some of the following comments might be mistaken - sorry about that :-( 1) The "cmapi.h" naming seems unnecessarily short. I'd suggest using simply compression or something like that. I see little reason to shorten "compression" to "cm", or to prefix files with "cm_". For example compression/cm_zlib.c might just be compression/zlib.c. 2) I see index_form_tuple does this: Datum cvalue = toast_compress_datum(untoasted_values[i], DefaultCompressionMethod); which seems wrong - why shouldn't the indexes use the same compression method as the underlying table? 3) dumpTableSchema in pg_dump.c does this: switch (tbinfo->attcompression[j]) { case 'p': cmname = "pglz"; case 'z': cmname = "zlib"; } which is broken as it's missing break, so 'p' will produce 'zlib'. 4) The name ExecCompareCompressionMethod is somewhat misleading, as the functions is not merely comparing compression methods - it also recompresses the data. 5) CheckCompressionMethodsPreserved should document what the return value is (true when new list contains all old values, thus not requiring a rewrite). Maybe "Compare" would be a better name? 6) The new field in ColumnDef is missing a comment. 7) It's not clear to me what "partial list" in the PRESERVE docs means. + which of them should be kept on the column. Without PRESERVE or partial + list of compression methods the table will be rewritten. 8) The initial synopsis in alter_table.sgml includes the PRESERVE syntax, but then later in the page it's omitted (yet the section talks about the keyword). 9) attcompression ... The main issue I see is what the patch does with attcompression. Instead of just using it to store a the compression method, it's also used to store the preserved compression methods. And using NameData to store this seems wrong too - if we really want to store this info, the correct way is either using text[] or inventing charvector or similar. But to me this seems very much like a misuse of attcompression to track dependencies on compression methods, necessary because we don't have a separate catalog listing compression methods. If we had that, I think we could simply add dependencies between attributes and that catalog. Moreover, having the catalog would allow adding compression methods (from extensions etc) instead of just having a list of hard-coded compression methods. Which seems like a strange limitation, considering this thread is called "custom compression methods". 10) compression parameters? I wonder if we could/should allow parameters, like compression level (and maybe other stuff, depending on the compression method). PG13 allowed that for opclasses, so perhaps we should allow it here too. 11) pg_column_compression When specifying compression method not present in attcompression, we get this error message and hint: test=# alter table t alter COLUMN a set compression "pglz" preserve (zlib); ERROR: "zlib" compression access method cannot be preserved HINT: use "pg_column_compression" function for list of compression methods but there is no pg_column_compression function, so the hint is wrong. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: