Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Ildar Musin
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id bc430af2-4388-4ad3-acda-f1a95b9310ae@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>)
Responses Re: [HACKERS] Custom compression methods  (Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>)
List pgsql-hackers
Hello Ildus,

I continue reviewing your patch. Here are some thoughts.

1. When I set column storage to EXTERNAL then I cannot set compression.
Seems reasonable:
create table test(id serial, msg text);
alter table test alter column msg set storage external;
alter table test alter column msg set compression pg_lz4;
ERROR:  storage for "msg" should be MAIN or EXTENDED

But if I reorder commands then it's ok:
create table test(id serial, msg text);
alter table test alter column msg set compression pg_lz4;
alter table test alter column msg set storage external;
\d+ test
                 Table "public.test"
  Column |  Type   |  ...  | Storage  | Compression
--------+---------+  ...  +----------+-------------
  id     | integer |  ...  | plain    |
  msg    | text    |  ...  | external | pg_lz4

So we could either allow user to set compression settings even when
storage is EXTERNAL but with warning or prohibit user to set compression
and external storage at the same time. The same thing is with setting 
storage PLAIN.


2. I think TOAST_COMPRESS_SET_RAWSIZE macro could be rewritten like
following to prevent overwriting of higher bits of 'info':

((toast_compress_header *) (ptr))->info = \
    ((toast_compress_header *) (ptr))->info & ~RAWSIZEMASK | (len);

It maybe does not matter at the moment since it is only used once, but
it could save some efforts for other developers in future.
In TOAST_COMPRESS_SET_CUSTOM() instead of changing individual bits you
may do something like this:

#define TOAST_COMPRESS_SET_CUSTOM(ptr) \
do { \
    ((toast_compress_header *) (ptr))->info = \
        ((toast_compress_header *) (ptr))->info & RAWSIZEMASK | ((uint32) 0x02
<< 30) \
} while (0)

Also it would be nice if bit flags were explained and maybe replaced by
a macro.


3. In AlteredTableInfo, BulkInsertStateData and some functions (eg
toast_insert_or_update) there is a hash table used to keep preserved
compression methods list per attribute. I think a simple array of List*
would be sufficient in this case.


4. In optionListToArray() you can use list_qsort() to sort options list 
instead of converting it manually into array and then back to a list.


5. Redundunt #includes:

    In heap.c:
        #include "access/reloptions.h"
    In tsvector.c:
        #include "catalog/pg_type.h"
        #include "common/pg_lzcompress.h"
    In relcache.c:
        #include "utils/datum.h"

6. Just a minor thing: no reason to change formatting in copy.c
-       heap_insert(resultRelInfo->ri_RelationDesc, tuple, mycid,
-                   hi_options, bistate);
+       heap_insert(resultRelInfo->ri_RelationDesc, tuple,
+                   mycid, hi_options, bistate);

7. Also in utility.c the extra new line was added which isn't relevant 
for this patch.

8. In parse_utilcmd.h the 'extern' keyword was removed from 
transformRuleStmt declaration which doesn't make sense in this patch.

9. Comments. Again, they should be read by a native speaker. So just a 
few suggestions:
toast_prepare_varlena() - comment needed
invalidate_amoptions_cache() - comment format doesn't match other 
functions in the file

In htup_details.h:
/* tuple contain custom compressed
  * varlenas */
should be "contains"

-- 
Ildar Musin
i.musin@postgrespro.ru


pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Boolean partitions syntax
Next
From: Stephen Frost
Date:
Subject: Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump