Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | 62e46a47-08a8-6e06-5e4f-6f52e0a13202@2ndquadrant.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
Hi, On 11/14/2017 02:23 PM, Ildus Kurbangaliev wrote: > > ... > > Attached version 4 of the patch. Fixed pg_upgrade and few other bugs. > I did a review of this today, and I think there are some things that need improvement / fixing. Firstly, some basic comments from just eye-balling the diff, then some bugs I discovered after writing an extension adding lz4. 1) formatRelOptions/freeRelOptions are no longer needed (I see Ildar already pointer that out) 2) There's unnecessary whitespace (extra newlines) on a couple of places, which is needlessly increasing the size of the patch. Small difference, but annoying. 3) tuptoaster.c Why do you change 'info' from int32 to uint32? Seems unnecessary. Adding new 'att' variable in toast_insert_or_update is confusing, as there already is 'att' in the very next loop. Technically it's correct, but I'd bet it'll lead to some WTF?! moments later. I propose to just use TupleDescAttr(tupleDesc,i) on the two places where it matters, around line 808. There are no comments for init_compression_options_htab and get_compression_options_info, so that needs to be fixed. Moreover, the names are confusing because what we really get is not just 'options' but the compression routines too. 4) gen_db_file_maps probably shouldn't do the fprints, right? 5) not sure why you modify src/tools/pgindent/exclude_file_patterns 6) I'm rather confused by AttributeCompression vs. ColumnCompression. I mean, attribute==column, right? Of course, one is for data from parser, the other one is for internal info. But can we make the naming clearer? 7) The docs in general are somewhat unsatisfactory, TBH. For example the ColumnCompression has no comments, unlike everything else in parsenodes. Similarly for the SGML docs - I suggest to expand them to resemble FDW docs (https://www.postgresql.org/docs/10/static/fdwhandler.html) which also follows the handler/routines pattern. 8) One of the unclear things if why we even need 'drop' routing. It seems that if it's defined DropAttributeCompression does something. But what should it do? I suppose dropping the options should be done using dependencies (just like we drop columns in this case). BTW why does DropAttributeCompression mess with att->attisdropped in this way? That seems a bit odd. 9) configure routines that only check if (options != NIL) and then error out (like tsvector_configure) seem a bit unnecessary. Just allow it to be NULL in CompressionMethodRoutine, and throw an error if options is not NIL for such compression method. 10) toast_compress_datum still does this: if (!ac && (valsize < PGLZ_strategy_default->min_input_size || valsize > PGLZ_strategy_default->max_input_size)) which seems rather pglz-specific (the naming is a hint). Why shouldn't this be specific to compression, exposed either as min/max constants, or wrapped in another routine - size_is_valid() or something like that? 11) The comments in toast_compress_datum probably need updating, as it still references to pglz specifically. I guess the new compression methods do matter too. 12) get_compression_options_info organizes the compression info into a hash table by OID. The hash table implementation assumes the hash key is at the beginning of the entry, but AttributeCompression is defined like this: typedef struct { CompressionMethodRoutine *routine; List *options; Oid cmoptoid; } AttributeCompression; Which means get_compression_options_info is busted, will never lookup anything, and the hash table will grow by adding more and more entries into the same bucket. Of course, this has extremely negative impact on performance (pretty much arbitrarily bad, depending on how many entries you've already added to the hash table). Moving the OID to the beginning of the struct fixes the issue. 13) When writing the experimental extension, I was extremely confused about the regular varlena headers, custom compression headers, etc. In the end I stole the code from tsvector.c and whacked it a bit until it worked, but I wouldn't dare to claim I understand how it works. This needs to be documented somewhere. For example postgres.h has a bunch of paragraphs about varlena headers, so perhaps it should be there? I see the patch tweaks some of the constants, but does not update the comment at all. Perhaps it would be useful to provide some additional macros making access to custom-compressed varlena values easier. Or perhaps the VARSIZE_ANY / VARSIZE_ANY_EXHDR / VARDATA_ANY already support that? This part is not very clear to me. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: