Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CA+TgmoYRsapWz2vDSQXFAEL_BAMq1aY37avcq9GAB=SUUvFhaw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
On Thu, Mar 18, 2021 at 10:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > I just realized that in the last patch (0003) I forgot to remove 2 > unused functions, CompressionMethodToId and CompressionIdToMethod. > Removed in the latest patch. I spent a little time polishing 0001 and here's what I came up with. I adjusted some comments, added documentation, fixed up the commit message, etc. I still don't quite like the approach in 0002. I feel that the function should not construct the tuple but modify the caller's arrays as a side effect. And if we're absolutely committed to the design where it does that, the comments need to call it out clearly, which they don't. Regarding 0003: I think it might make sense to change the names of the compression and decompression functions to match the names of the callers more closely. Like, toast_decompress_datum() calls either pglz_cmdecompress() or lz4_cmdecompress(). But, why not pglz_decompress_datum() or lz4_decompress_datum()? The "cm" thing doesn't really mean anything, and because the varlena is allocated by that function itself rather than the caller, this can't be used for anything other than TOAST. In toast_compress_datum(), if (tmp == NULL) return PointerGetDatum(NULL) is duplicated. It would be better to move it after the switch. Instead of "could not compress data with lz4" I suggest "lz4 compression failed". In catalogs.sgml, you shouldn't mention InvalidCompressionMethod, but you should explain what the actual possible values mean. Look at the way attidentity and attgenerated are documented and do it like that. In pg_column_compression() it might be a bit more elegant to add a char *result variable or similar, and have the switch cases just set it, and then do PG_RETURN_TEXT_P(cstring_to_text(result)) at the bottom. In getTableAttrs(), if the remoteVersion is new, the column gets a different alias than if the column is old. In dumpTableSchema(), the condition tbinfo->attcompression[j] means exactly the thing as the condition tbinfo->attcompression[j] != '\0', so it can't be right to test both. I think that there's some confusion here about the data type of tbinfo->attcompression[j]. It seems to be char *. Maybe you intended to test the first character in that second test, but that's not what this does. But you don't need to test that anyway because the switch already takes care of it. So I suggest (a) removing tbinfo->attcompression[j] != '\0' from this if-statement and (b) adding != NULL to the previous line for clarity. I would also suggest concluding the switch with a break just for symmetry. The patch removes 11 references to va_extsize and leaves behind 4. None of those 4 look like things that should have been left. The comment which says "When fetching a prefix of a compressed external datum, account for the rawsize tracking amount of raw data, which is stored at the beginning as an int32 value)" is no longer 100% accurate. I suggest changing it to say something like "When fetching a prefix of a compressed external datum, account for the space required by va_tcinfo" and leave out the rest. In describeOneTableDetails, the comment "compresssion info" needs to be compressed by removing one "s". It seems a little unfortunate that we need to include access/toast_compression.h in detoast.h. It seems like the reason we need to do that is because otherwise we won't have ToastCompressionId defined and so we won't be able to prototype toast_get_compression_id. But I think we should solve that problem by moving that file to toast_compression.c. (I'm OK if you want to keep the files separate, or if you want to reverse course and combine them I'm OK with that too, but the extra header dependency is clearly a sign of a problem with the split.) Regarding 0005: I think ApplyChangesToIndexes() should be renamed to something like SetIndexStorageProperties(). It's too generic right now. I think 0004 and 0005 should just be merged into 0003. I can't see committing them separately. I know I was the one who made you split the patch up in the first place, but those patches are quite small and simple now, so it makes more sense to me to combine them. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: