Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAFiTN-t03PqWCaRW1GoWQa0i2NwYw_p=FDj+Kxu_-rY_viALqw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
Re: [HACKERS] Custom compression methods |
List | pgsql-hackers |
On Fri, Mar 19, 2021 at 1:27 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 18, 2021 at 10:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I just realized that in the last patch (0003) I forgot to remove 2 > > unused functions, CompressionMethodToId and CompressionIdToMethod. > > Removed in the latest patch. > > I spent a little time polishing 0001 and here's what I came up with. I > adjusted some comments, added documentation, fixed up the commit > message, etc. Thanks, the changes looks fine to me. > > I still don't quite like the approach in 0002. I feel that the > function should not construct the tuple but modify the caller's arrays > as a side effect. And if we're absolutely committed to the design > where it does that, the comments need to call it out clearly, which > they don't. Added comment for the same. > Regarding 0003: > > I think it might make sense to change the names of the compression and > decompression functions to match the names of the callers more > closely. Like, toast_decompress_datum() calls either > pglz_cmdecompress() or lz4_cmdecompress(). But, why not > pglz_decompress_datum() or lz4_decompress_datum()? The "cm" thing > doesn't really mean anything, and because the varlena is allocated by > that function itself rather than the caller, this can't be used for > anything other than TOAST. Done > In toast_compress_datum(), if (tmp == NULL) return > PointerGetDatum(NULL) is duplicated. It would be better to move it > after the switch. Done > Instead of "could not compress data with lz4" I suggest "lz4 > compression failed". Done > In catalogs.sgml, you shouldn't mention InvalidCompressionMethod, but > you should explain what the actual possible values mean. Look at the > way attidentity and attgenerated are documented and do it like that. Done > In pg_column_compression() it might be a bit more elegant to add a > char *result variable or similar, and have the switch cases just set > it, and then do PG_RETURN_TEXT_P(cstring_to_text(result)) at the > bottom. Done > In getTableAttrs(), if the remoteVersion is new, the column gets a > different alias than if the column is old. Fixed > In dumpTableSchema(), the condition tbinfo->attcompression[j] means > exactly the thing as the condition tbinfo->attcompression[j] != '\0', > so it can't be right to test both. I think that there's some confusion > here about the data type of tbinfo->attcompression[j]. It seems to be > char *. Maybe you intended to test the first character in that second > test, but that's not what this does. But you don't need to test that > anyway because the switch already takes care of it. So I suggest (a) > removing tbinfo->attcompression[j] != '\0' from this if-statement and > (b) adding != NULL to the previous line for clarity. I would also > suggest concluding the switch with a break just for symmetry. Fixed > The patch removes 11 references to va_extsize and leaves behind 4. > None of those 4 look like things that should have been left. Fixed > The comment which says "When fetching a prefix of a compressed > external datum, account for the rawsize tracking amount of raw data, > which is stored at the beginning as an int32 value)" is no longer 100% > accurate. I suggest changing it to say something like "When fetching a > prefix of a compressed external datum, account for the space required > by va_tcinfo" and leave out the rest. Done > In describeOneTableDetails, the comment "compresssion info" needs to > be compressed by removing one "s". Done > It seems a little unfortunate that we need to include > access/toast_compression.h in detoast.h. It seems like the reason we > need to do that is because otherwise we won't have ToastCompressionId > defined and so we won't be able to prototype toast_get_compression_id. > But I think we should solve that problem by moving that file to > toast_compression.c. (I'm OK if you want to keep the files separate, > or if you want to reverse course and combine them I'm OK with that > too, but the extra header dependency is clearly a sign of a problem > with the split.) Moved to toast_compression.c > Regarding 0005: > > I think ApplyChangesToIndexes() should be renamed to something like > SetIndexStorageProperties(). It's too generic right now. Done > I think 0004 and 0005 should just be merged into 0003. I can't see > committing them separately. I know I was the one who made you split > the patch up in the first place, but those patches are quite small and > simple now, so it makes more sense to me to combine them. Done Also added a test case for vacuum full to recompress the data. One question, like storage should we apply the alter set compression changes recursively to the inherited children (I have attached a separate patch for this )? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: