Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-t03PqWCaRW1GoWQa0i2NwYw_p=FDj+Kxu_-rY_viALqw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Custom compression methods  (Justin Pryzby <pryzby@telsasoft.com>)
Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Mar 19, 2021 at 1:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 18, 2021 at 10:22 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I just realized that in the last patch (0003) I forgot to remove 2
> > unused functions, CompressionMethodToId and CompressionIdToMethod.
> > Removed in the latest patch.
>
> I spent a little time polishing 0001 and here's what I came up with. I
> adjusted some comments, added documentation, fixed up the commit
> message, etc.

Thanks, the changes looks fine to me.
>
> I still don't quite like the approach in 0002. I feel that the
> function should not construct the tuple but modify the caller's arrays
> as a side effect. And if we're absolutely committed to the design
> where it does that, the comments need to call it out clearly, which
> they don't.

Added comment for the same.

> Regarding 0003:
>
> I think it might make sense to change the names of the compression and
> decompression functions to match the names of the callers more
> closely. Like, toast_decompress_datum() calls either
> pglz_cmdecompress() or lz4_cmdecompress(). But, why not
> pglz_decompress_datum() or lz4_decompress_datum()? The "cm" thing
> doesn't really mean anything, and because the varlena is allocated by
> that function itself rather than the caller, this can't be used for
> anything other than TOAST.

Done

> In toast_compress_datum(), if (tmp == NULL) return
> PointerGetDatum(NULL) is duplicated. It would be better to move it
> after the switch.

Done

> Instead of "could not compress data with lz4" I suggest "lz4
> compression failed".

Done

> In catalogs.sgml, you shouldn't mention InvalidCompressionMethod, but
> you should explain what the actual possible values mean. Look at the
> way attidentity and attgenerated are documented and do it like that.

Done

> In pg_column_compression() it might be a bit more elegant to add a
> char *result variable or similar, and have the switch cases just set
> it, and then do PG_RETURN_TEXT_P(cstring_to_text(result)) at the
> bottom.

Done

> In getTableAttrs(), if the remoteVersion is new, the column gets a
> different alias than if the column is old.

Fixed

> In dumpTableSchema(), the condition tbinfo->attcompression[j] means
> exactly the thing as the condition tbinfo->attcompression[j] != '\0',
> so it can't be right to test both. I think that there's some confusion
> here about the data type of tbinfo->attcompression[j]. It seems to be
> char *. Maybe you intended to test the first character in that second
> test, but that's not what this does. But you don't need to test that
> anyway because the switch already takes care of it. So I suggest (a)
> removing tbinfo->attcompression[j] != '\0' from this if-statement and
> (b) adding != NULL to the previous line for clarity. I would also
> suggest concluding the switch with a break just for symmetry.

Fixed

> The patch removes 11 references to va_extsize and leaves behind 4.
> None of those 4 look like things that should have been left.

Fixed

> The comment which says "When fetching a prefix of a compressed
> external datum, account for the rawsize tracking amount of raw data,
> which is stored at the beginning as an int32 value)" is no longer 100%
> accurate. I suggest changing it to say something like "When fetching a
> prefix of a compressed external datum, account for the space required
> by va_tcinfo" and leave out the rest.

Done

> In describeOneTableDetails, the comment "compresssion info" needs to
> be compressed by removing one "s".

Done

> It seems a little unfortunate that we need to include
> access/toast_compression.h in detoast.h.  It seems like the reason we
> need to do that is because otherwise we won't have ToastCompressionId
> defined and so we won't be able to prototype toast_get_compression_id.
> But I think we should solve that problem by moving that file to
> toast_compression.c. (I'm OK if you want to keep the files separate,
> or if you want to reverse course and combine them I'm OK with that
> too, but the extra header dependency is clearly a sign of a problem
> with the split.)

Moved to toast_compression.c

> Regarding 0005:
>
> I think ApplyChangesToIndexes() should be renamed to something like
> SetIndexStorageProperties(). It's too generic right now.

Done

> I think 0004 and 0005 should just be merged into 0003. I can't see
> committing them separately. I know I was the one who made you split
> the patch up in the first place, but those patches are quite small and
> simple now, so it makes more sense to me to combine them.

Done

Also added a test case for vacuum full to recompress the data.

One  question, like storage should we apply the alter set compression
changes recursively to the inherited children (I have attached a
separate patch for this )?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: support for MERGE
Next
From: Julien Rouhaud
Date:
Subject: Re: [PATCH] pg_stat_statements dealloc field ignores manual deallocation