Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-vZHv=MKADfi75cgvS48889n+K-fBP0i2HCDPa1bwfu6Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Feb 2, 2021 at 2:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Some more review comments:
>
> 'git am' barfs on v0001 because it's got a whitespace error.

Fixed

> VARFLAGS_4B_C() doesn't seem to be used in any of the patches. I'm OK
> with keeping it even if it's not used just because maybe someone will
> need it later but, uh, don't we need to use it someplace?

Actually I was using TOAST_COMPRESS_METHOD and that required inclusion
of toast_internal.h so now
I have used VARFLAGS_4B_C and with that we are able to remove the
inclusion of toast_internal.h
in unwanted places.

> To avoid moving the goalposts for a basic install, I suggest that
> --with-lz4 should default to disabled. Maybe we'll want to rethink
> that at some point, but since we're just getting started with this
> whole thing, I don't think now is the time.

Done

> The change to ddl.sgml doesn't seem to make sense to me. There might
> be someplace where we want to explain how properties are inherited in
> partitioning hierarchies, but I don't think this is the right place,
> and I don't think this explanation is particularly clear.

Not yet done, I thought at the same place we are describing the
storage relationship with the partition so that is the place for the
compression also.  Maybe I will have to read ddl.sgml file and find
out the most suitable place.  So I kept is as pending.

> +      This clause adds the compression method to a column.  The Compression
> +      method can be set from available compression methods.  The built-in
> +      methods are <literal>pglz</literal> and <literal>lz4</literal>.
> +      If no compression method is specified, then compressible types will have
> +      the default compression method <literal>pglz</literal>.
>
> Suggest: This sets the compression method for a column. The supported
> compression methods are <literal>pglz</literal> and
> <literal>lz4</literal>. <literal>lz4</literal> is available only if
> <literal>--with-lz4</literal> was used when building
> <productname>PostgreSQL</productname>. The default is
> <literal>pglz</literal>.

Done

> We should make sure, if you haven't already, that trying to create a
> column with LZ4 compression fails at table creation time if the build
> does not support LZ4. But, someone could also create a table using a
> build that has LZ4 support and then switch to a different set of
> binaries that do not have it, so we need the runtime checks also.
> However, those runtime checks shouldn't fail simplify from trying to
> access a table that is set to use LZ4 compression; they should only
> fail if we actually need to decompress an LZ4'd value.

Done,  I have cheched the compression method Oid if it LZ4 and if the
lz4 library is not install
then error out.  We can also use the handler sepcific check function
but I am not sure does that make
sense to add extra routine for that.  In later patch 0006 we have an
check function to verify the
option so during that we can error out and no need to check this outside.

> Since indexes don't have TOAST tables, it surprises me that
> brin_form_tuple() thinks it can TOAST anything. But I guess that's not
> this patch's problem, if it's a problem at all.

it is just trying to compress it not externalize.

> I like the fact that you changed the message "compressed data is
> corrupt" to indicate the compression method, but I think the resulting
> message doesn't follow style guidelines because I don't believe we
> normally put something with a colon prefix at the beginning of a
> primary error message. So instead of saying "pglz: compressed data is
> corrupt" I think you should say something like "compressed pglz data
> is corrupt". Also, I suggest that we take this opportunity to switch
> to ereport() rather than elog() and set
> errcode(ERRCODE_DATA_CORRUPTED).

Done

>
> What testing have you done for performance impacts? Does the patch
> slow things down noticeably with pglz? (Hopefully not.) Can you
> measure a performance improvement with pglz? (Hopefully so.) Is it
> likely to hurt performance that there's no minimum size for lz4
> compression as we have for pglz? Seems like that could result in a lot
> of wasted cycles trying to compress short strings.

Not sure what to do about this,  I will check the performance with
small varlenas and see.

> pglz_cmcompress() cancels compression if the resulting value would be
> larger than the original one, but it looks like lz4_cmcompress() will
> just store the enlarged value. That seems bad.

you mean lz4_cmcompress, Done

> pglz_cmcompress() doesn't need to pfree(tmp) before elog(ERROR).

Done

> CompressionOidToId(), CompressionIdToOid() and maybe other places need
> to remember the message style guidelines. Primary error messages are
> not capitalized.

Fixed

> Why should we now have to include toast_internals.h in
> reorderbuffer.c, which has no other changes? That definitely shouldn't
> be necessary. If something in another header file now requires
> something from toast_internals.h, then that header file would be
> obliged to include toast_internals.h itself. But actually that
> shouldn't happen, because the whole point of toast_internals.h is that
> it should not be included in very many places at all. If we're adding
> stuff there that is going to be broadly needed, we're adding it in the
> wrong place.

Done

> varlena.c shouldn't need toast_internals.h either, and if it did, it
> should be in alphabetical order.
>

It was the wrong usage, fixed now.

Please refer to the latest patch at

https://www.postgresql.org/message-id/CAFiTN-v9Cs1MORnp-3bGZ5QBwr5v3VarSvfaDizHi1acXES5xQ%40mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] GSoC 2017: Foreign Key Arrays