Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CA+TgmoZ50fvbfhoqXZWAnrFx+_7qOiPE7xNXfat0h7awbm_nxw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Custom compression methods (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
Re: [HACKERS] Custom compression methods |
List | pgsql-hackers |
Some more review comments: 'git am' barfs on v0001 because it's got a whitespace error. VARFLAGS_4B_C() doesn't seem to be used in any of the patches. I'm OK with keeping it even if it's not used just because maybe someone will need it later but, uh, don't we need to use it someplace? To avoid moving the goalposts for a basic install, I suggest that --with-lz4 should default to disabled. Maybe we'll want to rethink that at some point, but since we're just getting started with this whole thing, I don't think now is the time. The change to ddl.sgml doesn't seem to make sense to me. There might be someplace where we want to explain how properties are inherited in partitioning hierarchies, but I don't think this is the right place, and I don't think this explanation is particularly clear. + This clause adds the compression method to a column. The Compression + method can be set from available compression methods. The built-in + methods are <literal>pglz</literal> and <literal>lz4</literal>. + If no compression method is specified, then compressible types will have + the default compression method <literal>pglz</literal>. Suggest: This sets the compression method for a column. The supported compression methods are <literal>pglz</literal> and <literal>lz4</literal>. <literal>lz4</literal> is available only if <literal>--with-lz4</literal> was used when building <productname>PostgreSQL</productname>. The default is <literal>pglz</literal>. We should make sure, if you haven't already, that trying to create a column with LZ4 compression fails at table creation time if the build does not support LZ4. But, someone could also create a table using a build that has LZ4 support and then switch to a different set of binaries that do not have it, so we need the runtime checks also. However, those runtime checks shouldn't fail simplify from trying to access a table that is set to use LZ4 compression; they should only fail if we actually need to decompress an LZ4'd value. Since indexes don't have TOAST tables, it surprises me that brin_form_tuple() thinks it can TOAST anything. But I guess that's not this patch's problem, if it's a problem at all. I like the fact that you changed the message "compressed data is corrupt" to indicate the compression method, but I think the resulting message doesn't follow style guidelines because I don't believe we normally put something with a colon prefix at the beginning of a primary error message. So instead of saying "pglz: compressed data is corrupt" I think you should say something like "compressed pglz data is corrupt". Also, I suggest that we take this opportunity to switch to ereport() rather than elog() and set errcode(ERRCODE_DATA_CORRUPTED). What testing have you done for performance impacts? Does the patch slow things down noticeably with pglz? (Hopefully not.) Can you measure a performance improvement with pglz? (Hopefully so.) Is it likely to hurt performance that there's no minimum size for lz4 compression as we have for pglz? Seems like that could result in a lot of wasted cycles trying to compress short strings. pglz_cmcompress() cancels compression if the resulting value would be larger than the original one, but it looks like lz4_cmcompress() will just store the enlarged value. That seems bad. pglz_cmcompress() doesn't need to pfree(tmp) before elog(ERROR). CompressionOidToId(), CompressionIdToOid() and maybe other places need to remember the message style guidelines. Primary error messages are not capitalized. Why should we now have to include toast_internals.h in reorderbuffer.c, which has no other changes? That definitely shouldn't be necessary. If something in another header file now requires something from toast_internals.h, then that header file would be obliged to include toast_internals.h itself. But actually that shouldn't happen, because the whole point of toast_internals.h is that it should not be included in very many places at all. If we're adding stuff there that is going to be broadly needed, we're adding it in the wrong place. varlena.c shouldn't need toast_internals.h either, and if it did, it should be in alphabetical order. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: