Re: pglz compression performance, take two - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: pglz compression performance, take two
Date
Msg-id EB24F56E-901B-46D6-BFFC-71D51603ABDE@enterprisedb.com
Whole thread Raw
In response to Re: pglz compression performance, take two  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: pglz compression performance, take two  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers

> On Jan 21, 2021, at 6:48 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> @cfbot: rebased
> <0001-Reorganize-pglz-compression-code.patch>

Review comments.

First, I installed a build from master without this patch, created a test installation with lots of compressed text and
arraycolumns, upgraded the binaries to a build with this patch included, and tried to find problems with the data left
overfrom the pre-patch binaries.  Everything checks out.  This is on little-endian mac osx intel core i9, not on any
ARMplatform that you are targeting with portions of the patch. 

+/**************************************
+ *  CPU Feature Detection            *
+ **************************************/
+/* PGLZ_FORCE_MEMORY_ACCESS
+ * By default, access to unaligned memory is controlled by `memcpy()`, which is safe and portable.
+ * Unfortunately, on some target/compiler combinations, the generated assembly is sub-optimal.
+ * The below switch allow to select different access method for improved performance.
+ * Method 0 (default) : use `memcpy()`. Safe and portable.
+ * Method 1 : direct access. This method is portable but violate C standard.
+ *         It can generate buggy code on targets which assembly generation depends on alignment.
+ *         But in some circumstances, it's the only known way to get the most performance (ie GCC + ARMv6)
+ * See https://fastcompression.blogspot.fr/2015/08/accessing-unaligned-memory.html for details.
+ * Prefer these methods in priority order (0 > 1)
+ */

The link to blogspot.fr has a lot more information than your summary in the code comments.  It might be hard to
understandthis comment if the blogspot article were ever to disappear.  Perhaps you could include a bit more of the
relevantdetails? 

+#ifndef PGLZ_FORCE_MEMORY_ACCESS   /* can be defined externally */
+#if defined(__GNUC__) && \
+  ( defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6K__) \
+  || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__) )
+#define PGLZ_FORCE_MEMORY_ACCESS 1
+#endif
+#endif

I can understand wanting to set this on gcc + ARMv6, but doesn't this belong in a configure script rather than directly
inthe compression code? 

The blogspot article indicates that the author lied about alignment to the compiler when using gcc on ARMv6, thereby
generatinga fast load instruction which happens to work on ARMv6.  You appear to be using that same approach.  Your #if
defined(__GNUC__),seems to assume that all future versions of gcc will generate the instructions that you want, and not
startgenerating some other set of instructions.  Wouldn't you at least need a configure test to verify that the version
ofgcc being used generates the desired assembly?  Even then, you'd be banking on gcc doing the same thing for the test
codeand for the pglz code, which I guess might not be true.  Have you considered using inline assembly instead? 



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] ProcessInterrupts_hook
Next
From: Peter Eisentraut
Date:
Subject: Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch