Re: pglz compression performance, take two - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: pglz compression performance, take two |
Date | |
Msg-id | EB24F56E-901B-46D6-BFFC-71D51603ABDE@enterprisedb.com Whole thread Raw |
In response to | Re: pglz compression performance, take two (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: pglz compression performance, take two
(Andrey Borodin <x4mmm@yandex-team.ru>)
|
List | pgsql-hackers |
> On Jan 21, 2021, at 6:48 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > @cfbot: rebased > <0001-Reorganize-pglz-compression-code.patch> Review comments. First, I installed a build from master without this patch, created a test installation with lots of compressed text and arraycolumns, upgraded the binaries to a build with this patch included, and tried to find problems with the data left overfrom the pre-patch binaries. Everything checks out. This is on little-endian mac osx intel core i9, not on any ARMplatform that you are targeting with portions of the patch. +/************************************** + * CPU Feature Detection * + **************************************/ +/* PGLZ_FORCE_MEMORY_ACCESS + * By default, access to unaligned memory is controlled by `memcpy()`, which is safe and portable. + * Unfortunately, on some target/compiler combinations, the generated assembly is sub-optimal. + * The below switch allow to select different access method for improved performance. + * Method 0 (default) : use `memcpy()`. Safe and portable. + * Method 1 : direct access. This method is portable but violate C standard. + * It can generate buggy code on targets which assembly generation depends on alignment. + * But in some circumstances, it's the only known way to get the most performance (ie GCC + ARMv6) + * See https://fastcompression.blogspot.fr/2015/08/accessing-unaligned-memory.html for details. + * Prefer these methods in priority order (0 > 1) + */ The link to blogspot.fr has a lot more information than your summary in the code comments. It might be hard to understandthis comment if the blogspot article were ever to disappear. Perhaps you could include a bit more of the relevantdetails? +#ifndef PGLZ_FORCE_MEMORY_ACCESS /* can be defined externally */ +#if defined(__GNUC__) && \ + ( defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6K__) \ + || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__) ) +#define PGLZ_FORCE_MEMORY_ACCESS 1 +#endif +#endif I can understand wanting to set this on gcc + ARMv6, but doesn't this belong in a configure script rather than directly inthe compression code? The blogspot article indicates that the author lied about alignment to the compiler when using gcc on ARMv6, thereby generatinga fast load instruction which happens to work on ARMv6. You appear to be using that same approach. Your #if defined(__GNUC__),seems to assume that all future versions of gcc will generate the instructions that you want, and not startgenerating some other set of instructions. Wouldn't you at least need a configure test to verify that the version ofgcc being used generates the desired assembly? Even then, you'd be banking on gcc doing the same thing for the test codeand for the pglz code, which I guess might not be true. Have you considered using inline assembly instead? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: