Re: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: Improve CRC32C performance on SSE4.2 |
Date | |
Msg-id | CANWCAZbG_pPFuLRaczYApBykaHCXQ-STAXO1=J6_4TRvy23Byg@mail.gmail.com Whole thread Raw |
In response to | RE: Improve CRC32C performance on SSE4.2 ("Devulapalli, Raghuveer" <raghuveer.devulapalli@intel.com>) |
List | pgsql-hackers |
On Wed, Feb 26, 2025 at 7:21 AM Devulapalli, Raghuveer <raghuveer.devulapalli@intel.com> wrote: > > > I agree it would be preferable to make a centralized check work. > > Here is my first stab at it. v9 is same as v8 + a commit to move all cpuid checks into one single place including the AVX512popcount code. Any new feature that requires CPUID information can access that information with pg_cpu_have[FEATURE]defined in pg_cpucap.h and initialized in pg_cpucap.c. v8 also had a typo in configure files which causeda build failure. Its fixed in v9. > > Pretty sure the ARM code paths need some correction. Let me know what you think. Thanks, I think this is a good direction. Some comments: +static void pg_cpuid(int leaf, int subleaf, unsigned int* exx) +{ +#if defined(HAVE__GET_CPUID_COUNT) + __get_cpuid_count(leaf, subleaf, &exx[0], &exx[1], &exx[2], &exx[3]); +#elif defined(HAVE__CPUIDEX) + __cpuidex(exx, leaf, subleaf); +#else +#error cpuid instruction not available +#endif +} Our current configure still looks for __get_cpuid and __cpuid. We committed checking these new ones fairly recently, and they were further gated by USE_AVX512_POPCNT_WITH_RUNTIME_CHECK. It seems like here we should do something like the following, where "+" lines are from the patch and other lines are mine: +static void +pg_cpucap_x86(void) +{ + unsigned int exx[4] = {0, 0, 0, 0}; #if defined(HAVE__GET_CPUID) __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]); #elif defined(HAVE__CPUID) __cpuid(exx, 1); #endif + pg_cpucap[PG_CPU_FEATURE_SSE42] = (exx[2] & (1 << 20)) != 0; + pg_cpucap[PG_CPU_FEATURE_PCLMUL] = (exx[2] & (1 << 1)) != 0; + pg_cpucap[PG_CPU_FEATURE_POPCNT] = (exx[2] & (1 << 23)) != 0; + /* osxsave */ + if ((exx[2] & (1 << 27)) == 0) { + return; + } + /* avx512 os support */ + if (zmm_regs_available()) { + return; + } // BTW, I do like the gating here that reduces the number of places that have to know about zmm and xsave. (Side note: shouldn't that be "if !(zmm_regs_available())"?) + /* reset for second cpuid call on leaf 7 to check extended avx512 support */ exx[4] = {0, 0, 0, 0}; #if defined(HAVE__GET_CPUID_COUNT) __get_cpuid_count(7, 0, &exx[0], &exx[1], &exx[2], &exx[3]); #elif defined(HAVE__CPUIDEX) __cpuidex(exx, 7, 0); #endif + pg_cpucap[PG_CPU_FEATURE_AVX512F] = (exx[1] & (1 << 16)) != 0; + pg_cpucap[PG_CPU_FEATURE_AVX512BW] = (exx[1] & (1 << 30)) != 0; + pg_cpucap[PG_CPU_FEATURE_AVX512VPOPCNTDQ] = (exx[2] & (1 << 14)) != 0; + +} What do you think? -#define PGCPUCAP_INIT (1 << 0) -#define PGCPUCAP_POPCNT (1 << 1) -#define PGCPUCAP_VPOPCNT (1 << 2) -#define PGCPUCAP_CRC32C (1 << 3) -#define PGCPUCAP_CLMUL (1 << 4) +enum pg_cpucap__ +{ + PG_CPU_FEATURE_INIT = 0, + // X86 + PG_CPU_FEATURE_SSE42 = 1, + PG_CPU_FEATURE_POPCNT = 2, + PG_CPU_FEATURE_PCLMUL = 3, [...] + PG_CPU_FEATURE_ARMV8_CRC32C = 100, I'm not a fan of exposing these architecture-specific details to places that consult the capabilities. That requires things like +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42) [...] +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_ARMV8_CRC32C) I'd prefer to have 1 capability <-> one place to check at runtime for architectures that need that, and to keep architecture details private to the initialization step. . Even for things that test for which function pointer to use, I think it's a cleaner interface to look at one thing. +static void +pg_cpucap_arch() +{ + /* WIP: configure checks */ +#ifdef __x86_64__ + pg_cpucap_x86(); +#else // ARM: + pg_cpucap_arm(); +#endif +} If we're going to have a single file for the init step, we don't need this -- we'd just have a different definition of pg_cpucap_initialize() in each part, with a default that only adds the "init" slot: #if defined( __i386__ ) || defined(i386) || defined(_M_IX86) || defined(__x86_64__) || defined(__x86_64) || defined(_M_AMD64) <cpuid checks> #elif defined(__arm__) || defined(__arm) || defined(__aarch64__) || defined(_M_ARM64) <for now only pg_crc32c_armv8_available()> #else <only init> #endif -- John Naylor Amazon Web Services
pgsql-hackers by date: