RE: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers

From Devulapalli, Raghuveer
Subject RE: Improve CRC32C performance on SSE4.2
Date
Msg-id PH8PR11MB8286B017901C2C30665B106CFBC22@PH8PR11MB8286.namprd11.prod.outlook.com
Whole thread Raw
In response to Re: Improve CRC32C performance on SSE4.2  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: Improve CRC32C performance on SSE4.2
List pgsql-hackers
Attached v10 with minor changes based on the feedback. 

> Thanks, I think this is a good direction. Some comments:
> 
> #if defined(HAVE__GET_CPUID)
> __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]); #elif defined(HAVE__CPUID)
> __cpuid(exx, 1); #endif

Oh yeah good catch. Fixed in v10. 

> +    }
> +    /* avx512 os support */
> +    if (zmm_regs_available()) {
> +        return;
> +    }
> 
> // BTW, I do like the gating here that reduces the number of places that have to
> know about zmm and xsave. (Side note: shouldn't that be "if
> !(zmm_regs_available())"?)

Yup, good catch again 😊 

> What do you think?

Yup, those look correct to me. Fixed them in v10. 

> +    PG_CPU_FEATURE_PCLMUL            = 3,
> [...]
> +    PG_CPU_FEATURE_ARMV8_CRC32C      = 100,
> 
> I'm not a fan of exposing these architecture-specific details to places that consult
> the capabilities. That requires things like
> 
> +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42)
> [...]
> +#define PGCPUCAP_CRC32C
> pg_cpu_have(PG_CPU_FEATURE_ARMV8_CRC32C)
> 
> I'd prefer to have 1 capability <-> one place to check at runtime for architectures
> that need that, and to keep architecture details private to the initialization step. .
> Even for things that test for which function pointer to use, I think it's a cleaner
> interface to look at one thing.

Isn't that one thing currently pg_cpu_have(FEATURE)? The reason we have the additional PGCPUCAP_CRC32C is because the
dispatchfor CRC32C is currently defined in the header file common to both ARM and SSE42. We should ideally have
separatedispatch for ARM and x86 in their own files (the way it is in the main branch). This also makes it easier to
addmore implementations in the future without having to make the dispatch function work for both ARM and x86. 
 

> If we're going to have a single file for the init step, we don't need this -- we'd just
> have a different definition of
> pg_cpucap_initialize() in each part, with a default that only adds the "init" slot:
> 
> #if defined( __i386__ ) || defined(i386) || defined(_M_IX86) ||
>   defined(__x86_64__) || defined(__x86_64) || defined(_M_AMD64)
> 
> <cpuid checks>
> 
> #elif defined(__arm__) || defined(__arm) ||
>    defined(__aarch64__) || defined(_M_ARM64)
> 
> <for now only pg_crc32c_armv8_available()>
> 
> #else
> <only init>
> #endif

Makes sense. Got rid of it in v10. 

Raghuveer 

Attachment

pgsql-hackers by date:

Previous
From: Dmytro Astapov
Date:
Subject: Re: Relaxing constraints on BitmapAnd eligibility?
Next
From: Robert Haas
Date:
Subject: Re: Statistics Import and Export