Attached v10 with minor changes based on the feedback.
> Thanks, I think this is a good direction. Some comments:
>
> #if defined(HAVE__GET_CPUID)
> __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]); #elif defined(HAVE__CPUID)
> __cpuid(exx, 1); #endif
Oh yeah good catch. Fixed in v10.
> + }
> + /* avx512 os support */
> + if (zmm_regs_available()) {
> + return;
> + }
>
> // BTW, I do like the gating here that reduces the number of places that have to
> know about zmm and xsave. (Side note: shouldn't that be "if
> !(zmm_regs_available())"?)
Yup, good catch again 😊
> What do you think?
Yup, those look correct to me. Fixed them in v10.
> + PG_CPU_FEATURE_PCLMUL = 3,
> [...]
> + PG_CPU_FEATURE_ARMV8_CRC32C = 100,
>
> I'm not a fan of exposing these architecture-specific details to places that consult
> the capabilities. That requires things like
>
> +#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42)
> [...]
> +#define PGCPUCAP_CRC32C
> pg_cpu_have(PG_CPU_FEATURE_ARMV8_CRC32C)
>
> I'd prefer to have 1 capability <-> one place to check at runtime for architectures
> that need that, and to keep architecture details private to the initialization step. .
> Even for things that test for which function pointer to use, I think it's a cleaner
> interface to look at one thing.
Isn't that one thing currently pg_cpu_have(FEATURE)? The reason we have the additional PGCPUCAP_CRC32C is because the
dispatchfor CRC32C is currently defined in the header file common to both ARM and SSE42. We should ideally have
separatedispatch for ARM and x86 in their own files (the way it is in the main branch). This also makes it easier to
addmore implementations in the future without having to make the dispatch function work for both ARM and x86.
> If we're going to have a single file for the init step, we don't need this -- we'd just
> have a different definition of
> pg_cpucap_initialize() in each part, with a default that only adds the "init" slot:
>
> #if defined( __i386__ ) || defined(i386) || defined(_M_IX86) ||
> defined(__x86_64__) || defined(__x86_64) || defined(_M_AMD64)
>
> <cpuid checks>
>
> #elif defined(__arm__) || defined(__arm) ||
> defined(__aarch64__) || defined(_M_ARM64)
>
> <for now only pg_crc32c_armv8_available()>
>
> #else
> <only init>
> #endif
Makes sense. Got rid of it in v10.
Raghuveer