Re: Popcount optimization using AVX512 - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Popcount optimization using AVX512
Date
Msg-id 20240418210158.GA3776258@nathanxps13
Whole thread Raw
In response to Re: Popcount optimization using AVX512  (David Rowley <dgrowleyml@gmail.com>)
Responses RE: Popcount optimization using AVX512
List pgsql-hackers
On Thu, Apr 18, 2024 at 08:24:03PM +0000, Devulapalli, Raghuveer wrote:
>> This seems to contradict the note about doing step 3 at any point, and
>> given step 1 is the OSXSAVE check, I'm not following what this means,
>> anyway.
> 
> It is recommended that you run the xgetbv code before you check for cpu
> features avx512-popcnt and avx512-bw. The way it is written now is the
> opposite order. I would also recommend splitting the cpuid feature check
> for avx512popcnt/avx512bw and xgetbv section into separate functions to
> make them modular. Something like:
> 
> static inline
> int check_os_avx512_support(void)
> {
>     // (1) run cpuid leaf 1 to check for xgetbv instruction support:
>         unsigned int exx[4] = {0, 0, 0, 0};
>         __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]);
>         if ((exx[2] & (1 << 27)) == 0)  /* xsave */
>                 return false;
> 
>         /* Does XGETBV say the ZMM/YMM/XMM registers are enabled? */
>         return (_xgetbv(0) & 0xe0) == 0xe0;
> }
> 
>> I'm also wondering if we need to check that (_xgetbv(0) & 0xe6) == 0xe6
>> instead of just (_xgetbv(0) & 0xe0) != 0, as the status of the lower
>> half of some of the ZMM registers is stored in the SSE and AVX state
>> [0].  I don't know how likely it is that 0xe0 would succeed but 0xe6
>> wouldn't, but we might as well make it correct.
> 
> This is correct. It needs to check all the 3 bits (XMM/YMM and ZMM). The
> way it is written is now is in-correct. 

Thanks for the feedback.  I've attached an updated patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Cannot find a working 64-bit integer type on Illumos
Next
From: Nathan Bossart
Date:
Subject: Re: improve performance of pg_dump --binary-upgrade