Home > mailing lists

Re: Popcount optimization using AVX512 - Mailing list pgsql-hackers

From	Ants Aasma
Subject	Re: Popcount optimization using AVX512
Date	April 5, 2024 07:33:27
Msg-id	CANwKhkMQtZCxa+nq=9QAoT6rgSQ48cVpH83tO3Md+-ck4bVz2w@mail.gmail.com Whole thread
In response to	Re: Popcount optimization using AVX512 (Nathan Bossart <nathandbossart@gmail.com>)
Responses	Re: Popcount optimization using AVX512
List	pgsql-hackers

Tree view

On Fri, 5 Apr 2024 at 07:15, Nathan Bossart <nathandbossart@gmail.com> wrote:
> Here is an updated patch set.  IMHO this is in decent shape and is
> approaching committable.

I checked the code generation on various gcc and clang versions. It
looks mostly fine starting from versions where avx512 is supported,
gcc-7.1 and clang-5.

The main issue I saw was that clang was able to peel off the first
iteration of the loop and then eliminate the mask assignment and
replace masked load with a memory operand for vpopcnt. I was not able
to convince gcc to do that regardless of optimization options.
Generated code for the inner loop:

clang:
<L2>:
      50:      add rdx, 64
      54:      cmp rdx, rdi
      57:      jae <L1>
      59:      vpopcntq zmm1, zmmword ptr [rdx]
      5f:      vpaddq zmm0, zmm1, zmm0
      65:      jmp <L2>

gcc:
<L1>:
      38:      kmovq k1, rdx
      3d:      vmovdqu8 zmm0 {k1} {z}, zmmword ptr [rax]
      43:      add rax, 64
      47:      mov rdx, -1
      4e:      vpopcntq zmm0, zmm0
      54:      vpaddq zmm0, zmm0, zmm1
      5a:      vmovdqa64 zmm1, zmm0
      60:      cmp rax, rsi
      63:      jb <L1>

I'm not sure how much that matters in practice. Attached is a patch to
do this manually giving essentially the same result in gcc. As most
distro packages are built using gcc I think it would make sense to
have the extra code if it gives a noticeable benefit for large cases.

The visibility map patch has the same issue, otherwise looks good.

Regards,
Ants Aasma

Attachment

avx512-peel-first-iteration.patch

pgsql-hackers by date:

From: Amit Langote
Date: 05 April 2024, 07:09:29
Subject: Re: remaining sql/json patches

From: Bertrand Drouvot
Date: 05 April 2024, 07:43:58
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation

Re: Popcount optimization using AVX512 - Mailing list pgsql-hackers

Attachment

Previous

Next