Re: Popcount optimization using AVX512 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Popcount optimization using AVX512
Date
Msg-id 20240731014651.byn3exbu6n5v2yqa@awork3.anarazel.de
Whole thread Raw
In response to Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Popcount optimization using AVX512
List pgsql-hackers
Hi,

On 2024-07-30 20:20:34 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 05:49:59PM -0700, Andres Freund wrote:
> > Ah, I somehow thought we'd avoid the runtime check in case we determine at
> > compile time we don't need any extra flags to enable the AVX512 stuff (similar
> > to how we deal with crc32). But it looks like that's not the case - which
> > seems pretty odd to me:
> > 
> > This turns something that can be a single instruction into an indirect
> > function call, even if we could know that it's guaranteed to be available for
> > the compilation target, due to -march=....
> > 
> > It's one thing for the avx512 path to have that overhead, but it's
> > particularly absurd for pg_popcount32/pg_popcount64, where
> > 
> > a) The function call overhead is a larger proportion of the cost.
> > b) the instruction is almost universally available, including in the
> >    architecture baseline x86-64-v2, which several distros are using as the
> >    x86-64 baseline.
> 
> Yeah, pg_popcount32/64 have been doing this since v12 (02a6a54).  Until v17
> (cc4826d), pg_popcount() repeatedly calls these function pointers, too.  I
> think it'd be awesome if we could start requiring some of these "almost
> universally available" instructions, but AFAICT that brings its own
> complexity [0].

I'll respond there...


> > Why are we actually checking for xsave? We're not using xsave itself and I
> > couldn't find a comment in 792752af4eb5 explaining what we're using it as a
> > proxy for?  Is that just to know if _xgetbv() exists?  Is it actually possible
> > that xsave isn't available when avx512 is?
> 
> Yes, it's to verify we have XGETBV, which IIUC requires support from both
> the processor and the OS (see 598e011 and upthread discussion).  AFAIK the
> way we are detecting AVX-512 support is quite literally by-the-book unless
> I've gotten something wrong.

I'm basically wondering whether we need to check for compiler (not OS support)
support for xsave if we also check for -mavx512vpopcntdq -mavx512bw
support. Afaict the latter implies support for xsave.

andres@alap6:~$ echo|gcc -c - -march=x86-64 -xc -dM -E  - -o -|grep '__XSAVE__'
andres@alap6:~$ echo|gcc -c - -march=x86-64 -mavx512vpopcntdq -mavx512bw -xc -dM -E  - -o -|grep '__XSAVE__'
#define __XSAVE__ 1
#define __XSAVE__ 1

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Do we still need parent column in pg_backend_memory_context?
Next
From: Nathan Bossart
Date:
Subject: Re: Popcount optimization using AVX512