Re: Popcount optimization using AVX512 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Popcount optimization using AVX512
Date
Msg-id 20240731004959.6ys24432n6xlgemk@awork3.anarazel.de
Whole thread Raw
In response to Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Popcount optimization using AVX512
Re: Popcount optimization using AVX512
List pgsql-hackers
Hi,

On 2024-07-30 16:32:07 -0500, Nathan Bossart wrote:
> On Tue, Jul 30, 2024 at 02:07:01PM -0700, Andres Freund wrote:
> > Now, a reasonable counter-argument would be that only some of these macros are
> > defined for msvc ([1]).  However, as it turns out, the test is broken
> > today, as msvc doesn't error out when using an intrinsic that's not
> > "available" by the target architecture, it seems to assume that the caller did
> > a cpuid check ahead of time.
> > 
> > 
> > Check out [2], it shows the various predefined macros for gcc, clang and msvc.
> > 
> > 
> > ISTM that the msvc checks for xsave/avx512 being broken should be an open
> > item?
> 
> I'm not following this one.  At the moment, we always do a runtime check
> for the AVX-512 stuff, so in the worst case we'd check CPUID at startup and
> set the function pointers appropriately, right?  We could, of course, still
> fix it, though.

Ah, I somehow thought we'd avoid the runtime check in case we determine at
compile time we don't need any extra flags to enable the AVX512 stuff (similar
to how we deal with crc32). But it looks like that's not the case - which
seems pretty odd to me:

This turns something that can be a single instruction into an indirect
function call, even if we could know that it's guaranteed to be available for
the compilation target, due to -march=....

It's one thing for the avx512 path to have that overhead, but it's
particularly absurd for pg_popcount32/pg_popcount64, where

a) The function call overhead is a larger proportion of the cost.
b) the instruction is almost universally available, including in the
   architecture baseline x86-64-v2, which several distros are using as the
   x86-64 baseline.


Why are we actually checking for xsave? We're not using xsave itself and I
couldn't find a comment in 792752af4eb5 explaining what we're using it as a
proxy for?  Is that just to know if _xgetbv() exists?  Is it actually possible
that xsave isn't available when avx512 is?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Do we still need parent column in pg_backend_memory_context?
Next
From: Andy Fan
Date:
Subject: Re: Seq scan instead of index scan querying single row from primary key on large table