Re: Popcount optimization using AVX512 - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Popcount optimization using AVX512
Date
Msg-id 20240212205507.GB1815383@nathanxps13
Whole thread Raw
In response to Re: Popcount optimization using AVX512  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Sat, Feb 10, 2024 at 03:52:38PM -0800, Noah Misch wrote:
> On Fri, Feb 09, 2024 at 08:33:23PM -0800, Andres Freund wrote:
>> My understanding is that the ifunc mechanism just avoid the need for repeated
>> indirect calls/jumps to implement a single function call, not the use of
>> indirect function calls at all. Calls into shared libraries, like libc, are
>> indirected via the GOT / PLT, i.e. an indirect function call/jump.  Without
>> ifuncs, the target of the function call would then have to dispatch to the
>> resolved function. Ifuncs allow to avoid this repeated dispatch by moving the
>> dispatch to the dynamic linker stage, modifying the contents of the GOT/PLT to
>> point to the right function. Thus ifuncs are an optimization when calling a
>> function in a shared library that's then dispatched depending on the cpu
>> capabilities.
>> 
>> However, in our case, where the code is in the same binary, function calls
>> implemented in the main binary directly (possibly via a static library) don't
>> go through GOT/PLT. In such a case, use of ifuncs turns a normal direct
>> function call into one going through the GOT/PLT, i.e. makes it indirect. The
>> same is true for calls within a shared library if either explicit symbol
>> visibility is used, or -symbolic, -Wl,-Bsymbolic or such is used. Therefore
>> there's no efficiency gain of ifuncs over a call via function pointer.
>> 
>> 
>> This isn't because ifunc is implemented badly or something - the reason for
>> this is that dynamic relocations aren't typically implemented by patching all
>> callsites (".text relocations"), which is what you would need to avoid the
>> need for an indirect call to something that fundamentally cannot be a constant
>> address at link time. The reason text relocations are disfavored is that
>> they can make program startup quite slow, that they require allowing
>> modifications to executable pages which are disliked due to the security
>> implications, and that they make the code non-shareable, as the in-memory
>> executable code has to differ from the on-disk code.
>> 
>> 
>> I actually think ifuncs within the same binary are a tad *slower* than plain
>> function pointer calls, unless -fno-plt is used. Without -fno-plt, an ifunc is
>> called by 1) a direct call into the PLT, 2) loading the target address from
>> the GOT, 3) making an an indirect jump to that address.  Whereas a "plain
>> indirect function call" is just 1) load target address from variable 2) making
>> an indirect jump to that address. With -fno-plt the callsites themselves load
>> the address from the GOT.
> 
> That sounds more accurate than what I wrote.  Thanks.

+1, thanks for the detailed explanation, Andres.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH] Add native windows on arm64 support
Next
From: "David E. Wheeler"
Date:
Subject: Re: Patch: Add parse_type Function