Hi,
On 2025-01-08 16:01:19 -0600, Nathan Bossart wrote:
> On Wed, Jan 08, 2025 at 03:07:24PM -0500, Andres Freund wrote:
> > Out of curiosity, have you measured whether this has a positive effect without
> > pg_stat_statements? I think it'll e.g. also affect lwlocks, as they also use
> > perform_spin_delay().
>
> AFAICT TAS_SPIN() is only used for s_lock(), which doesn't appear to be
> used by LWLocks.
Brainfart on my part, sorry. I was thinking of SPIN_DELAY() for a moment...
> But I did retry my test from upthread without pg_stat_statements and was
> surprised to find a reproducible 4-6% regression.
Uh, huh. I assume this was readonly pgbench with 256 clients just as you had
tested upthread? I don't think there's any hot spinlock meaningfully involved
in that workload? A r/w workload is a different story, but upthread you
mentioned select-only.
Do you see any spinlock in profiles?
> I'm not seeing any obvious differences in perf, but I do see that the thread
> for adding TAS_SPIN() for PPC mentions a regression at lower contention
> levels [0]. Perhaps the non-locked test is failing often enough to hurt
> performance in this case... Whatever it is, it'll be mighty frustrating to
> miss out on a
> >7x gain because of a 4% regression.
I don't think the explanation can be that simple - even with TAS_SPIN defined,
we do try to acquire the lock once without using TAS_SPIN:
#if !defined(S_LOCK)
#define S_LOCK(lock) \
(TAS(lock) ? s_lock((lock), __FILE__, __LINE__, __func__) : 0)
#endif /* S_LOCK */
Only s_lock() then uses TAS_SPIN(lock).
I wonder if you're hitting an extreme case of binary-layout related effects?
I've never seen them at this magnitude though. I'd suggest using either lld
or mold as linker and comparing the numbers for a few
-Wl,--shuffle-sections=$seed seed values.
Greetings,
Andres Freund