On Wed, Jan 08, 2025 at 05:25:24PM -0500, Andres Freund wrote:
> On 2025-01-08 16:01:19 -0600, Nathan Bossart wrote:
>> But I did retry my test from upthread without pg_stat_statements and was
>> surprised to find a reproducible 4-6% regression.
>
> Uh, huh. I assume this was readonly pgbench with 256 clients just as you had
> tested upthread? I don't think there's any hot spinlock meaningfully involved
> in that workload? A r/w workload is a different story, but upthread you
> mentioned select-only.
>
> Do you see any spinlock in profiles?
Yes, this was using 256 clients. Looking closer, I don't see anything
spinlock related anywhere near the top of perf.
>> I'm not seeing any obvious differences in perf, but I do see that the thread
>> for adding TAS_SPIN() for PPC mentions a regression at lower contention
>> levels [0]. Perhaps the non-locked test is failing often enough to hurt
>> performance in this case... Whatever it is, it'll be mighty frustrating to
>> miss out on a
>> >7x gain because of a 4% regression.
>
> I don't think the explanation can be that simple - even with TAS_SPIN defined,
> we do try to acquire the lock once without using TAS_SPIN:
>
> #if !defined(S_LOCK)
> #define S_LOCK(lock) \
> (TAS(lock) ? s_lock((lock), __FILE__, __LINE__, __func__) : 0)
> #endif /* S_LOCK */
>
> Only s_lock() then uses TAS_SPIN(lock).
Ah, right. FWIW I tried setting a cap on the number of times we do a
non-locked test, and the results still showed the regression, which seems
to match your intuition here.
> I wonder if you're hitting an extreme case of binary-layout related effects?
> I've never seen them at this magnitude though. I'd suggest using either lld
> or mold as linker and comparing the numbers for a few
> -Wl,--shuffle-sections=$seed seed values.
Will do.
--
nathan