On Fri, Oct 3, 2025 at 10:48 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I quickly hacked together some patches for this. 0001 adds new static
> variables so that we have a separate array of the buffers and the index for
> the current ReservedRefCountEntry. 0002 optimizes the linear search in
> GetPrivateRefCountEntry() using our simd.h routines. This stuff feels
> expensive (see vector8_highbit_mask()'s implementation for AArch64), but if
> the main goal is to avoid branches, I think this is about as "branchless"
> as we can make it. I'm going to stare at this a bit longer, but I figured
> I'd get something on the lists while it is fresh in my mind.
I was unable to notice any improvements in any of the microbenchmarks
that I've been using to test the index prefetching patch set. For
whatever reason, these test cases are neither improved nor regressed
by your patch series.
I've never really played around with SIMD before. Is the precise CPU
microarchitecture relevant? Are power management settings important?
--
Peter Geoghegan