On Thu, Feb 06, 2025 at 08:44:35AM +0000, Chiranmoy.Bhattacharya@fujitsu.com wrote:
>> Does this hand-rolled loop unrolling offer any particular advantage? What
>> do the numbers look like if we don't do this or if we process, say, 4
>> vectors at a time?
>
> The unrolled version performs better than the non-unrolled one, but
> processing four vectors provides no additional benefit. The numbers
> and code used are given below.
Hm. Any idea why that is? I wonder if the compiler isn't using as many
SVE registers as it could for this.
--
nathan