On Fri, Mar 22, 2024 at 12:09 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Thu, Mar 21, 2024 at 11:30:30AM +0700, John Naylor wrote:
> > If this were "<=" then the for long arrays we could assume there is
> > always more than one block, and wouldn't need to check if any elements
> > remain -- first block, then a single loop and it's done.
> >
> > The loop could also then be a "do while" since it doesn't have to
> > check the exit condition up front.
>
> Good idea. That causes us to re-check all of the tail elements when the
> number of elements is evenly divisible by nelem_per_iteration, but that
> might be worth the trade-off.
Yeah, if there's no easy way to avoid that it's probably fine. I
wonder if we can subtract one first to force even multiples to round
down, although I admit I haven't thought through the consequences of
that.
> [v8]
Seems pretty good. It'd be good to see the results of 2- vs.
4-register before committing, because that might lead to some
restructuring, but maybe it won't, and v8 is already an improvement
over HEAD.
/* Process the remaining elements one at a time. */
This now does all of them if that path is taken, so "remaining" can be removed.