On Tue, Mar 5, 2024 at 5:30 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Tue, Jan 30, 2024 at 5:04 PM John Naylor <johncnaylorls@gmail.com> wrote:
> >
> > On Tue, Jan 30, 2024 at 4:13 AM Ants Aasma <ants.aasma@cybertec.at> wrote:
> > > But given that we know the data length and we have it in a register
> > > already, it's easy enough to just mask out data past the end with a
> > > shift. See patch 1. Performance benefit is about 1.5x Measured on a
> > > small test harness that just hashes and finalizes an array of strings,
> > > with a data dependency between consecutive hashes (next address
> > > depends on the previous hash output).
> >
> > Interesting work! I've taken this idea and (I'm guessing, haven't
> > tested) improved it by re-using an intermediate step for the
> > conditional, simplifying the creation of the mask, and moving the
> > bitscan out of the longest dependency chain.
>
> This needed a rebase, and is now 0001. I plan to push this soon.
I pushed but had to revert -- my version (and I believe both) failed
to keep the invariant that the aligned and unaligned must result in
the same hash. It's clear to me how to fix, but I've injured my strong
hand and won't be typing much in for a cuople days. I'll prioritize
the removal of strlen calls for v17, since the optimization can wait
and there is also a valgrind issue I haven't looked into.