On Thu, 7 Nov 2024 at 00:40, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> Do you mean add:
>
> "
> for (; p < aligned_end; p += sizeof(size_t))
> {
> if (*(size_t *)p != 0)
> return false;
> }
> "
>
> just before the last loop?
>
> If so, I did a few tests and did not see any major improvements. So, I thought
> it's simpler to not add more code in this inline function in v7 shared up-thread.
Did you try with a size where there's a decent remainder, say 124
bytes? FWIW, one of the cases has 112 bytes, and I think that is
aligned memory meaning we'll do the first 64 in the SIMD loop and have
to do 48 bytes in the byte-at-a-time loop. If you had the loop Michael
mentioned, that would instead be 6 loops of size_t-at-a-time.
David