On Tue, Nov 7, 2023 at 9:47 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Separately, I'm wondering whether we should consider using CFLAGS_VECTORIZE
> on the whole tree. Commit fdea253 seems to be responsible for introducing
> this targeted autovectorization strategy, and AFAICT this was just done to
> minimize the impact elsewhere while optimizing page checksums. Are there
> fundamental problems with adding CFLAGS_VECTORIZE everywhere? Or is it
> just waiting on someone to do the analysis/benchmarking?
It's already the default for gcc 12 with -O2 (looking further in the
docs, it uses the "very-cheap" vectorization cost model), so it may be
worth investigating what the effect of that was. I can't quickly find
the equivalent info for clang.
That being the case, if the difference you found was real, it must
have been due to unrolling loops. What changed in the binary?
https://gcc.gnu.org/gcc-12/changes.html
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html