Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> On Mon, 7 Sep 2020 at 11:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, poking at this further, it seems that the patch only really
>> works for gcc. clang accepts the -ftree-vectorize switch, but
>> looking at the generated asm shows that it does nothing useful.
>> Which is odd, because clang does do loop vectorization.
> Hmm, yeah that's unfortunate. My guess is that the compiler would do
> vectorization only if 'i' is a constant, which is not true for our
> case.
No, they claim to handle variable trip counts, per
https://llvm.org/docs/Vectorizers.html#loops-with-unknown-trip-count
I experimented with a few different ideas such as adding restrict
decoration to the pointers, and eventually found that what works
is to write the loop termination condition as "i2 < limit"
rather than "i2 <= limit". It took me a long time to think of
trying that, because it seemed ridiculously stupid. But it works.
regards, tom lane