Re: Auto-vectorization speeds up multiplication of large-precision numerics - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Auto-vectorization speeds up multiplication of large-precision numerics
Date
Msg-id 1694682.1599494835@sss.pgh.pa.us
Whole thread Raw
In response to Re: Auto-vectorization speeds up multiplication of large-precision numerics  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: Auto-vectorization speeds up multiplication of large-precision numerics  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Amit Khandekar <amitdkhan.pg@gmail.com> writes:
> On Mon, 7 Sep 2020 at 11:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, poking at this further, it seems that the patch only really
>> works for gcc.  clang accepts the -ftree-vectorize switch, but
>> looking at the generated asm shows that it does nothing useful.
>> Which is odd, because clang does do loop vectorization.

> Hmm, yeah that's unfortunate. My guess is that the compiler would do
> vectorization only if 'i' is a constant, which is not true for our
> case.

No, they claim to handle variable trip counts, per

https://llvm.org/docs/Vectorizers.html#loops-with-unknown-trip-count

I experimented with a few different ideas such as adding restrict
decoration to the pointers, and eventually found that what works
is to write the loop termination condition as "i2 < limit"
rather than "i2 <= limit".  It took me a long time to think of
trying that, because it seemed ridiculously stupid.  But it works.

            regards, tom lane



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: factorial function/phase out postfix operators?
Next
From: Konstantin Knizhnik
Date:
Subject: Re: Improving connection scalability: GetSnapshotData()