Re: Auto-vectorization speeds up multiplication of large-precision numerics - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: Auto-vectorization speeds up multiplication of large-precision numerics
Date
Msg-id CAJ3gD9e=X=oC+R2n7istwZ3Qfh3EHsQ=c2iH8uGyRoAujH=4Sw@mail.gmail.com
Whole thread Raw
In response to Re: Auto-vectorization speeds up multiplication of large-precision numerics  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Auto-vectorization speeds up multiplication of large-precision numerics  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, 8 Sep 2020 at 02:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > I experimented with a few different ideas such as adding restrict
> > decoration to the pointers, and eventually found that what works
> > is to write the loop termination condition as "i2 < limit"
> > rather than "i2 <= limit".  It took me a long time to think of
> > trying that, because it seemed ridiculously stupid.  But it works.

Ah ok.

I checked the "Auto-Vectorization in LLVM" link that you shared. All
the examples use "< n" or "> n". None of them use "<= n". Looks like a
hidden restriction.

>
> I've done more testing and confirmed that both gcc and clang can
> vectorize the improved loop on aarch64 as well as x86_64.  (clang's
> results can be confusing because -ftree-vectorize doesn't seem to
> have any effect: its vectorizer is on by default.  But if you use
> -fno-vectorize it'll go back to the old, slower code.)
>
> The only buildfarm effect I've noticed is that locust and
> prairiedog, which are using nearly the same ancient gcc version,
> complain
>
> c1: warning: -ftree-vectorize enables strict aliasing. -fno-strict-aliasing is ignored when Auto Vectorization is
used.
>
> which is expected (they say the same for checksum.c), but then
> there are a bunch of
>
> warning: dereferencing type-punned pointer will break strict-aliasing rules
>
> which seems worrisome.  (This sort of thing is the reason I'm
> hesitant to apply higher optimization levels across the board.)
> Both animals pass the regression tests anyway, but if any other
> compilers treat -ftree-vectorize as an excuse to apply stricter
> optimization assumptions, we could be in for trouble.
>
> I looked closer and saw that all of those warnings are about
> init_var(), and this change makes them go away:
>
> -#define init_var(v)        MemSetAligned(v, 0, sizeof(NumericVar))
> +#define init_var(v)        memset(v, 0, sizeof(NumericVar))
>
> I'm a little inclined to commit that as future-proofing.  It's
> essentially reversing out a micro-optimization I made in d72f6c750.
> I doubt I had hard evidence that it made any noticeable difference;
> and even if it did back then, modern compilers probably prefer the
> memset approach.

Thanks. I must admit it did not occur to me that I could have very
well installed clang on my linux machine and tried compiling this
file, or tested with some older gcc versions. I think I was using gcc
8. Do you know what was the gcc compiler version that gave these
warnings ?

-- 
Thanks,
-Amit Khandekar
Huawei Technologies



pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist