On Tue, Apr 23, 2013 at 11:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-04-23 00:17:28 -0700, Jeff Davis wrote:
>> + # important optimization flags for checksum.c
>> + ifeq ($(GCC),yes)
>> + checksum.o: CFLAGS += -msse4.1 -funroll-loops -ftree-vectorize
>> + endif
>
> I am pretty sure we can't do those unconditionally:
> - -funroll-loops and -ftree-vectorize weren't always part of gcc afair,
> so we would need a configure check for those
-funroll-loops is available from at least GCC 2.95. -ftree-vectorize
is GCC 4.0+. From what I read from the documentation on ICC -axSSE4.1
should generate a plain and accelerated version and do a runtime
check., I don't know if ICC vectorizes the specific loop in the patch,
but I would expect it to given that Intels vectorization has generally
been better than GCCs and the loop is about as simple as it gets. I
don't know the relevant options for other compilers.
> - SSE4.1 looks like a total no-go, its not available everywhere. We
> *can* add runtime detection of that with gcc fairly easily and
> one-time if we wan't to go there (later?) using 'ifunc's, but that
> needs a fair amount of infrastructure work.
> - We can rely on SSE1/2 on amd64, but I think thats automatically
> enabled there.
This is why I initially went for the lower strength 16bit checksum
calculation - requiring only SSE2 would have made supporting the
vectorized version on amd64 trivial. By now my feeling is that it's
not prudent to compromise in quality to save some infrastructure
complexity. If we set a hypothetical VECTORIZATION_FLAGS variable at
configure time, the performance is still there for those who need it
and can afford CPU specific builds.
Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de