Re: Auto-vectorization speeds up multiplication of large-precision numerics - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: Auto-vectorization speeds up multiplication of large-precision numerics
Date
Msg-id CAJ3gD9e+j+DT1pWZDEk3Ou56=qVThH4TeJUwrTYNGv2LD57uew@mail.gmail.com
Whole thread Raw
In response to Re: Auto-vectorization speeds up multiplication of large-precision numerics  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Auto-vectorization speeds up multiplication of large-precision numerics
List pgsql-hackers
On Fri, 10 Jul 2020 at 19:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > We normally don't compile with -O3, so very few users would get the
> > benefit of this.
>
> Yeah.  I don't think changing that baseline globally would be a wise move.
>
> > We have CFLAGS_VECTOR for the checksum code.  I
> > suppose if we are making the numeric code vectorizable as well, we
> > should apply this there also.
>
> > I think we need a bit of a policy decision from the group here.
>
> I'd vote in favor of applying CFLAGS_VECTOR to specific source files
> that can benefit.  We already have experience with that and we've not
> detected any destabilization potential.

I tried this in utils/adt/Makefile :
+
+numeric.o: CFLAGS += ${CFLAGS_VECTOR}
+
and it works.

CFLAGS_VECTOR also includes the -funroll-loops option, which I
believe, had showed improvements in the checksum.c runs ( [1] ). This
option makes the object file a bit bigger. For numeric.o, it's size
increased by 15K; from 116672 to 131360 bytes. I ran the
multiplication test, and didn't see any additional speed-up with this
option. Also, it does not seem to be related to vectorization. So I
was thinking of splitting the CFLAGS_VECTOR into CFLAGS_VECTOR and
CFLAGS_UNROLL_LOOPS. Checksum.c can use both these flags, and
numeric.c can use only CFLAGS_VECTOR.

I was also wondering if it's worth to extract only the code that we
think can be optimized and keep it in a separate file (say
numeric_vectorize.c or adt_vectorize.c, which can have mul_var() to
start with), and use this file as a collection of all such code in the
adt module, and then we can add such files into other modules as and
when needed. For numeric.c, there can be already some scope for
auto-vectorizations in other similar regions in that file, so we don't
require a separate numeric_vectorize.c and just pass the CFLAGS_VECTOR
flag for this file itself.


[1]
https://www.postgresql.org/message-id/flat/CA%2BU5nML8JYeGqM-k4eEwNJi5H%3DU57oPLBsBDoZUv4cfcmdnpUA%40mail.gmail.com#2ec419817ff429588dd1229fb663080e

-- 
Thanks,
-Amit Khandekar
Huawei Technologies



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: possibility to read dumped table's name from file
Next
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions