Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. |
Date | |
Msg-id | 8e909218-1965-4515-99e3-4bb5b625e004@app.fastmail.com Whole thread Raw |
In response to | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
|
List | pgsql-hackers |
On Tue, Jul 2, 2024, at 00:19, Dean Rasheed wrote: > I had a play with this, and came up with a slightly different way of > doing it that works for var2 of any size, as long as var1 is just 1 or > 2 digits. > > Repeating your benchmark where both numbers have up to 2 NBASE-digits, > this new approach was slightly faster: > ... > > (This was on an older Intel Core i9-9900K, so I'm not sure why all the > timings are faster. What compiler settings are you using?) Strange. I just did `./configure` with a --prefix. Compiler settings on my Intel Core i9-14900K machine: $ pg_config | grep -E '^(CC|CFLAGS|CPPFLAGS|LDFLAGS)' CC = gcc CPPFLAGS = -D_GNU_SOURCE CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing-fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -O2 CFLAGS_SL = -fPIC LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/home/joel/pg-dev/lib',--enable-new-dtags LDFLAGS_EX = LDFLAGS_SL = > The approach taken in this patch only uses 32-bit integers, so in > theory it could be extended to work for var1ndigits = 3, 4, or even > more, but the code would get increasingly complex, and I ran out of > steam at 2 digits. It might be worth trying though. > > Regards, > Dean > > Attachments: > * optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.txt Really nice! I've benchmarked your patch on my three machines with great results. I added a setseed() step, to make the benchmarks reproducible, shouldn't matter much since it should statistically average out, but I thought why not. CREATE TABLE bench_mul_var (num1 numeric, num2 numeric); SELECT setseed(0.12345); INSERT INTO bench_mul_var (num1, num2) SELECT random(0::numeric,1e8::numeric), random(0::numeric,1e8::numeric) FROM generate_series(1,1e8); \timing /* * Apple M3 Max */ SELECT SUM(num1*num2) FROM bench_mul_var; -- HEAD Time: 3622.342 ms (00:03.622) Time: 3029.786 ms (00:03.030) Time: 3046.497 ms (00:03.046) Time: 3035.910 ms (00:03.036) Time: 3034.073 ms (00:03.034) SELECT SUM(num1*num2) FROM bench_mul_var; -- optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.txt Time: 2484.685 ms (00:02.485) Time: 2478.341 ms (00:02.478) Time: 2494.397 ms (00:02.494) Time: 2470.987 ms (00:02.471) Time: 2490.215 ms (00:02.490) /* * Intel Core i9-14900K */ SELECT SUM(num1*num2) FROM bench_mul_var; -- HEAD Time: 2555.569 ms (00:02.556) Time: 2523.145 ms (00:02.523) Time: 2518.671 ms (00:02.519) Time: 2514.501 ms (00:02.515) Time: 2516.919 ms (00:02.517) SELECT SUM(num1*num2) FROM bench_mul_var; -- optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.txt Time: 2246.441 ms (00:02.246) Time: 2243.900 ms (00:02.244) Time: 2245.350 ms (00:02.245) Time: 2245.080 ms (00:02.245) Time: 2247.856 ms (00:02.248) /* * AMD Ryzen 9 7950X3D */ SELECT SUM(num1*num2) FROM bench_mul_var; -- HEAD Time: 3037.497 ms (00:03.037) Time: 3010.037 ms (00:03.010) Time: 3000.956 ms (00:03.001) Time: 2989.424 ms (00:02.989) Time: 2984.911 ms (00:02.985) SELECT SUM(num1*num2) FROM bench_mul_var; -- optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.txt Time: 2645.530 ms (00:02.646) Time: 2640.472 ms (00:02.640) Time: 2638.613 ms (00:02.639) Time: 2637.889 ms (00:02.638) Time: 2638.054 ms (00:02.638) /Joel
pgsql-hackers by date: