Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. |
Date | |
Msg-id | 58e5e7d2-7ad8-40b4-9b76-a5c3049346e5@app.fastmail.com Whole thread Raw |
In response to | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
|
List | pgsql-hackers |
On Fri, Jul 5, 2024, at 17:41, Dean Rasheed wrote: > On Fri, 5 Jul 2024 at 12:56, Joel Jacobson <joel@compiler.org> wrote: >> >> Interesting you got so bad bench results for v6-mul_var_int64.patch >> for var1ndigits=4, that patch is actually the winner on AMD Ryzen 9 7950X3D. > > Interesting. I remeasured just to be sure, and yes, it was the winner among the previous patches, but the new v7 beats it. >> On Intel Core i9-14900K the winner is v6-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch. > > That must be random noise, since > v6-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch doesn't > invoke mul_var_small() for 4-digit inputs. Yes, something was off with the HEAD measurements for that one, I remeasured and then got almost identical results (as expected) between HEAD and v6-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch for 4-digit inputs. >> On Apple M3 Max, HEAD is the winner. > > Importantly, mul_var_int64() is around 1.25x slower there, and it was > even worse on my machine. > > Attached is a v7 mul_var_small() patch adding 4-digit support. For me, > this gives a nice speedup: > > SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; > Time: 5617.150 ms (00:05.617) -- HEAD > Time: 8203.081 ms (00:08.203) -- v6-mul_var_int64.patch > Time: 4750.212 ms (00:04.750) -- v7-mul_var_small.patch > > The other advantage, of course, is that it doesn't require 128-bit > integer support. Very nice, v7-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch is now the winner on all my CPUs: -- Apple M3 Max SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; -- HEAD Time: 3574.865 ms (00:03.575) Time: 3573.678 ms (00:03.574) Time: 3576.953 ms (00:03.577) Time: 3580.536 ms (00:03.581) Time: 3589.007 ms (00:03.589) SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; -- v7-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch Time: 3110.171 ms (00:03.110) Time: 3098.558 ms (00:03.099) Time: 3105.873 ms (00:03.106) Time: 3104.484 ms (00:03.104) Time: 3109.035 ms (00:03.109) -- Intel Core i9-14900K SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; -- HEAD Time: 3751.767 ms (00:03.752) Time: 3745.916 ms (00:03.746) Time: 3742.542 ms (00:03.743) Time: 3746.139 ms (00:03.746) Time: 3745.493 ms (00:03.745) SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; -- v6-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch Time: 3747.640 ms (00:03.748) Time: 3747.231 ms (00:03.747) Time: 3747.965 ms (00:03.748) Time: 3748.309 ms (00:03.748) Time: 3746.498 ms (00:03.746) SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; -- v7-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch Time: 3417.924 ms (00:03.418) Time: 3417.088 ms (00:03.417) Time: 3415.708 ms (00:03.416) Time: 3415.453 ms (00:03.415) Time: 3419.566 ms (00:03.420) -- AMD Ryzen 9 7950X3D SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; -- HEAD Time: 3970.131 ms (00:03.970) Time: 3924.335 ms (00:03.924) Time: 3927.863 ms (00:03.928) Time: 3924.761 ms (00:03.925) Time: 3926.290 ms (00:03.926) SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; -- v6-add-mul_var_int64.patch Time: 3874.769 ms (00:03.875) Time: 3858.071 ms (00:03.858) Time: 3836.698 ms (00:03.837) Time: 3871.388 ms (00:03.871) Time: 3844.907 ms (00:03.845) SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4; -- v7-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch Time: 3397.846 ms (00:03.398) Time: 3398.050 ms (00:03.398) Time: 3395.279 ms (00:03.395) Time: 3393.285 ms (00:03.393) Time: 3402.570 ms (00:03.403) Code wise I think it's now very nice and clear, with just enough comments. Also nice to see that the var1ndigits=4 case isn't much more complex than var1ndigits=3, since it follows the same pattern. Regards, Joel
pgsql-hackers by date: