Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. |
Date | |
Msg-id | 32510d71-8c00-4896-9c11-5ea4b3c01ffc@app.fastmail.com Whole thread Raw |
In response to | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
|
List | pgsql-hackers |
On Wed, Jul 3, 2024, at 20:57, Dean Rasheed wrote: > Ah yes, I think I was looking at a newer version of the code where I'd > already fixed that bug. Unless you think there are still bugs in any > of the boundary checks, which is entirely possible. Ah, that explains it. And no, I can't find any other bugs in the boundary checks. > Ah, I can see what's going on. It's perhaps best illustrated with a > simple example. Suppose you are multiplying a 4-digit integer x by a > 2-digit integer y (both with dscale=0). Then the terms of the full > product computed by mul_var() or mul_var_small() would look something > like this: > > x0 x1 x2 x3 > * y0 y1 > --------------------------------- > x0*y0 x1*y0 x2*y0 x3*y0 > x0*y1 x1*y1 x2*y1 x3*y1 > > In the reduced-rscale case, it might perform a truncated computation, > computing just the first 3 columns (say), and discarding the last two > columns. Therefore it would skip the 3 rightmost digit products. > > However, in mul_var_int(), y0 and y1 have been combined into a single > integer equal to y0*NBASE+y1, and the terms of full product are > computed as follows: > > x0*(y0*NBASE+y1) x1*(y0*NBASE+y1) x2*(y0*NBASE+y1) x3*(y0*NBASE+y1) > > In the full product, that gives the same result, but if you follow the > same rule in the reduced-rscale case, skipping the last two terms, it > would actually discard 4 digit products, making it less accurate. > > That could be avoided by increasing maxdigits by 1 in mul_var_int() so > it would always be at least as accurate as it was before, but that > might not really be necessary. However, if we implemented > mul_var_int64() in the same way, it would be discarding much > higher-order digit products, and so we probably would have to increase > maxdigits to get sufficiently accurate results. But there's an even > bigger problem: the results would be different between platforms that > did and didn't have 128-bit integers, which I really don't like. We > could avoid that by not using it in reduced-rscale cases, but that > would involve another test condition. > > By contrast, mul_var_small() is intended to replicate the arithmetic > in mul_var() exactly (but in a different order) for all rscales. So if > you've found any cases where they give different results, that's a > bug. > > In light of that, it might be that mul_var_small() is the better > option, rather than mul_var_int(), but it'd be interesting to see how > they compare in terms of performance first. Thanks for explaining, very helpful. I agree on your reasoning about the pros and cons. Not sure yet which version I prefer. Let's see how it evolves. I've done some benchmarks. Haven't tested Intel and AMD yet, but this is what I get on my Apple M3 Max: -- -- varndigits=1 -- -- HEAD SELECT SUM(numeric_mul_head(var1,var2,0)) FROM bench_mul_var_var1ndigits_1; Time: 2976.896 ms (00:02.977) Time: 2984.759 ms (00:02.985) Time: 2970.364 ms (00:02.970) -- mul_var_int() patch SELECT SUM(numeric_mul_patch_int(var1,var2,0)) FROM bench_mul_var_var1ndigits_1; Time: 2790.227 ms (00:02.790) Time: 2786.338 ms (00:02.786) Time: 2784.957 ms (00:02.785) -- mul_var_small() patch SELECT SUM(numeric_mul_patch_small(var1,var2,0)) FROM bench_mul_var_var1ndigits_1; Time: 2770.211 ms (00:02.770) Time: 2760.685 ms (00:02.761) Time: 2773.221 ms (00:02.773) -- -- varndigits=2 -- -- HEAD SELECT SUM(numeric_mul_head(var1,var2,0)) FROM bench_mul_var_var1ndigits_2; Time: 3353.258 ms (00:03.353) Time: 3273.055 ms (00:03.273) Time: 3266.392 ms (00:03.266) -- mul_var_int() patch SELECT SUM(numeric_mul_patch_int(var1,var2,0)) FROM bench_mul_var_var1ndigits_2; Time: 2694.169 ms (00:02.694) Time: 2687.935 ms (00:02.688) Time: 2692.398 ms (00:02.692) -- mul_var_small() patch SELECT SUM(numeric_mul_patch_small(var1,var2,0)) FROM bench_mul_var_var1ndigits_2; Time: 2997.685 ms (00:02.998) Time: 2984.418 ms (00:02.984) Time: 2986.976 ms (00:02.987) -- -- varndigits=3 -- -- HEAD SELECT SUM(numeric_mul_head(var1,var2,0)) FROM bench_mul_var_var1ndigits_3; Time: 3471.391 ms (00:03.471) Time: 3384.114 ms (00:03.384) Time: 3387.031 ms (00:03.387) -- mul_var_int() patch SELECT SUM(numeric_mul_patch_int(var1,var2,0)) FROM bench_mul_var_var1ndigits_3; Time: 3384.428 ms (00:03.384) Time: 3398.044 ms (00:03.398) Time: 3393.727 ms (00:03.394) -- mul_var_small() patch SELECT SUM(numeric_mul_patch_small(var1,var2,0)) FROM bench_mul_var_var1ndigits_3; Time: 3100.567 ms (00:03.101) Time: 3114.225 ms (00:03.114) Time: 3116.137 ms (00:03.116) Interesting, mul_var_small() seems to be the winner for var1ndigits=3 and mul_var_int() to be the winner for var1ndigits=2, and about the same for var1ndigits=1. >> What can be said about mul_var()'s contract with regards to rscale? >> It's the number of decimal digits requested by the caller, and if not >> requesting full precision, then the decimal digits might not be accurate, >> but can something be said about how far off they can be? >> > > I wouldn't expect it to ever be off by more than 1, given that > MUL_GUARD_DIGITS = 2, which corresponds to 8 decimal digits, and the > number of digits in the smaller input (and hence the number of digit > products in each column) is limited to something like 16,000 NBASE > digits. OK, so then the cases I found where it was off by 2 for the mul_var_int() patch are unexpected? Regards, Joel
pgsql-hackers by date: