Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
Date
Msg-id 252c8b54-3962-4d0a-8479-43f429bd7015@app.fastmail.com
Whole thread Raw
In response to Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
List pgsql-hackers
On Tue, Jul 9, 2024, at 14:01, Dean Rasheed wrote:
> Before considering the other patches to optimise for larger inputs, I
> think it's worth optimising the existing mul_var() code as much as
> possible.
>
> One thing I noticed while testing the earlier patches on this thread
> was that they were significantly faster if they used unsigned integers
> rather than signed integers. I think the reason is that operations
> like "x / 10000" and "x % 10000" use fewer CPU instructions (on every
> platform, according to godbolt.org) if x is unsigned.
>
> In addition, this reduces the number of times the digit array needs to
> be renormalised, which seems to be the biggest factor.
>
> Another small optimisation that seems to be just about worthwhile is
> to pull the first digit of var1 out of the main loop, so that its
> contributions can be set directly in dig[], rather than being added to
> it. This allows palloc() to be used to allocate dig[], rather than
> palloc0(), and only requires the upper part of dig[] to be initialised
> to zeros, rather than all of it.

Nice, really smart!

> Together, these seem to give a decent speed-up:
>
>  NBASE digits |  HEAD rate    | patch rate
> --------------+---------------+---------------
>             5 | 5.8797105e+06 |  6.047134e+06
>            12 |  4.140017e+06 | 4.3429845e+06
>            25 | 2.5711072e+06 | 2.7530615e+06
>            50 | 1.0367389e+06 | 1.3370771e+06
>           100 |      367924.1 |     462437.38
>           250 |      77231.32 |     104593.95
>          2500 |     881.48694 |     1197.4739
>         15000 |     25.064987 |      32.78391
>
> The largest gains are above around 50 NBASE digits, where the time
> spent renormalising dig[] becomes significant.

I added some more ndigits test cases:

/*
 * Intel Core i9-14900K
 */

 NBASE digits |   HEAD rate    |   patch rate   | relative difference
--------------+----------------+----------------+---------------------
            1 |  4.7846890e+07 |  4.7846890e+07 | 0.00%
            2 |  4.9504950e+07 |  4.7393365e+07 | -4.27%
            3 |  4.0816327e+07 |  4.0983607e+07 | 0.41%
            4 |  4.1152263e+07 |  3.9370079e+07 | -4.33%
            5 |  2.2573363e+07 |  2.1978022e+07 | -2.64%
            6 |  2.1739130e+07 |  1.9646365e+07 | -9.63%
            7 |  1.6393443e+07 |  1.6339869e+07 | -0.33%
            8 |  1.6863406e+07 |  1.6778523e+07 | -0.50%
            9 |  1.5105740e+07 |  1.6420361e+07 | 8.70%
           10 |  1.3315579e+07 |  1.5527950e+07 | 16.61%
           11 |  1.2360939e+07 |  1.4124294e+07 | 14.27%
           12 |  1.1764706e+07 |  1.2836970e+07 | 9.11%
           13 |  1.0060362e+07 |  1.1820331e+07 | 17.49%
           14 |  9.0909091e+06 |  1.0000000e+07 | 10.00%
           15 |  7.6923077e+06 |  8.0000000e+06 | 4.00%
           16 |  9.0909091e+06 |  9.4339623e+06 | 3.77%
           17 |  7.2992701e+06 |  9.0909091e+06 | 24.55%
           18 |  7.0921986e+06 |  7.8125000e+06 | 10.16%
           19 |  6.5789474e+06 |  6.6666667e+06 | 1.33%
           20 |  6.2500000e+06 |  6.5789474e+06 | 5.26%
           21 |  5.8479532e+06 |  6.1728395e+06 | 5.56%
           22 |  5.5555556e+06 |  5.9880240e+06 | 7.78%
           24 |  5.2631579e+06 |  5.8823529e+06 | 11.76%
           25 |  5.2083333e+06 |  5.5555556e+06 | 6.67%
           26 |  4.7619048e+06 |  5.2631579e+06 | 10.53%
           27 |  4.5045045e+06 |  5.2083333e+06 | 15.63%
           28 |  4.4247788e+06 |  4.7619048e+06 | 7.62%
           29 |  4.1666667e+06 |  4.5454545e+06 | 9.09%
           30 |  4.0000000e+06 |  4.3478261e+06 | 8.70%
           31 |  3.4482759e+06 |  4.0000000e+06 | 16.00%
           32 |  3.9840637e+06 |  4.2016807e+06 | 5.46%
           50 |  2.0964361e+06 |  2.6595745e+06 | 26.86%
          100 | 666666.67      | 869565.22      | 30.43%
          250 | 142653.35      | 171526.59      | 20.24%
         2500 | 1642.04        | 2197.80        | 33.85%
        15000 | 41.67          | 52.63          | 26.32%
(36 rows)

/*
 * AMD Ryzen 9 7950X3D
 */

 NBASE digits |   HEAD rate    |   patch rate   | relative difference
--------------+----------------+----------------+---------------------
            1 |  3.6900369e+07 |  3.8022814e+07 | 3.04%
            2 |  3.4602076e+07 |  3.5714286e+07 | 3.21%
            3 |  2.8011204e+07 |  2.7777778e+07 | -0.83%
            4 |  2.7932961e+07 |  2.8328612e+07 | 1.42%
            5 |  1.6420361e+07 |  1.7123288e+07 | 4.28%
            6 |  1.4705882e+07 |  1.5313936e+07 | 4.13%
            7 |  1.3192612e+07 |  1.3888889e+07 | 5.28%
            8 |  1.2121212e+07 |  1.2919897e+07 | 6.59%
            9 |  1.1235955e+07 |  1.2135922e+07 | 8.01%
           10 |  1.0000000e+07 |  1.1312217e+07 | 13.12%
           11 |  9.0909091e+06 |  1.0000000e+07 | 10.00%
           12 |  8.1967213e+06 |  8.4033613e+06 | 2.52%
           13 |  7.2463768e+06 |  7.7519380e+06 | 6.98%
           14 |  6.7567568e+06 |  7.1428571e+06 | 5.71%
           15 |  5.5555556e+06 |  5.8823529e+06 | 5.88%
           16 |  6.3291139e+06 |  5.7803468e+06 | -8.67%
           17 |  5.8823529e+06 |  5.9880240e+06 | 1.80%
           18 |  5.5555556e+06 |  5.7142857e+06 | 2.86%
           19 |  5.2356021e+06 |  5.6179775e+06 | 7.30%
           20 |  4.9019608e+06 |  5.1020408e+06 | 4.08%
           21 |  4.5454545e+06 |  4.8543689e+06 | 6.80%
           22 |  4.1841004e+06 |  4.5871560e+06 | 9.63%
           24 |  4.4642857e+06 |  4.4052863e+06 | -1.32%
           25 |  4.1666667e+06 |  4.2194093e+06 | 1.27%
           26 |  4.0000000e+06 |  3.9525692e+06 | -1.19%
           27 |  3.8461538e+06 |  3.8022814e+06 | -1.14%
           28 |  3.9062500e+06 |  3.8759690e+06 | -0.78%
           29 |  3.7878788e+06 |  3.8022814e+06 | 0.38%
           30 |  3.3898305e+06 |  3.7174721e+06 | 9.67%
           31 |  2.7472527e+06 |  2.8571429e+06 | 4.00%
           32 |  3.0395137e+06 |  3.1446541e+06 | 3.46%
           50 |  1.7094017e+06 |  2.0576132e+06 | 20.37%
          100 | 518134.72      | 609756.10      | 17.68%
          250 | 108577.63      | 136612.02      | 25.82%
         2500 | 1264.22        | 1610.31        | 27.38%
        15000 | 34.48          | 43.48          | 26.09%
(36 rows)

/*
 * Apple M3 Max
 */

 NBASE digits |   HEAD rate    |   patch rate   | relative difference
--------------+----------------+----------------+---------------------
            1 |  4.9504950e+07 |  4.9504950e+07 | 0.00%
            2 |  6.1349693e+07 |  5.9171598e+07 | -3.55%
            3 |  5.2631579e+07 |  5.2083333e+07 | -1.04%
            4 |  4.5248869e+07 |  4.5248869e+07 | 0.00%
            5 |  2.1978022e+07 |  2.2727273e+07 | 3.41%
            6 |  1.9920319e+07 |  2.0876827e+07 | 4.80%
            7 |  1.7182131e+07 |  1.8018018e+07 | 4.86%
            8 |  1.5822785e+07 |  1.6051364e+07 | 1.44%
            9 |  1.3368984e+07 |  1.3333333e+07 | -0.27%
           10 |  1.1709602e+07 |  1.1627907e+07 | -0.70%
           11 |  1.0020040e+07 |  1.0526316e+07 | 5.05%
           12 |  9.0909091e+06 |  9.0909091e+06 | 0.00%
           13 |  8.2644628e+06 |  8.2644628e+06 | 0.00%
           14 |  7.6923077e+06 |  7.6335878e+06 | -0.76%
           15 |  7.1428571e+06 |  7.0921986e+06 | -0.71%
           16 |  6.6225166e+06 |  6.5789474e+06 | -0.66%
           17 |  5.9880240e+06 |  6.2111801e+06 | 3.73%
           18 |  5.7803468e+06 |  5.5865922e+06 | -3.35%
           19 |  5.2631579e+06 |  5.2356021e+06 | -0.52%
           20 |  4.6296296e+06 |  4.8543689e+06 | 4.85%
           21 |  4.4444444e+06 |  4.3859649e+06 | -1.32%
           22 |  4.2016807e+06 |  4.0485830e+06 | -3.64%
           24 |  3.7453184e+06 |  3.5714286e+06 | -4.64%
           25 |  3.4843206e+06 |  3.4013605e+06 | -2.38%
           26 |  3.2786885e+06 |  3.2786885e+06 | 0.00%
           27 |  3.0674847e+06 |  3.1055901e+06 | 1.24%
           28 |  2.8818444e+06 |  2.9069767e+06 | 0.87%
           29 |  2.7322404e+06 |  2.7700831e+06 | 1.39%
           30 |  2.5839793e+06 |  2.6246719e+06 | 1.57%
           31 |  2.5062657e+06 |  2.4630542e+06 | -1.72%
           32 |  4.5871560e+06 |  4.6082949e+06 | 0.46%
           50 |  1.7513135e+06 |  1.9880716e+06 | 13.52%
          100 | 714285.71      | 833333.33      | 16.67%
          250 | 124223.60      | 149925.04      | 20.69%
         2500 | 1440.92        | 1760.56        | 22.18%
        15000 | 39.53          | 48.08          | 21.63%
(36 rows)

Regards,
Joel



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: jsonpath: Inconsistency of timestamp_tz() Output
Next
From: "David E. Wheeler"
Date:
Subject: Re: jsonpath: Inconsistency of timestamp_tz() Output