Home > mailing lists

Re: Optimize numeric.c mul_var() using the Karatsuba algorithm - Mailing list pgsql-hackers

From	Dean Rasheed
Subject	Re: Optimize numeric.c mul_var() using the Karatsuba algorithm
Date	July 2, 2024 09:42:06
Msg-id	CAEZATCW9hzVVfoH8-OZL4+-q6PD-ztEkJdxuzC6kBY3f+RhJEA@mail.gmail.com Whole thread Raw
In response to	Re: Optimize numeric.c mul_var() using the Karatsuba algorithm ("Joel Jacobson" <joel@compiler.org>)
List	pgsql-hackers

Tree view

On Sun, 30 Jun 2024 at 11:22, Joel Jacobson <joel@compiler.org> wrote:
>
> On Sat, Jun 29, 2024, at 14:22, Dean Rasheed wrote:
>
> > But why just split it into two pieces? That will just lead
> > to a lot of unnecessary recursion for very unbalanced inputs. Instead,
> > why not split the longer input into N roughly equal sized pieces, each
> > around the same length as the shorter input, multiplying and adding
> > them at the appropriate offsets?
>
> The approach you're describing is implemented by e.g. CPython
> and is called "lopsided" in their code base. It has some different
> performance characteristics, compared to the recursive Half-Karatsuba
> approach.
>
> What I didn't like about lopsided is the degenerate case where the
> last chunk is much shorter than the var1, for example, if we pretend
> we would be doing Karatsuba all the way down to ndigits 2,
> and think about the example var1ndigits = 3 and var2ndigits = 10,
> then lopsided would do
> var1ndigits=3 var2ndigits=3
> var1ndigits=3 var2ndigits=3
> var1ndigits=3 var2ndigits=3
> var1ndigits=3 var2ndigits=1
>
> whereas Half-Karatsuba would do
> var1ndigits=3 var2ndigits=5
> var1ndigits=3 var2ndigits=5
>
> You can find contrary examples too of course where lopsided
> is better than Half-Karatsuba, none of them seem substantially better
> than the other.

Actually, that wasn't quite what I was thinking of. The approach I've
seen (I can't remember where) is to split var2 into roughly
equal-sized chunks as follows:

If var2ndigits >= 2 * var1ndigits, split it into a number chunks where

  nchunks = var2ndigits / var1ndigits

using integer division with truncation. Then call mul_var_chunks()
(say), passing it nchunks, to perform the multiplication using that
many chunks.

mul_var_chunks() would use nchunks to decides on the chunk size according to

  chunk_size = var2ndigits / nchunks

again using integer division with truncation. That has a remainder in
the range [0, nchunks), which is divided up between the chunks so that
some of them end up being chunk_size + 1 digits. I.e., something like:

  chunk_remainder = var2ndigits - chunk_size * nchunks

  for (int start = 0, end = chunk_size;
       start < var2ndigits;
       start = end, end = start + chunk_size)
  {
    /* Distribute remainder over the first few chunks */
    if (chunk_remainder > 0)
    {
      end++;
      chunk_remainder--;
    }

    /* Process chunk between "start" and "end" */
  }

What this does is adapt the shape of the chunks to the region where
the Karatsuba algorithm is most efficient. For example, suppose
var1ndigits = 1000. Then:

For 2000x1000 to 2999x1000 digit products, nchunks will be 2 and
chunk_size will be at most (2999/2)+1 = 1500.

For 3000x1000 to 3999x1000 digit products, nchunks will be 3 and
chunk_size will be at most (3999/3)+1 = 1334.

And so on.

The result is that all the chunks will fall into that region where
var2ndigits / var1ndigits is between 1 and 1.5, for which
KARATSUBA_LOW_RANGE_CONDITION() will almost certainly pass, and
Karatsuba will operate efficiently all the way down to
KARATSUBA_BASE_LIMIT.

Regards,
Dean

pgsql-hackers by date:

From: vignesh C
Date: 02 July 2024, 09:39:59
Subject: Re: Logical Replication of sequences

From: Daniel Gustafsson
Date: 02 July 2024, 09:44:07
Subject: Re: Avoid incomplete copy string (src/backend/access/transam/xlog.c)

Re: Optimize numeric.c mul_var() using the Karatsuba algorithm - Mailing list pgsql-hackers

Previous

Next