Some optimisations for numeric division - Mailing list pgsql-hackers

From Dean Rasheed
Subject Some optimisations for numeric division
Date
Msg-id CAEZATCVwsBi-ND-t82Cuuh1=8ee6jdOpzsmGN+CUZB6yjLg9jw@mail.gmail.com
Whole thread Raw
Responses Re: Some optimisations for numeric division
List pgsql-hackers
Attached are 3 small patches that improve the performance of numeric
division. Patch 0002 seems to have the biggest impact, but I think
they're all worth including, since they're quite simple changes, with
noticeable performance gains.


Patch 0001 vectorises the inner loop of div_var_fast(). This loop is
almost identical to the inner loop of mul_var(), which was vectorised
by commit 8870917623. The thing preventing the div_var_fast() loop
from being vectorised is the assignment to div[qi + i], and simply
replacing that with div_qi[i] where div_qi = &div[qi], in the same
style as mul_var(), fixes that.

One difference between this and the mul_var() code is that it is also
necessary to add an explicit cast to get the compiler to generate
matching output, and give the best results. This is because in
mul_var() the compiler recognises that var1digit is actually 16-bit,
rather than 32-bit, and so it doesn't need to multiply the high part,
but in div_var_fast() it's not obvious to the compiler that qdigit
also fits in 16 bits, hence the cast.

The actual performance gain is typically quite modest, since
div_var_fast() is always only a part of some more complex numeric
computation, but it can be seen in cases like raising numbers to
negative integer powers, e.g.:

CREATE TEMP TABLE num_test(x numeric);
SELECT setseed(0);
INSERT INTO num_test
  SELECT (SELECT ('1.'||string_agg((random()*9)::int::text, '')||x)::numeric
          FROM generate_series(1,100))
  FROM generate_series(1,100000) g(x);

SELECT sum(x^(-2)) FROM num_test;

Time: 112.949 ms  (HEAD)
Time: 98.537 ms   (with patch)


Patch 0002 simplifies the inner loop of div_var() (the guts of the
public-facing numeric division operator) by more closely combining the
multiplication and subtraction operations and folding the separate
"carry" and "borrow" variables into a single "borrow", as suggested by
the old code comment.

IMO, this makes the code easier to understand, as well as giving more
significant performance gains:

CREATE TEMP TABLE div_test(x numeric);
SELECT setseed(0);
INSERT INTO div_test
  SELECT (SELECT ('1.'||string_agg((random()*9)::int::text, ''))::numeric + x
          FROM generate_series(1,50))
  FROM generate_series(1,1000) g(x);

SELECT sum(x/y) FROM div_test t1(x), div_test t2(y);

Time: 1477.340 ms  (HEAD)
Time: 826.748 ms   (with patch)


Patch 0003 replaces some uses of div_var_fast() with div_var().
Specifically, when the divisor has just one or two base-NBASE digits,
div_var() is faster. This is especially true for 1-digit divisors, due
to the "fast path" code in div_var() to handle that. It's also just
about true for 2-digit divisors, and it occurs to me that that case
could potentially be optimised further with similar fast path code in
div_var(), but I haven't tried that yet.

One-digit divisors are quite common. For example, in the Taylor series
expansions in exp_var() and ln_var(), since the number of Taylor
series terms never exceeds a few hundred in practice. Also,
exp_var()'s argument reduction step divides by 2^ndiv2, which is
roughly 100 times the input, rounded up to a power of two, and valid
inputs are less than 6000, so this will always be one or two digits.

Testing this with a bunch of random exp() and ln() computations I saw
a speed-up of 15-20%, and it reduced the run time of the numeric-big
regression test by around 10%, which seems worth having.

Regards,
Dean

Attachment

pgsql-hackers by date:

Previous
From: Nitin Jadhav
Date:
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Next
From: Nitin Jadhav
Date:
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)