Re: Inaccurate results from numeric ln(), log(), exp() and pow() - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: Inaccurate results from numeric ln(), log(), exp() and pow()
Date
Msg-id CAEZATCVit3zjGinEits1ZTmCkdGvXxbbSqJeUXZJYD5HDrtwxg@mail.gmail.com
Whole thread Raw
In response to Re: Inaccurate results from numeric ln(), log(), exp() and pow()  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: Inaccurate results from numeric ln(), log(), exp() and pow()  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 16 September 2015 at 15:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, in that particular example I'd happily take the 27ms time to get
> the more accurate answer.  If it were 270ms, maybe not.  I think my
> initial reaction to this patch is "are there any cases where it makes
> things 100x slower ... especially for non-outrageous inputs?"  If not,
> sure, let's go for more accuracy.
>

On 16 September 2015 at 17:03, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> I'll try to do some more comprehensive performance testing over the
> next few days.
>

I've done some more performance testing, and the results are broadly
in line with my initial expectations. There are a couple of cases
where pow() with non-integer powers is hundreds of times slower. This
happens when inputs with small numbers of significant digits generate
results with thousands of digits, that the code in HEAD doesn't
calculate accurately. However, there do not appear to be any cases
where this happens for "non-outrageous" inputs.

There are also cases where the new code is hundreds or even thousands
of times faster, mainly due to it making better choices for the local
rscale, and the reduced use of sqrt() in ln_var().

I wrote a script to test each function with a range of inputs, some
straightforward, and some intentionally difficult to compute. Attached
is the script's output. The columns in the output are:

* Function being called.
* The input(s) passed to it.
* Number of significant digits in the inputs (not counting trailing zeros).
* Number of significant digits in the output (HEAD vs patched code).
* Number of output digits on the right that differ between the two.
* Average function call time in HEAD.
* Average function call time with the patch.
* How many times faster or slower the patched code is.

There is a huge spread of function call times, both before and after
the patch, and the overall performance profile has changed
significantly, but in general the patched code is faster more often
than it is slower, especially for "non-outrageous" inputs.

All the cases where it is significantly slower are when the result is
significantly more accurate, but it isn't always slower to generate
more accurate results.


These results are based on the attached, updated patch which includes
a few minor improvements. The main changes are:

* In mul_var() instead of just ripping out the faulty input truncation
code, I've now replaced it with code that correctly truncates the
inputs as much as possible when the exact answer isn't required. This
produces a noticeable speedup in a number of cases. For example it
reduced the time to compute exp(5999.999) from 27ms to 20ms.

* Also in mul_var(), the simple measure of swapping the inputs so that
var1 is always the number with fewer digits, produces a worthwhile
benefit. This further reduced the time to compute exp(5999.999) to
17ms.

There's more that could be done to improve multiplication performance,
but I think that's out of scope for this patch.

Regards,
Dean

Attachment

pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: jsonb_set array append hack?
Next
From: Andreas Seltenreich
Date:
Subject: RemoveLocalLock pfree'ing NULL when out-of-memory