Thread: Review: Patch for phypot - Pygmy Hippotause

Review: Patch for phypot - Pygmy Hippotause

From
Andrew Geery
Date:
This is a review of the phypot - Pygmy Hippotause patch:
http://archives.postgresql.org/message-id/4A9897E1.8090703@netspace.net.au
submitted by Paul Matthews.

Contents & Purpose
==================
The purpose of the patch is to compute a hypotenuse with higher
precision than the current implementation (the HYPOT macro in
src/include/utils/geo_decls.h).  The initial impetus for the patch
goes back to this message
[http://archives.postgresql.org/pgsql-hackers/2009-08/msg01579.php].
The new phypot function (in src/backend/utils/adt/geo_ops.c) is well
documented in the function header comments and matches the discussion
on the wikipedia page [http://en.wikipedia.org/wiki/Hypot].  It is
envisioned that the new phypot function will eventually be replaced by
the standard C99 hypot function.  This message
[http://archives.postgresql.org/pgsql-hackers/2009-08/msg01580.php]
discusses why the standard c99 hypot function can't be used
(PostgreSQL targets c89, not c99 -- although other messages in the
thread make it sound like the hypot function is ubiquitous).

Initial Run
===========
The patch is in context diff form and applies cleanly to the current CVS HEAD.
There are no tests.
There is no documentation, outside of the code comments, as this is an
internal function.

A couple of nitpicking items:
(1) the phypot function is declared as static, but it is not defined that way
(2) to better match the style of the rest of the geo_ops.c file:
        (a) put the function return type on its own line
        (b) don't put spaces after a "(" and before a ")" [e.g.,
if-statements, function declaration]
        (c) put a space between the keyword "if" and the opening "("
        (d) put spaces around arithmetic operators

Performance
===========
The two concerns about the patch in the mail archives
[http://archives.postgresql.org/message-id/4B83EE31020000250002F55E@gw.wicourts.gov]
are that
(1) since there is more logic in the new function than in the original
marco, it might be slower; and
(2) despite the fact that it is a better implementation of the
hypotenuse functionality, it might break existing applications which
are depending on the existing computation

For (1), I wrote a small c program that executed the original HYPOT
macro in a loop 100 million times and I did the same with the new
phypot function.  The new phypot function is, on average, about twice
as slow as the original HYPOT macro.  The HYPOT macro executed 100
million times in 11 seconds and the phypot function executed the same
number of times in 22 seconds.  With both -O2 and -O3, the HYPOT macro
executed in 8 seconds and the phypot in 18.

For (2), I wrote a small c program that executed the original HYPOT
macro and the new phypot function in a loop 100 million times on
random numbers and compared at what precision the two calculations
started to differ.  I found that the difference in the two
calculations were always less than 0.000001.  However, about a third
of the calculations differed at one more magnitude of precision (that
is, there were differences in the calculations that were greater than
0.0000001).

Conclusion
==========
I like that the patch provides greater precision.  However, I am
unclear as to how significant the slow down is in the new function
(it's certainly not very significant for small iterations), nor how
significant the difference in the calculations is between the existing
macro and the new function.

Thanks
Andrew


Re: Review: Patch for phypot - Pygmy Hippotause

From
"Kevin Grittner"
Date:
Andrew Geery <andrew.geery@gmail.com> wrote:
> The HYPOT macro executed 100 million times in 11 seconds and the
> phypot function executed the same number of times in 22 seconds.
Or, to put that another way, the new function adds 110 nanoseconds
to each hypotenuse calculation.
> With both -O2 and -O3, the HYPOT macro executed in 8 seconds and
> the phypot in 18.
Or, with these optimizations, it adds 100 nanoseconds to each
hypotenuse calculation.
I don't know about others, but that seems a reasonable price to pay
to eliminate all possible underflows, all overflows except where the
result doesn't fit in the result data type, and make NaN and Inf
processing more standard compliant.
> I found that the difference in the two calculations were always
> less than 0.000001.  However, about a third of the calculations
> differed at one more magnitude of precision (that is, there were
> differences in the calculations that were greater than 0.0000001).
That's harder for me to evaluate in terms of whether it's
acceptable.  It *is* an *approximate* data type, and the differences
are all less than 0.0001%; however, that seems like a pretty weak
guarantee if it's making the answer less accurate.  If we could take
some of these cases with relatively large differences and see which
of the calculations is more accurate, that might help make a
decision.  I'm not sure which technique would tend to be more
accurate.  Since they're algebraically equivalent, and what we're
using now pushes toward underflow and overflow more readily, it
seems possible that the differences will generally reflect a greater
accuracy in the patch's technique.
-Kevin


Re: Review: Patch for phypot - Pygmy Hippotause

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Andrew Geery <andrew.geery@gmail.com> wrote:
>> I found that the difference in the two calculations were always
>> less than 0.000001.  However, about a third of the calculations
>> differed at one more magnitude of precision (that is, there were
>> differences in the calculations that were greater than 0.0000001).
> That's harder for me to evaluate in terms of whether it's
> acceptable.  It *is* an *approximate* data type, and the differences
> are all less than 0.0001%; however, that seems like a pretty weak
> guarantee if it's making the answer less accurate.  If we could take
> some of these cases with relatively large differences and see which
> of the calculations is more accurate, that might help make a
> decision.  I'm not sure which technique would tend to be more
> accurate.  Since they're algebraically equivalent, and what we're
> using now pushes toward underflow and overflow more readily, it
> seems possible that the differences will generally reflect a greater
> accuracy in the patch's technique.

Hm ... it's been twenty years since I did any serious numerical analysis
hacking, but ... offhand this looks to me like it's about as accurate as
the straightforward way, not really better or worse.  Ignoring
overflow/underflow, the basic knock on the naive expression is that if x
is much bigger than y, you lose most or all of the significant digits in
y when you add their squares.  For instance, if x is 1e8 * y, then y*y
fails to affect the sum at all (given typical float8 arithmetic), and
you'll get back sqrt(x*x) even though y should have been able to affect
the result at the 8th place or so.  In the patch's calculation, y/x is
computed accurately but then we'll lose the same precision when we form
1 + yx*yx --- the result will be just 1 if y is lots smaller than x.

If we were feeling tense about this, we could look for an alternate way
of calculating sqrt(1 + yx*yx) that doesn't lose so much accuracy.
In principle I think that's doable since this expression is related to
ln(1+x) which can be calculated accurately even for very small x.
I'm not convinced that it's worth troubling over though, seeing that no
real attention has been paid to numerical stability anywhere else in the
geometric functions.

I think the patch is good in principle; what could be improved about it
is:

1. It should just redefine HYPOT(x,y) as pg_hypot(x,y), rather than
touching all the call sites --- not to mention possibly breaking
third-party code depending on the HYPOT macro.  (That possibility
also leads me to think that the function shouldn't be static, but
should be exported in geo_decls.h.)

2. The comments in the new function leave something to be desired, eg
the discussion of the zero case is as clear as mud IMO.

BTW, the function comment claims that SUS requires a NAN result for
hypot(NAN,INF), but I don't believe it.  If it did say that it would
be contrary to ISO C:
        -- hypot(x,  y) returns +INF if x is infinite, even if y is a           NaN.

Anyway, if you read SUS' HUGE_VAL as meaning INF, that clause comes
before the one about NAN so I think it's saying the same thing as the
other standards.
        regards, tom lane


Re: Review: Patch for phypot - Pygmy Hippotause

From
Dean Rasheed
Date:
On 17 July 2010 20:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ...  For instance, if x is 1e8 * y, then y*y
> fails to affect the sum at all (given typical float8 arithmetic), and
> you'll get back sqrt(x*x) even though y should have been able to affect
> the result at the 8th place or so.  In the patch's calculation, y/x is
> computed accurately but then we'll lose the same precision when we form
> 1 + yx*yx --- the result will be just 1 if y is lots smaller than x.
>

No. If x is 1e8 * y, then y will only affect the result in the 16th
place. You can see this if you do a simple series expansion:

sqrt(1+yx^2) = 1 + 1/2 yx^2 + O(yx^4)

> If we were feeling tense about this, we could look for an alternate way
> of calculating sqrt(1 + yx*yx) that doesn't lose so much accuracy.
> In principle I think that's doable since this expression is related to
> ln(1+x) which can be calculated accurately even for very small x.

This algorithm is about as accurate as it could possibly be. The point
with ln(1+x) is that for small x:

ln(1+x) = x + O(x^2)

so you would loose precision if x were much smaller than 1. This is
not the case with sqrt(1+x).

For most cases, the new algorithm is no more accurate than the old
one. The exception is when *both* x and y are very small. In this
case, it protects against incorrect underflows to 0.

Regards,
Dean


Re: Review: Patch for phypot - Pygmy Hippotause

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> No. If x is 1e8 * y, then y will only affect the result in the 16th
> place. You can see this if you do a simple series expansion:

> sqrt(1+yx^2) = 1 + 1/2 yx^2 + O(yx^4)

Sigh, I went looking for that expansion yesterday and didn't find it.
Should've tried harder.  I was relying on a gut feeling that it would
behave approximately like ln(1+x).

> For most cases, the new algorithm is no more accurate than the old
> one. The exception is when *both* x and y are very small. In this
> case, it protects against incorrect underflows to 0.

Yeah, I think you're right.
        regards, tom lane


Re: Review: Patch for phypot - Pygmy Hippotause

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think the patch is good in principle

Since everyone seems to agree this is a good patch which needed
minor tweaks, and we haven't heard from the author, I just went
ahead and made the changes.

Andrew, could you take another look and see if you agree I've
covered it all before it gets marked ready for a committer?

-Kevin

Attachment

Re: Review: Patch for phypot - Pygmy Hippotause

From
Andrew Geery
Date:
It looks good to me: (0) new patch applies cleanly to CVS HEAD; (1)
the formating of the code was changed; (2) definition of the HYPOT
macro was changed to use phypot rather than being removed; (3) the
phypot function was declared to be extern; (4) the comments to the
phypot function were changed to remove the reference about the SUS
behavior.

I changed the status to "Ready for Committer".

Thanks
Andrew

On Fri, Jul 23, 2010 at 4:01 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> I think the patch is good in principle
>
> Since everyone seems to agree this is a good patch which needed
> minor tweaks, and we haven't heard from the author, I just went
> ahead and made the changes.
>
> Andrew, could you take another look and see if you agree I've
> covered it all before it gets marked ready for a committer?
>
> -Kevin
>


Re: Review: Patch for phypot - Pygmy Hippotause

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think the patch is good in principle
> Since everyone seems to agree this is a good patch which needed
> minor tweaks, and we haven't heard from the author, I just went
> ahead and made the changes.

Applied with a bit of further editing of the comments.
        regards, tom lane