Thread: Slaying the HYPOTamus
Like some ancient precursor to the modern hypot()enuse the dreaded ill-tempered HYPOT()amus lurks in the recesses of geo_ops.c and geo_decls.h. And many a line of code has been swallowed by its mighty maw. This patch replaces the HYPOT() macro with a calls to the hypot() function. The hypot() function has been part of the C standard since C99 (ie 10 years ago), and in most UNIX, IBM and GNU libraries longer than that. The function is designed not to fail where the current naive macro would result in overflow. Where available, we might expect to the hypot() function to take advantage of underlying hardware opcodes. In addition the function evaluates its arguments only once. The current macro evaluates its arguments twice. Currently HYPOT(a.x - b.x, a.y - b.y)) becomes: sqrt((a.x - b.x)*(a.x - b.x) + (a.y - b.y) * (a.y - b.y)) The patch passes all test. Patch attached below. (First attempt at using CVS. Hope it's all good) ? GNUmakefile ? config.log ? config.status ? patchfile ? src/Makefile.global ? src/backend/postgres ? src/backend/catalog/postgres.bki ? src/backend/catalog/postgres.description ? src/backend/catalog/postgres.shdescription ? src/backend/snowball/snowball_create.sql ? src/backend/utils/probes.h ? src/backend/utils/mb/conversion_procs/conversion_create.sql ? src/bin/initdb/initdb ? src/bin/pg_config/pg_config ? src/bin/pg_controldata/pg_controldata ? src/bin/pg_ctl/pg_ctl ? src/bin/pg_dump/pg_dump ? src/bin/pg_dump/pg_dumpall ? src/bin/pg_dump/pg_restore ? src/bin/pg_resetxlog/pg_resetxlog ? src/bin/psql/psql ? src/bin/scripts/clusterdb ? src/bin/scripts/createdb ? src/bin/scripts/createlang ? src/bin/scripts/createuser ? src/bin/scripts/dropdb ? src/bin/scripts/droplang ? src/bin/scripts/dropuser ? src/bin/scripts/reindexdb ? src/bin/scripts/vacuumdb ? src/include/pg_config.h ? src/include/stamp-h ? src/interfaces/ecpg/compatlib/exports.list ? src/interfaces/ecpg/compatlib/libecpg_compat.so.3.1 ? src/interfaces/ecpg/compatlib/libecpg_compat.so.3.2 ? src/interfaces/ecpg/ecpglib/exports.list ? src/interfaces/ecpg/ecpglib/libecpg.so.6.2 ? src/interfaces/ecpg/include/ecpg_config.h ? src/interfaces/ecpg/include/stamp-h ? src/interfaces/ecpg/pgtypeslib/exports.list ? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.3.1 ? src/interfaces/ecpg/preproc/ecpg ? src/interfaces/libpq/exports.list ? src/interfaces/libpq/libpq.so.5.3 ? src/port/pg_config_paths.h ? src/test/regress/log ? src/test/regress/pg_regress ? src/test/regress/results ? src/test/regress/testtablespace ? src/test/regress/tmp_check ? src/test/regress/expected/constraints.out ? src/test/regress/expected/copy.out ? src/test/regress/expected/create_function_1.out ? src/test/regress/expected/create_function_2.out ? src/test/regress/expected/largeobject.out ? src/test/regress/expected/largeobject_1.out ? src/test/regress/expected/misc.out ? src/test/regress/expected/tablespace.out ? src/test/regress/sql/constraints.sql ? src/test/regress/sql/copy.sql ? src/test/regress/sql/create_function_1.sql ? src/test/regress/sql/create_function_2.sql ? src/test/regress/sql/largeobject.sql ? src/test/regress/sql/misc.sql ? src/test/regress/sql/tablespace.sql ? src/timezone/zic Index: src/backend/utils/adt/geo_ops.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/geo_ops.c,v retrieving revision 1.103 diff -c -r1.103 geo_ops.c *** src/backend/utils/adt/geo_ops.c 28 Jul 2009 09:47:59 -0000 1.103 --- src/backend/utils/adt/geo_ops.c 23 Aug 2009 03:44:39 -0000 *************** *** 825,831 **** box_cn(&a, box1); box_cn(&b, box2); ! PG_RETURN_FLOAT8(HYPOT(a.x - b.x, a.y - b.y)); } --- 825,831 ---- box_cn(&a, box1); box_cn(&b, box2); ! PG_RETURN_FLOAT8(hypot(a.x - b.x, a.y - b.y)); } *************** *** 1971,1977 **** Point *pt1 = PG_GETARG_POINT_P(0); Point *pt2 = PG_GETARG_POINT_P(1); ! PG_RETURN_FLOAT8(HYPOT(pt1->x - pt2->x, pt1->y - pt2->y)); } double --- 1971,1977 ---- Point *pt1 = PG_GETARG_POINT_P(0); Point *pt2 = PG_GETARG_POINT_P(1); ! PG_RETURN_FLOAT8(hypot(pt1->x - pt2->x, pt1->y - pt2->y)); } double *************** *** 1979,1987 **** { #ifdef GEODEBUG printf("point_dt- segment (%f,%f),(%f,%f) length is %f\n", ! pt1->x, pt1->y, pt2->x, pt2->y, HYPOT(pt1->x - pt2->x, pt1->y - pt2->y)); #endif ! return HYPOT(pt1->x - pt2->x, pt1->y - pt2->y); } Datum --- 1979,1987 ---- { #ifdef GEODEBUG printf("point_dt- segment (%f,%f),(%f,%f) length is %f\n", ! pt1->x, pt1->y, pt2->x, pt2->y, hypot(pt1->x - pt2->x, pt1->y - pt2->y)); #endif ! return hypot(pt1->x - pt2->x, pt1->y - pt2->y); } Datum *************** *** 2444,2450 **** dist_pl_internal(Point *pt, LINE *line) { return (line->A * pt->x + line->B * pt->y + line->C) / ! HYPOT(line->A, line->B); } Datum --- 2444,2450 ---- dist_pl_internal(Point *pt, LINE *line) { return (line->A * pt->x + line->B * pt->y + line->C) / ! hypot(line->A, line->B); } Datum *************** *** 4916,4922 **** PointPGetDatum(point))); result->center.x = p->x; result->center.y = p->y; ! result->radius *= HYPOT(point->x, point->y); PG_RETURN_CIRCLE_P(result); } --- 4916,4922 ---- PointPGetDatum(point))); result->center.x = p->x; result->center.y = p->y; ! result->radius *= hypot(point->x, point->y); PG_RETURN_CIRCLE_P(result); } *************** *** 4936,4942 **** PointPGetDatum(point))); result->center.x = p->x; result->center.y = p->y; ! result->radius /= HYPOT(point->x, point->y); PG_RETURN_CIRCLE_P(result); } --- 4936,4942 ---- PointPGetDatum(point))); result->center.x = p->x; result->center.y = p->y; ! result->radius /= hypot(point->x, point->y); PG_RETURN_CIRCLE_P(result); } Index: src/include/utils/geo_decls.h =================================================================== RCS file: /projects/cvsroot/pgsql/src/include/utils/geo_decls.h,v retrieving revision 1.55 diff -c -r1.55 geo_decls.h *** src/include/utils/geo_decls.h 1 Jan 2009 17:24:02 -0000 1.55 --- src/include/utils/geo_decls.h 23 Aug 2009 03:44:44 -0000 *************** *** 50,56 **** #define FPge(A,B) ((A) >= (B)) #endif - #define HYPOT(A, B) sqrt((A) * (A) + (B) * (B)) /*--------------------------------------------------------------------- * Point - (x,y) --- 50,55 ----
On Sun, Aug 23, 2009 at 4:54 AM, Paul Matthews<plm@netspace.net.au> wrote: > > The hypot() function has been part of the C standard since C99 (ie 10 > years ago) Postgres targets C89. The date of the standard is when the standard came out, it takes years before it's widely available and then years again before the systems with the old compiler are no longer interesting. If there's a performance advantage then we could add a configure test and define the macro to call hypot(). You said it existed before C99 though, how widespread was it? If it's in all the platforms we support it might be reasonable to just go with it. > The function is designed not to fail where the current naive macro would > result in overflow. The code seems to be blissfully unaware of overflow dangers :( Even if hypot() avoids spurious overflows it's always possible for there to be a legitimate overflow which we ought to detect and handle properly. -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > If there's a performance advantage then we could add a configure test > and define the macro to call hypot(). You said it existed before C99 > though, how widespread was it? If it's in all the platforms we support > it might be reasonable to just go with it. For one data point, I see hypot() in HPUX 10.20, released circa 1996. I suspect we would want a configure test and a substitute function anyway. Personally I wouldn't have a problem with the substitute being the naive sqrt(x*x+y*y), particularly if it's replacing existing code that overflows in the same places. regards, tom lane
Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > >> If there's a performance advantage then we could add a configure test >> and define the macro to call hypot(). You said it existed before C99 >> though, how widespread was it? If it's in all the platforms we support >> it might be reasonable to just go with it. >> > > For one data point, I see hypot() in HPUX 10.20, released circa 1996. > I suspect we would want a configure test and a substitute function > anyway. Personally I wouldn't have a problem with the substitute being > the naive sqrt(x*x+y*y), particularly if it's replacing existing code > that overflows in the same places. > > regards, tom lane > > A hypot() substitute might look something like this psudo-code, this is how Python does it if the real hypot() is missing. double hypot( double dx, double dy ) { double yx; if( isinf(dx) || ifinf(dy) ) { return INFINITY; } dx = fabs(dx); dy = fabs(dy); if (dx < dy) { doubletemp = dx; dx = dy; dy = temp; } if (x == 0.) return 0.; else { yx = dy/dx; returndx*sqrt(1.0+yx*yx); } } As the following link shows, a lot of care could be put into getting a substitute hypot() correct. http://gforge.inria.fr/plugins/scmsvn/viewcvs.php/trunk/hypot.c?rev=5677&root=mpfr&view=markup
On Sun, Aug 23, 2009 at 07:42, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Greg Stark <gsstark@mit.edu> writes: >> If there's a performance advantage then we could add a configure test >> and define the macro to call hypot(). You said it existed before C99 >> though, how widespread was it? If it's in all the platforms we support >> it might be reasonable to just go with it. > > For one data point, I see hypot() in HPUX 10.20, released circa 1996. > I suspect we would want a configure test and a substitute function > anyway. Personally I wouldn't have a problem with the substitute being > the naive sqrt(x*x+y*y), particularly if it's replacing existing code > that overflows in the same places. For another data point, Microsoft documentation says: "This POSIX function is deprecated beginning in Visual C++ 2005. Use the ISO C++ conformant _hypot instead." _hypot() has been there since Windows 95, so it shouldn't be a problem to use it - it just needs a define, like we have for some other such functions. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
2009/8/23 Magnus Hagander <magnus@hagander.net>: > For another data point, Microsoft documentation says: > "This POSIX function is deprecated beginning in Visual C++ 2005. Use > the ISO C++ conformant _hypot instead." > > _hypot() has been there since Windows 95, so it shouldn't be a problem > to use it - it just needs a define, like we have for some other such > functions. FYI, according to [1], this warning is bogus (i.e., "hypot" without underscore should be entirely OK) and will be removed in VS 2010. Nicolas [1] <url:http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=289653>
This is to go with the hypot() patch I posted the other day. As other code, such as found in adt/float.c and adt/numeric.c, simply assumes that isnan() exists, despite it being a C99 function, I have assumed the same. The below code should be placed into a file called src/port/hypot.c. Unfortunately I do not know how to drive configure and all the other associated build magics. Could some kind soul please implemented that portion. (Or shove me in the right direction) #include <math.h> #include "c.h" #include "utils/builtins.h" /** Find the hypotenuse. Firstly x and y are swapped, if required, to make* x the larger number. The traditional formulaeof x^2+y^2 is rearranged* to bring x outside the sqrt. This allows computation of the hypotenuse* for much largermagnitudes than otherwise normally possible.** sqrt( x^2 + y^2 ) = sqrt( x^2( 1 + y^2/x^2) )* = x* sqrt( 1 + y^2/x^2 )* = x * sqrt( 1 + y/x * y/x )*/ double hypot( double x, double y ) { double yx; if( isinf(x) || isinf(y) ) return get_float8_infinity(); if( isnan(x) || isnan(y) ) return get_float8_nan(); x = fabs(x); y = fabs(y); if (x < y) { double temp = x; x = y; y = temp; } if (x == 0.0) return 0.0; else { yx = y/x; return x*sqrt(1.0+yx*yx); } }
On Mon, Aug 24, 2009 at 11:14:19PM +1000, Paul Matthews wrote: > This is to go with the hypot() patch I posted the other day. > > As other code, such as found in adt/float.c and adt/numeric.c, simply > assumes that isnan() exists, despite it being a C99 function, I have > assumed the same. > > The below code should be placed into a file called src/port/hypot.c. > > Unfortunately I do not know how to drive configure and all the other > associated build magics. Could some kind soul please implemented that > portion. (Or shove me in the right direction) Comments below :) > #include <math.h> > #include "c.h" > #include "utils/builtins.h" > > /* > * Find the hypotenuse. Firstly x and y are swapped, if required, to make > * x the larger number. The traditional formulae of x^2+y^2 is rearranged > * to bring x outside the sqrt. This allows computation of the hypotenuse > * for much larger magnitudes than otherwise normally possible. > * > * sqrt( x^2 + y^2 ) = sqrt( x^2( 1 + y^2/x^2) ) > * = x * sqrt( 1 + y^2/x^2 ) > * = x * sqrt( 1 + y/x * y/x ) > */ > double hypot( double x, double y ) > { > double yx; > > if( isinf(x) || isinf(y) ) > return get_float8_infinity(); > > if( isnan(x) || isnan(y) ) > return get_float8_nan(); > > x = fabs(x); > y = fabs(y); > > if (x < y) { > double temp = x; > x = y; > y = temp; > } How about putting: if (x >= 0.0 && y == 0.0) return x; here? These next two lines are a teensy bit baroque. Is there some significant speed increase that would justify them? > if (x == 0.0) > return 0.0; > else { > yx = y/x; > return x*sqrt(1.0+yx*yx); > } > } Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> wrote: > On Mon, Aug 24, 2009 at 11:14:19PM +1000, Paul Matthews wrote: > These next two lines are a teensy bit baroque. Is there some > significant speed increase that would justify them? > >> if (x == 0.0) >> return 0.0; >> else { >> yx = y/x; >> return x*sqrt(1.0+yx*yx); >> } >> } I think the reason is overflow. From the function comment: >> * The traditional formulae of x^2+y^2 is rearranged >> * to bring x outside the sqrt. This allows computation of the hypotenuse >> * for much larger magnitudes than otherwise normally possible. Although I don't see why the first part isn't: if (y == 0.0) return x; -Kevin
On Mon, Aug 24, 2009 at 12:47:42PM -0500, Kevin Grittner wrote: > David Fetter <david@fetter.org> wrote: > > On Mon, Aug 24, 2009 at 11:14:19PM +1000, Paul Matthews wrote: > > > These next two lines are a teensy bit baroque. Is there some > > significant speed increase that would justify them? > > > >> if (x == 0.0) > >> return 0.0; > >> else { > >> yx = y/x; > >> return x*sqrt(1.0+yx*yx); > >> } > >> } > > I think the reason is overflow. From the function comment: > > >> * The traditional formulae of x^2+y^2 is rearranged > >> * to bring x outside the sqrt. This allows computation of the > hypotenuse > >> * for much larger magnitudes than otherwise normally possible. > > Although I don't see why the first part isn't: > > if (y == 0.0) > return x; D'oh! Good point :) So the code should read as follows? #include <math.h> #include "c.h" #include "utils/builtins.h" /** Find the hypotenuse. Firstly x and y are swapped, if required, to make* x the larger number. The traditional formulaeof x^2+y^2 is rearranged* to bring x outside the sqrt. This allows computation of the hypotenuse* for much largermagnitudes than otherwise normally possible.** sqrt( x^2 + y^2 ) = sqrt( x^2( 1 + y^2/x^2) )* = x* sqrt( 1 + y^2/x^2 )* = x * sqrt( 1 + y/x * y/x )*/ double hypot( double x, double y ) { double yx; if( isinf(x) || isinf(y) ) return get_float8_infinity(); if( isnan(x) || isnan(y) ) return get_float8_nan(); x = fabs(x); y = fabs(y); if (x < y) { double temp = x; x = y; y = temp; } if (y == 0.0) return x; yx = y/x; returnx*sqrt(1.0+yx*yx); } Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, Aug 24, 2009 at 07:07:13AM -0700, David Fetter wrote: > These next two lines are a teensy bit baroque. Is there some > significant speed increase that would justify them? Just noticed with your revised code that the following check: > On Mon, Aug 24, 2009 at 11:14:19PM +1000, Paul Matthews wrote: > > if (x == 0.0) > > return 0.0; > > else { > > yx = y/x; is preventing a divide by zero on the line above. So it's not a performance hack, it's just allowing it to remain correct as a result of changing the maths around. > > return x*sqrt(1.0+yx*yx); > > } > > } -- Sam http://samason.me.uk/
On Mon, Aug 24, 2009 at 06:59:38PM +0100, Sam Mason wrote: > > On Mon, Aug 24, 2009 at 11:14:19PM +1000, Paul Matthews wrote: > > > if (x == 0.0) > > > return 0.0; > > > else { > > > yx = y/x; > > is preventing a divide by zero on the line above. So it's not a > performance hack, it's just allowing it to remain correct as a result of > changing the maths around. I've also just realized why it's safe to return zero here; "y" contains the smaller number and so if "x" is zero, "y" must be zero as well. -- Sam http://samason.me.uk/
On Mon, Aug 24, 2009 at 6:52 PM, David Fetter<david@fetter.org> wrote: > double hypot( double x, double y ) > { > double yx; > > if( isinf(x) || isinf(y) ) > return get_float8_infinity(); > > if( isnan(x) || isnan(y) ) > return get_float8_nan(); For what it's worth though, check out the code in float.c to see how postgres handles floating point overflows. I'm not sure whether it's forced by the standard, we discussed and settled on this behaviour, or the person who wrote it just thought it was a good idea but we may as well be consistent. The behaviour we have now is that if you pass Inf or -Inf then we happily return Inf or -Inf (or whatever the result is) as appropriate. But if the operation overflows despite being passed reasonable values then it throws an error. Afaict hypot() can still overflow even with this new coding if you have, say, hypot(MAX_FLOAT, MAX_FLOAT) which will return MAX_FLOAT * sqrt(2). In that case we should throw an error unless either input was Inf. Also, the question arises what should be returned for hypot(Inf,NaN) which your code returns Inf for. Empirically, it seems the normal floating point behaviour is to return NaN so I think the NaN test should be first. Lastly I find the swap kind of confusing and also think it might perform less well than just having one branch and simple logic in each branch. This is just a style question that you could see either way though, the performance difference is probably immeasurable if even if my guess is right. so I would suggest: if (isnan(x) || isnan(y) return float8_get_nan(); else if (isinf(x) || isinf(y)) return float8_get_inf(); else if (x == 0.0 && y == 0.0) return 0.0; x = fabs(x); y = fabs(y); if (x > y) retval = x * sqrt((y/x)*(y/x) + 1); else retval = y * sqrt((x/y)*(x/y) + 1); if (isinf(retval)) ereport(overflow...) return retval; } -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark wrote: > Also, the question arises what should be returned for hypot(Inf,NaN) > which your code returns Inf for. Empirically, it seems the normal > floating point behaviour is to return NaN so I think the NaN test > should be first. > > See http://www.opengroup.org/onlinepubs/000095399/functions/hypot.html If /x/ or /y/ is ±Inf, +Inf shall be returned (even if one of /x/ or /y/ is NaN). If /x/ or /y/ is NaN, and the other is not ±Inf, a NaN shall be returned. Just trying to implement correct C99 behaviour here.
Paul Matthews <plm@netspace.net.au> writes: > Just trying to implement correct C99 behaviour here. Around here we tend to read the Single Unix Spec before C99, and SUS saith something different: http://www.opengroup.org/onlinepubs/007908799/xsh/hypot.html It would be serious folly for us to suppose that every platform's version of hypot() behaves exactly the same for these corner cases, anyway. If you're proposing to write code that would depend on that, we need to call it something else and not pretend that it's just a fill-in for platforms that lack hypot() entirely. regards, tom lane
On Tue, Aug 25, 2009 at 1:14 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Paul Matthews <plm@netspace.net.au> writes: >> Just trying to implement correct C99 behaviour here. > > Around here we tend to read the Single Unix Spec before C99, and SUS > saith something different: It doesn't seem to anticipate NaN at all. Neither of these seems on-point since we're neither implementing SuS nor a C compiler. In fact we're not even implementing hypot(). We're implementing things like box_distance and point_distance which as it happens will already be doing earlier arithmetic on the doubles, so whatever HYPOT() does had better be consistent with that or the results will be just an inexplicable mishmash. Incidentally, what on earth does it mean to multiply or divide a circle by a point? -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark wrote: > We're implementing things like box_distance and point_distance which > as it happens will already be doing earlier arithmetic on the doubles, > so whatever HYPOT() does had better be consistent with that or the > results will be just an inexplicable mishmash. > > Let's look at what the HYPOT macro in PostgreSQL does right now: #define HYPOT(A, B) sqrt((A)*(A)+(B)*(B)) And let's see how it is used in point_distance: Datum point_distance(PG_FUNCTION_ARGS) { Point *pt1 = PG_GETARG_POINT_P(0); Point *pt2 = PG_GETARG_POINT_P(1); PG_RETURN_FLOAT8(HYPOT(pt1->x - pt2->x, pt1->y - pt2->y)); } Oh. Surprise. It does not look out for NaN's, InF's, overflows, underflows or anything else. In addition, for the majority of it's input space, it fails to work correctly at all. If x and y are both 1E200 the hypotenuse should be 1.4142135e200, well within the range of the double. However by naively squaring x yields 1E400, outside the range of a double. Currently HYPOT() returns INFINITY, which is not the correct answer. I am trying to introduce the hypot() function (where required) and associated patch, in order to start correcting some of the more obvious defects with the current geometry handling. Maybe my proposed hypot() function is not perfect, but it's so far in front of current the HYPOT macro is not funny. > Incidentally, what on earth does it mean to multiply or divide a > circle by a point? > > Basically it's complex math. This comment, from my new geometry implementation, might help. /*-------------------------------------------------------------------------* Additional Operators** The +, -, * and / operatorstreat IPoint's as complex numbers.* (This would indicate a requirement for a Complex type?)** (a+bi)+(c+di) =((a+c)+(b+d)i)* (a+bi)-(c+di) = ((a-c)+(b-d)i)* (a+bi)*(c+di) = ac + adi + bci + bdi^2* = ac + (ad+bc)i- bd # as i^2 = -1* = ((ac-bd)+(ad+bc)i)* (a+bi)/(c+di) = (a+bi)(c-di) / (c+di)(c-di)* = ((ac+bd) + (bc-ad)i) / (c^2+d^2)* = ((ac + bd)/(c^2+d^2) + ((bc-ad)/(c^2+d^2))i)** translation + IPoint_IPoint_add* translation - IPoint_IPoint_sub* multiplication * IPoint_IPoint_mul* division / IPoint_IPoint_div*-----------------------------------------------------------------------*/