Thread: mingw32 floating point diff
Running the regression tests on mingw32, I get the following diff in circle.out: @@ -111,8 +111,8 @@ WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0) ORDER BY distance, area(c1.f1), area(c2.f1); five | one | two | distance -------+----------------+----------------+------------------ - | <(3,5),0> | <(1,2),3> | 0.60555127546399 +------+----------------+----------------+------------------- + | <(3,5),0> | <(1,2),3> | 0.605551275463989 | <(3,5),0> | <(5,1),3> | 1.47213595499958 | <(100,200),10> | <(100,1),115> | 74 | <(100,200),10> | <(1,2),100> | 111.370729772479 I only get this on master/PG12, but not on PG11, so I suspect that the new floating-point output routines could be the root of the issue. This happens only with the 32-bit build (mingw32), but not with a 64-bit build (mingw64). Any suggestions on how to analyze this further? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-08-20 14:59, Peter Eisentraut wrote: > Running the regression tests on mingw32, I get the following diff in > circle.out: > > @@ -111,8 +111,8 @@ > WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0) > ORDER BY distance, area(c1.f1), area(c2.f1); > five | one | two | distance > -------+----------------+----------------+------------------ > - | <(3,5),0> | <(1,2),3> | 0.60555127546399 > +------+----------------+----------------+------------------- > + | <(3,5),0> | <(1,2),3> | 0.605551275463989 > | <(3,5),0> | <(5,1),3> | 1.47213595499958 > | <(100,200),10> | <(100,1),115> | 74 > | <(100,200),10> | <(1,2),100> | 111.370729772479 > > I only get this on master/PG12, but not on PG11, so I suspect that the > new floating-point output routines could be the root of the issue. > > This happens only with the 32-bit build (mingw32), but not with a 64-bit > build (mingw64). OK, the problem isn't the new output routines. The result of the computations is actually different. The test itself is new in PG12. The difference in output is due to the mingw32 target using -mfpmath=387 by default. If you build with -mfpmath=sse -msee, the tests pass again. Do we care to do anything about this? Pick slightly different test data perhaps? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 2019-08-20 14:59, Peter Eisentraut wrote: >> Running the regression tests on mingw32, I get the following diff in >> circle.out: ... > OK, the problem isn't the new output routines. The result of the > computations is actually different. The test itself is new in PG12. > The difference in output is due to the mingw32 target using -mfpmath=387 > by default. If you build with -mfpmath=sse -msee, the tests pass again. Hm, so presumably we could replicate this on other Intel-oid platforms by changing compiler switches? (I haven't tried.) > Do we care to do anything about this? Pick slightly different test data > perhaps? Picking different test data might be a good "fix". Alternatively, we could try to figure out where the discrepancy is arising and adjust the code --- but that might be a lot more work than it's worth. regards, tom lane
I wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> Do we care to do anything about this? Pick slightly different test data >> perhaps? > Picking different test data might be a good "fix". Alternatively, we > could try to figure out where the discrepancy is arising and adjust > the code --- but that might be a lot more work than it's worth. I poked at this a bit more. I can reproduce the problem by using -mfpmath=387 on dromedary's host (fairly old 32-bit macOS); although I also get half a dozen *other* failures in the core regression tests, mostly around detection of float overflow. So I'm not quite sure that this is comparable. But at any rate, I tracked the core of the problem to pg_hypot: /* Determine the hypotenuse */ yx = y / x; result = x * sqrt(1.0 + (yx * yx)); With -mfpmath=387, these calculations are done to more-than-64-bit precision, yielding a different end result --- note in particular that sqrt() is a hardware instruction on this platform, so it's not rounding either. I experimented with preventing that by using volatile intermediate variables (cf comments in float.c); but it seemed like a mess, and it would likely pessimize the code more than we want for other platforms, and it's kind of hard to argue that deliberately sabotaging the more-accurate computation is an improvement. What I suggest doing is reducing extra_float_digits to -1 for this specific test. Changing the contents of circle_tbl seems like it'd have more consequences than we want, in particular there's no guarantee that we'd not hit similar issues in other tests if they're given different inputs. regards, tom lane
On 2019-08-22 18:19, Tom Lane wrote: > What I suggest doing is reducing extra_float_digits to -1 for this > specific test. Changing the contents of circle_tbl seems like it'd have > more consequences than we want, in particular there's no guarantee that > we'd not hit similar issues in other tests if they're given different > inputs. I agree that reducing the output precision is better than adjusting the code. The circle.sql file already has SET extra_float_digits TO 0, and a few other files have other settings with different values. Are we content to use various numbers until it works in each case, or should we try to use some consistency? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 8/20/19 8:59 AM, Peter Eisentraut wrote: > Running the regression tests on mingw32, I get the following diff in > circle.out: > > @@ -111,8 +111,8 @@ > WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0) > ORDER BY distance, area(c1.f1), area(c2.f1); > five | one | two | distance > -------+----------------+----------------+------------------ > - | <(3,5),0> | <(1,2),3> | 0.60555127546399 > +------+----------------+----------------+------------------- > + | <(3,5),0> | <(1,2),3> | 0.605551275463989 > | <(3,5),0> | <(5,1),3> | 1.47213595499958 > | <(100,200),10> | <(100,1),115> | 74 > | <(100,200),10> | <(1,2),100> | 111.370729772479 > > I only get this on master/PG12, but not on PG11, so I suspect that the > new floating-point output routines could be the root of the issue. > > This happens only with the 32-bit build (mingw32), but not with a 64-bit > build (mingw64). > > Any suggestions on how to analyze this further? > I complained about this a year ago: <https://postgr.es/m/9f4f22be-f9f1-b350-bc06-521226b87f7a@dunslane.net> +1 for fixing it by any reasonable means. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > The circle.sql file already has SET extra_float_digits TO 0, and a few > other files have other settings with different values. Are we content > to use various numbers until it works in each case, or should we try to > use some consistency? The one in rules.sql doesn't count here. Everyplace else seems to be using 0, except for the one in geometry.sql, which seems to be perhaps unreasonably conservative. Might be worth researching why it's -3. regards, tom lane
> I poked at this a bit more. I can reproduce the problem by using > -mfpmath=387 on dromedary's host (fairly old 32-bit macOS); although > I also get half a dozen *other* failures in the core regression tests, > mostly around detection of float overflow. So I'm not quite sure that > this is comparable. But at any rate, I tracked the core of the problem > to pg_hypot: I couldn't test if it helps, but another solution may be is to rip out pg_hypot() in favour of the libc implementation. This was discussed in detail as part of "Improve geometric types" thread. Last comment about it: https://www.postgresql.org/message-id/9223.1507039405%40sss.pgh.pa.us
Emre Hasegeli <emre@hasegeli.com> writes: > I couldn't test if it helps, but another solution may be is to rip out > pg_hypot() in favour of the libc implementation. This was discussed > in detail as part of "Improve geometric types" thread. Hm ... the problem we're trying to fix here is platform-varying results. Surely switching to the libc implementation would make that worse not better? regards, tom lane
On 2019-08-23 15:50, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> The circle.sql file already has SET extra_float_digits TO 0, and a few >> other files have other settings with different values. Are we content >> to use various numbers until it works in each case, or should we try to >> use some consistency? > > The one in rules.sql doesn't count here. Everyplace else seems to be > using 0, except for the one in geometry.sql, which seems to be perhaps > unreasonably conservative. Might be worth researching why it's -3. I can confirm that SET extra_float_digits TO -1 in circle.sql fixes the original complaint. I don't understand this stuff enough to be able to provide a good source code comment or commit message, so I'd appreciate some help or someone else to proceed with this change. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > I can confirm that SET extra_float_digits TO -1 in circle.sql fixes the > original complaint. Cool. It did on dromedary, but that doesn't necessarily prove much about other compilers :-( > I don't understand this stuff enough to be able to provide a good source > code comment or commit message, so I'd appreciate some help or someone > else to proceed with this change. Sure, I'll take it. regards, tom lane
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 8/20/19 8:59 AM, Peter Eisentraut wrote: >> Running the regression tests on mingw32, I get the following diff in >> circle.out: >> - | <(3,5),0> | <(1,2),3> | 0.60555127546399 >> + | <(3,5),0> | <(1,2),3> | 0.605551275463989 > I complained about this a year ago: > <https://postgr.es/m/9f4f22be-f9f1-b350-bc06-521226b87f7a@dunslane.net> > +1 for fixing it by any reasonable means. Now that that fix is in, could we get a buildfarm member running on such a platform? It seems to behave differently from anything else. I tracked down the residual regression failures on dromedary with -mfpmath=387, and found that they occur because check_float8_val gets inlined and then its tests for "isinf(val)" and "val == 0.0" are applied to the 80-bit intermediate results not the 64-bit form. I find it odd that we apparently don't have the same issues on mingw32. I'm very hesitant to apply a volatile-qualification approach to eliminate those issues, for fear of pessimizing performance-critical code on more modern platforms. I wonder whether there is a reasonable way to tell at compile time if we have a platform with 80-bit math. regards, tom lane
I wrote: > I'm very hesitant to apply a volatile-qualification approach to > eliminate those issues, for fear of pessimizing performance-critical > code on more modern platforms. I wonder whether there is a reasonable > way to tell at compile time if we have a platform with 80-bit math. Hmmm ... I find that dromedary's compiler predefines __FLT_EVAL_METHOD__ as 2 not 0 when -mfpmath=387 is given. This seems to be something that was standardized in C99 (without the double underscores), so maybe we could do something like #if __FLT_EVAL_METHOD__ > 0 || FLT_EVAL_METHOD > 0 to conditionalize whether we try to force the evaluation width in check_float8_val and check_float4_val. regards, tom lane
On 8/25/19 4:23 PM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 8/20/19 8:59 AM, Peter Eisentraut wrote: >>> Running the regression tests on mingw32, I get the following diff in >>> circle.out: >>> - | <(3,5),0> | <(1,2),3> | 0.60555127546399 >>> + | <(3,5),0> | <(1,2),3> | 0.605551275463989 >> I complained about this a year ago: >> <https://postgr.es/m/9f4f22be-f9f1-b350-bc06-521226b87f7a@dunslane.net> >> +1 for fixing it by any reasonable means. > Now that that fix is in, could we get a buildfarm member running on > such a platform? It seems to behave differently from anything else. > I'm pretty much tapped out for Windows resources, I have one physical and one virtual machine which do nothing but run my 6 Windows based animals. I don't know if the community has spare resources available from those that Amazon donate to us. There is already one animal I manage running there, so maybe another would be feasible. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I wrote: >> I'm very hesitant to apply a volatile-qualification approach to >> eliminate those issues, for fear of pessimizing performance-critical >> code on more modern platforms. I wonder whether there is a reasonable >> way to tell at compile time if we have a platform with 80-bit math. > Hmmm ... I find that dromedary's compiler predefines __FLT_EVAL_METHOD__ > as 2 not 0 when -mfpmath=387 is given. This seems to be something > that was standardized in C99 (without the double underscores), so > maybe we could do something like > #if __FLT_EVAL_METHOD__ > 0 || FLT_EVAL_METHOD > 0 After further poking around, it seems that testing FLT_EVAL_METHOD should be sufficient --- <float.h> appears to define that correctly even in very old C99 installations. However, I'm losing interest in the problem after finding that I can't reproduce it anywhere except on dromedary (with "-mfpmath=387" added). For instance, I have a 32-bit FreeBSD system with what claims to be the same compiler (gcc 4.2.1), but it passes regression just fine, with or without -mfpmath=387. Earlier and later gcc versions also don't show a problem. I suspect that Apple bollixed something with local mods to their version of 4.2.1, or possibly they are allowing inlining of isinf() in a way that nobody else does. Also, using that compiler with "-mfpmath=387", I see that every supported PG version back to 9.4 fails regression due to not detecting float8 multiply overflow. So this isn't a problem that we introduced with v12's changes, as I'd first suspected; and it's not a problem that anyone is hitting in the field, or we'd have heard complaints. So, barring somebody showing that we have an issue on some platform that people actually care about currently, I'm inclined not to do anything more here. regards, tom lane