Thread: mingw32 floating point diff

mingw32 floating point diff

From

Peter Eisentraut

Date:

20 August 2019, 12:59:52

Running the regression tests on mingw32, I get the following diff in
circle.out:

@@ -111,8 +111,8 @@
   WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0)
   ORDER BY distance, area(c1.f1), area(c2.f1);
  five |      one       |      two       |     distance
-------+----------------+----------------+------------------
-      | <(3,5),0>      | <(1,2),3>      | 0.60555127546399
+------+----------------+----------------+-------------------
+      | <(3,5),0>      | <(1,2),3>      | 0.605551275463989
       | <(3,5),0>      | <(5,1),3>      | 1.47213595499958
       | <(100,200),10> | <(100,1),115>  |               74
       | <(100,200),10> | <(1,2),100>    | 111.370729772479

I only get this on master/PG12, but not on PG11, so I suspect that the
new floating-point output routines could be the root of the issue.

This happens only with the 32-bit build (mingw32), but not with a 64-bit
build (mingw64).

Any suggestions on how to analyze this further?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

mingw32 floating point diff

From

Peter Eisentraut

Date:

22 August 2019, 12:49:33

On 2019-08-20 14:59, Peter Eisentraut wrote:
> Running the regression tests on mingw32, I get the following diff in
> circle.out:
> 
> @@ -111,8 +111,8 @@
>    WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0)
>    ORDER BY distance, area(c1.f1), area(c2.f1);
>   five |      one       |      two       |     distance
> -------+----------------+----------------+------------------
> -      | <(3,5),0>      | <(1,2),3>      | 0.60555127546399
> +------+----------------+----------------+-------------------
> +      | <(3,5),0>      | <(1,2),3>      | 0.605551275463989
>        | <(3,5),0>      | <(5,1),3>      | 1.47213595499958
>        | <(100,200),10> | <(100,1),115>  |               74
>        | <(100,200),10> | <(1,2),100>    | 111.370729772479
> 
> I only get this on master/PG12, but not on PG11, so I suspect that the
> new floating-point output routines could be the root of the issue.
> 
> This happens only with the 32-bit build (mingw32), but not with a 64-bit
> build (mingw64).

OK, the problem isn't the new output routines.  The result of the
computations is actually different.  The test itself is new in PG12.

The difference in output is due to the mingw32 target using -mfpmath=387
by default.  If you build with -mfpmath=sse -msee, the tests pass again.

Do we care to do anything about this?  Pick slightly different test data
perhaps?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: mingw32 floating point diff

From

Tom Lane

Date:

22 August 2019, 13:55:03

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2019-08-20 14:59, Peter Eisentraut wrote:
>> Running the regression tests on mingw32, I get the following diff in
>> circle.out:
...
> OK, the problem isn't the new output routines.  The result of the
> computations is actually different.  The test itself is new in PG12.
> The difference in output is due to the mingw32 target using -mfpmath=387
> by default.  If you build with -mfpmath=sse -msee, the tests pass again.

Hm, so presumably we could replicate this on other Intel-oid platforms
by changing compiler switches?  (I haven't tried.)

> Do we care to do anything about this?  Pick slightly different test data
> perhaps?

Picking different test data might be a good "fix".  Alternatively, we
could try to figure out where the discrepancy is arising and adjust
the code --- but that might be a lot more work than it's worth.

            regards, tom lane

Re: mingw32 floating point diff

From

Tom Lane

Date:

22 August 2019, 16:19:06

I wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> Do we care to do anything about this?  Pick slightly different test data
>> perhaps?

> Picking different test data might be a good "fix".  Alternatively, we
> could try to figure out where the discrepancy is arising and adjust
> the code --- but that might be a lot more work than it's worth.

I poked at this a bit more.  I can reproduce the problem by using 
-mfpmath=387 on dromedary's host (fairly old 32-bit macOS); although
I also get half a dozen *other* failures in the core regression tests,
mostly around detection of float overflow.  So I'm not quite sure that
this is comparable.  But at any rate, I tracked the core of the problem
to pg_hypot:

    /* Determine the hypotenuse */
    yx = y / x;
    result = x * sqrt(1.0 + (yx * yx));

With -mfpmath=387, these calculations are done to more-than-64-bit
precision, yielding a different end result --- note in particular
that sqrt() is a hardware instruction on this platform, so it's
not rounding either.

I experimented with preventing that by using volatile intermediate
variables (cf comments in float.c); but it seemed like a mess,
and it would likely pessimize the code more than we want for other
platforms, and it's kind of hard to argue that deliberately sabotaging
the more-accurate computation is an improvement.

What I suggest doing is reducing extra_float_digits to -1 for this
specific test.  Changing the contents of circle_tbl seems like it'd have
more consequences than we want, in particular there's no guarantee that
we'd not hit similar issues in other tests if they're given different
inputs.

            regards, tom lane

Re: mingw32 floating point diff

From

Peter Eisentraut

Date:

23 August 2019, 06:14:06

On 2019-08-22 18:19, Tom Lane wrote:
> What I suggest doing is reducing extra_float_digits to -1 for this
> specific test.  Changing the contents of circle_tbl seems like it'd have
> more consequences than we want, in particular there's no guarantee that
> we'd not hit similar issues in other tests if they're given different
> inputs.

I agree that reducing the output precision is better than adjusting the
code.

The circle.sql file already has SET extra_float_digits TO 0, and a few
other files have other settings with different values.  Are we content
to use various numbers until it works in each case, or should we try to
use some consistency?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: mingw32 floating point diff

From

Andrew Dunstan

Date:

23 August 2019, 12:25:01

On 8/20/19 8:59 AM, Peter Eisentraut wrote:
> Running the regression tests on mingw32, I get the following diff in
> circle.out:
>
> @@ -111,8 +111,8 @@
>    WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0)
>    ORDER BY distance, area(c1.f1), area(c2.f1);
>   five |      one       |      two       |     distance
> -------+----------------+----------------+------------------
> -      | <(3,5),0>      | <(1,2),3>      | 0.60555127546399
> +------+----------------+----------------+-------------------
> +      | <(3,5),0>      | <(1,2),3>      | 0.605551275463989
>        | <(3,5),0>      | <(5,1),3>      | 1.47213595499958
>        | <(100,200),10> | <(100,1),115>  |               74
>        | <(100,200),10> | <(1,2),100>    | 111.370729772479
> 
> I only get this on master/PG12, but not on PG11, so I suspect that the
> new floating-point output routines could be the root of the issue.
>
> This happens only with the 32-bit build (mingw32), but not with a 64-bit
> build (mingw64).
>
> Any suggestions on how to analyze this further?
>


I complained about this a year ago:
<https://postgr.es/m/9f4f22be-f9f1-b350-bc06-521226b87f7a@dunslane.net>


+1 for fixing it by any reasonable means.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: mingw32 floating point diff

From

Tom Lane

Date:

23 August 2019, 13:50:54

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> The circle.sql file already has SET extra_float_digits TO 0, and a few
> other files have other settings with different values.  Are we content
> to use various numbers until it works in each case, or should we try to
> use some consistency?

The one in rules.sql doesn't count here.  Everyplace else seems to be
using 0, except for the one in geometry.sql, which seems to be perhaps
unreasonably conservative.  Might be worth researching why it's -3.

            regards, tom lane

Re: mingw32 floating point diff

From

Emre Hasegeli

Date:

24 August 2019, 12:23:11

> I poked at this a bit more.  I can reproduce the problem by using
> -mfpmath=387 on dromedary's host (fairly old 32-bit macOS); although
> I also get half a dozen *other* failures in the core regression tests,
> mostly around detection of float overflow.  So I'm not quite sure that
> this is comparable.  But at any rate, I tracked the core of the problem
> to pg_hypot:

I couldn't test if it helps, but another solution may be is to rip out
pg_hypot() in favour of the libc implementation.  This was discussed
in detail as part of "Improve geometric types" thread.

Last comment about it:

https://www.postgresql.org/message-id/9223.1507039405%40sss.pgh.pa.us

Re: mingw32 floating point diff

From

Tom Lane

Date:

24 August 2019, 15:18:37

Emre Hasegeli <emre@hasegeli.com> writes:
> I couldn't test if it helps, but another solution may be is to rip out
> pg_hypot() in favour of the libc implementation.  This was discussed
> in detail as part of "Improve geometric types" thread.

Hm ... the problem we're trying to fix here is platform-varying results.
Surely switching to the libc implementation would make that worse not
better?

            regards, tom lane

Re: mingw32 floating point diff

From

Peter Eisentraut

Date:

25 August 2019, 07:02:23

On 2019-08-23 15:50, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> The circle.sql file already has SET extra_float_digits TO 0, and a few
>> other files have other settings with different values.  Are we content
>> to use various numbers until it works in each case, or should we try to
>> use some consistency?
> 
> The one in rules.sql doesn't count here.  Everyplace else seems to be
> using 0, except for the one in geometry.sql, which seems to be perhaps
> unreasonably conservative.  Might be worth researching why it's -3.

I can confirm that SET extra_float_digits TO -1 in circle.sql fixes the
original complaint.

I don't understand this stuff enough to be able to provide a good source
code comment or commit message, so I'd appreciate some help or someone
else to proceed with this change.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: mingw32 floating point diff

From

Tom Lane

Date:

25 August 2019, 15:22:49

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> I can confirm that SET extra_float_digits TO -1 in circle.sql fixes the
> original complaint.

Cool.  It did on dromedary, but that doesn't necessarily prove much
about other compilers :-(

> I don't understand this stuff enough to be able to provide a good source
> code comment or commit message, so I'd appreciate some help or someone
> else to proceed with this change.

Sure, I'll take it.

            regards, tom lane

Re: mingw32 floating point diff

From

Tom Lane

Date:

25 August 2019, 20:23:34

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 8/20/19 8:59 AM, Peter Eisentraut wrote:
>> Running the regression tests on mingw32, I get the following diff in
>> circle.out:
>> -      | <(3,5),0>      | <(1,2),3>      | 0.60555127546399
>> +      | <(3,5),0>      | <(1,2),3>      | 0.605551275463989

> I complained about this a year ago:
> <https://postgr.es/m/9f4f22be-f9f1-b350-bc06-521226b87f7a@dunslane.net>
> +1 for fixing it by any reasonable means.

Now that that fix is in, could we get a buildfarm member running on
such a platform?  It seems to behave differently from anything else.

I tracked down the residual regression failures on dromedary with
-mfpmath=387, and found that they occur because check_float8_val
gets inlined and then its tests for "isinf(val)" and "val == 0.0"
are applied to the 80-bit intermediate results not the 64-bit form.
I find it odd that we apparently don't have the same issues on
mingw32.

I'm very hesitant to apply a volatile-qualification approach to
eliminate those issues, for fear of pessimizing performance-critical
code on more modern platforms.  I wonder whether there is a reasonable
way to tell at compile time if we have a platform with 80-bit math.

            regards, tom lane

Re: mingw32 floating point diff

From

Tom Lane

Date:

25 August 2019, 20:49:38

I wrote:
> I'm very hesitant to apply a volatile-qualification approach to
> eliminate those issues, for fear of pessimizing performance-critical
> code on more modern platforms.  I wonder whether there is a reasonable
> way to tell at compile time if we have a platform with 80-bit math.

Hmmm ... I find that dromedary's compiler predefines __FLT_EVAL_METHOD__
as 2 not 0 when -mfpmath=387 is given.  This seems to be something
that was standardized in C99 (without the double underscores), so
maybe we could do something like

#if __FLT_EVAL_METHOD__ > 0 || FLT_EVAL_METHOD > 0

to conditionalize whether we try to force the evaluation width in
check_float8_val and check_float4_val.

            regards, tom lane

Re: mingw32 floating point diff

From

Andrew Dunstan

Date:

26 August 2019, 15:13:30

On 8/25/19 4:23 PM, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> On 8/20/19 8:59 AM, Peter Eisentraut wrote:
>>> Running the regression tests on mingw32, I get the following diff in
>>> circle.out:
>>> -      | <(3,5),0>      | <(1,2),3>      | 0.60555127546399
>>> +      | <(3,5),0>      | <(1,2),3>      | 0.605551275463989
>> I complained about this a year ago:
>> <https://postgr.es/m/9f4f22be-f9f1-b350-bc06-521226b87f7a@dunslane.net>
>> +1 for fixing it by any reasonable means.
> Now that that fix is in, could we get a buildfarm member running on
> such a platform?  It seems to behave differently from anything else.
>


I'm pretty much tapped out for Windows resources, I have one physical
and one virtual machine which do nothing but run my 6 Windows based animals.


I don't know if the community has spare resources available from those
that Amazon donate to us. There is already one animal I manage running
there, so maybe another would be feasible.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: mingw32 floating point diff

From

Tom Lane

Date:

26 August 2019, 18:11:09

I wrote:
>> I'm very hesitant to apply a volatile-qualification approach to
>> eliminate those issues, for fear of pessimizing performance-critical
>> code on more modern platforms.  I wonder whether there is a reasonable
>> way to tell at compile time if we have a platform with 80-bit math.

> Hmmm ... I find that dromedary's compiler predefines __FLT_EVAL_METHOD__
> as 2 not 0 when -mfpmath=387 is given.  This seems to be something
> that was standardized in C99 (without the double underscores), so
> maybe we could do something like
> #if __FLT_EVAL_METHOD__ > 0 || FLT_EVAL_METHOD > 0

After further poking around, it seems that testing FLT_EVAL_METHOD
should be sufficient --- <float.h> appears to define that correctly
even in very old C99 installations.

However, I'm losing interest in the problem after finding that I can't
reproduce it anywhere except on dromedary (with "-mfpmath=387" added).

For instance, I have a 32-bit FreeBSD system with what claims to be
the same compiler (gcc 4.2.1), but it passes regression just fine,
with or without -mfpmath=387.  Earlier and later gcc versions also
don't show a problem.  I suspect that Apple bollixed something with
local mods to their version of 4.2.1, or possibly they are allowing
inlining of isinf() in a way that nobody else does.

Also, using that compiler with "-mfpmath=387", I see that every
supported PG version back to 9.4 fails regression due to not
detecting float8 multiply overflow.  So this isn't a problem that
we introduced with v12's changes, as I'd first suspected; and it's
not a problem that anyone is hitting in the field, or we'd have
heard complaints.

So, barring somebody showing that we have an issue on some platform
that people actually care about currently, I'm inclined not to do
anything more here.

            regards, tom lane