Thread: Re: [GENERAL] Trouble with float4 after upgrading from 6.5.3 to 7.0.2
"Romanenko Mikhail" <mikhail@angg.ru> writes: > testfloat=# update tftbl set f1=10.1 where f1=10 and f2=20; > UPDATE 1 > testfloat=# update tftbl set f2=20.2 where f1=10.1 and f2=20; > UPDATE 0 The second update is failing to find any tuple that satisfies f1 = 10.1, because f1 is a float4 variable whereas 10.1 is implicitly a float8 constant. 6.5 also treated 10.1 as float8, but managed to find equality anyway. I think this change in behavior is my fault :-(. About a year ago I cleaned up some ugly coding in float.c and (without thinking about it very carefully) changed float48eq and related operators so that float4-vs-float8 comparisons are done in float8 arithmetic not float4. The old code truncated the double input down to float and did a float equality check, while the new code promotes the float input to double and does a double-precision comparison. This behavior is arguably more correct than the old way from a purely mathematical point of view, but now that I think about it, it's not clear that it's more useful than the old way. In particular, in an example like the above, it's now impossible for any float4 value to be considered exactly equal to the float8 constant 10.1, because the float4 value just hasn't got the right low-order bits after widening. Perhaps the old way of considering equality only to float accuracy is more useful, even though it opens us up to problems like overflow errors in "float4var = 1e100". Comments anyone? A general comment on your table design though: anyone who expects exact equality tests on fractional float values to succeed is going to get burnt sooner or later. If you must use this style of coding then I recommend using numeric fields not float fields, and certainly not float4 fields. regards, tom lane
At 11:35 7/08/00 -0400, Tom Lane wrote: > >Perhaps the old way of considering equality only to float accuracy >is more useful, even though it opens us up to problems like overflow >errors in "float4var = 1e100". Comments anyone? > The following frightened me a little: pjw=# select float4(10.1); float4 -------- 10.1 (1 row) pjw=# select float8(float4(10.1)); float8 ------------------ 10.1000003814697 (1 row) I would have expected the latter to be at worst 10.10000000000000 +/- .00000000000001. Am I missing something? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Re: Re: [GENERAL] Trouble with float4 after upgrading from 6.5.3 to 7.0.2
From
Thomas Lockhart
Date:
> Perhaps the old way of considering equality only to float accuracy > is more useful, even though it opens us up to problems like overflow > errors in "float4var = 1e100". Comments anyone? I would not have anticipated this either. I agree that downconverting to float4 is the right solution. Possible overflow errors can be checked in advance using the macros or routines already there. This may be an example of why those could be A Good Thing in some instances. - Thomas
> I would have expected the latter to be at worst 10.10000000000000 +/- > .00000000000001. > Am I missing something? Well, yes :) 10.1 can't be represented exactly, so the float8 representation has bits set way down at the low end of the mantissa. When converting to float4 those low bits get rounded up or down into the lowest bit of the float4 representation. At that point, you have lost knowledge that this ever was supposed to be *exactly* 10.1. And when converting back to float8, that float4 low bit becomes a middle-range bit in the float8 representation, with all the bits underneath that zeroed. Back in the old days, before printf() implementations settled down, you would be reminded of this any time you did anything, since just assigning 10.1 and then printing it out would give you some goofy 10.099999999998 or 10.10000000001 (don't count the number of digits here too closely, they are only qualitatively correct). - Thomas
Philip Warner <pjw@rhyme.com.au> writes: > pjw=# select float8(float4(10.1)); > float8 > ------------------ > 10.1000003814697 > (1 row) > I would have expected the latter to be at worst 10.10000000000000 +/- > .00000000000001. float4 is good to about 7 decimal digits (24 mantissa bits) on IEEE-standard machines. Thus the above result is actually closer than you have any right to expect. Don't they teach people about float arithmetic in CS 101 anymore? regards, tom lane
At 12:11 7/08/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> pjw=# select float8(float4(10.1)); >> float8 >> ------------------ >> 10.1000003814697 >> (1 row) > >> I would have expected the latter to be at worst 10.10000000000000 +/- >> .00000000000001. > >float4 is good to about 7 decimal digits (24 mantissa bits) on >IEEE-standard machines. Thus the above result is actually closer >than you have any right to expect. > >Don't they teach people about float arithmetic in CS 101 anymore? > No idea. It's a couple of decades since I did it. I wasn't complaining about the float4 accuracy; I was complaining about the way it was converted to float8. It seems more intuitive to zero-extend base 10 zeros... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 16:12 7/08/00 +0000, Thomas Lockhart wrote: >> I would have expected the latter to be at worst 10.10000000000000 +/- >> .00000000000001. >> Am I missing something? > >10.1 can't be represented exactly, so the float8 representation has bits >set way down at the low end of the mantissa. When converting to float4 >those low bits get rounded up or down into the lowest bit of the float4 >representation. At that point, you have lost knowledge that this ever >was supposed to be *exactly* 10.1. And when converting back to float8, >that float4 low bit becomes a middle-range bit in the float8 >representation, with all the bits underneath that zeroed. > Now I understand, but it doesn't quite make sense given what was displayed. The float4 value is *displayed* as 10.1, not 10.1000001, so I had assumed that there was a level of either accuracy or display rouding happening. When this value is converted to float8, I hoped that the result would be the same as: Cast( Cast(f4val as varchar(32)) as float8) Maybe this hope is naieve, but it it a lot more useful than the current situation. But now that I understand what is happening, I see that (short of varchar conversions!), it is probably quite hard to do since we can't tell the 'correct' value. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Now I understand, but it doesn't quite make sense given what was displayed. > The float4 value is *displayed* as 10.1, not 10.1000001, so I had assumed > that there was a level of either accuracy or display rouding happening. In float4-to-ASCII, yes. Modern printf implementations have some heuristics about the actual accuracy of float4 and float8 and where they ought to round off the printed result accordingly. But float4 to float8 is normally done just by appending zeroes to the mantissa. I suppose we could implement the conversion as "float8in(float4out(x))" instead of "(double) x" but it'd be several orders of magnitude slower, as well as being *less* useful to those who know what they're doing with float math (since the result would actually be a less-precise conversion except in cases where the intended value has a short decimal representation). After thinking about it some more, I'm of the opinion that the current behavior of float48eq and friends is the right thing, and that people who expect 10.1 to be an exact value should be told to use type NUMERIC. We should not kluge up the behavior of the float operations to try to make it look like inexact values are exact. That will just cause failures in other situations. regards, tom lane
At 02:15 AM 8/8/00 +1000, Philip Warner wrote: >I wasn't complaining about the float4 accuracy; I was complaining about the >way it was converted to float8. It seems more intuitive to zero-extend base >10 zeros... You're right, it absolutely is more intuitive - in a base-10 representation, i.e. NUMERIC. A float4 is a binary number, and the only zeros available are binary ones. The same is true of the non-zeros, for that matter. Not that it matters, adding zero to the right of any number doesn't add significance, something they pound into people's heads in the physical sciences. Doing a type conversion from float4 to float8 is in general not a safe thing to do, because you only can depend on 24 bits of mantissa significance afterwards anyway. One such conversion will propagate that lesser significance all throughout the expressions using it. Take great care when you do this. As Tom pointed out you're getting 8 digits of decimal significance in your example (10.100000) due to the particular number involved. You can only expect 24/log2(10) digits, which as he points out is just 7 digits plus change. The basic problem is that we evolved with 10 fingers, rather than 8 or 16 :) - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
At 12:53 7/08/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> Now I understand, but it doesn't quite make sense given what was displayed. >> The float4 value is *displayed* as 10.1, not 10.1000001, so I had assumed >> that there was a level of either accuracy or display rouding happening. > >I suppose we could implement the conversion as "float8in(float4out(x))" >instead of "(double) x" but it'd be several orders of magnitude slower, >as well as being *less* useful to those who know what they're doing with >float math (since the result would actually be a less-precise conversion >except in cases where the intended value has a short decimal >representation). Would I be right in saying that "those who know what they're doing with float math" would totally avoid mixing float4 & float8? If so, then ISTM that any changes to float4/8 conversions should not affect them. There seem to be a few choices: - go with zero-extending the bits. This is easy, and what would be expected for normal float ops, at least by people who understand float implementations. - do an intermediate text or numeric conversion. This will produce more expected results, but at the expense of speed. If people complain about speed, then they can change all float types to matching precision, or use numeric data types. - take the code from 'printf' or 'numeric' and do the appropriate bit-munging to get the value to use in conversion. No idea if this would work, but it is probably better than doing a text conversion since we won't be at the mercy of the occasional C library that produces 10.1000001. Would it be worth having some kind of DB setting for how it handles float4/8 conversion? Or is it just too much work, when just using all float8 or numeric is an acceptable workaround? Do you know how fast 'numeric' is? >That will just cause >failures in other situations. If there are genuine failures that would be introduced, then clearly it's a bad idea. But, since it will only affect people who compare float4/8, it doesn't seem too likely to produce worse failures than the change you have already made. I ask this mainly out of curiosity - I assume there are more aspects to this issue that I have missed... Bye for now, Philip. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 01:37 PM 8/8/00 +1000, Philip Warner wrote: >- do an intermediate text or numeric conversion. This will produce more >expected results By who? I'm serious. I sure wouldn't. I can't think of any language implementation that does this. The standard approach has the advantage of maintaining a defined significance. The approach you suggest doesn't, you're actually losing significance. It gives the illusion of increasing for the particular example you've chosen, but it is nothing but illusion. >Would it be worth having some kind of DB setting for how it handles >float4/8 conversion? Use type numeric when you need precise decimal results. Your suggested kludge won't give you what you want. >Do you know how fast 'numeric' is? Not as fast as float by any means, but there's a reason why they exist in all languages which include the financial sphere in their presumed application space. The simplest thing is to realize that using float4 leaves you with just over 7 significant digits, and to only print out 7 digits. Then you'll get the answer you expect (10.100000). - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
At 05:50 8/08/00 -0700, Don Baccus wrote: > >The simplest thing is to realize that using float4 leaves you with >just over 7 significant digits, and to only print out 7 digits. >Then you'll get the answer you expect (10.100000). > You may have missed the point; my suggestions are only aimed at changing the results of float4/float8 conversions & comparisons. My (very vague) recollections of this stuff is that the machine representation is only guaranteed to be within a certain machine/language accuracy, so the stored value is within +/-(machine error) of the 'real value'. Further, my recollection is that one or more bits are usually used to provide rounding information so that, eg., the 7 digit representations are consistent. Given this, I have assumed that printf etc use these least significant bits to determine the 'correct' representation. The idea is to do exactly the same in converting float4 to float8, so that: '4.1'::float4 = '4.1'::float8 will be true. Maybe my recollection is false... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Given this, I have assumed that printf etc use these least significant bits > to determine the 'correct' representation. No. What float4-to-text really does is *discard* information, by rounding off the printed result to only 7 digits (when there are actually 7-and-change in there). This means values that are actually distinct float4 values may get printed as the same thing: regression=# select 1.234567 :: float4; ?column? ---------- 1.23457 (1 row) regression=# select 1.234568 :: float4; ?column? ---------- 1.23457 (1 row) regression=# select 1.234567 :: float4 = 1.234568 :: float4; ?column? ---------- f (1 row) regression=# select 1.234567 :: float4 - 1.234568 :: float4; ?column? -------------- -9.53674e-07 (1 row) I don't much care for this behavior (since it means dump and reload of float columns is lossy), and I certainly won't hold still for introducing it into other operations on floats. regards, tom lane
At 10:04 AM 8/8/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> Given this, I have assumed that printf etc use these least significant bits >> to determine the 'correct' representation. > >No. What float4-to-text really does is *discard* information, by >rounding off the printed result to only 7 digits (when there are >actually 7-and-change in there). Which is the standard approach for such conversion routines. If you keep generating digits they're just garbage anyway. As far as the rest, Phil, it is true that well-designed floating-point hardware such as that which follows the IEEE spec (well, at least better-designed compared to most of its predecessors) strictly specifies how extra rounding information is to be used during various mathematical operations (add, multiply, etc). This is done so that error due to rounding can be strictly bounded. The result (for float4) is a 24-bit mantissa with strictly defined significance. However, the float4 itself once stored only consists of that 24-bit mantissa. There's no way to know the history of how that 24th bit was generated, i.e. whether all the bits to the right were exactly zero or whether it was the result of rounding (or truncation if the user specified it and the hardware supports it). Kludging conversion by using decimal conversion will simply lose significance. In your 10.1 case you'll be happy because that 24th bit becomes zero. All you've accomplished, though, is to throw away (at least) one bit. Your float8 now has no more than 23 bits of significance rather than 24. Repeat this process a few times and you could store the result in a boolean, in terms of the bits you could guarantee to be significant ... >I don't much care for this behavior (since it means dump and reload of >float columns is lossy), A good reason for binary backup programs! > and I certainly won't hold still for >introducing it into other operations on floats. No, it flies in the face of not only convention, but a lot of investigation into how to implement floating point arithmetic in a way that's useful to those who have to depend on the results having mathematically definable error bounds. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
At 10:04 8/08/00 -0400, Tom Lane wrote: > >No. What float4-to-text really does is *discard* information, by >rounding off the printed result to only 7 digits (when there are >actually 7-and-change in there). This means values that are actually >distinct float4 values may get printed as the same thing: > Thanks guys for some remarkably patient explanations. I now know more than I want to know about float values. >I don't much care for this behavior (since it means dump and reload of >float columns is lossy), and I certainly won't hold still for >introducing it into other operations on floats. This makes me think that some kind of binary dump in pg_dump is probably not a bad idea. Has anybody looked at doing a cross-platform binary COPY? Or some other way of representing base types - we have <type>in/out maybe <type>exp/imp (export/import) might be useful to get a portable, lossless representation. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 01:40 AM 8/9/00 +1000, Philip Warner wrote: >This makes me think that some kind of binary dump in pg_dump is probably >not a bad idea. Has anybody looked at doing a cross-platform binary COPY? >Or some other way of representing base types - we have <type>in/out maybe ><type>exp/imp (export/import) might be useful to get a portable, lossless >representation. Another way to do it is to dump/restore floats in hex, maintaining the actual binary values. Conversion to hex, unlike conversion to decimal, is exact (16 is a power of 2 while 10 is not, to add to your "more knowledge than you want" about floats!) - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.