Thread: Trouble with float4 after upgrading from 6.5.3 to 7.0.2
After trying to upgrade PostgreSQL from 6.5.3 to 7.0.2 I got into trouble with float4. I'll try to explain it by example. postgres@lee: ~$ createdb -E LATIN1 -e testfloat CREATE DATABASE "testfloat" WITH ENCODING = 'LATIN1' CREATE DATABASE postgres@lee: ~$ psql testfloat testfloat=# create table tftbl (f1 float4, f2 float4); CREATE testfloat=# insert into tftbl values (10, 20); INSERT 212682 1 testfloat=# select * from tftbl; f1 | f2 ----+---- 10 | 20 (1 row) testfloat=# update tftbl set f1=10.1 where f1=10 and f2=20; UPDATE 1 testfloat=# update tftbl set f2=20.2 where f1=10.1 and f2=20; UPDATE 0 testfloat=# select * from tftbl; f1 | f2 ------+---- 10.1 | 20 (1 row) testfloat=# update tftbl set f2=20.2 where f1=float4(10.1) and f2=20; UPDATE 1 testfloat=# select * from tftbl; f1 | f2 ------+------ 10.1 | 20.2 (1 row) In my real client application (Windows 98, Borland C++ Builder 5.0, BDE 5.1.1.1, PostODBC 06.50.0000) I cannot in all casesuse expressions like f1=float4(10.1) instead of simple f1=10.1 because BDE and PostODBC construct queries by themselveswhen I, for example, update tables from BCB components (BDE complains, that another user changed the record, whileI am the only user at the time). They use f1=10.1-like format. With PostgeSQL 6.5.3 I have no problem of the kind. PostgreSQL lives in Debian Linux (woody), kernel 2.2.14, libc6 2.1.3, locales 2.1.3, here is output of locale: lee:~# locale LANG=C LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_ALL= Any help, tip or hint would be appreciated. Thank you, Mikhail.
"Romanenko Mikhail" <mikhail@angg.ru> writes: > testfloat=# update tftbl set f1=10.1 where f1=10 and f2=20; > UPDATE 1 > testfloat=# update tftbl set f2=20.2 where f1=10.1 and f2=20; > UPDATE 0 The second update is failing to find any tuple that satisfies f1 = 10.1, because f1 is a float4 variable whereas 10.1 is implicitly a float8 constant. 6.5 also treated 10.1 as float8, but managed to find equality anyway. I think this change in behavior is my fault :-(. About a year ago I cleaned up some ugly coding in float.c and (without thinking about it very carefully) changed float48eq and related operators so that float4-vs-float8 comparisons are done in float8 arithmetic not float4. The old code truncated the double input down to float and did a float equality check, while the new code promotes the float input to double and does a double-precision comparison. This behavior is arguably more correct than the old way from a purely mathematical point of view, but now that I think about it, it's not clear that it's more useful than the old way. In particular, in an example like the above, it's now impossible for any float4 value to be considered exactly equal to the float8 constant 10.1, because the float4 value just hasn't got the right low-order bits after widening. Perhaps the old way of considering equality only to float accuracy is more useful, even though it opens us up to problems like overflow errors in "float4var = 1e100". Comments anyone? A general comment on your table design though: anyone who expects exact equality tests on fractional float values to succeed is going to get burnt sooner or later. If you must use this style of coding then I recommend using numeric fields not float fields, and certainly not float4 fields. regards, tom lane
At 11:35 7/08/00 -0400, Tom Lane wrote: > >Perhaps the old way of considering equality only to float accuracy >is more useful, even though it opens us up to problems like overflow >errors in "float4var = 1e100". Comments anyone? > The following frightened me a little: pjw=# select float4(10.1); float4 -------- 10.1 (1 row) pjw=# select float8(float4(10.1)); float8 ------------------ 10.1000003814697 (1 row) I would have expected the latter to be at worst 10.10000000000000 +/- .00000000000001. Am I missing something? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Re: [HACKERS] Re: Trouble with float4 after upgrading from 6.5.3 to 7.0.2
From
Thomas Lockhart
Date:
> Perhaps the old way of considering equality only to float accuracy > is more useful, even though it opens us up to problems like overflow > errors in "float4var = 1e100". Comments anyone? I would not have anticipated this either. I agree that downconverting to float4 is the right solution. Possible overflow errors can be checked in advance using the macros or routines already there. This may be an example of why those could be A Good Thing in some instances. - Thomas
> I would have expected the latter to be at worst 10.10000000000000 +/- > .00000000000001. > Am I missing something? Well, yes :) 10.1 can't be represented exactly, so the float8 representation has bits set way down at the low end of the mantissa. When converting to float4 those low bits get rounded up or down into the lowest bit of the float4 representation. At that point, you have lost knowledge that this ever was supposed to be *exactly* 10.1. And when converting back to float8, that float4 low bit becomes a middle-range bit in the float8 representation, with all the bits underneath that zeroed. Back in the old days, before printf() implementations settled down, you would be reminded of this any time you did anything, since just assigning 10.1 and then printing it out would give you some goofy 10.099999999998 or 10.10000000001 (don't count the number of digits here too closely, they are only qualitatively correct). - Thomas
Philip Warner <pjw@rhyme.com.au> writes: > pjw=# select float8(float4(10.1)); > float8 > ------------------ > 10.1000003814697 > (1 row) > I would have expected the latter to be at worst 10.10000000000000 +/- > .00000000000001. float4 is good to about 7 decimal digits (24 mantissa bits) on IEEE-standard machines. Thus the above result is actually closer than you have any right to expect. Don't they teach people about float arithmetic in CS 101 anymore? regards, tom lane
At 12:11 7/08/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> pjw=# select float8(float4(10.1)); >> float8 >> ------------------ >> 10.1000003814697 >> (1 row) > >> I would have expected the latter to be at worst 10.10000000000000 +/- >> .00000000000001. > >float4 is good to about 7 decimal digits (24 mantissa bits) on >IEEE-standard machines. Thus the above result is actually closer >than you have any right to expect. > >Don't they teach people about float arithmetic in CS 101 anymore? > No idea. It's a couple of decades since I did it. I wasn't complaining about the float4 accuracy; I was complaining about the way it was converted to float8. It seems more intuitive to zero-extend base 10 zeros... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 16:12 7/08/00 +0000, Thomas Lockhart wrote: >> I would have expected the latter to be at worst 10.10000000000000 +/- >> .00000000000001. >> Am I missing something? > >10.1 can't be represented exactly, so the float8 representation has bits >set way down at the low end of the mantissa. When converting to float4 >those low bits get rounded up or down into the lowest bit of the float4 >representation. At that point, you have lost knowledge that this ever >was supposed to be *exactly* 10.1. And when converting back to float8, >that float4 low bit becomes a middle-range bit in the float8 >representation, with all the bits underneath that zeroed. > Now I understand, but it doesn't quite make sense given what was displayed. The float4 value is *displayed* as 10.1, not 10.1000001, so I had assumed that there was a level of either accuracy or display rouding happening. When this value is converted to float8, I hoped that the result would be the same as: Cast( Cast(f4val as varchar(32)) as float8) Maybe this hope is naieve, but it it a lot more useful than the current situation. But now that I understand what is happening, I see that (short of varchar conversions!), it is probably quite hard to do since we can't tell the 'correct' value. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Now I understand, but it doesn't quite make sense given what was displayed. > The float4 value is *displayed* as 10.1, not 10.1000001, so I had assumed > that there was a level of either accuracy or display rouding happening. In float4-to-ASCII, yes. Modern printf implementations have some heuristics about the actual accuracy of float4 and float8 and where they ought to round off the printed result accordingly. But float4 to float8 is normally done just by appending zeroes to the mantissa. I suppose we could implement the conversion as "float8in(float4out(x))" instead of "(double) x" but it'd be several orders of magnitude slower, as well as being *less* useful to those who know what they're doing with float math (since the result would actually be a less-precise conversion except in cases where the intended value has a short decimal representation). After thinking about it some more, I'm of the opinion that the current behavior of float48eq and friends is the right thing, and that people who expect 10.1 to be an exact value should be told to use type NUMERIC. We should not kluge up the behavior of the float operations to try to make it look like inexact values are exact. That will just cause failures in other situations. regards, tom lane
> I suppose we could implement the conversion as "float8in(float4out(x))" > instead of "(double) x" but it'd be several orders of magnitude slower, > as well as being *less* useful to those who know what they're doing with > float math (since the result would actually be a less-precise conversion > except in cases where the intended value has a short decimal > representation). We only need to maintain the lower-order bit(s). Seems this could be done a lot easier than by an ascii in-between. Is there a reason we can't perform the conversion and then copy the low-order bits manually, with some bit-shifting and masking? Ian
<vectro@pipeline.com> writes: > Is there a reason we can't perform the conversion and then copy the > low-order bits manually, with some bit-shifting and masking? *What* low-order bits? The fundamental problem is we don't have 'em. regards, tom lane
> > Is there a reason we can't perform the conversion and then copy the > > low-order bits manually, with some bit-shifting and masking? > > *What* low-order bits? The fundamental problem is we don't have 'em. OK, we represent 10.1 (decimal) with 1010.000110011..., repeating the 0011 pattern. In floating point representation, we say this is 2^3 * 1.010000110011..., repeating the 0011 pattern. But this dosen't work very well, because when we print it out it won't be exact, it will be some decimal number ending in a 0 or a 5. So when we read this out in decimal, we get 10.099999995, with a variable number of 9's depending on the precision we use. When we print this number out, we perform the decimal conversion, and then truncate the last decimal digit and round. So I guess the question is, why can't we perform 4-bit float -> 8-bit float conversion via a decimal conversion, irrespective of storing the thing in ASCII. Mabye the decimal conversion is too costly? Perhaps we could 1) mark 4-bit floats with a flag of some kind indicating whether or not the decimal conversion is necessary, and 2) avoid this conversion wherever possible, including giving people a warning when they use float4s in their tuples. Or mabye I'm just being dumb. Ian
At 12:53 7/08/00 -0400, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> Now I understand, but it doesn't quite make sense given what was displayed. >> The float4 value is *displayed* as 10.1, not 10.1000001, so I had assumed >> that there was a level of either accuracy or display rouding happening. > >I suppose we could implement the conversion as "float8in(float4out(x))" >instead of "(double) x" but it'd be several orders of magnitude slower, >as well as being *less* useful to those who know what they're doing with >float math (since the result would actually be a less-precise conversion >except in cases where the intended value has a short decimal >representation). Would I be right in saying that "those who know what they're doing with float math" would totally avoid mixing float4 & float8? If so, then ISTM that any changes to float4/8 conversions should not affect them. There seem to be a few choices: - go with zero-extending the bits. This is easy, and what would be expected for normal float ops, at least by people who understand float implementations. - do an intermediate text or numeric conversion. This will produce more expected results, but at the expense of speed. If people complain about speed, then they can change all float types to matching precision, or use numeric data types. - take the code from 'printf' or 'numeric' and do the appropriate bit-munging to get the value to use in conversion. No idea if this would work, but it is probably better than doing a text conversion since we won't be at the mercy of the occasional C library that produces 10.1000001. Would it be worth having some kind of DB setting for how it handles float4/8 conversion? Or is it just too much work, when just using all float8 or numeric is an acceptable workaround? Do you know how fast 'numeric' is? >That will just cause >failures in other situations. If there are genuine failures that would be introduced, then clearly it's a bad idea. But, since it will only affect people who compare float4/8, it doesn't seem too likely to produce worse failures than the change you have already made. I ask this mainly out of curiosity - I assume there are more aspects to this issue that I have missed... Bye for now, Philip. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Tom Lane <tgl@sss.pgh.pa.us> writes: > A general comment on your table design though: anyone who expects exact > equality tests on fractional float values to succeed is going to get > burnt sooner or later. If you must use this style of coding then > I recommend using numeric fields not float fields, and certainly not > float4 fields. > > regards, tom lane I try 'numeric', 'decimal', and 'float8' types and only 'float8' works. Both 'decimal' and 'numeric' failed (as 'float4'did) with error message: "Unable to identify an operator '=' for type numeric and 'float8' You will have to retypethis query using an explicite cast". And though my problem is solved, thank you for the help, I thought perhaps this information will be usefull for you. Mikhail.
At 05:50 8/08/00 -0700, Don Baccus wrote: > >The simplest thing is to realize that using float4 leaves you with >just over 7 significant digits, and to only print out 7 digits. >Then you'll get the answer you expect (10.100000). > You may have missed the point; my suggestions are only aimed at changing the results of float4/float8 conversions & comparisons. My (very vague) recollections of this stuff is that the machine representation is only guaranteed to be within a certain machine/language accuracy, so the stored value is within +/-(machine error) of the 'real value'. Further, my recollection is that one or more bits are usually used to provide rounding information so that, eg., the 7 digit representations are consistent. Given this, I have assumed that printf etc use these least significant bits to determine the 'correct' representation. The idea is to do exactly the same in converting float4 to float8, so that: '4.1'::float4 = '4.1'::float8 will be true. Maybe my recollection is false... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Given this, I have assumed that printf etc use these least significant bits > to determine the 'correct' representation. No. What float4-to-text really does is *discard* information, by rounding off the printed result to only 7 digits (when there are actually 7-and-change in there). This means values that are actually distinct float4 values may get printed as the same thing: regression=# select 1.234567 :: float4; ?column? ---------- 1.23457 (1 row) regression=# select 1.234568 :: float4; ?column? ---------- 1.23457 (1 row) regression=# select 1.234567 :: float4 = 1.234568 :: float4; ?column? ---------- f (1 row) regression=# select 1.234567 :: float4 - 1.234568 :: float4; ?column? -------------- -9.53674e-07 (1 row) I don't much care for this behavior (since it means dump and reload of float columns is lossy), and I certainly won't hold still for introducing it into other operations on floats. regards, tom lane
At 10:04 8/08/00 -0400, Tom Lane wrote: > >No. What float4-to-text really does is *discard* information, by >rounding off the printed result to only 7 digits (when there are >actually 7-and-change in there). This means values that are actually >distinct float4 values may get printed as the same thing: > Thanks guys for some remarkably patient explanations. I now know more than I want to know about float values. >I don't much care for this behavior (since it means dump and reload of >float columns is lossy), and I certainly won't hold still for >introducing it into other operations on floats. This makes me think that some kind of binary dump in pg_dump is probably not a bad idea. Has anybody looked at doing a cross-platform binary COPY? Or some other way of representing base types - we have <type>in/out maybe <type>exp/imp (export/import) might be useful to get a portable, lossless representation. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/