Thread: Incorrect rounding of double values at max precision
Hi,
When storing a double in Postgres, it looks like under specific circumstances it can get rounded incorrectly:
select round(cast(float8 '42258656681.38498' as numeric), 2), round(numeric '42258656681.38498', 2);
which returns either 42258656681.38 or 42258656681.39 depending on whether it is float8 or not.
Thanks,
Gilleain
Gilleain Torrance
Consultant
Alfa
e: Gilleain.Torrance@alfasystems.com | w: alfasystems.com
t: +44 (0)20 7920 2855 | Moor Place, 1 Fore Street Avenue, London, EC2Y 9DT,
The contents of this communication are not intended to be binding or constitute any form of offer or acceptance or give rise to any legal obligations on behalf of the sender or Alfa. The views or opinions expressed represent those of the author and not necessarily those of Alfa. This email and any attachments are strictly confidential and are intended solely for use by the individual or entity to whom it is addressed. If you are not the addressee (or responsible for delivery of the message to the addressee) you may not copy, forward, disclose or use any part of the message or its attachments. At present the integrity of email across the internet cannot be guaranteed and messages sent via this medium are potentially at risk. All liability is excluded to the extent permitted by law for any claims arising as a result of the use of this medium to transmit information by or to Alfa or its affiliates.
Alfa Financial Software Ltd
Reg. in England No: 0248 2325
Gilleain Torrance <Gilleain.Torrance@alfasystems.com> writes: > When storing a double in Postgres, it looks like under specific circumstances it can get rounded incorrectly: > select round(cast(float8 '42258656681.38498' as numeric), 2), round(numeric '42258656681.38498', 2); > which returns either 42258656681.38 or 42258656681.39 depending on whether it is float8 or not. I think this is behaving as expected. float8-to-numeric conversion rounds the float8 to 15 (DBL_DIG) decimal places, since that's as much precision as you're guaranteed to have. So what comes out of the cast is regression=# select cast(float8 '42258656681.38498' as numeric); numeric ----------------- 42258656681.385 (1 row) and then that rounds up to 42258656681.39. In the other case you have an exact numeric value of 42258656681.38498, so it's unsurprisingly rounded to 42258656681.38. You could quibble about whether numeric round() ought to apply round-up or round-to-nearest-even when dealing with exact halfway cases. If it did the latter, this particular case would match up, but other cases would not, so I don't think it's a helpful proposal for this issue. The other thing we could conceivably do is ask sprintf for more digits. But since those extra digit(s) aren't fully precise, I'm afraid that would likewise introduce as many oddities as it fixes. Still, it's somewhat interesting to wonder whether applying the Ryu algorithm would produce better or worse results on average. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Gilleain Torrance <Gilleain.Torrance@alfasystems.com> writes: >> When storing a double in Postgres, it looks like under specific >> circumstances it can get rounded incorrectly: >> select round(cast(float8 '42258656681.38498' as numeric), 2), >> round(numeric '42258656681.38498', 2); >> which returns either 42258656681.38 or 42258656681.39 depending on >> whether it is float8 or not. Tom> I think this is behaving as expected. float8-to-numeric conversion Tom> rounds the float8 to 15 (DBL_DIG) decimal places, since that's as Tom> much precision as you're guaranteed to have. Yes. This came up for discussion in the Ryu patch, but did not get much input; I think some sort of case could be made for making the casts exact, but the cast can't look at a config GUC without losing its immutability, and changing the value could have an effect on functional indexes. So I ended up not touching that at all. Tom> The other thing we could conceivably do is ask sprintf for more Tom> digits. But since those extra digit(s) aren't fully precise, I'm Tom> afraid that would likewise introduce as many oddities as it fixes. Tom> Still, it's somewhat interesting to wonder whether applying the Tom> Ryu algorithm would produce better or worse results on average. Hmm. The Ryu output values will still throw out edge cases similar to the above; for example, 502.15::float8 / 10 = 50.214999999999996 whereas 502.15::numeric / 10 = 50.215, so rounding the result of that to 2 digits will give a different result. Perhaps it would make more sense for the float8 to numeric cast to look at the requested typmod and use that for the conversion? That way we could make casts like fltval::numeric(20,2) or whatever produce the correct result without any double-rounding issues. But the nature of floating point means that this would still throw out occasionally unexpected values (e.g. the 502.15::float8/10 example would still give 50.21 for a 2-digit result rather than 50.22). (502.15::float8 is exactly 502.14999999999997726263245567679405212402343750) I also did consider adding functions to convert a float8 value to the _exact_ numeric that it represents. This is easy enough to write using numeric arithmetic (I have SQL versions that I used extensively when testing the Ryu code) but the performance isn't exceptionally good. Might be good enough for many applications, though. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> The other thing we could conceivably do is ask sprintf for more > Tom> digits. But since those extra digit(s) aren't fully precise, I'm > Tom> afraid that would likewise introduce as many oddities as it fixes. > Tom> Still, it's somewhat interesting to wonder whether applying the > Tom> Ryu algorithm would produce better or worse results on average. > Hmm. > The Ryu output values will still throw out edge cases similar to the > above; for example, 502.15::float8 / 10 = 50.214999999999996 whereas > 502.15::numeric / 10 = 50.215, so rounding the result of that to 2 > digits will give a different result. Yeah. Worse, casting that to numeric currently gives the "correct" result: regression=# select (502.15::float8 / 10)::numeric; numeric --------- 50.215 (1 row) while if we changed float8_numeric to apply Ryu, the result would be 50.214999999999996. So that's not great. But there are other cases where the result would be better than before, such as the OP's example of 42258656681.38498::float8. I'd like to get my hands around how many "better" and "worse" cases there would be, but I'm not sure how to attack that question. > Perhaps it would make more sense for the float8 to numeric cast to look > at the requested typmod and use that for the conversion? As things stand right now, float8_numeric has no idea what the target typmod is; any typmod-driven rounding happens in a separate function call afterwards. I don't recall whether the parser's casting infrastructure could support merging those steps, and I'm not sure it matters in most cases. Commonly, we don't have a target typmod. (Still, if we do, having two separate rounding steps isn't nice.) regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> Perhaps it would make more sense for the float8 to numeric cast to >> look at the requested typmod and use that for the conversion? Tom> As things stand right now, float8_numeric has no idea what the Tom> target typmod is; any typmod-driven rounding happens in a separate Tom> function call afterwards. I don't recall whether the parser's Tom> casting infrastructure could support merging those steps, As far as I can tell it does; it looks at whether the cast function takes a typmod parameter, and if it does, it passes the typmod, otherwise it generates a separate typmod coercion and stacks that on top of the cast proper. So changing the function declaration to include a typmod parameter, and using it, should just work... but I've not tested it yet. Tom> and I'm not sure it matters in most cases. Commonly, we don't have Tom> a target typmod. (Still, if we do, having two separate rounding Tom> steps isn't nice.) So we'd still need to decide what to do in the no-typmod case. -- Andrew (irc:RhodiumToad)
> I think this is behaving as expected. float8-to-numeric conversion rounds the float8 to 15 (DBL_DIG) decimal places, since that's as much precision as you're guaranteed to have. Perhaps, but the original path we took to find this was through JDBC, specifically the one from https://jdbc.postgresql.org/.Tracking the double (42258656681.38498) through the driver shows that it is converted from javadouble to float8 bytes, and is then stored as 42258656681.39. Which is definitely not expected, even if it can be explainedby double rounding. The half-up/half-down of the rounding may be a side issue, although also important. A simple test like this will show the behaviour we see: Connection conn = getConnection(); // some db connection PreparedStatement pstmt = conn.prepareStatement("INSERT into mytable VALUES (?, ?)"); double x = 4.225865668138498E10; int id = 123; pstmt.setObject(1, id); pstmt.setDouble(2, x); pstmt.execute(); where mytable just has a decimal(13, 2) column and an integer id. When selected afterwards, we get the 42258656681.39 valueinstead of an expected value of 42258656681.38. thanks gilleain -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: 21 October 2019 19:43 To: Gilleain Torrance <Gilleain.Torrance@alfasystems.com> Cc: pgsql-bugs@lists.postgresql.org; Andrew Gierth <andrew@tao11.riddles.org.uk> Subject: Re: Incorrect rounding of double values at max precision Gilleain Torrance <Gilleain.Torrance@alfasystems.com> writes: > When storing a double in Postgres, it looks like under specific circumstances it can get rounded incorrectly: > select round(cast(float8 '42258656681.38498' as numeric), 2), round(numeric '42258656681.38498', 2); > which returns either 42258656681.38 or 42258656681.39 depending on whether it is float8 or not. I think this is behaving as expected. float8-to-numeric conversion rounds the float8 to 15 (DBL_DIG) decimal places, since that's as much precision as you're guaranteed to have. So what comes out of the cast is regression=# select cast(float8 '42258656681.38498' as numeric); numeric ----------------- 42258656681.385 (1 row) and then that rounds up to 42258656681.39. In the other case you have an exact numeric value of 42258656681.38498, so it's unsurprisingly rounded to 42258656681.38. You could quibble about whether numeric round() ought to apply round-up or round-to-nearest-even when dealing with exact halfway cases. If it did the latter, this particular case would match up, but other cases would not, so I don't think it's a helpful proposal for this issue. The other thing we could conceivably do is ask sprintf for more digits. But since those extra digit(s) aren't fully precise, I'm afraid that would likewise introduce as many oddities as it fixes. Still, it's somewhat interesting to wonder whether applying the Ryu algorithm would produce better or worse results on average. regards, tom lane
>>>>> "Gilleain" == Gilleain Torrance <Gilleain.Torrance@alfasystems.com> writes: Gilleain> A simple test like this will show the behaviour we see: Gilleain> Connection conn = getConnection(); // some db connection Gilleain> PreparedStatement pstmt = conn.prepareStatement("INSERT into mytable VALUES (?, ?)"); Gilleain> double x = 4.225865668138498E10; Gilleain> int id = 123; Gilleain> pstmt.setObject(1, id); Gilleain> pstmt.setDouble(2, x); Gilleain> pstmt.execute(); Gilleain> where mytable just has a decimal(13, 2) column and an integer Gilleain> id. I do not see any way to make that code work as you apparently expect in all cases. Here is a simple counterexample: imagine setting x to 502.215 instead. Currently, that will insert a value of 502.22 into the table, as you would expect. If we fixed your example above by doing the conversion with maximum precision, then 502.215 would instead be inserted as 502.21, because the actual float value that represents 502.215 is equal to 502.21499999999997498889570124447345733642578125. Or for another example, 4.225865668139500E10 currently inserts as 42258656681.40 in your code, but using maximum precision would cause it to insert as 42258656681.39 instead (the true value of a float8 '4.225865668139500E10' is 42258656681.39499664306640625.) So while what we currently do is arguably wrong since it's doing two rounding steps, fixing it wouldn't actually help your problem but would just move the failure cases to different values. -- Andrew (irc:RhodiumToad)
Hi, Re-found this thread due to https://postgr.es/m/CH2PR19MB3798B24BCC34D3F9949F629C83000%40CH2PR19MB3798.namprd19.prod.outlook.com On 2019-10-21 22:41:11 -0400, Tom Lane wrote: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > > The Ryu output values will still throw out edge cases similar to the > > above; for example, 502.15::float8 / 10 = 50.214999999999996 whereas > > 502.15::numeric / 10 = 50.215, so rounding the result of that to 2 > > digits will give a different result. > > Yeah. Worse, casting that to numeric currently gives the "correct" > result: > > regression=# select (502.15::float8 / 10)::numeric; > numeric > --------- > 50.215 > (1 row) > > while if we changed float8_numeric to apply Ryu, the result would be > 50.214999999999996. So that's not great. But there are other cases > where the result would be better than before, such as the OP's example > of 42258656681.38498::float8. I'd like to get my hands around how > many "better" and "worse" cases there would be, but I'm not sure how > to attack that question. The above bug report has a, in my opinion, pretty egregious cases of wrongness. From the bug: postgres[1296822][1]=# select '1234567'::float4::numeric; ┌─────────┐ │ numeric │ ├─────────┤ │ 1234570 │ └─────────┘ (1 row) It seems crazy that we throw away integer precision in the range it's guaranteed to be accurate (that's what, −16777216 to 16777216 for float4, more than a magnitude larger than this value). It does feel different to me that we're throwing away precision that we could know is not just the result of floating point imprecision. I can't really see outputting a bunch of "fake precision" post decimal points digits being comparably bad? I didn't find much discussion about how we could deal with pg_upgrade issues if we were to change the behaviour. Detect indexes involving such casts, and mark them as invalid? Greetings, Andres Freund
[ btw, thanks for finding this thread; I searched for relevant discussion earlier today and couldn't find it ] Andres Freund <andres@anarazel.de> writes: > It seems crazy that we throw away integer precision in the range it's > guaranteed to be accurate (that's what, −16777216 to 16777216 for > float4, more than a magnitude larger than this value). It does feel > different to me that we're throwing away precision that we could know is > not just the result of floating point imprecision. Meh. Yeah, we could improve the observed results for float4 values that are integers between 1M and 16M, and some similarly-sized band for float8; but to what end? The most likely practical result is just to postpone the user's discovery that they're Doing It Wrong. If you expect exact answers out of float calculations then you are going to learn an expensive lesson sooner or later. Better sooner, before you've stored even more inexact data that you cannot fix. > I didn't find much discussion about how we could deal with pg_upgrade > issues if we were to change the behaviour. Detect indexes involving such > casts, and mark them as invalid? For what that's worth, I do not think I buy the argument that float4_numeric's behavior can't ever be changed because it's marked immutable. Compare other recent discussions about what "immutable" really means; or compare the fact that various text search functions are marked immutable despite being quite dependent on config files that we don't even track. If we think this is actually an improvement I'd be fine with changing it in a major release, and documenting that users should reindex any indexes whose semantics are affected (of which there'd be epsilon in the real world, anyway). But I'm not convinced that changing this is doing anything except putting more lipstick on the pig. float is imprecise, and we do no one a service by trying to hide that. regards, tom lane
Hi, On 2020-10-20 21:48:52 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > It seems crazy that we throw away integer precision in the range it's > > guaranteed to be accurate (that's what, −16777216 to 16777216 for > > float4, more than a magnitude larger than this value). It does feel > > different to me that we're throwing away precision that we could know is > > not just the result of floating point imprecision. > > Meh. Yeah, we could improve the observed results for float4 values > that are integers between 1M and 16M, and some similarly-sized band > for float8; but to what end? I'm not actually arguing that we should improve it by relying on range based heuristics. Just that throwing away precision that can't just argued to have been conjured by float representation issues is an indicator of our current approach to be quite wrong. > The most likely practical result is just to postpone the user's > discovery that they're Doing It Wrong. If you expect exact answers > out of float calculations then you are going to learn an expensive > lesson sooner or later. Better sooner, before you've stored even more > inexact data that you cannot fix. I don't buy this, not even for a second. Why is FLT_DIG = 6 the right way to hint at that? Why not a precision of 5, 4, 3? Sure 6 digits is guaranteed to roundtrip the same way, but that's an *extremely* coarse filter. And if we like exposing people to floating point imprecision, why is select (502.15::float8 / 10)::numeric resulting in 50.214999999999996 bad? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-10-20 21:48:52 -0400, Tom Lane wrote: >> Meh. Yeah, we could improve the observed results for float4 values >> that are integers between 1M and 16M, and some similarly-sized band >> for float8; but to what end? > I'm not actually arguing that we should improve it by relying on range > based heuristics. Just that throwing away precision that can't just > argued to have been conjured by float representation issues is an > indicator of our current approach to be quite wrong. I think your argument is founded on the unprovable assumption that the original input value was an integer. If we see 12345678::float4, the "true" value could have been anything between 12345677.5 and 12345678.5. Printing it as an exact integer isn't going to seem too exact to a person who knows they put in a fraction. Or for a person who does know a little bit about float arithmetic, they might reasonably wonder why 12345.678::float4 is "exactly" converted to numeric but 1234.5678::float4, with the same number of digits, is not. (Ryu renders the latter as 1234.5677.) >> The most likely practical result is just to postpone the user's >> discovery that they're Doing It Wrong. > I don't buy this, not even for a second. Why is FLT_DIG = 6 the right > way to hint at that? Why not a precision of 5, 4, 3? Sure 6 digits is > guaranteed to roundtrip the same way, but that's an *extremely* coarse > filter. It's also the *highest* precision that is guaranteed to round-trip, which 5,4,3 are not, so that argument is unconvincing too. > And if we like exposing people to floating point imprecision, why is > select (502.15::float8 / 10)::numeric resulting in 50.214999999999996 > bad? The point there was that that's "worse" (seemingly less precise) than what happens now. Please notice that I did not say that that one example means we shouldn't change; any more than I think that isolated other examples mean we should. The question I was trying to get at is how we could decide whether using Ryu for this is an overall win when some examples will get better and others worse. No one's stepped up to that plate yet. regards, tom lane