Re: [HACKERS] Floating point error - Mailing list pgsql-general

From Albe Laurenz
Subject Re: [HACKERS] Floating point error
Date
Msg-id A737B7A37273E048B164557ADEF4A58B057BA6F6@ntex2010a.host.magwien.gv.at
Whole thread Raw
In response to Re: [HACKERS] Floating point error  (Daniel Farina <daniel@heroku.com>)
Responses Re: [HACKERS] Floating point error
List pgsql-general
Daniel Farina wrote:
> On Mon, Mar 4, 2013 at 2:27 PM, Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
>> On Sun, Mar 3, 2013 at 9:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The real difficulty is that there may be more than one storable value
>>> that corresponds to "1.23456" to six decimal digits.  To be certain that
>>> we can reproduce the stored value uniquely, we have to err in the other
>>> direction, and print *more* decimal digits than the underlying precision
>>> justifies, rather than a bit less.  Some of those digits are going to
>>> look like garbage to the naked eye.
>>
>> I think part of the difficulty here is that psql (if I understand this
>> correctly) conflates the wire-format text representations with what
>> should be displayed to the user. E.g., a different driver might parse
>> the wire representation into a native representation, and then format
>> that native representation when it is to be displayed. That's what the
>> JDBC driver does, so it doesn't care about how the wire format
>> actually looks.
>>
>>> pg_dump cares about reproducing values exactly, and not about whether
>>> things are nice-looking, so it cranks up extra_float_digits.  The JDBC
>>> driver might be justified in doing likewise, to ensure that the
>>> identical binary float value is stored on both client and server ---
>>> but that isn't even a valid goal unless you assume that the server's
>>> float implementation is the same as Java's, which is a bit of a leap of
>>> faith, even if IEEE 754 is nigh universal these days.
>>
>> I would hope that any driver cares about reproducing values exactly
>> (or at least as exactly as the semantics of the client and server
>> representations of the data type allow). Once you start talking
>> operations, sure, things get a lot more complicated and you're better
>> off not relying on any particular semantics. But IEEE 754
>> unambiguously defines certain bit patterns to correspond to certain
>> values, no? If both client and server talk IEEE 754 floating point, it
>> should be possible to round-trip values with no fuss and end up with
>> the same bits you started with (and as far as I can tell, it is, as
>> long as extra_float_digits is set to the max), even if the
>> implementations of actual operations on these numbers behave very
>> differently on client and server. I think given that many ORMs can
>> cause UPDATEs on tuple fields that have not changed as part of saving
>> an object, stable round trips seem like a desirable feature.

But all these things are already available:
Any driver that cares can set extra_float_digits=3, and if it
prefers the binary format, the wire protocol supports sending
floating point values as such.

> I also find the rationale for extra_float digits quite mysterious for
> the same reason: why would most programs care about precision less
> than pg_dump does?
>
> If a client wants floating point numbers to look nice, I think the
> rendering should be on them (e.g. psql and pgadmin), and the default
> should be to expose whatever precision is available to clients that
> want an accurate representation of what is in the database.
>
> This kind of change may have many practical problems that may make it
> un-pragmatic to alter at this time (considering the workaround is to
> set the extra float digits), but I can't quite grasp the rationale for
> "well, the only program that cares about the most precision available
> is pg_dump".  It seems like most programs would care just as much.

I don't think that it is about looking nice.
C doesn't promise you more than FLT_DIG or DBL_DIG digits of
precision, so PostgreSQL cannot either.

If you allow more, that would mean that if you store the same
number on different platforms and query it, it might come out
differently.  Among other things, that would be a problem for
the regression tests.

Yours,
Laurenz Albe


pgsql-general by date:

Previous
From: Guy Rouillier
Date:
Subject: Re: Database (Schema) Objects?
Next
From: Albe Laurenz
Date:
Subject: Re: 9.2 timestamp function syntax error