Re: [HACKERS] Floating point error - Mailing list pgsql-general

From Maciek Sakrejda
Subject Re: [HACKERS] Floating point error
Date
Msg-id CAOtHd0BkSD3vAsZ6hs059t6szbVH1jgjEAw7ZVPaLMQywPjvLA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Floating point error  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Floating point error  (Daniel Farina <daniel@heroku.com>)
List pgsql-general
On Sun, Mar 3, 2013 at 9:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The real difficulty is that there may be more than one storable value
> that corresponds to "1.23456" to six decimal digits.  To be certain that
> we can reproduce the stored value uniquely, we have to err in the other
> direction, and print *more* decimal digits than the underlying precision
> justifies, rather than a bit less.  Some of those digits are going to
> look like garbage to the naked eye.

I think part of the difficulty here is that psql (if I understand this
correctly) conflates the wire-format text representations with what
should be displayed to the user. E.g., a different driver might parse
the wire representation into a native representation, and then format
that native representation when it is to be displayed. That's what the
JDBC driver does, so it doesn't care about how the wire format
actually looks.

> pg_dump cares about reproducing values exactly, and not about whether
> things are nice-looking, so it cranks up extra_float_digits.  The JDBC
> driver might be justified in doing likewise, to ensure that the
> identical binary float value is stored on both client and server ---
> but that isn't even a valid goal unless you assume that the server's
> float implementation is the same as Java's, which is a bit of a leap of
> faith, even if IEEE 754 is nigh universal these days.

I would hope that any driver cares about reproducing values exactly
(or at least as exactly as the semantics of the client and server
representations of the data type allow). Once you start talking
operations, sure, things get a lot more complicated and you're better
off not relying on any particular semantics. But IEEE 754
unambiguously defines certain bit patterns to correspond to certain
values, no? If both client and server talk IEEE 754 floating point, it
should be possible to round-trip values with no fuss and end up with
the same bits you started with (and as far as I can tell, it is, as
long as extra_float_digits is set to the max), even if the
implementations of actual operations on these numbers behave very
differently on client and server. I think given that many ORMs can
cause UPDATEs on tuple fields that have not changed as part of saving
an object, stable round trips seem like a desirable feature.

> We could have dumbed it down to a boolean "look nice versus reproduce
> the value exactly" switch, but it seemed like there might be
> applications that could use some additional flexibility.  In any case,
> it's not Postgres' fault that there is an issue here; it's fundamental
> to the use of binary rather than decimal stored values.

It seems like getting things to look nice should be the client's job,
no? Why does that factor into wire protocol data representations (and
yes, I know part of the answer here--presumably literals are
intimately tied to the same code paths, so it's not quite so simple)?

Going back to the documentation patch, what should the advice be? How
about something along these lines:

    Due to the nature of floating point numeric values, a faithful
textual representation
    of a <type>real</type> or <type>double precision</type> value
requires some decimal
    digits that are generally insignificant, impairing readability of
    common values. Because of this, Postgres supports a limited output
precision for
    floating point numbers by default. In order to preserve floating
point values more
    exactly, you can use the <xref linkend="guc-extra-float-digits">
to adjust this setting.

Is that reasonable? It still feels like extra_float_digits should be
opt-out rather than opt-in (leaving any formatting issues to clients),
but this could be a start. It doesn't address non-IEEE 754 platforms,
but the note in the other proposed patch is so high-level as to just
be hand-waving.


pgsql-general by date:

Previous
From: Daniel de Oliveira Mantovani
Date:
Subject: Re: Scalable cluster
Next
From: Mason S
Date:
Subject: Re: Scalable cluster