Thread: Function trunc() behaves in unexpected manner with different data types

Function trunc() behaves in unexpected manner with different data types

From

"Nathan M. Davalos"

Date:

24 February 2011, 15:13:42

I ran into something interesting with using trunc() and different data
types:

The following is a simplified from the statement we're using and
produces the same results:

=20

select trunc( ((cast(2183.68 as numeric) - cast(1 as numeric)) )*100)
/100=20

  yields 2184.68

=20

select trunc(((cast(2183.68 as numeric) - cast(1 as double precision)
))*100) /100

  yields 2184.67

=20

select trunc(cast(2184.68 as double precision) *100)=20

  yields 218467 instead of 218468

=20

This only happens on certain ranges of numbers. Doing the same thing
with the number 3183.68 yields the same result in both cases. It only
appears to happen when a number is declared as a double and there is no
number past the last significant digit or the number past the last
significant digit is a zero AND falls within a certain range of numbers.
For instance select trunc(cast(2184.681 as double precision) *100)
yields 218468, but select trunc(cast(2184.680 as double precision) *100)
yields 218467.

=20

I already made sure everything we're using is just defined as numeric to
avoid the issue.

Re: Function trunc() behaves in unexpected manner with different data types

From

Merlin Moncure

Date:

24 February 2011, 15:31:30

On Thu, Feb 24, 2011 at 1:01 PM, Nathan M. Davalos
<n.davalos@sharedmarketing.com> wrote:
> I ran into something interesting with using trunc() and different data
> types:
>
> The following is a simplified from the statement we=92re using and produc=
es
> the same results:
>
> select trunc( ((cast(2183.68 as numeric) - cast(1 as numeric)) )*100) /100
>
> =A0=A0yields 2184.68

the root issue I think here is that the string version of the double
precision math is approximated:
postgres=3D# create table v as select floor(2183.68::float8 * 100) as v;
postgres=3D# select * from v;
   v
--------
 218367
(1 row)

postgres=3D# select floor(v) from v;
 floor
--------
 218367

postgres=3D# insert into v select 218368;
INSERT 0 1
(1 row)

postgres=3D# select distinct v from v;
   v
--------
 218368
 218368
(2 rows)

As you can see, even though the string versions are the same, the
internal representation is different. You could dump the data and
restore it and get different results.  Also the text/binary protocols
would send different data to the client.  I don't know if this is a
bug in postgresql floating point implementation or not: i think the
backend would either have to print 218367.999999999999999ish number or
spend the time to look for these cases and round them internally.
Floating point is a headache :-).

merlin

Re: Function trunc() behaves in unexpected manner with different data types

From

Greg Stark

Date:

24 February 2011, 22:04:28

On Thu, Feb 24, 2011 at 7:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> the root issue I think here is that the string version of the double
> precision math is approximated:

No, it's simpler than that, all double precision math is approximated.
The root issue is that 2183.67 is not representable in a floating
point binary number. Just like 1/3 isn't representable in base 10
(decimal) numbers many fractions aren't representable in base 2
(binary) numbers. The result are repeated decimals like 0.3333... if you
multiply that by three you get 0.99999 and if you truncate that you
get 0 insted of 1.

It's the trunc() that's exposing the imprecision because like "=" it
depends on the precise value of the number down to the last digit.
Though depending on the arithmetic you can always make the precision
expand beyond the last digit anyways -- when you multiply by 100 you
magnify that imprecision too.

--
greg

Re: Function trunc() behaves in unexpected manner with different data types

From

Merlin Moncure

Date:

25 February 2011, 10:46:38

On Thu, Feb 24, 2011 at 8:03 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Thu, Feb 24, 2011 at 7:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> the root issue I think here is that the string version of the double
>> precision math is approximated:
>
> No, it's simpler than that, all double precision math is approximated.
> The root issue is that 2183.67 is not representable in a floating
> point binary number. Just like 1/3 isn't representable in base 10
> (decimal) numbers many fractions aren't representable in base 2
> (binary) numbers. The result are repeated decimals like 0.3333... if you
> multiply that by three you get 0.99999 and if you truncate that you
> get 0 insted of 1.
>
> It's the trunc() that's exposing the imprecision because like "=" it
> depends on the precise value of the number down to the last digit.
> Though depending on the arithmetic you can always make the precision
> expand beyond the last digit anyways -- when you multiply by 100 you
> magnify that imprecision too.

right -- in understand how floating point works -- but are you are
saying that you are ok with the fact that (for example) a table with a
floating point unique key could dump and not restore? more
specifically, a binary dump would restore but a text dump would not.
I think this is a problem with our implementation -- not all versions
of 2183.68 as outputted from the server are the same internally.

put another way, text output from the server should unambiguously
match what sourced the text. in the case of floating point, it does
not...there are N versions of internal data that can match particular
text output.  I am speculating that the rounding is happening in the
wrong place maybe.

merlin

Re: Function trunc() behaves in unexpected manner with different data types

From

Tom Lane

Date:

25 February 2011, 11:21:36

Merlin Moncure <mmoncure@gmail.com> writes:
> right -- in understand how floating point works -- but are you are
> saying that you are ok with the fact that (for example) a table with a
> floating point unique key could dump and not restore? more
> specifically, a binary dump would restore but a text dump would not.

pg_dump takes measures against that (see extra_float_digits).

> I think this is a problem with our implementation -- not all versions
> of 2183.68 as outputted from the server are the same internally.

It's an inherent property of float math.  Yes, we could set the default
value of extra_float_digits high enough that distinct internal values
always had distinct text representations, but trust me, you would not
like it.

            regards, tom lane

Re: Function trunc() behaves in unexpected manner with different data types

From

Merlin Moncure

Date:

25 February 2011, 11:28:33

On Fri, Feb 25, 2011 at 9:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> right -- in understand how floating point works -- but are you are
>> saying that you are ok with the fact that (for example) a table with a
>> floating point unique key could dump and not restore? more
>> specifically, a binary dump would restore but a text dump would not.
>
> pg_dump takes measures against that (see extra_float_digits).
>
>> I think this is a problem with our implementation -- not all versions
>> of 2183.68 as outputted from the server are the same internally.
>
> It's an inherent property of float math. =A0Yes, we could set the default
> value of extra_float_digits high enough that distinct internal values
> always had distinct text representations, but trust me, you would not
> like it.

no I wouldn't, and the pg_dump extra_float_digits setting addresses my
primary concern.  The client has a similar issue though -- suppose it
fetches a value from the server and updates it back -- which record
gets the update?  You would get different results if the client was
using binary or text features of the protocol.  Not saying this is
wrong or needs to be fixed, just pointing it out :-).

update foo set val=3Dval + 1 where val =3D 2183.68;

merlin

Re: Function trunc() behaves in unexpected manner with different data types

From

Alvaro Herrera

Date:

25 February 2011, 11:40:23

Excerpts from Merlin Moncure's message of vie feb 25 12:28:25 -0300 2011:

> no I wouldn't, and the pg_dump extra_float_digits setting addresses my
> primary concern.  The client has a similar issue though -- suppose it
> fetches a value from the server and updates it back -- which record
> gets the update?  You would get different results if the client was
> using binary or text features of the protocol.  Not saying this is
> wrong or needs to be fixed, just pointing it out :-).
>
> update foo set val=val + 1 where val = 2183.68;

I think the mere idea of using floating point equality on a
WHERE clause is bogus, regardless of text or binary format.

--
Ãlvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Function trunc() behaves in unexpected manner with different data types

From

Merlin Moncure

Date:

25 February 2011, 11:48:59

On Fri, Feb 25, 2011 at 9:40 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Merlin Moncure's message of vie feb 25 12:28:25 -0300 2011:
>
>> no I wouldn't, and the pg_dump extra_float_digits setting addresses my
>> primary concern. =A0The client has a similar issue though -- suppose it
>> fetches a value from the server and updates it back -- which record
>> gets the update? =A0You would get different results if the client was
>> using binary or text features of the protocol. =A0Not saying this is
>> wrong or needs to be fixed, just pointing it out :-).
>>
>> update foo set val=3Dval + 1 where val =3D 2183.68;
>
> I think the mere idea of using floating point equality on a
> WHERE clause is bogus, regardless of text or binary format.

That's a bridge to far -- akin to saying floating point should not
support equality operator.  select count(*) from foo where val >=3D
2183.68?  you are ok getting different answers depending on method of
transmission of 2183.68 to the server?

merlin

Re: Function trunc() behaves in unexpected manner with different data types

From

Merlin Moncure

Date:

25 February 2011, 12:30:17

On Fri, Feb 25, 2011 at 9:48 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Feb 25, 2011 at 9:40 AM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>> Excerpts from Merlin Moncure's message of vie feb 25 12:28:25 -0300 2011:
>>
>>> no I wouldn't, and the pg_dump extra_float_digits setting addresses my
>>> primary concern. =A0The client has a similar issue though -- suppose it
>>> fetches a value from the server and updates it back -- which record
>>> gets the update? =A0You would get different results if the client was
>>> using binary or text features of the protocol. =A0Not saying this is
>>> wrong or needs to be fixed, just pointing it out :-).
>>>
>>> update foo set val=3Dval + 1 where val =3D 2183.68;
>>
>> I think the mere idea of using floating point equality on a
>> WHERE clause is bogus, regardless of text or binary format.
>
> That's a bridge to[sic] far -- akin to saying floating point should not
> support equality operator. =A0select count(*) from foo where val >=3D
> 2183.68? =A0you are ok getting different answers depending on method of
> transmission of 2183.68 to the server?

I stand corrected -- I did some digging and Postgres's handling of
this issue is afaict correct: you are supposed to round on
presentation only, and equality matching on floating point in sql
(just like in C) is capricious exercise at best, at least without some
defenses.  So, we can definitely file under 'not a bug'.

merlin