Thread: WIDTH_BUCKET inconsistency

WIDTH_BUCKET inconsistency

From
Martin Visser
Date:
Inconsistent results between numeric and double precision types.

Reproduction:
CREATE TABLE tab( 
    col1 REAL,
    col2 DOUBLE PRECISION,
    col3 NUMERIC
);

INSERT into tab VALUES 
    (10, 10, 10), 
    (20, 20, 20), 
    (30, 30, 30), 
    (40, 40, 40), 
    (50, 50, 50), 
    (60, 60, 60), 
    (70, 70, 70), 
    (80, 80, 80), 
    (90, 90, 90), 
    (100, 100, 100);

SELECT col1, width_bucket(col1, 10, 100, 9), width_bucket(col2, 10, 100, 9), width_bucket(col3, 10, 100, 9) FROM tab;

 col1 | width_bucket | width_bucket | width_bucket
------+--------------+--------------+--------------
   10 |            1 |            1 |            1
   20 |            2 |            2 |            1
   30 |            3 |            3 |            2
   40 |            4 |            4 |            3
   50 |            5 |            5 |            4
   60 |            6 |            6 |            6
   70 |            7 |            7 |            7
   80 |            8 |            8 |            8
   90 |            9 |            9 |            9
  100 |           10 |           10 |           10
(10 rows)

Re: WIDTH_BUCKET inconsistency

From
Tom Lane
Date:
Martin Visser <Martin.Visser@brytlyt.com> writes:
> Inconsistent results between numeric and double precision types.

That's basically roundoff error, which I'm afraid we're not going
to be able to do much about.  You could probably find other examples
where the numeric calculation is "right" and the float8 calculation
is "wrong".  For example, the case

regression=# select width_bucket(20::numeric,10,100,9);
 width_bucket 
--------------
            1
(1 row)

is essentially computing floor() of

regression=# select 9 * (10::numeric / 90::numeric) + 1;
        ?column?        
------------------------
 1.99999999999999999999
(1 row)

and that's really the right answer given

regression=# select 10::numeric / 90::numeric;
        ?column?        
------------------------
 0.11111111111111111111
(1 row)

We could carry that division out to more decimal places, but we'd
still just have a string of ones, which would become a string of
nines after multiplication by 9, and then floor() doesn't have a
choice about what to return.  If we try to fudge that to get the
"right" answer, we'd be getting wrong answers for other cases.

[ thinks for awhile... ]  It's tempting to propose that we take
the next-to-last result (1.99999999999999999999 here) and round
it to one fewer decimal place before applying floor().  It's
still scary to contemplate whether that might make as many cases
worse as it does better, but I suspect the reason why we get a
nicer answer for this case in the float8 code is that the CPU is
doing something equivalent to that in the float calculation.


While comparing the numeric and float8 versions of width_bucket,
I did notice that float8 allows +-Infinity for the first argument:

regression=# select width_bucket('inf'::float8,10,100,9);
 width_bucket 
--------------
           10
(1 row)

while numeric doesn't:

regression=# select width_bucket('inf'::numeric,10,100,9);
ERROR:  operand, lower bound, and upper bound cannot be infinity

That seems like an oversight in my recent patch to allow infinities
in numeric.

            regards, tom lane



Re: WIDTH_BUCKET inconsistency

From
Tom Lane
Date:
I wrote:
> That's basically roundoff error, which I'm afraid we're not going
> to be able to do much about.  You could probably find other examples
> where the numeric calculation is "right" and the float8 calculation
> is "wrong".  For example, the case

> regression=# select width_bucket(20::numeric,10,100,9);
>  width_bucket 
> --------------
>             1
> (1 row)

> is essentially computing floor() of

> regression=# select 9 * (10::numeric / 90::numeric) + 1;
>         ?column?        
> ------------------------
>  1.99999999999999999999
> (1 row)

...

> [ thinks for awhile... ]  It's tempting to propose that we take
> the next-to-last result (1.99999999999999999999 here) and round
> it to one fewer decimal place before applying floor().  It's
> still scary to contemplate whether that might make as many cases
> worse as it does better, but I suspect the reason why we get a
> nicer answer for this case in the float8 code is that the CPU is
> doing something equivalent to that in the float calculation.

No, waitasecond.  Studying the float code a little closer, I realize
that what it's doing is equivalent to

regression=# select (9 * 10::numeric) / 90::numeric + 1;
        ?column?        
------------------------
 2.00000000000000000000
(1 row)

That is, the numeric code is doing the calculations in an order that
must lead to an inexact result, whereas we could easily do them in
an order that always gives an exact result when one is possible.
Independently of whether trying to round a little harder would be
advisable, that seems like a clear win.

There are certain parties around here who would probably bitch
if I were to back-patch this change, but it seems fine to do in
HEAD.

            regards, tom lane