Re: About numeric division again - Mailing list pgsql-hackers

From Tom Lane
Subject Re: About numeric division again
Date
Msg-id 29619.1207271063@sss.pgh.pa.us
Whole thread Raw
In response to Re: About numeric division again  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> The code that's currently in there sometimes has to propagate rounding
>> to the left, meaning that you can never be certain whether all of the
>> digits you have so far are good, and that means that it can sometimes
>> generate an incorrect truncated output.  This leads to the bugs cited
>> in the above message.

> The case I looked into could be traced specifically to the fact that it
> calculates an intermediate value which is the reciprocal of the divisor and
> multiplies by that.

Perhaps, but no matter how you slice it, that algorithm sometimes needs
to go back and "fix" digits it previously emitted.  That's okay for
approximate answers and not okay for exact ones, because you never
know if you're done fixing the last digit you need to return.

I fooled around with the idea of using the existing fast division code
in the sqrt/exp/log/power functions, and got these timings for the
numeric_big regression test:
        Xeon        HPPA

CVS HEAD        5.446s        2m10.97s
Knuth division only    11.574s        3m9.29s
old code for trans. fns    5.477s        2m15.97s

The HPPA is an old machine with excruciatingly slow integer division,
so it's probably about the worst case for the Knuth-based code.

Now you can certainly object that numeric_big might not be
representative of real-world applications, but these results seem to
me to justify implementing both functions.  wc says that numeric.c is

5457 15914 122662    CVS HEAD
5466 16050 123401    Knuth only
5732 17202 130764    both

A couple hundred more lines seems like an acceptable price.

Also, I remembered the reason for proposing an explicit "integer
division" function: you can't always get the right answer from
trunc(x/y).  The / operator will round its output at some number of
fractional digits chosen by itself, and it's possible that it rounds
.999... up to the next integer, whereupon your trunc() result is simply
wrong.  (Indeed this is just the same problem at the SQL level as mod
was facing at the C level in one of the bugs that started this
discussion.)  So if you want to do any high-precision integer math,
this is really not a negotiable feature.

So IMHO the only question left is what to call it at the SQL level.
A few options:

Expose it as a function:
div(x, y)quotient(x, y)

Expose it as an operator:
x // y

There are probably better ideas I haven't thought of.  Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Sam Mason
Date:
Subject: Re: [GENERAL] SHA1 on postgres 8.3
Next
From: Sam Mason
Date:
Subject: Re: COPY Transform support