Bug in numeric multiplication - Mailing list pgsql-hackers

From Dean Rasheed
Subject Bug in numeric multiplication
Date
Msg-id CAEZATCUUEeYOzi-cOt+1p3wo_15q=ffqVc+oV-uMnmBH-rPFGg@mail.gmail.com
Whole thread Raw
Responses Re: Bug in numeric multiplication  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

By chance, while testing the nearby numeric log/exp/pow patch, I came
across the following case which generates an initially puzzling
looking error on HEAD -- (5.6-1e-500) ^ (3.2-1e-200). This computation
actually works OK with that other patch, but only by blind luck. It
turns out that the underlying problem is a bug in the low-level
numeric multiplication function mul_var(). It is possible to trigger
it directly with the following carefully crafted inputs:

select 4790999999999999999999999999999999999999999999999999999999999999999999999999999999999999
* 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999;

Result:

47909999999999999999999999999999999999999999999999999999999999999999999999999999997852304953000000000000000000000000000000000000000000000000000000000000000000000000000000000001

That answer is actually incorrect. Tweaking the input a little, it is
possible to generate a much more obviously nonsensical result:

select 4789999999999999999999999999999999999999999999999999999999999999999999999999999999999999
* 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999;

Result:

4789999999999999999999999999999999999999999999999999999999999999999999999999999999785231+0,*000000000000000000000000000000000000000000000000000000000000000000000000000000000001

Notice those garbage digits in the middle of the number returned.

The problem is that these examples trigger an overflow of the digits
in the accumulator array in mul_var().

The number on the left in the first example consists of 21 copies of
9999, preceded by 4790. Those are chosen so that when added together
they lead to a value for maxdig in mul_var() of 21*9999 + 4790 =
214769, which is exactly equal to INT_MAX / (NBASE - 1). So this
doesn't quite trigger a normalisation of the accumulator array, and
leaves several of the digits in that array a little under INT_MAX at
the end of the main multiplication loop.

The problem then arises in the final carry propagation pass. During
this phase of the computation, the carry from one digit (which can be
a shade under INT_MAX / NBASE) is added to the next digit, and that's
where the overflow happens.

To fix that, the initial value for maxdig needs to be made larger to
leave headroom for the carry. The largest possible carry is INT_MAX /
NBASE, and maxdig is the maximum possible dig value divided by
NBASE-1, so maxdig needs to be initialised to

 (INT_MAX / NBASE) / (NBASE - 1)

which is 21 for NBASE = 10000.

A new corner-case input that doesn't quite trigger an accumulator
normalisation is then 47699999... The worst case inputs are now values
like this for which the sum of a sequence of input digits is INT_MAX /
(NBASE - 1) - 21 = 214769 - 21 = 214748. So in the worst case, the
accumulator's digits can be up to 214748 * 9999 = 2147265252 in the
main multiplication loop. Then, during the carry propagation phase (or
any of the normalisation phases), the carry can be anything up to
INT_MAX / NBASE = 214748. So the maximum value that can be assigned to
any individual digit is now 2147265252 + 214748 = 2147480000, which is
now less than INT_MAX.

Patch attached.

Regards,
Dean

Attachment

pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: [patch] Proposal for \rotate in psql
Next
From: Robert Haas
Date:
Subject: Re: Rework the way multixact truncations work