Re: Bug in numeric multiplication - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Bug in numeric multiplication
Date
Msg-id 1120.1447973897@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bug in numeric multiplication  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: Bug in numeric multiplication  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 18 November 2015 at 22:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Still, this proves that we are onto a real problem.

> Agreed. I had a go at turning that example into something using
> log(base, num) so that the result would be visible in a pure SQL test,
> but I didn't have any luck.

I experimented with that idea some, and found a test case that would
trigger the Assert during log_var's final div_var_fast call:

select log(

exp(.999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999995050012831598249352382434889825483159817)
,

exp(.99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999009999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999));

However, the emitted answer was the same with or without my proposed
patch.  So I dug deeper by putting in some debug printf's, and found that
the overflow occurs at this point:

div[81] will overflow: div[qi] = 218943 qdigit = 7673 maxdiv = 210375
div[]:0000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
00000000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 -001 9999 9999 9999
50499871 6840 1750 6476 1756 5110 1745 1684 0183 0860 9567 3786 7660 2671 2843 5974 6515 4013 7886 0978 7230 5372 8162
70772013 6185 3746 3095 8528 1595 3071 7541 7920 7950 218943 -2103498897 -2103499629 -2103499629 -2103499629
-2103504578

Note that div[qi+1], and indeed all the remaining dividend digits, are
large negative values.  This is what you'd expect if the carry propagation
step hasn't run for awhile, which is a precondition for div[qi] being
large enough to cause an issue.  When we compute 218943 * 10000, we will
indeed get an overflow, and the result will wrap around to some large
negative value (2^32 less than it should be).  Then we will add that to
div[qi+1], and we'll get *another* overflow, wrapping what nominally
should have been a negative sum around to a positive value (2^32 more than
it should be).  So the two overflows cancel and we get exactly the correct
new value of div[qi+1].

I do not know whether it's possible to devise a test case where you don't
get offsetting overflows.  It may be that there's no operational bug here.
Still, the code is surely not behaving per face value.

> I had a go at trying to find a simpler approach and came up with the
> attached, which computes the value intended for div[qi+1] near the top
> of the loop (using doubles) so that it can detect if it will overflow,
> and if so it enters the normalisation block. It still needs some work
> to prove that the normalised value for fnextdigit won't overflow, but
> it feels like a promising, easier-to-follow approach to me. What do
> you think?

I'll look at this later ...
        regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Using quicksort for every external sort run
Next
From: David Fetter
Date:
Subject: ROLES and ALTER DEFAULT PRIVILEGES