Re: Bug in numeric multiplication - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Bug in numeric multiplication
Date
Msg-id 3056.1447885183@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bug in numeric multiplication  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bug in numeric multiplication  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
I wrote:
> I'm kind of stuck on that too.  I did some experimentation by tracking
> maximum values of outercarry in the regression tests (including
> numeric_big) and did not see it get larger than a couple hundred thousand,
> ie more or less INT_MAX/NBASE.  But I don't have an argument as to why
> that would be the limit.

A bit of progress on this: I've found a test case, namely

select
sqrt(99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999.0099999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999);

If one inserts no-overflow Asserts into div_var_fast, this case will trip
them, specifically this:

        /*
         * The dividend digit we are about to replace might still be nonzero.
         * Fold it into the next digit position.  We don't need to worry about
         * overflow here since this should nearly cancel with the subtraction
         * of the divisor.
         */
+       Assert(Abs(div[qi]) <= INT_MAX/NBASE);
        div[qi + 1] += div[qi] * NBASE;

Unfortunately, there isn't any SQL-visible misbehavior in this example,
because the loop in sqrt_var is more or less self-correcting for minor
errors (and this example produces bogus results only in very low-order
digits).  Most of the other calls of div_var_fast give it inputs that are
even harder to control than sqrt_var's, so finding something that did
produce visibly wrong results might take a whole lot of trial and error.

Still, this proves that we are onto a real problem.

> Another issue here is that with outercarry added into the qdigit
> computation, it's no longer clear what the bounds of qdigit itself are,

I concluded that that particular issue is a red herring: qdigit should
always be a fairly accurate estimate of the next output digit, so it
cannot fall very far outside the range 0..NBASE-1.  Testing confirms that
the range seen in the regression tests is exactly -1 to 10000, which is
what you'd expect since there can be some roundoff error.

Also, after further thought I've been able to construct an argument why
outercarry stays bounded.  See what you think of the comments in the
attached patch, which incorporates your ideas about postponing the div[qi]
calculation.

            regards, tom lane

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index dcdc5cf..2c6a3f9 100644
*** a/src/backend/utils/adt/numeric.c
--- b/src/backend/utils/adt/numeric.c
*************** div_var_fast(NumericVar *var1, NumericVa
*** 6186,6192 ****
      double        fdividend,
                  fdivisor,
                  fdivisorinverse,
!                 fquotient;
      int            qi;
      int            i;

--- 6186,6193 ----
      double        fdividend,
                  fdivisor,
                  fdivisorinverse,
!                 fquotient,
!                 outercarry;
      int            qi;
      int            i;

*************** div_var_fast(NumericVar *var1, NumericVa
*** 6274,6289 ****
       * To avoid overflow in maxdiv itself, it represents the max absolute
       * value divided by NBASE-1, ie, at the top of the loop it is known that
       * no div[] entry has an absolute value exceeding maxdiv * (NBASE-1).
       */
      maxdiv = 1;

      /*
!      * Outer loop computes next quotient digit, which will go into div[qi]
       */
      for (qi = 0; qi < div_ndigits; qi++)
      {
          /* Approximate the current dividend value */
!         fdividend = (double) div[qi];
          for (i = 1; i < 4; i++)
          {
              fdividend *= NBASE;
--- 6275,6297 ----
       * To avoid overflow in maxdiv itself, it represents the max absolute
       * value divided by NBASE-1, ie, at the top of the loop it is known that
       * no div[] entry has an absolute value exceeding maxdiv * (NBASE-1).
+      *
+      * Note that maxdiv only includes digit positions that are still part of
+      * the dividend, ie, strictly speaking the above holds only for div[i]
+      * where i >= qi, at the top of the loop.
       */
      maxdiv = 1;

      /*
!      * Outer loop computes next quotient digit, which will go into div[qi].
!      * "outercarry" represents high-order dividend digits carried across
!      * iterations.
       */
+     outercarry = 0;
      for (qi = 0; qi < div_ndigits; qi++)
      {
          /* Approximate the current dividend value */
!         fdividend = outercarry * NBASE + (double) div[qi];
          for (i = 1; i < 4; i++)
          {
              fdividend *= NBASE;
*************** div_var_fast(NumericVar *var1, NumericVa
*** 6320,6340 ****
                          carry = 0;
                      div[i] = newdig;
                  }
!                 newdig = div[qi] + carry;
!                 div[qi] = newdig;

                  /*
                   * All the div[] digits except possibly div[qi] are now in the
!                  * range 0..NBASE-1.
                   */
!                 maxdiv = Abs(newdig) / (NBASE - 1);
!                 maxdiv = Max(maxdiv, 1);

                  /*
                   * Recompute the quotient digit since new info may have
                   * propagated into the top four dividend digits
                   */
!                 fdividend = (double) div[qi];
                  for (i = 1; i < 4; i++)
                  {
                      fdividend *= NBASE;
--- 6328,6347 ----
                          carry = 0;
                      div[i] = newdig;
                  }
!                 div[qi] += carry;

                  /*
                   * All the div[] digits except possibly div[qi] are now in the
!                  * range 0..NBASE-1; and we don't care about including div[qi]
!                  * anymore, so we can reset maxdiv to 1.
                   */
!                 maxdiv = 1;

                  /*
                   * Recompute the quotient digit since new info may have
                   * propagated into the top four dividend digits
                   */
!                 fdividend = outercarry * NBASE + (double) div[qi];
                  for (i = 1; i < 4; i++)
                  {
                      fdividend *= NBASE;
*************** div_var_fast(NumericVar *var1, NumericVa
*** 6348,6370 ****
                  maxdiv += Abs(qdigit);
              }

!             /* Subtract off the appropriate multiple of the divisor */
              if (qdigit != 0)
              {
                  int            istop = Min(var2ndigits, div_ndigits - qi + 1);

!                 for (i = 0; i < istop; i++)
                      div[qi + i] -= qdigit * var2digits[i];
              }
          }

          /*
           * The dividend digit we are about to replace might still be nonzero.
!          * Fold it into the next digit position.  We don't need to worry about
!          * overflow here since this should nearly cancel with the subtraction
!          * of the divisor.
           */
!         div[qi + 1] += div[qi] * NBASE;

          div[qi] = qdigit;
      }
--- 6355,6412 ----
                  maxdiv += Abs(qdigit);
              }

!             /*
!              * Subtract off the appropriate multiple of the divisor.
!              *
!              * In principle we should update div[qi] along with the other
!              * digits, but since we're about to replace div[qi] with qdigit,
!              * we prefer to fold that subtraction step into the outercarry
!              * update below.  That way we can do it in float arithmetic and
!              * avoid any worry about overflow due to div[qi] possibly being
!              * larger than the other digits after a carry propagation.
!              */
              if (qdigit != 0)
              {
                  int            istop = Min(var2ndigits, div_ndigits - qi + 1);

!                 for (i = 1; i < istop; i++)
                      div[qi + i] -= qdigit * var2digits[i];
              }
          }

          /*
           * The dividend digit we are about to replace might still be nonzero.
!          * Carry it forward to future division steps in "outercarry", being
!          * sure to apply the last divisor-subtraction step too.
!          *
!          * Although it might appear that outercarry could grow large enough to
!          * lose precision, this is not actually possible because the
!          * higher-order terms in qdigit are
!          *
!          * qdigit ~= (outercarry * NBASE + div[qi]) / var2digits[0]
!          *
!          * So the outercarry calculation would yield zero were it not for the
!          * lower-order terms and the fact that qdigit was truncated to an int.
!          * Truncation could let the result range from 0 to var2digits[0] - 1.
!          * As for the lower-order terms: including additional digits from the
!          * divisor cannot increase qdigit compared to the above approximation,
!          * and can decrease it by at most a factor of two (this occurs when
!          * var2digits[0] = 1 and var2digits[1] is NBASE-1).  So this
!          * adjustment may mean that outercarry is not decreased all the way to
!          * zero, but it won't be increased either.  The effect of including
!          * additional dividend digits is large only if outercarry is zero, and
!          * becomes negligible as outercarry increases.  If outercarry is more
!          * than INT_MAX/NBASE then it's impossible for div[qi + 1] to create
!          * more than 1/NBASE relative change in qdigit, ie qdigit is at most
!          * off by 1, so that the error is comparable to truncation error.
!          * Testing suggests that the maximum observed outercarry is somewhat
!          * less than INT_MAX/NBASE.  (However, since outercarry will only be
!          * used in floating-point calculations, we might as well compute it as
!          * a double and avoid any worry about intermediate overflows in this
!          * expression.)
           */
!         outercarry = outercarry * NBASE + (double) div[qi]
!             - (double) (qdigit * var2digits[0]);

          div[qi] = qdigit;
      }
*************** div_var_fast(NumericVar *var1, NumericVa
*** 6372,6378 ****
      /*
       * Approximate and store the last quotient digit (div[div_ndigits])
       */
!     fdividend = (double) div[qi];
      for (i = 1; i < 4; i++)
          fdividend *= NBASE;
      fquotient = fdividend * fdivisorinverse;
--- 6414,6420 ----
      /*
       * Approximate and store the last quotient digit (div[div_ndigits])
       */
!     fdividend = outercarry * NBASE + (double) div[qi];
      for (i = 1; i < 4; i++)
          fdividend *= NBASE;
      fquotient = fdividend * fdivisorinverse;

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Additional role attributes && superuser review
Next
From: Andres Freund
Date:
Subject: Re: Trivial heap_finish_speculative() error message inaccuracy