Re: pgsql: Fix numeric_mul() overflow due to too many digits after decimal - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: pgsql: Fix numeric_mul() overflow due to too many digits after decimal |
Date | |
Msg-id | CAEZATCXAHZBr8hbTAyCdtAmuSHh5CporSn-mDGad34T=xn-JXQ@mail.gmail.com Whole thread Raw |
In response to | Re: pgsql: Fix numeric_mul() overflow due to too many digits after decimal (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Sat, 10 Jul 2021 at 18:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > [ moving to pghackers for wider visibility ] > > Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > On Sat, 10 Jul 2021 at 16:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> In general, I'm disturbed that we just threw away the previous > >> promise that numeric multiplication results were exact. That > >> seems like a pretty fundamental property --- which is stated > >> in so many words in the manual, btw --- and I'm not sure I want > >> to give it up. > > > Perhaps we should amend the statement about numeric multiplication to > > say that it's exact within the limits of the numeric type's supported > > scale, which we also document in the manual as 16383. > > That seems a lot better than throwing an overflow error for a result > > that isn't very big, which limits what's possible with numeric > > multiplication to much less than 16383 digits. > > TBH, I don't agree. I think this is strictly worse than what we > did before, and we should just revert it. It's no longer possible > to reason about what numeric multiplication will do. I think > throwing an error if we can't represent the result exactly is a > preferable behavior. If you don't want exact results, use float8. Actually, I think it makes it a lot easier to reason about what it will do -- it'll return the exact result, or the exact result correctly rounded to 16383 digits. That seems perfectly reasonable to me, since almost every numeric operation ends up having to round at some point, and almost all of them don't produce exact results. The previous behaviour might seem like a principled stance to take, from a particular perspective, but it's pretty hard to work with in practice. For example, say you had 2 numbers, each with 10000 digits after the decimal point, that you wanted to multiply. If it throws an overflow error for results with more than 16383 digits after the decimal point, then you'd have to round one or both input numbers before multiplying. To get the most accurate result possible, you'd actually have to round each number to have the same number of *significant digits*, rather than the same number of digits after the decimal point. In general, that's quite complicated -- suppose, after rounding, you had x = [x1 digits] . [x2 digits] y = [y1 digits] . [y2 digits] where x1+x2 = y1+y2 to minimise the error in the final result, and x2+y2 = 16383 to maximise the result scale. x1 and y1 are known (though we don't have a convenient way to obtain them), so that's a pair of simultaneous equations to solve to decide the optimal amount of rounding to apply before multiplying. And after all that, the result will only be accurate to around x1+x2 (or y1+y2) significant digits, and around x1+y1 of those will be before the decimal point, so even though it will return a result with 16383 digits after the decimal point, lots of those digits will be completely wrong. With the new code, you just multiply the numbers and the result is correctly rounded to 16383 digits. In general, I'd argue that using numeric isn't about getting exact results, since nearly all real computations don't have an exact result. Really, it's about getting high-precision results that would be impossible with float8. (And if you don't care about numbers with 16383 digits after the decimal point, this change won't affect you.) Regards, Dean
pgsql-hackers by date: