Thread: numeric/decimal docs bug?
In datatype.sgml: The type numeric can store numbers of practically unlimited size and precision,... I think this is simply wrong since the current implementation of numeric and decimal data types limit the precision up to 1000. #define NUMERIC_MAX_PRECISION 1000 Comments? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > In datatype.sgml: > The type numeric can store numbers of practically > unlimited size and precision,... > I think this is simply wrong since the current implementation of > numeric and decimal data types limit the precision up to 1000. > #define NUMERIC_MAX_PRECISION 1000 I was thinking just the other day that there's no reason for that limit to be so low. Jan, couldn't we bump it up to 8 or 16K or so? (Not that I'd care to do heavy arithmetic on such numbers, or that I believe there's any practical use for them ... but why set the limit lower than we must?) regards, tom lane
Are there other cases where the pgsql docs may say unlimited where it might not be? I remember when the FAQ stated unlimited columns per table (it's been corrected now so that's good). Not asking for every limit to be documented but while documentation is written if one does not yet know (or remember) the actual (or even rough/estimated) limit it's better to skip it for later than to falsely say "unlimited". Better to have no signal than noise in this case. Regards, Link. At 11:14 PM 02-03-2002 +0900, Tatsuo Ishii wrote: >In datatype.sgml: > > The type numeric can store numbers of practically > unlimited size and precision,... > >I think this is simply wrong since the current implementation of >numeric and decimal data types limit the precision up to 1000. > >#define NUMERIC_MAX_PRECISION 1000 > >Comments?
Tom Lane writes: > > #define NUMERIC_MAX_PRECISION 1000 > > I was thinking just the other day that there's no reason for that > limit to be so low. Jan, couldn't we bump it up to 8 or 16K or so? Why have an arbitrary limit at all? Set it to INT_MAX, or whatever the index variables have for a type. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: > #define NUMERIC_MAX_PRECISION 1000 >> >> I was thinking just the other day that there's no reason for that >> limit to be so low. Jan, couldn't we bump it up to 8 or 16K or so? > Why have an arbitrary limit at all? Set it to INT_MAX, The hard limit is certainly no more than 64K, since we store these numbers in half of an atttypmod. In practice I suspect the limit may be less; Jan would be more likely to remember... regards, tom lane
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane writes: > > #define NUMERIC_MAX_PRECISION 1000 > >> > >> I was thinking just the other day that there's no reason for that > >> limit to be so low. Jan, couldn't we bump it up to 8 or 16K or so? > > > Why have an arbitrary limit at all? Set it to INT_MAX, > > The hard limit is certainly no more than 64K, since we store these > numbers in half of an atttypmod. In practice I suspect the limit may > be less; Jan would be more likely to remember... It is arbitrary of course. I don't recall completely, have to dig into the code, but there might be some side effect when mucking with it. The NUMERIC code increases the actual internal precision when doing multiply and divide, what happens a gazillion times when doing higher functions like trigonometry. I think there was some connection between the max precision and how high this internal precision can grow, so increasing the precision might affect the computational performance of such higher functions significantly. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck wrote: > > The hard limit is certainly no more than 64K, since we store these > > numbers in half of an atttypmod. In practice I suspect the limit may > > be less; Jan would be more likely to remember... > > It is arbitrary of course. I don't recall completely, have to > dig into the code, but there might be some side effect when > mucking with it. > > The NUMERIC code increases the actual internal precision when > doing multiply and divide, what happens a gazillion times > when doing higher functions like trigonometry. I think there > was some connection between the max precision and how high > this internal precision can grow, so increasing the precision > might affect the computational performance of such higher > functions significantly. Oh, interesting, maybe we should just leave it alone. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Jan Wieck wrote: > > > The hard limit is certainly no more than 64K, since we store these > > > numbers in half of an atttypmod. In practice I suspect the limit may > > > be less; Jan would be more likely to remember... > > > > It is arbitrary of course. I don't recall completely, have to > > dig into the code, but there might be some side effect when > > mucking with it. > > > > The NUMERIC code increases the actual internal precision when > > doing multiply and divide, what happens a gazillion times > > when doing higher functions like trigonometry. I think there > > was some connection between the max precision and how high > > this internal precision can grow, so increasing the precision > > might affect the computational performance of such higher > > functions significantly. > > Oh, interesting, maybe we should just leave it alone. As said, I have to look at the code. I'm pretty sure that it currently will not use hundreds of digits internally if you use only a few digits in your schema. So changing it isn't that dangerous. But who's going to write and run a regression test, ensuring that the new high limit can really be supported. Ididn't even run the numeric_big test lately, which tests with 500 digits precision at least ... and therefore takessome time (yawn). Increasing the number of digits used you first have to have some other tool to generate the test data (I originally used bc(1) with some scripts). Based on that we still claim that our systemdeals correctly with up to 1,000 digits precision. I don't like the idea of bumping up that number to some higher nonsense, claiming we support 32K digits precisionon exact numeric, and noone ever tested if natural log really returns it's result in that precision insteadof a 30,000 digit precise approximation. I missed some of the discussion, because I considered the 1,000 digits already beeing complete nonsense and droppedthe thread. So could someone please enlighten me what the real reason for increasing our precision is? AFAIR it had something to do with the docs. If it's just because the docs and the code aren't in sync, I'd votefor changing the docs. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
> Jan Wieck wrote: > > > The hard limit is certainly no more than 64K, since we store these > > > numbers in half of an atttypmod. In practice I suspect the limit may > > > be less; Jan would be more likely to remember... > > > > It is arbitrary of course. I don't recall completely, have to > > dig into the code, but there might be some side effect when > > mucking with it. > > > > The NUMERIC code increases the actual internal precision when > > doing multiply and divide, what happens a gazillion times > > when doing higher functions like trigonometry. I think there > > was some connection between the max precision and how high > > this internal precision can grow, so increasing the precision > > might affect the computational performance of such higher > > functions significantly. > > Oh, interesting, maybe we should just leave it alone. So are we going to just fix the docs? -- Tatsuo Ishii
Jan Wieck wrote: > Bruce Momjian wrote: > > Jan Wieck wrote: > > > > The hard limit is certainly no more than 64K, since we store these > > > > numbers in half of an atttypmod. In practice I suspect the limit may > > > > be less; Jan would be more likely to remember... > > > > > > It is arbitrary of course. I don't recall completely, have to > > > dig into the code, but there might be some side effect when > > > mucking with it. > > > > > > The NUMERIC code increases the actual internal precision when > > > doing multiply and divide, what happens a gazillion times > > > when doing higher functions like trigonometry. I think there > > > was some connection between the max precision and how high > > > this internal precision can grow, so increasing the precision > > > might affect the computational performance of such higher > > > functions significantly. > > > > Oh, interesting, maybe we should just leave it alone. > > As said, I have to look at the code. I'm pretty sure that it > currently will not use hundreds of digits internally if you > use only a few digits in your schema. So changing it isn't > that dangerous. > > But who's going to write and run a regression test, ensuring > that the new high limit can really be supported. I didn't > even run the numeric_big test lately, which tests with 500 > digits precision at least ... and therefore takes some time > (yawn). Increasing the number of digits used you first have > to have some other tool to generate the test data (I > originally used bc(1) with some scripts). Based on that we > still claim that our system deals correctly with up to 1,000 > digits precision. > > I don't like the idea of bumping up that number to some > higher nonsense, claiming we support 32K digits precision on > exact numeric, and noone ever tested if natural log really > returns it's result in that precision instead of a 30,000 > digit precise approximation. > > I missed some of the discussion, because I considered the > 1,000 digits already beeing complete nonsense and dropped the > thread. So could someone please enlighten me what the real > reason for increasing our precision is? AFAIR it had > something to do with the docs. If it's just because the docs > and the code aren't in sync, I'd vote for changing the docs. I have done a little more research on this. If you create a numeric with no precision: CREATE TABLE test (x numeric); You can insert numerics that are greater in length that 1000 digits: INSERT INTO test values ('1111(continues 1010 times)'); You can even do computations on it: SELECT x+1 FROM test; 1000 is pretty arbitrary. If we can handle 1000, I can't see how larger values somehow could fail. Also, the numeric regression tests takes much longer than the other tests. I don't see why a test of that length is required, compared to the other tests. Probably time to pair it back a little. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Jan Wieck wrote: > > > > I missed some of the discussion, because I considered the > > 1,000 digits already beeing complete nonsense and dropped the > > thread. So could someone please enlighten me what the real > > reason for increasing our precision is? AFAIR it had > > something to do with the docs. If it's just because the docs > > and the code aren't in sync, I'd vote for changing the docs. > > I have done a little more research on this. If you create a numeric > with no precision: > > CREATE TABLE test (x numeric); > > You can insert numerics that are greater in length that 1000 digits: > > INSERT INTO test values ('1111(continues 1010 times)'); > > You can even do computations on it: > > SELECT x+1 FROM test; > > 1000 is pretty arbitrary. If we can handle 1000, I can't see how larger > values somehow could fail. And I can't see what more than 1,000 digits would be good for. Bruce, your research is neat, but IMHO wasted time. Why do we need to change it now? Is the more important issue (doing the internal storage representation in base10,000, done yet? If not, we can open up for unlimited precision at that time. Please, adjust the docs for now, drop the issue and let's do something useful. > Also, the numeric regression tests takes much longer than the other > tests. I don't see why a test of that length is required, compared to > the other tests. Probably time to pair it back a little. What exactly do you mean with "pair it back"? Shrinking the precision of the test or reducing it's coverage of functionality? For the former, it only uses 10 of the possible 1,000 digits after the decimal point. Run the numeric_big test(which uses 800) at least once and you'll see what kind of difference precision makes. And on functionality, it is absolutely insufficient for numerical functionality that has possible carry, rounding etc. issues, to check a function just for one single known value, and if it computes that result correctly,consider it OK for everything. I thought the actual test is sloppy already ... but it's still too much for you ... hmmmm. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
... > Also, the numeric regression tests takes much longer than the other > tests. I don't see why a test of that length is required, compared to > the other tests. Probably time to pair it back a little. The numeric types are inherently slow. You might look at what effect you can achieve by restructuring that regression test to more closely resemble the other tests. In particular, it defines several source tables, each one of which containing similar initial values. And it defines a results table, into which intermediate results are placed, which are then immediately queried for display and comparison to obtain a test result. If handling the values is slow, we could certainly remove these intermediate steps and still get most of the test coverage. On another related topic: I've been wanting to ask: we have in a few cases moved aggregate calculations from small, fast data types to using numeric as the accumulator. It would be nice imho to allow, say, an int8 accumulator for an int4 data type, rather than requiring numeric. But not all platforms (I assume) have an int8 data type. So we would need to be able to fall back to numeric for those platforms which need to use it. What would it take to make some of the catalogs configurable or sensitive to configuration results? - Thomas
Thomas Lockhart <lockhart@fourpalms.org> writes: > I've been wanting to ask: we have in a few cases moved aggregate > calculations from small, fast data types to using numeric as the > accumulator. Which ones are you concerned about? As of 7.2, the only ones that use numeric accumulators for non-numeric input types are aggname | basetype | aggtransfn | transtype ----------+-------------+---------------------+-------------avg | int8 | int8_accum | _numericsum | int8 | int8_sum | numericstddev | int2 | int2_accum | _numericstddev | int4 | int4_accum | _numericstddev | int8 | int8_accum | _numericvariance | int2 | int2_accum | _numericvariance | int4 | int4_accum | _numericvariance | int8 | int8_accum | _numeric All of these seem to have good precision/range arguments for using numeric accumulators, or to be enough off the beaten track that it's not worth much angst to optimize them. regards, tom lane
> Which ones are you concerned about? As of 7.2, the only ones that use > numeric accumulators for non-numeric input types are ... OK, I did imply that I've been wanting to ask this for some time. I should have asked during the 7.1 era, when this was true for more cases. :) > All of these seem to have good precision/range arguments for using > numeric accumulators, or to be enough off the beaten track that it's > not worth much angst to optimize them. Well, they *are* on the beaten track for someone, just not you! ;) I'd think that things like stddev might be OK with 52 bits of accumulation, so could be done with doubles. Were they implemented that way at one time? Do we have a need to provide precision greater than that, or to guard against the (unlikely) case of having so many values that a double-based accumulator overflows its ability to see the next value? I'll point out that for the case of accumulating so many integers that they can't work with a double, the alternative implementation of using numeric may approach infinite computation time. But in any case, I can ask the same question, only reversed: We now have some aggregate functions which use, say, int4 to accumulate int4 values, if the target platform does *not* support int8. What would it take to make the catalogs configurable or able to respond to configuration results so that, for example, platforms without int8 support could instead use numeric or double values as a substitute? - Thomas
Thomas Lockhart <lockhart@fourpalms.org> writes: >> All of these seem to have good precision/range arguments for using >> numeric accumulators, or to be enough off the beaten track that it's >> not worth much angst to optimize them. > Well, they *are* on the beaten track for someone, just not you! ;) > I'd think that things like stddev might be OK with 52 bits of > accumulation, so could be done with doubles. ISTM that people who are willing to have it done in a double can simply write stddev(x::float8). Of course you will rejoin that if they want it done in a numeric, they can write stddev(x::numeric) ... but since we are talking about exact inputs, I would prefer that the default behavior be to carry out the summation without loss of precision. The stddev calculation *is* subject to problems if you don't do the summation as accurately as you can. > Do we have a need to provide precision greater than > that, or to guard against the (unlikely) case of having so many values > that a double-based accumulator overflows its ability to see the next > value? You don't see the cancellation problems inherent in N*sum(x^2) - sum(x)^2? You're likely to be subtracting bignums even with not all that many input values; they just have to be large input values. > But in any case, I can ask the same question, only reversed: > We now have some aggregate functions which use, say, int4 to accumulate > int4 values, if the target platform does *not* support int8. What would > it take to make the catalogs configurable or able to respond to > configuration results so that, for example, platforms without int8 > support could instead use numeric or double values as a substitute? Haven't thought hard about it. I will say that I don't like the idea of changing the declared output type of the aggregates across platforms. Changing the internal implementation (ie, transtype) would be acceptable --- but I doubt it's worth the trouble. In most other arguments that touch on this point, I seem to be one of the few holdouts for insisting that we worry about int8-less platforms anymore at all ;-). For those few old platforms, the 7.2 behavior of avg(int) and sum(int) is no worse than it was for everyone in all pre-7.1 versions; I am not excited about expending significant effort to make it better. regards, tom lane
Jan Wieck wrote: > Bruce Momjian wrote: > > Jan Wieck wrote: > > > > > > I missed some of the discussion, because I considered the > > > 1,000 digits already beeing complete nonsense and dropped the > > > thread. So could someone please enlighten me what the real > > > reason for increasing our precision is? AFAIR it had > > > something to do with the docs. If it's just because the docs > > > and the code aren't in sync, I'd vote for changing the docs. > > > > I have done a little more research on this. If you create a numeric > > with no precision: > > > > CREATE TABLE test (x numeric); > > > > You can insert numerics that are greater in length that 1000 digits: > > > > INSERT INTO test values ('1111(continues 1010 times)'); > > > > You can even do computations on it: > > > > SELECT x+1 FROM test; > > > > 1000 is pretty arbitrary. If we can handle 1000, I can't see how larger > > values somehow could fail. > > And I can't see what more than 1,000 digits would be good > for. Bruce, your research is neat, but IMHO wasted time. > > Why do we need to change it now? Is the more important issue > (doing the internal storage representation in base 10,000, > done yet? If not, we can open up for unlimited precision at > that time. I certainly would like the 10,000 change done, but few of us are capable of doing it. :-( > Please, adjust the docs for now, drop the issue and let's do > something useful. Thats how I got started. The problem is that the limit isn't 1,000. Looking at NUMERIC_MAX_PRECISION, I see it used in gram.y to prevent creation of NUMERIC columns that exceed the maximum length, and I see it used in numeric.c to prevent exponients that exceed the maximum length, but I don't see other cases that would actually enforce the limit in INSERT and other cases. Remember how people complained when I said "unlimited" in the FAQ for some items that actually had a limit. Well, in this case, we have a limit that is only enforced in some places. I would like to see this cleared up on way or the other so the docs would be correct. Jan, any chance on doing the 10,000 change in your spare time? ;-) > > Also, the numeric regression tests takes much longer than the other > > tests. I don't see why a test of that length is required, compared to > > the other tests. Probably time to pair it back a little. > > What exactly do you mean with "pair it back"? Shrinking the > precision of the test or reducing it's coverage of > functionality? > > For the former, it only uses 10 of the possible 1,000 digits > after the decimal point. Run the numeric_big test (which > uses 800) at least once and you'll see what kind of > difference precision makes. > > And on functionality, it is absolutely insufficient for > numerical functionality that has possible carry, rounding > etc. issues, to check a function just for one single known > value, and if it computes that result correctly, consider it > OK for everything. > > I thought the actual test is sloppy already ... but it's > still too much for you ... hmmmm. Well, our regression tests are not intended to test every possible NUMERIC combination, just a resonable subset. As it is now, I often think the regression tests have hung because numeric takes so much longer than any of the other tests. We have had this code in there for a while now, and it is not OS-specific stuff, so I think we should just pair it back so we know it is working. We already have bignumeric for a larger test. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Well, our regression tests are not intended to test every possible > NUMERIC combination, just a resonable subset. As it is now, I often > think the regression tests have hung because numeric takes so much > longer than any of the other tests. We have had this code in there for > a while now, and it is not OS-specific stuff, so I think we should just > pair it back so we know it is working. We already have bignumeric for a > larger test. Bruce, have you even taken one single look at the test? It does 100 of each add, sub, mul and div, these are the fast operations that don't really take much time. Then it does 10 of each sqrt(), ln(), log10(), pow10() and 10 combined power(ln()). These are the time consuming operations, working iterative alas Newton, Taylor and McLaurin. All that is done with 10 digits after the decimal point only! So again, WHAT exactly do you mean with "pair it back"? Sorry, I don't get it. Do you want to remove the entiretest? Reduce it to an INSERT, one SELECT (so that we know the input- and output functions work) and the four basic operators used once? Well, that's a hell of a test, makes me really feel comfortable. Like the mechanic kicking against the tire then saying "I ain't see noth'n wrong with the brakes, ya sure can makea trip in the mountains". Yeah, at least once! Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > Bruce Momjian wrote: > > Well, our regression tests are not intended to test every possible > > NUMERIC combination, just a resonable subset. As it is now, I often > > think the regression tests have hung because numeric takes so much > > longer than any of the other tests. We have had this code in there for > > a while now, and it is not OS-specific stuff, so I think we should just > > pair it back so we know it is working. We already have bignumeric for a > > larger test. > > Bruce, > > have you even taken one single look at the test? It does 100 > of each add, sub, mul and div, these are the fast operations > that don't really take much time. > > Then it does 10 of each sqrt(), ln(), log10(), pow10() and 10 > combined power(ln()). These are the time consuming > operations, working iterative alas Newton, Taylor and > McLaurin. All that is done with 10 digits after the decimal > point only! > > So again, WHAT exactly do you mean with "pair it back"? > Sorry, I don't get it. Do you want to remove the entire test? > Reduce it to an INSERT, one SELECT (so that we know the > input- and output functions work) and the four basic > operators used once? Well, that's a hell of a test, makes me > really feel comfortable. Like the mechanic kicking against > the tire then saying "I ain't see noth'n wrong with the > brakes, ya sure can make a trip in the mountains". Yeah, at > least once! Jan, regression is not a test of the level a developer would use to make sure his code works. It is merely to make sure the install works on a limited number of cases. Having seen zero reports of any numeric failures since we installed it, and seeing it takes >10x times longer than the other tests, I think it should be paired back. Do we really need 10 tests of each complex function? I think one would do the trick. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Jan Wieck wrote: > > Bruce Momjian wrote: > > > Well, our regression tests are not intended to test every possible > > > NUMERIC combination, just a resonable subset. As it is now, I often > > > think the regression tests have hung because numeric takes so much > > > longer than any of the other tests. We have had this code in there for > > > a while now, and it is not OS-specific stuff, so I think we should just > > > pair it back so we know it is working. We already have bignumeric for a > > > larger test. > > > > Bruce, > > > > have you even taken one single look at the test? It does 100 > > of each add, sub, mul and div, these are the fast operations > > that don't really take much time. > > > > Then it does 10 of each sqrt(), ln(), log10(), pow10() and 10 > > combined power(ln()). These are the time consuming > > operations, working iterative alas Newton, Taylor and > > McLaurin. All that is done with 10 digits after the decimal > > point only! > > > > So again, WHAT exactly do you mean with "pair it back"? > > Sorry, I don't get it. Do you want to remove the entire test? > > Reduce it to an INSERT, one SELECT (so that we know the > > input- and output functions work) and the four basic > > operators used once? Well, that's a hell of a test, makes me > > really feel comfortable. Like the mechanic kicking against > > the tire then saying "I ain't see noth'n wrong with the > > brakes, ya sure can make a trip in the mountains". Yeah, at > > least once! > > Jan, regression is not a test of the level a developer would use to make > sure his code works. It is merely to make sure the install works on a > limited number of cases. Having seen zero reports of any numeric > failures since we installed it, and seeing it takes >10x times longer > than the other tests, I think it should be paired back. Do we really > need 10 tests of each complex function? I think one would do the trick. You forgot who wrote that code originally. I feel alot better WITH the tests in place :-) And if it's merely to make sure the install worked, man who is doing source installations these days and runs the regression tests anyway? Most people throw in a RPM or the like, only a few serious users install from sources,and only a fistfull of them then runs regression. Aren't it mostly developers and distro-maintainers who use that directory? I think your entire point isn't justweak, IMNSVHO you don't really have a point. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > You forgot who wrote that code originally. I feel alot > better WITH the tests in place :-) > > And if it's merely to make sure the install worked, man who > is doing source installations these days and runs the > regression tests anyway? Most people throw in a RPM or the > like, only a few serious users install from sources, and only > a fistfull of them then runs regression. > > Aren't it mostly developers and distro-maintainers who use > that directory? I think your entire point isn't just weak, > IMNSVHO you don't really have a point. It is my understanding that RPM does run that test. My main issue is why does numeric have to be so much larger than the other tests? I have not heard that explained. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
... > Jan, regression is not a test of the level a developer would use to make > sure his code works. It is merely to make sure the install works on a > limited number of cases. Having seen zero reports of any numeric > failures since we installed it, and seeing it takes >10x times longer > than the other tests, I think it should be paired back. Do we really > need 10 tests of each complex function? I think one would do the trick. Whoops. We rely on the regression tests to make sure that previous behaviors continue to be valid behaviors. Another use is to verify that a particular installation can reproduce this same test. But regression testing is a fundamental and essential development tool, precisely because it covers cases outside the range you might be thinking of testing as you do development. As a group, we might tend to underestimate the value of this, which could be evidenced by the fact that our regression test suite has not grown substantially more than it has over the years. It could have many more tests within each module, and bug reports *could* be fed back into regression updates to make sure that failures do not reappear. All imho of course ;) - Thomas
... > It is my understanding that RPM does run that test. My main issue is > why does numeric have to be so much larger than the other tests? I have > not heard that explained. afaict it is not larger. It *does* take more time, but the number of tests is relatively small, or at least compatible with the number of tests which appear, or should appear, in other tests of data types covering a large problem space (e.g. date/time). It does illustrate that BCD-like encodings are expensive, and that machine-supported math is usually a win. If it is a big deal, jump in and widen the internal math operations! - Thomas
Bruce Momjian wrote: > Jan Wieck wrote: > > You forgot who wrote that code originally. I feel alot > > better WITH the tests in place :-) > > > > And if it's merely to make sure the install worked, man who > > is doing source installations these days and runs the > > regression tests anyway? Most people throw in a RPM or the > > like, only a few serious users install from sources, and only > > a fistfull of them then runs regression. > > > > Aren't it mostly developers and distro-maintainers who use > > that directory? I think your entire point isn't just weak, > > IMNSVHO you don't really have a point. > > It is my understanding that RPM does run that test. My main issue is > why does numeric have to be so much larger than the other tests? I have > not heard that explained. Well, I heard Thomas commenting that it's horribly slow implemented (or so, don't recall his exact wording). But he's right. I think the same test done with float8 would run in less than a tenth of that time. This is only an explanation "why it takes so long"? It is no argument pro or con the test itself. I think I made my point clear enough, that I consider calling these functions just once is plain sloppy. But that'sjust my opinion. What do others think? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <janwieck@yahoo.com> writes: > I think I made my point clear enough, that I consider calling > these functions just once is plain sloppy. But that's just > my opinion. What do others think? I don't have a problem with the current length of the numeric test. The original form of it (now shoved over to bigtests) did seem excessively slow to me ... but I can live with this one. I do agree that someone ought to reimplement numeric using base10k arithmetic ... but it's not bugging me so much that I'm likely to get around to it anytime soon myself ... Bruce, why is there no TODO item for that project? regards, tom lane
Thomas Lockhart wrote: > ... > > It is my understanding that RPM does run that test. My main issue is > > why does numeric have to be so much larger than the other tests? I have > > not heard that explained. > > afaict it is not larger. It *does* take more time, but the number of > tests is relatively small, or at least compatible with the number of > tests which appear, or should appear, in other tests of data types > covering a large problem space (e.g. date/time). > > It does illustrate that BCD-like encodings are expensive, and that > machine-supported math is usually a win. If it is a big deal, jump in > and widen the internal math operations! OK, as long as everyone else is fine with the tests, we can leave it alone. The concept that the number of tests is realisitic, and that they are just slower than other data types, makes sense. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > Jan Wieck <janwieck@yahoo.com> writes: > > I think I made my point clear enough, that I consider calling > > these functions just once is plain sloppy. But that's just > > my opinion. What do others think? > > I don't have a problem with the current length of the numeric test. > The original form of it (now shoved over to bigtests) did seem > excessively slow to me ... but I can live with this one. > > I do agree that someone ought to reimplement numeric using base10k > arithmetic ... but it's not bugging me so much that I'm likely > to get around to it anytime soon myself ... > > Bruce, why is there no TODO item for that project? Not sure. I was aware of it for a while. Added: * Change NUMERIC data type to use base 10,000 internally -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tatsuo Ishii wrote: > > Jan Wieck wrote: > > > > The hard limit is certainly no more than 64K, since we store these > > > > numbers in half of an atttypmod. In practice I suspect the limit may > > > > be less; Jan would be more likely to remember... > > > > > > It is arbitrary of course. I don't recall completely, have to > > > dig into the code, but there might be some side effect when > > > mucking with it. > > > > > > The NUMERIC code increases the actual internal precision when > > > doing multiply and divide, what happens a gazillion times > > > when doing higher functions like trigonometry. I think there > > > was some connection between the max precision and how high > > > this internal precision can grow, so increasing the precision > > > might affect the computational performance of such higher > > > functions significantly. > > > > Oh, interesting, maybe we should just leave it alone. > > So are we going to just fix the docs? OK, I have updated the docs. Patch attached. I have also added this to the TODO list: * Change NUMERIC to enforce the maximum precision, and increase it -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 Index: datatype.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v retrieving revision 1.87 diff -c -r1.87 datatype.sgml *** datatype.sgml 3 Apr 2002 05:39:27 -0000 1.87 --- datatype.sgml 13 Apr 2002 01:26:54 -0000 *************** *** 506,518 **** <title>Arbitrary Precision Numbers</title> <para> ! The type <type>numeric</type> can store numbers of practically ! unlimited size and precision, while being able to store all ! numbers and carry out all calculations exactly. It is especially ! recommended for storing monetary amounts and other quantities ! where exactness is required. However, the <type>numeric</type> ! type is very slow compared to the floating-point types described ! in the next section. </para> <para> --- 506,517 ---- <title>Arbitrary Precision Numbers</title> <para> ! The type <type>numeric</type> can store numbers with up to 1,000 ! digits of precision and perform calculations exactly. It is ! especially recommended for storing monetary amounts and other ! quantities where exactness is required. However, the ! <type>numeric</type> type is very slow compared to the ! floating-point types described in the next section. </para> <para>
Jan Wieck wrote: > > Oh, interesting, maybe we should just leave it alone. > > As said, I have to look at the code. I'm pretty sure that it > currently will not use hundreds of digits internally if you > use only a few digits in your schema. So changing it isn't > that dangerous. > > But who's going to write and run a regression test, ensuring > that the new high limit can really be supported. I didn't > even run the numeric_big test lately, which tests with 500 > digits precision at least ... and therefore takes some time > (yawn). Increasing the number of digits used you first have > to have some other tool to generate the test data (I > originally used bc(1) with some scripts). Based on that we > still claim that our system deals correctly with up to 1,000 > digits precision. > > I don't like the idea of bumping up that number to some > higher nonsense, claiming we support 32K digits precision on > exact numeric, and noone ever tested if natural log really > returns it's result in that precision instead of a 30,000 > digit precise approximation. > > I missed some of the discussion, because I considered the > 1,000 digits already beeing complete nonsense and dropped the > thread. So could someone please enlighten me what the real > reason for increasing our precision is? AFAIR it had > something to do with the docs. If it's just because the docs > and the code aren't in sync, I'd vote for changing the docs. Jan, if the numeric code works on 100 or 500 digits, could it break with 10,000 digits. Is there a reason to believe longer digits could cause problems not present in shorter tests? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Jan, regression is not a test of the level a developer would use to make > sure his code works. It is merely to make sure the install works on a > limited number of cases. News to me! If anything, I don't think a lot of the current regression tests are comprehensive enough! For the SET/DROP NOT NULL patch I submitted, I included a regression test that tests every one of the preconditions in my code - that way if anything gets changed or broken, we'll find out very quickly. I personally don't have a problem with the time taken to regression test - and I think that trimming the numeric test _might_ be a false economy. Who knows what's going to turn around and bite us oneday? > Having seen zero reports of any numeric > failures since we installed it, and seeing it takes >10x times longer > than the other tests, I think it should be paired back. Do we really > need 10 tests of each complex function? I think one would do the trick. A good point tho, I didn't submit a regression test that tries to ALTER 3 different non-existent tables to check for failures - one test was enough... Chris
Christopher Kings-Lynne wrote: > > Having seen zero reports of any numeric > > failures since we installed it, and seeing it takes >10x times longer > > than the other tests, I think it should be paired back. Do we really > > need 10 tests of each complex function? I think one would do the trick. > > A good point tho, I didn't submit a regression test that tries to ALTER 3 > different non-existent tables to check for failures - one test was enough... That was my point. Is there much value in testing each function ten times. Anyway, seems only I care so I will drop it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Christopher Kings-Lynne wrote: > > > Having seen zero reports of any numeric > > > failures since we installed it, and seeing it takes >10x times longer > > > than the other tests, I think it should be paired back. Do we really > > > need 10 tests of each complex function? I think one would do the trick. > > > > A good point tho, I didn't submit a regression test that tries to ALTER 3 > > different non-existent tables to check for failures - one test was enough... > > That was my point. Is there much value in testing each function ten > times. Anyway, seems only I care so I will drop it. Yes there is value in it. There is conditional code in it that depends on the values. I wrote that before (I saidthere are possible carry, rounding etc. issues), and it looked to me that you simply ignored these facts. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #