Thread: regression bigtest needs very long time
Hi, I always execute 'regression test' and 'regression bigtest' when PostgreSQL was enhanced. However,'regression bigtest' needs the very long processing time in PostgreSQL-6.5. In my computer, it is taken of about 1 hour. The reason why the processing time is long is because 1000 digits are calculated using the 'LOG' and 'POWER' function. Actual statement in "postgresql-6.5/src/test/regress/sql/ numeric_big.sql" is the following. INSERT INTO num_result SELECT id, 0, POWER('10'::numeric, LN(ABS(round(val,1000)))) FROM num_data WHERE val != '0.0'; But, the processing ends for a few minutes when this "LN(ABS(round(val,1000)))" is made to be "LN(ABS(round(val,30)))". INSERT or SELECT must be tested using the value of 1000 digits, because to handle NUMERIC and DECIMAL data type to 1000 digits is possible. However, I think that there is no necessity of calculating the value of 1000 digits in the 'LOG' function. Comments? -- Regards. SAKAIDA Masaaki <sakaida@psn.co.jp> Personal Software, Inc. Osaka Japan
> Hi, > > I always execute 'regression test' and 'regression bigtest' > when PostgreSQL was enhanced. However,'regression bigtest' needs > the very long processing time in PostgreSQL-6.5. In my computer, > it is taken of about 1 hour. > > The reason why the processing time is long is because 1000 > digits are calculated using the 'LOG' and 'POWER' function. > > Actual statement in "postgresql-6.5/src/test/regress/sql/ > numeric_big.sql" is the following. > > INSERT INTO num_result SELECT id, 0, POWER('10'::numeric, > LN(ABS(round(val,1000)))) FROM num_data WHERE val != '0.0'; > > > But, the processing ends for a few minutes when this > "LN(ABS(round(val,1000)))" is made to be "LN(ABS(round(val,30)))". > > INSERT or SELECT must be tested using the value of 1000 digits, > because to handle NUMERIC and DECIMAL data type to 1000 digits is > possible. > > However, I think that there is no necessity of calculating the > value of 1000 digits in the 'LOG' function. > numeric/decimal is a new type for this release. I assume this extra processing will be removed once we are sure it works. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> wrote: > > > SAKAIDA wrote: > > However, I think that there is no necessity of calculating the > > value of 1000 digits in the 'LOG' function. > > > > numeric/decimal is a new type for this release. I assume this extra > processing will be removed once we are sure it works. Thank you for your reply. At the next version, I hope that 'regression test/bigtest' ends in the short time. The patch as an example which I considered is the following. If this patch is applied, the processing which requires 1.5 hours in the current ends for 5 minutes. -- Regards. SAKAIDA Masaaki <sakaida@psn.co.jp> Osaka, Japan *** postgresql-6.5/src/test/regress/sql/numeric.sql.orig Fri Jun 11 02:49:31 1999 --- postgresql-6.5/src/test/regress/sql/numeric.sql Wed Jun 16 13:46:41 1999 *************** *** 626,632 **** -- * POWER(10, LN(value)) check -- ****************************** DELETE FROM num_result; ! INSERT INTO num_result SELECT id, 0, POWER('10'::numeric, LN(ABS(round(val,300)))) FROM num_data WHERE val != '0.0';SELECT t1.id1, t1.result, t2.expected --- 626,632 ---- -- * POWER(10, LN(value)) check -- ****************************** DELETE FROM num_result; ! INSERT INTO num_result SELECT id, 0, POWER('10'::numeric, LN(ABS(round(val,30)))) FROM num_data WHERE val != '0.0';SELECT t1.id1, t1.result, t2.expected *** postgresql-6.5/src/test/regress/sql/numeric_big.sql.orig Thu Jun 17 19:22:53 1999 --- postgresql-6.5/src/test/regress/sql/numeric_big.sql Thu Jun 17 19:27:36 1999 *************** *** 602,608 **** -- * Natural logarithm check -- ****************************** DELETE FROM num_result; ! INSERT INTO num_result SELECT id, 0, LN(ABS(val)) FROM num_data WHERE val != '0.0'; SELECT t1.id1, t1.result, t2.expected --- 602,608 ---- -- * Natural logarithm check -- ****************************** DELETE FROM num_result; ! INSERT INTO num_result SELECT id, 0, LN(round(ABS(val),30)) FROM num_data WHERE val != '0.0'; SELECT t1.id1, t1.result,t2.expected *************** *** 614,620 **** -- * Logarithm base 10 check -- ****************************** DELETE FROM num_result; ! INSERT INTO num_result SELECT id, 0, LOG('10'::numeric, ABS(val)) FROM num_data WHERE val != '0.0'; SELECT t1.id1,t1.result, t2.expected --- 614,620 ---- -- * Logarithm base 10 check -- ****************************** DELETE FROM num_result; ! INSERT INTO num_result SELECT id, 0, LOG('10'::numeric, round(ABS(val),30)) FROM num_data WHERE val != '0.0'; SELECTt1.id1, t1.result, t2.expected *************** *** 626,632 **** -- * POWER(10, LN(value)) check -- ****************************** DELETE FROM num_result; ! INSERT INTO num_result SELECT id, 0, POWER('10'::numeric, LN(ABS(round(val,1000)))) FROM num_data WHERE val !='0.0'; SELECT t1.id1, t1.result, t2.expected --- 626,632 ---- -- * POWER(10, LN(value)) check -- ****************************** DELETE FROM num_result; ! INSERT INTO num_result SELECT id, 0, POWER('10'::numeric, LN(ABS(round(val,30)))) FROM num_data WHERE val != '0.0';SELECT t1.id1, t1.result, t2.expected
> Bruce Momjian <maillist@candle.pha.pa.us> wrote: > > > > > SAKAIDA wrote: > > > However, I think that there is no necessity of calculating the > > > value of 1000 digits in the 'LOG' function. > > > > > > > numeric/decimal is a new type for this release. I assume this extra > > processing will be removed once we are sure it works. > > Thank you for your reply. At the next version, I hope that > 'regression test/bigtest' ends in the short time. > > The patch as an example which I considered is the following. > If this patch is applied, the processing which requires 1.5 hours > in the current ends for 5 minutes. Just don't run bigtest. It is only for people who are having trouble with the new numeric type. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Just don't run bigtest. It is only for people who are having trouble > with the new numeric type. I don't mind too much that bigtest takes forever --- as you say, it shouldn't be run except by people who want a thorough test. But I *am* unhappy that the regular numeric test takes much longer than all the other regression tests put together. That's an unreasonable amount of effort spent on one feature, and it gets really annoying for someone like me who's in the habit of running the regress tests after any update. Is there anything this test is likely to catch that wouldn't get caught with a much narrower field width (say 10 digits instead of 30)? regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Just don't run bigtest. It is only for people who are having trouble > > with the new numeric type. > > I don't mind too much that bigtest takes forever --- as you say, > it shouldn't be run except by people who want a thorough test. > > But I *am* unhappy that the regular numeric test takes much longer than > all the other regression tests put together. That's an unreasonable > amount of effort spent on one feature, and it gets really annoying for > someone like me who's in the habit of running the regress tests after > any update. Is there anything this test is likely to catch that > wouldn't get caught with a much narrower field width (say 10 digits > instead of 30)? Oh, I didn't realize this. We certainly should think about reducing the time spent on it, though it is kind of lame to be testing numeric in a precision that is less than the standard int4 type. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
SAKAIDA wrote: > > Bruce Momjian <maillist@candle.pha.pa.us> wrote: > > > > > SAKAIDA wrote: > > > However, I think that there is no necessity of calculating the > > > value of 1000 digits in the 'LOG' function. > > > > > > > numeric/decimal is a new type for this release. I assume this extra > > processing will be removed once we are sure it works. > > Thank you for your reply. At the next version, I hope that > 'regression test/bigtest' ends in the short time. > > The patch as an example which I considered is the following. > If this patch is applied, the processing which requires 1.5 hours > in the current ends for 5 minutes. The test was intended to check the internal low level functions of the NUMERIC datatype against MANY possible values. That's the reason for the high precision resulting in this runtime. That was a wanted side effect, not a bug! Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Hi, > > > Bruce Momjian wrote: > > > Just don't run bigtest. It is only for people who are having trouble > > > with the new numeric type. > > > > Tom Lane wrote: > > I don't mind too much that bigtest takes forever --- as you say, > > it shouldn't be run except by people who want a thorough test. At the end of regression normal test, the following message is displayed. "To run the optional huge test(s) too type 'make bigtest'" Many users, especialy those who install "PostgreSQL" for the first time, may type 'make bigtest' and may feel the PostgreSQL unstable. Because the bigtest outputs no messages for long time and seems to be no executing at the point of numeric testing. Therefore, if it is not necessary for the general user to execute "regression bigtest", I think that the message of 'make bigtest' should be removed or that the message should be changed like "it takes several hours ....". > > But I *am* unhappy that the regular numeric test takes much longer than > > all the other regression tests put together. That's an unreasonable > > amount of effort spent on one feature, and it gets really annoying for > > someone like me who's in the habit of running the regress tests after > > any update. I think so too. > > Is there anything this test is likely to catch that > > wouldn't get caught with a much narrower field width (say 10 digits > > instead of 30)? > > Bruce Momjian wrote: > Oh, I didn't realize this. We certainly should think about reducing the > time spent on it, though it is kind of lame to be testing numeric in a > precision that is less than the standard int4 type. -- Regards. SAKAIDA Masaaki <sakaida@psn.co.jp> Osaka, Japan
Bruce Momjian wrote: > Oh, I didn't realize this. We certainly should think about reducing the > time spent on it, though it is kind of lame to be testing numeric in a > precision that is less than the standard int4 type. We certainly should think about a general speedup of NUMERIC. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> Bruce Momjian wrote: > > > Oh, I didn't realize this. We certainly should think about reducing the > > time spent on it, though it is kind of lame to be testing numeric in a > > precision that is less than the standard int4 type. > > We certainly should think about a general speedup of NUMERIC. How would we do that? I assumed it was already pretty optimized. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> At the end of regression normal test, the following message > is displayed. > > "To run the optional huge test(s) too type 'make bigtest'" > > Many users, especialy those who install "PostgreSQL" for the > first time, may type 'make bigtest' and may feel the PostgreSQL > unstable. Because the bigtest outputs no messages for long time > and seems to be no executing at the point of numeric testing. > > Therefore, if it is not necessary for the general user to > execute "regression bigtest", I think that the message of > 'make bigtest' should be removed or that the message should be > changed like "it takes several hours ....". Warning added: These big tests can take over an hour to complete -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > > Bruce Momjian wrote: > > > > > Oh, I didn't realize this. We certainly should think about reducing the > > > time spent on it, though it is kind of lame to be testing numeric in a > > > precision that is less than the standard int4 type. > > > > We certainly should think about a general speedup of NUMERIC. > > How would we do that? I assumed it was already pretty optimized. By reimplementing the entire internals from scratch again :-) For now the db storage format is something like packed decimal. Two digits fit into one byte. Sign, scale and precision are stored in a header. For computations, this gets unpacked so every digit is stored in one byte and all the computations are performed on the digit level and base 10. Computers are good in performing computations in other bases (hex, octal etc.). And we can assume that any architecture where PostgreSQL can be installed supports 32 bit integers. Thus, a good choice for an internal base whould be 10000 and the digits(10000) stored in small integers. 1. Converting between decimal (base 10) and base 10000 is relatively simple. One digit(10000) holds 4 digits(10). 2. Computations using a 32 bit integer for carry/borrow are safe because the biggest result of a one digit(10000) add/subtract/multiply cannot exceed the 32 bits. The speedup (I expect) results from the fact that the inner loops of add, subtract and multiply will then handle 4 decimal digits per cycle instead of one! Doing a 1234.5678 + 2345.6789 then needs 2 internal cycles instead of 8. And 100.123 + 12030.12345 needs 4 cycles instead of 10 (because the decimal point has the same meaning in base 10000 the last value is stored internally as short ints 1, 2030, 1234, 5000). This is the worst case and it still saved 60% of the innermost cycles! Rounding and checking for overflow will get a little more difficult, but I think it's worth the efford. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> needs 4 cycles instead of 10 (because the decimal point has > the same meaning in base 10000 the last value is stored > internally as short ints 1, 2030, 1234, 5000). This is the > worst case and it still saved 60% of the innermost cycles! Interesting. How do other Db's do it internally? Anyone know? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> needs 4 cycles instead of 10 (because the decimal point has >> the same meaning in base 10000 the last value is stored >> internally as short ints 1, 2030, 1234, 5000). This is the >> worst case and it still saved 60% of the innermost cycles! > Interesting. How do other Db's do it internally? Anyone know? Probably the same way, if they want to be portable. What Jan is describing is a *real* standard technique (it's recommended in Knuth). AFAIK the only other way to speed up a digit-at-a-time implementation is to drop down to the assembly level and use packed-decimal instructions ... if your machine has any ... One thing worth thinking about is whether the storage format shouldn't be made the same as the calculation format, so as to eliminate the conversion costs. At four decimal digits per int2, it wouldn't cost us anything to do so. regards, tom lane PS: BTW, Jan, if you do not have a copy of Knuth's volume 2, I'd definitely recommend laying your hands on it for this project. His description of multiprecision arithmetic is the best I've seen anywhere. If we thought that the math functions (sqrt, exp, etc) for numerics were really getting used for anything, it might also be fun to try to put in some better algorithms for them. I've got a copy of Cody and Waite, which has been the bible for such things for twenty years. But my guess is that it wouldn't be worth the trouble, except to the extent that it speeds up the regression tests ;-)
Tom Lane wrote: > One thing worth thinking about is whether the storage format shouldn't > be made the same as the calculation format, so as to eliminate the > conversion costs. At four decimal digits per int2, it wouldn't cost > us anything to do so. That's an extra bonus point from the described internal format. > > regards, tom lane > > PS: BTW, Jan, if you do not have a copy of Knuth's volume 2, I'd > definitely recommend laying your hands on it for this project. > His description of multiprecision arithmetic is the best I've seen > anywhere. I don't have so far - thanks for the hint. > If we thought that the math functions (sqrt, exp, etc) for numerics > were really getting used for anything, it might also be fun to try > to put in some better algorithms for them. I've got a copy of Cody > and Waite, which has been the bible for such things for twenty years. > But my guess is that it wouldn't be worth the trouble, except to the > extent that it speeds up the regression tests ;-) They are based on the standard Taylor/McLaurin definitions for those functions. Most times I need trigonometric functions or the like one of my sliderules still has enough precision because I'm unable to draw 0.1mm or more precise with a pencil on a paper. YES, I love to USE sliderules (I have a dozen now, some regular ones, some circular ones, some pocket sized and one circular pocket sized one that looks more like a stopwatch than a sliderule). Thus, usually the precision of float8 should be more than enough for those calculations. Making NUMERIC able to handle these functions in it's extreme precision shouldn't really be that time critical. Remember: The lack of mathematical knowledge never shows up better than in unappropriate precision of numerical calculations. C. F. Gauss (Sorry for the poor translation) What Gauss (born 1777 and the first who knew how to take the square root out of negative numbers) meant by that is, it is stupid to calculate with a precision of 10 or no digits after the decimal point if you're able to measure with 4 digits. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Hi, wieck@debis.com (Jan Wieck) wrote: > > Tom Lane wrote: > > If we thought that the math functions (sqrt, exp, etc) for numerics > > were really getting used for anything, it might also be fun to try > > to put in some better algorithms for them. I've got a copy of Cody > > and Waite, which has been the bible for such things for twenty years. > > But my guess is that it wouldn't be worth the trouble, except to the > > extent that it speeds up the regression tests ;-) > (snip) > > Thus, usually the precision of float8 should be more than > enough for those calculations. Making NUMERIC able to handle > these functions in it's extreme precision shouldn't really be > that time critical. > There are no problem concerning the NUMERIC test of INSERT/ SELECT and add/subtract/multiply/division. The only problem is the processing time. One solution which solves this problem is to change the argument into *float8*. If the following changes are done, the processing will become high-speed than a previous about 10 times. File :"src/regress/sql/numeric.sql" Statement:"INSERT INTO num_result SELECT id, 0, POWER('10'::numeric,LN(ABS(round(val,300)))..." Change: "LN(ABS(round(val,300))))" to: "LN(float8(ABS(round(va,300))))" # Another solution is to automatically convert the argument of the LOG function into double precision data type in the *inside*.(But, I do not know what kind of effect will be caused by this solution.) -- Regards. SAKAIDA Masaaki <sakaida@psn.co.jp> Osaka, Japan
SAKAIDA Masaaki wrote: > There are no problem concerning the NUMERIC test of INSERT/ > SELECT and add/subtract/multiply/division. The only problem is > the processing time. > > One solution which solves this problem is to change the argument > into *float8*. If the following changes are done, the processing > will become high-speed than a previous about 10 times. > > File :"src/regress/sql/numeric.sql" > Statement:"INSERT INTO num_result SELECT id, 0, > POWER('10'::numeric,LN(ABS(round(val,300))) ..." > > Change: "LN(ABS(round(val,300))))" > to: "LN(float8(ABS(round(va,300))))" > > > > # Another solution is to automatically convert the argument of the > LOG function into double precision data type in the *inside*. > (But, I do not know what kind of effect will be caused by this > solution.) The complex functions (LN, LOG, EXP, etc.) where added to NUMERIC for the case someone really needs higher precision than float8. The numeric_big test simply ensures that someone really get's the CORRECT result when computing a logarithm up to hundreds of digits. All the expected results fed into the tables are computed by scripts using bc(1) with a precision 200 digits higher than that used in the test itself. So I'm pretty sure NUMERIC returns a VERY GOOD approximation if I ask for the square root of 2 with 1000 digits. One thing in mathematics that is silently forbidden is to present a result with digits that aren't significant! But it is the user to decide where the significance of his INPUT ends, not the database. So it is up to the user to decide when to loose precision by switching to float. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Hi, wieck@debis.com (Jan Wieck) wrote: > > The complex functions (LN, LOG, EXP, etc.) where added to > NUMERIC for the case someone really needs higher precision > than float8. The numeric_big test simply ensures that > someone really get's the CORRECT result when computing a > logarithm up to hundreds of digits. All the expected results > fed into the tables are computed by scripts using bc(1) with > a precision 200 digits higher than that used in the test > itself. So I'm pretty sure NUMERIC returns a VERY GOOD > approximation if I ask for the square root of 2 with 1000 > digits. I was able to understand the specification for the NUMERIC data type. But, I can not yet understand the specification of the regression normal test. File :"src/regress/sql/numeric.sql" Function : LN(ABS(round(val,300))) ----> LN(ABS(round(val,30))) <----My hope Please teach me, Is there a difference of the calculation algorithm between 30 and 300 digits ? Is there a difference of something like CPU-dependence or like compiler-dependence between 30 and 300 digits ? # If the answer is "NO", I think that the 300 digits case is not necessary once you are sure that it works, because 1. the 30 digits case is equivalent to the 300 digits case. 2. the 300 digits case is slow. 3. It is sufficiently largevalue even in 30 digits. -- Regards. SAKAIDA Masaaki <sakaida@psn.co.jp> Osaka, Japan