Thread: Re: [HACKERS] regression bigtest needs very long time
wieck@debis.com (Jan Wieck) writes: >> > We certainly should think about a general speedup of NUMERIC. >> >> How would we do that? I assumed it was already pretty optimized. > > The speedup (I expect) results from the fact that the inner > loops of add, subtract and multiply will then handle 4 > decimal digits per cycle instead of one! Doing a > > 1234.5678 + 2345.6789 > > then needs 2 internal cycles instead of 8. And > > 100.123 + 12030.12345 > > needs 4 cycles instead of 10 (because the decimal point has > the same meaning in base 10000 the last value is stored > internally as short ints 1, 2030, 1234, 5000). This is the > worst case and it still saved 60% of the innermost cycles! The question, though, becomes what percentage of operations on a NUMERIC field are arithmetic, and what percentage are storage/retrieval. For databases that simply store/retrieve data, your "optimization" will have the effect of significantly increasing format conversion overhead. With a 512-byte table, four packed-decimal digits can be converted in two primitive operations, but base-10000 will require three divisions, three subtractions, four additions, plus miscellaneous data shuffling. -Michael Robinson
Michael Robinson <robinson@netrinsics.com> writes: > The question, though, becomes what percentage of operations on a > NUMERIC field are arithmetic, and what percentage are storage/retrieval. Good point. > For databases that simply store/retrieve data, your "optimization" will have > the effect of significantly increasing format conversion overhead. With a > 512-byte table, four packed-decimal digits can be converted in two > primitive operations, but base-10000 will require three divisions, > three subtractions, four additions, plus miscellaneous data shuffling. That is something to worry about, but I think the present implementation unpacks the storage format into calculation format before converting to text. Getting rid of the unpack step by making storage and calc formats the same would probably buy enough speed to pay for the extra conversion arithmetic. regards, tom lane
> > The question, though, becomes what percentage of operations on a > > NUMERIC field are arithmetic, and what percentage are storage/retrieval. > Good point. We assume that most data stays inside the database on every query. That is, one should optimize for comparison/calculation speed, not formatting speed. If you are comparing a bunch of rows to return one, you will be much happier if the comparison happens quickly, as opposed to doing that slowly but formatting the single output value quickly. An RDBMS can't really try to optimize for the opposite case, since that isn't how it is usually used... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Tom Lane wrote: > > Michael Robinson <robinson@netrinsics.com> writes: > > The question, though, becomes what percentage of operations on a > > NUMERIC field are arithmetic, and what percentage are storage/retrieval. > > Good point. > > > For databases that simply store/retrieve data, your "optimization" will have > > the effect of significantly increasing format conversion overhead. With a > > 512-byte table, four packed-decimal digits can be converted in two > > primitive operations, but base-10000 will require three divisions, > > three subtractions, four additions, plus miscellaneous data shuffling. > > That is something to worry about, but I think the present implementation > unpacks the storage format into calculation format before converting > to text. Getting rid of the unpack step by making storage and calc > formats the same would probably buy enough speed to pay for the extra > conversion arithmetic. What I'm actually wondering about is why the hell using NUMERIC data type for fields where the database shouldn't calculate on. Why not using TEXT in that case? OTOH, I don't think that the format conversion base 10000->10 overhead will be that significant compared against what in summary must happen until one tuple is ready to get sent to the frontend. Then, ALL our output functions allocate memory for the string representation and at least copy the result to there. How many arithmetic operations are performed internally to create the output of an int4 or float8 via sprintf()? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes: > What I'm actually wondering about is why the hell using > NUMERIC data type for fields where the database shouldn't > calculate on. Why not using TEXT in that case? He didn't say his application would be *all* I/O; he was just concerned about whether the change would be a net loss if he did more I/O than calculation. Seems like a reasonable concern to me. > OTOH, I don't think that the format conversion base 10000->10 > overhead will be that significant compared against what in > summary must happen until one tuple is ready to get sent to > the frontend. I agree, but it's still good if you can avoid slowing it down. Meanwhile, I'd still like to see the runtime of the 'numeric' regression test brought down to something comparable to one of the other regression tests. How about cutting the precision it uses from (300,100) down to something sane, like say (30,10)? I do not believe for a moment that there are any portability bugs that will be uncovered by the 300-digit case but not by a 30-digit case. regards, tom lane
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> > The question, though, becomes what percentage of operations on a >> > NUMERIC field are arithmetic, and what percentage are storage/retrieval. >> Good point. >We assume that most data stays inside the database on every query. >That is, one should optimize for comparison/calculation speed, not >formatting speed. If you are comparing a bunch of rows to return one, >you will be much happier if the comparison happens quickly, as opposed >to doing that slowly but formatting the single output value quickly. >An RDBMS can't really try to optimize for the opposite case, since >that isn't how it is usually used... The optimizations under discussion will not significantly affect comparison speed one way or the other, so comparison speed is a moot issue. The question, really, is how often do you do this: select bignum from table where key = condition versus this: select bignum1/bignum2 from table where key = condition or this: select * from table where bignum1/bignum2 = condition -Michael Robinson
> I do not believe for a moment that there are any portability bugs > that will be uncovered by the 300-digit case but not by a 30-digit > case. Yeah, just gratuitous showmanship ;) And think about those poor 486 machines. Maybe Jan is trying to burn them out so they get replaced with something reasonable... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Michael Robinson <robinson@netrinsics.com> writes: > Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> We assume that most data stays inside the database on every query. >> That is, one should optimize for comparison/calculation speed, not >> formatting speed. If you are comparing a bunch of rows to return one, >> you will be much happier if the comparison happens quickly, as opposed >> to doing that slowly but formatting the single output value quickly. > The optimizations under discussion will not significantly affect comparison > speed one way or the other, so comparison speed is a moot issue. On what do you base that assertion? I'd expect comparisons to be sped up significantly: no need to unpack the storage format, and the inner loop handles four digits per iteration instead of one. regards, tom lane