Thread: Re: [HACKERS] regression bigtest needs very long time

Re: [HACKERS] regression bigtest needs very long time

From
Michael Robinson
Date:
wieck@debis.com (Jan Wieck) writes:
>> >     We certainly should think about a general speedup of NUMERIC.
>>
>> How would we do that?  I assumed it was already pretty optimized.
>
>    The  speedup  (I expect) results from the fact that the inner
>    loops of add,  subtract  and  multiply  will  then  handle  4
>    decimal digits per cycle instead of one! Doing a
>
>        1234.5678 + 2345.6789
>
>    then needs 2 internal cycles instead of 8. And
>
>        100.123 + 12030.12345
>
>    needs  4  cycles instead of 10 (because the decimal point has
>    the same meaning in base  10000  the  last  value  is  stored
>    internally  as  short  ints 1, 2030, 1234, 5000). This is the
>    worst case and it still saved 60% of the innermost cycles!

The question, though, becomes what percentage of operations on a 
NUMERIC field are arithmetic, and what percentage are storage/retrieval.

For databases that simply store/retrieve data, your "optimization" will have
the effect of significantly increasing format conversion overhead.  With a
512-byte table, four packed-decimal digits can be converted in two
primitive operations, but base-10000 will require three divisions, 
three subtractions, four additions, plus miscellaneous data shuffling.
-Michael Robinson



Re: [HACKERS] regression bigtest needs very long time

From
Tom Lane
Date:
Michael Robinson <robinson@netrinsics.com> writes:
> The question, though, becomes what percentage of operations on a 
> NUMERIC field are arithmetic, and what percentage are storage/retrieval.

Good point.

> For databases that simply store/retrieve data, your "optimization" will have
> the effect of significantly increasing format conversion overhead.  With a
> 512-byte table, four packed-decimal digits can be converted in two
> primitive operations, but base-10000 will require three divisions, 
> three subtractions, four additions, plus miscellaneous data shuffling.

That is something to worry about, but I think the present implementation
unpacks the storage format into calculation format before converting
to text.  Getting rid of the unpack step by making storage and calc
formats the same would probably buy enough speed to pay for the extra
conversion arithmetic.
        regards, tom lane


Re: [HACKERS] regression bigtest needs very long time

From
Thomas Lockhart
Date:
> > The question, though, becomes what percentage of operations on a
> > NUMERIC field are arithmetic, and what percentage are storage/retrieval.
> Good point.

We assume that most data stays inside the database on every query.
That is, one should optimize for comparison/calculation speed, not
formatting speed. If you are comparing a bunch of rows to return one,
you will be much happier if the comparison happens quickly, as opposed
to doing that slowly but formatting the single output value quickly.
An RDBMS can't really try to optimize for the opposite case, since
that isn't how it is usually used...
                   - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] regression bigtest needs very long time

From
wieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

>
> Michael Robinson <robinson@netrinsics.com> writes:
> > The question, though, becomes what percentage of operations on a
> > NUMERIC field are arithmetic, and what percentage are storage/retrieval.
>
> Good point.
>
> > For databases that simply store/retrieve data, your "optimization" will have
> > the effect of significantly increasing format conversion overhead.  With a
> > 512-byte table, four packed-decimal digits can be converted in two
> > primitive operations, but base-10000 will require three divisions,
> > three subtractions, four additions, plus miscellaneous data shuffling.
>
> That is something to worry about, but I think the present implementation
> unpacks the storage format into calculation format before converting
> to text.  Getting rid of the unpack step by making storage and calc
> formats the same would probably buy enough speed to pay for the extra
> conversion arithmetic.

    What  I'm  actually  wondering  about  is  why the hell using
    NUMERIC data type for fields  where  the  database  shouldn't
    calculate on. Why not using TEXT in that case?

    OTOH, I don't think that the format conversion base 10000->10
    overhead will be that significant compared  against  what  in
    summary  must  happen until one tuple is ready to get sent to
    the frontend. Then, ALL our output functions allocate  memory
    for the string representation and at least copy the result to
    there.   How  many  arithmetic   operations   are   performed
    internally  to  create  the  output  of an int4 or float8 via
    sprintf()?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] regression bigtest needs very long time

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>     What  I'm  actually  wondering  about  is  why the hell using
>     NUMERIC data type for fields  where  the  database  shouldn't
>     calculate on. Why not using TEXT in that case?

He didn't say his application would be *all* I/O; he was just concerned
about whether the change would be a net loss if he did more I/O than
calculation.  Seems like a reasonable concern to me.

>     OTOH, I don't think that the format conversion base 10000->10
>     overhead will be that significant compared  against  what  in
>     summary  must  happen until one tuple is ready to get sent to
>     the frontend.

I agree, but it's still good if you can avoid slowing it down.

Meanwhile, I'd still like to see the runtime of the 'numeric'
regression test brought down to something comparable to one
of the other regression tests.  How about cutting the precision
it uses from (300,100) down to something sane, like say (30,10)?
I do not believe for a moment that there are any portability bugs
that will be uncovered by the 300-digit case but not by a 30-digit
case.
        regards, tom lane


Re: [HACKERS] regression bigtest needs very long time

From
Michael Robinson
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> > The question, though, becomes what percentage of operations on a
>> > NUMERIC field are arithmetic, and what percentage are storage/retrieval.
>> Good point.
>We assume that most data stays inside the database on every query.
>That is, one should optimize for comparison/calculation speed, not
>formatting speed. If you are comparing a bunch of rows to return one,
>you will be much happier if the comparison happens quickly, as opposed
>to doing that slowly but formatting the single output value quickly.
>An RDBMS can't really try to optimize for the opposite case, since
>that isn't how it is usually used...

The optimizations under discussion will not significantly affect comparison
speed one way or the other, so comparison speed is a moot issue.

The question, really, is how often do you do this:
   select bignum from table where key = condition

versus this:
   select bignum1/bignum2 from table where key = condition

or this:
   select * from table where bignum1/bignum2 = condition

-Michael Robinson



Re: [HACKERS] regression bigtest needs very long time

From
Thomas Lockhart
Date:
> I do not believe for a moment that there are any portability bugs
> that will be uncovered by the 300-digit case but not by a 30-digit
> case.

Yeah, just gratuitous showmanship ;)

And think about those poor 486 machines. Maybe Jan is trying to burn
them out so they get replaced with something reasonable...
                 - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] regression bigtest needs very long time

From
Tom Lane
Date:
Michael Robinson <robinson@netrinsics.com> writes:
> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> We assume that most data stays inside the database on every query.
>> That is, one should optimize for comparison/calculation speed, not
>> formatting speed. If you are comparing a bunch of rows to return one,
>> you will be much happier if the comparison happens quickly, as opposed
>> to doing that slowly but formatting the single output value quickly.

> The optimizations under discussion will not significantly affect comparison
> speed one way or the other, so comparison speed is a moot issue.

On what do you base that assertion?  I'd expect comparisons to be sped
up significantly: no need to unpack the storage format, and the inner
loop handles four digits per iteration instead of one.
        regards, tom lane