Thread: Faster NUMERIC implementation

Faster NUMERIC implementation

From
Tom Lane
Date:
I've been amusing myself the last several evenings by working on a
reimplementation of the NUMERIC datatype, along the lines of previous
discussion (use base-10000 digits instead of base-10 so that the number
of iterations of the inner loops decreases by a factor of about 4).

It's not ready to commit yet, but I've got it passing the regression
tests, and I find that it runs the 'numeric' test about a factor of five
faster than CVS tip; so it seems worth doing.  A couple questions for
the group:

1. Has anyone got a problem with changing the on-disk representation of
NUMERIC for 7.4?  The only objection I can think of is that it'd prevent
"pg_upgrade" from working ... but we don't have pg_upgrade capability
right now anyway, and I've not heard that anyone is planning to make it
happen for 7.4.

2. The numeric regression test probably isn't a good benchmark for this,
since it spends most of its time pushing around numerics with hundreds of
digits.  I doubt that's representative of common usage.  Can anyone
offer a more real-world benchmark test?
        regards, tom lane


Re: Faster NUMERIC implementation

From
Bruce Momjian
Date:
Sounds great!

---------------------------------------------------------------------------

Tom Lane wrote:
> I've been amusing myself the last several evenings by working on a
> reimplementation of the NUMERIC datatype, along the lines of previous
> discussion (use base-10000 digits instead of base-10 so that the number
> of iterations of the inner loops decreases by a factor of about 4).
> 
> It's not ready to commit yet, but I've got it passing the regression
> tests, and I find that it runs the 'numeric' test about a factor of five
> faster than CVS tip; so it seems worth doing.  A couple questions for
> the group:
> 
> 1. Has anyone got a problem with changing the on-disk representation of
> NUMERIC for 7.4?  The only objection I can think of is that it'd prevent
> "pg_upgrade" from working ... but we don't have pg_upgrade capability
> right now anyway, and I've not heard that anyone is planning to make it
> happen for 7.4.
> 
> 2. The numeric regression test probably isn't a good benchmark for this,
> since it spends most of its time pushing around numerics with hundreds of
> digits.  I doubt that's representative of common usage.  Can anyone
> offer a more real-world benchmark test?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Faster NUMERIC implementation

From
Michael Meskes
Date:
On Wed, Mar 19, 2003 at 10:51:32PM -0500, Tom Lane wrote:
> I've been amusing myself the last several evenings by working on a
> reimplementation of the NUMERIC datatype, along the lines of previous
> discussion (use base-10000 digits instead of base-10 so that the number
> of iterations of the inner loops decreases by a factor of about 4).
> ...

Tom, I do like it since I really believe it will be faster.

But I wonder if we could arrange things so the Numeric stuff wents out
of the backend. As you surely noticed I created a pgtypes lib to make
our special types available to the outside world. So far Numeric is the
only one, but we are working on date (partly finished) and timestamp. I
think it shouldn't be too difficult to make the backend call the
functions inside the dynamic library instead of keeping it inside. That
way we would have to code these functions only once and no double work
is required. 

Of course the lib should then move out of interfaces, but ecpg could
still call it. Any comments?

Michael
-- 
Michael Meskes
Email: Michael@Fam-Meskes.De
ICQ: 179140304
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!


Re: Faster NUMERIC implementation

From
Tom Lane
Date:
Michael Meskes <meskes@postgresql.org> writes:
> But I wonder if we could arrange things so the Numeric stuff wents out
> of the backend.

With suitable #define hacking you could perhaps take care of the code's
dependencies on palloc/pfree ... but elog is harder, and I don't see any
realistic way to handle the backend's function-call conventions as
opposed to conventions that would make sense as a library API.

I don't want to clutter the code by having to support two sets of error
conventions and two APIs.  If you can figure a way around that, great...
        regards, tom lane


Re: Faster NUMERIC implementation

From
Michael Meskes
Date:
On Thu, Mar 20, 2003 at 09:49:30AM -0500, Tom Lane wrote:
> With suitable #define hacking you could perhaps take care of the code's
> dependencies on palloc/pfree ... but elog is harder, and I don't see any
> realistic way to handle the backend's function-call conventions as
> opposed to conventions that would make sense as a library API.
> 
> I don't want to clutter the code by having to support two sets of error
> conventions and two APIs.  If you can figure a way around that, great...

How about some wrapper frunctions in the backend that just call their
helper functions in the lib? Let's be honest maintaining all this code
twice will be very hard to do too. I'd prefer looking for a way to
integrate things. I have no problem with special backend syntax for some
functions. It's not that the API has to be identical. We could have an
open API and a backend API calling the same functions. 

Michael
-- 
Michael Meskes
Email: Michael@Fam-Meskes.De
ICQ: 179140304
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!


Re: Faster NUMERIC implementation

From
Tom Lane
Date:
Michael Meskes <meskes@postgresql.org> writes:
> How about some wrapper frunctions in the backend that just call their
> helper functions in the lib?

I'm not willing to do that for any very large number of functions; the
code clutter and runtime overhead would become significant.

I had some visions, back when we were first doing the v1-call-convention
stuff, that it might be possible to make a script that automatically
interprets 

Datum
numeric_add(PG_FUNCTION_ARGS)
{Numeric        num1 = PG_GETARG_NUMERIC(0);Numeric        num2 = PG_GETARG_NUMERIC(1);
...
PG_RETURN_NUMERIC(res);
}

and generates a derived version like

Numeric
numeric_add(Numeric num1, Numeric num2)
{...
return res;
}

We'd probably have to tighten the consistency of formatting a little
to make that workable, but it seems more attractive than manually
maintaining either two sets of code or a wrapper layer.


But before you get too excited about that, there's also the
error-handling issue --- and I'm definitely not interested in changing
all the subroutines away from elog to funny-return-value conventions.
        regards, tom lane


Re: Faster NUMERIC implementation

From
Michael Meskes
Date:
On Thu, Mar 20, 2003 at 11:48:33AM -0500, Tom Lane wrote:
> I'm not willing to do that for any very large number of functions; the
> code clutter and runtime overhead would become significant.

But then maintaining the same stuff twice is also a pretty hefty job,
though not during runtime of course.

Actually I only see the special types routines so far.

> I had some visions, back when we were first doing the v1-call-convention
> stuff, that it might be possible to make a script that automatically
> interprets 
> ...
> We'd probably have to tighten the consistency of formatting a little
> to make that workable, but it seems more attractive than manually
> maintaining either two sets of code or a wrapper layer.

Maybe we could find a compromise and just move the helper functions
outside. For instance numeric_in() calls get_var_from_str(). The very same
get_var_from string is implemented in pgtypes and called by
PGTYPESnumeric_ntoa() as I called it for now. Moving this does not mean
we have to write a wrapper except for the elog() function. But I'm
willing to change all functions to return error codes that are mapped to
elog() calls. Doesn't look like a big deal IMO.

And this has next to no runtime performance penalty since it is only
used in case of an error, which stops computation anyway.

> But before you get too excited about that, there's also the
> error-handling issue --- and I'm definitely not interested in changing
> all the subroutines away from elog to funny-return-value conventions.

Since I already did this for numeric, it's no big deal. Okay, granted,
it would need a few more error codes, but frankly I'd guess this takes
less than five minutes.

I already have to manually sync code (preproc.y <=> gram.y) and don't
like the idea of having to do it with a lot more code.

Michael
-- 
Michael Meskes
Email: Michael@Fam-Meskes.De
ICQ: 179140304
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!


Re: Faster NUMERIC implementation

From
Tom Lane
Date:
[ very off topic ]

Michael Meskes <meskes@postgresql.org> writes:
> I already have to manually sync code (preproc.y <=> gram.y) and don't
> like the idea of having to do it with a lot more code.

I've been wondering for quite awhile if we couldn't find a way to avoid
manually duplicating the backend grammar in preproc.y.  Seems like a
script could handle 99% of the conversion, generating the very stylized
actions you need.
        regards, tom lane