Thread: Faster NUMERIC implementation
I've been amusing myself the last several evenings by working on a reimplementation of the NUMERIC datatype, along the lines of previous discussion (use base-10000 digits instead of base-10 so that the number of iterations of the inner loops decreases by a factor of about 4). It's not ready to commit yet, but I've got it passing the regression tests, and I find that it runs the 'numeric' test about a factor of five faster than CVS tip; so it seems worth doing. A couple questions for the group: 1. Has anyone got a problem with changing the on-disk representation of NUMERIC for 7.4? The only objection I can think of is that it'd prevent "pg_upgrade" from working ... but we don't have pg_upgrade capability right now anyway, and I've not heard that anyone is planning to make it happen for 7.4. 2. The numeric regression test probably isn't a good benchmark for this, since it spends most of its time pushing around numerics with hundreds of digits. I doubt that's representative of common usage. Can anyone offer a more real-world benchmark test? regards, tom lane
Sounds great! --------------------------------------------------------------------------- Tom Lane wrote: > I've been amusing myself the last several evenings by working on a > reimplementation of the NUMERIC datatype, along the lines of previous > discussion (use base-10000 digits instead of base-10 so that the number > of iterations of the inner loops decreases by a factor of about 4). > > It's not ready to commit yet, but I've got it passing the regression > tests, and I find that it runs the 'numeric' test about a factor of five > faster than CVS tip; so it seems worth doing. A couple questions for > the group: > > 1. Has anyone got a problem with changing the on-disk representation of > NUMERIC for 7.4? The only objection I can think of is that it'd prevent > "pg_upgrade" from working ... but we don't have pg_upgrade capability > right now anyway, and I've not heard that anyone is planning to make it > happen for 7.4. > > 2. The numeric regression test probably isn't a good benchmark for this, > since it spends most of its time pushing around numerics with hundreds of > digits. I doubt that's representative of common usage. Can anyone > offer a more real-world benchmark test? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Wed, Mar 19, 2003 at 10:51:32PM -0500, Tom Lane wrote: > I've been amusing myself the last several evenings by working on a > reimplementation of the NUMERIC datatype, along the lines of previous > discussion (use base-10000 digits instead of base-10 so that the number > of iterations of the inner loops decreases by a factor of about 4). > ... Tom, I do like it since I really believe it will be faster. But I wonder if we could arrange things so the Numeric stuff wents out of the backend. As you surely noticed I created a pgtypes lib to make our special types available to the outside world. So far Numeric is the only one, but we are working on date (partly finished) and timestamp. I think it shouldn't be too difficult to make the backend call the functions inside the dynamic library instead of keeping it inside. That way we would have to code these functions only once and no double work is required. Of course the lib should then move out of interfaces, but ecpg could still call it. Any comments? Michael -- Michael Meskes Email: Michael@Fam-Meskes.De ICQ: 179140304 Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
Michael Meskes <meskes@postgresql.org> writes: > But I wonder if we could arrange things so the Numeric stuff wents out > of the backend. With suitable #define hacking you could perhaps take care of the code's dependencies on palloc/pfree ... but elog is harder, and I don't see any realistic way to handle the backend's function-call conventions as opposed to conventions that would make sense as a library API. I don't want to clutter the code by having to support two sets of error conventions and two APIs. If you can figure a way around that, great... regards, tom lane
On Thu, Mar 20, 2003 at 09:49:30AM -0500, Tom Lane wrote: > With suitable #define hacking you could perhaps take care of the code's > dependencies on palloc/pfree ... but elog is harder, and I don't see any > realistic way to handle the backend's function-call conventions as > opposed to conventions that would make sense as a library API. > > I don't want to clutter the code by having to support two sets of error > conventions and two APIs. If you can figure a way around that, great... How about some wrapper frunctions in the backend that just call their helper functions in the lib? Let's be honest maintaining all this code twice will be very hard to do too. I'd prefer looking for a way to integrate things. I have no problem with special backend syntax for some functions. It's not that the API has to be identical. We could have an open API and a backend API calling the same functions. Michael -- Michael Meskes Email: Michael@Fam-Meskes.De ICQ: 179140304 Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
Michael Meskes <meskes@postgresql.org> writes: > How about some wrapper frunctions in the backend that just call their > helper functions in the lib? I'm not willing to do that for any very large number of functions; the code clutter and runtime overhead would become significant. I had some visions, back when we were first doing the v1-call-convention stuff, that it might be possible to make a script that automatically interprets Datum numeric_add(PG_FUNCTION_ARGS) {Numeric num1 = PG_GETARG_NUMERIC(0);Numeric num2 = PG_GETARG_NUMERIC(1); ... PG_RETURN_NUMERIC(res); } and generates a derived version like Numeric numeric_add(Numeric num1, Numeric num2) {... return res; } We'd probably have to tighten the consistency of formatting a little to make that workable, but it seems more attractive than manually maintaining either two sets of code or a wrapper layer. But before you get too excited about that, there's also the error-handling issue --- and I'm definitely not interested in changing all the subroutines away from elog to funny-return-value conventions. regards, tom lane
On Thu, Mar 20, 2003 at 11:48:33AM -0500, Tom Lane wrote: > I'm not willing to do that for any very large number of functions; the > code clutter and runtime overhead would become significant. But then maintaining the same stuff twice is also a pretty hefty job, though not during runtime of course. Actually I only see the special types routines so far. > I had some visions, back when we were first doing the v1-call-convention > stuff, that it might be possible to make a script that automatically > interprets > ... > We'd probably have to tighten the consistency of formatting a little > to make that workable, but it seems more attractive than manually > maintaining either two sets of code or a wrapper layer. Maybe we could find a compromise and just move the helper functions outside. For instance numeric_in() calls get_var_from_str(). The very same get_var_from string is implemented in pgtypes and called by PGTYPESnumeric_ntoa() as I called it for now. Moving this does not mean we have to write a wrapper except for the elog() function. But I'm willing to change all functions to return error codes that are mapped to elog() calls. Doesn't look like a big deal IMO. And this has next to no runtime performance penalty since it is only used in case of an error, which stops computation anyway. > But before you get too excited about that, there's also the > error-handling issue --- and I'm definitely not interested in changing > all the subroutines away from elog to funny-return-value conventions. Since I already did this for numeric, it's no big deal. Okay, granted, it would need a few more error codes, but frankly I'd guess this takes less than five minutes. I already have to manually sync code (preproc.y <=> gram.y) and don't like the idea of having to do it with a lot more code. Michael -- Michael Meskes Email: Michael@Fam-Meskes.De ICQ: 179140304 Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
[ very off topic ] Michael Meskes <meskes@postgresql.org> writes: > I already have to manually sync code (preproc.y <=> gram.y) and don't > like the idea of having to do it with a lot more code. I've been wondering for quite awhile if we couldn't find a way to avoid manually duplicating the backend grammar in preproc.y. Seems like a script could handle 99% of the conversion, generating the very stylized actions you need. regards, tom lane