Re: Inlining comparators as a performance optimisation - Mailing list pgsql-hackers
From | Pierre C |
---|---|
Subject | Re: Inlining comparators as a performance optimisation |
Date | |
Msg-id | op.v70n7uhgeorkce@apollo13 Whole thread Raw |
In response to | Re: Inlining comparators as a performance optimisation (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Inlining comparators as a performance optimisation
|
List | pgsql-hackers |
On Wed, 21 Sep 2011 18:13:07 +0200, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> On 21.09.2011 18:46, Tom Lane wrote: >>> The idea that I was toying with was to allow the regular SQL-callable >>> comparison function to somehow return a function pointer to the >>> alternate comparison function, > >> You could have a new function with a pg_proc entry, that just returns a >> function pointer to the qsort-callback. > > Yeah, possibly. That would be a much more invasive change, but cleaner > in some sense. I'm not really prepared to do all the legwork involved > in that just to get to a performance-testable patch though. A few years ago I had looked for a way to speed up COPY operations, and it turned out that COPY TO has a good optimization opportunity. At that time, for each datum, COPY TO would : - test for nullness - call an outfunc through fmgr - outfunc pallocs() a bytea or text, fills it with data, and returns it (sometimes it uses an extensible string buffer which may be repalloc()d several times) - COPY memcpy()s returned data to a buffer and eventually flushes the buffer to client socket. I introduced a special write buffer with an on-flush callback (ie, a close relative of the existing string-buffer), in this case the callback was "flush to client socket", and several outfuncs (one per type) which took that buffer as argument, besides the datum to output, and simply put the datum inside the buffer, with appropriate transformations (like converting to bytea or text), and flushed if needed. Then the COPY TO BINARY of a constant-size datum would turn to : - one test for nullness - one C function call - one test to ensure appropriate space available in buffer (flush if needed) - one htonl() and memcpy of constant size, which the compiler turns out into a couple of simple instructions I recall measuring speedups of 2x - 8x on COPY BINARY, less for text, but still large gains. Although eliminating fmgr call and palloc overhead was an important part of it, another large part was getting rid of memcpy()'s which the compiler turned into simple movs for known-size types, a transformation that can be done only if the buffer write functions are inlined inside the outfuncs. Compilers love constants... Additionnally, code size growth was minimal since I moved the old outfuncs code into the new outfuncs, and replaced the old fmgr-callable outfuncs with "create buffer with on-full callback=extend_and_repalloc() - pass to new outfunc(buffer,datum) - return buffer". Which is basically equivalent to the previous palloc()-based code, maybe with a few extra instructions. When I submitted the patch for review, Tom rightfully pointed out that my way of obtaining the C function pointer sucked very badly (I don't remember how I did it, only that it was butt-ugly) but the idea was to get a quick measurement of what could be gained, and the result was positive. Unfortunately I had no time available to finish it and make it into a real patch, I'm sorry about that. So why do I post in this sorting topic ? It seems, by bypassing fmgr for functions which are small, simple, and called lots of times, there is a large gain to be made, not only because of fmgr overhead but also because of the opportunity for new compiler optimizations, palloc removal, etc. However, in my experiment the arguments and return types of the new functions were DIFFERENT from the old functions : the new ones do the same thing, but in a different manner. One manner was suited to sql-callable functions (ie, palloc and return a bytea) and another one to writing large amounts of data (direct buffer write). Since both have very different requirements, being fast at both is impossible for the same function. Anyway, all that rant boils down to : Some functions could benefit having two versions (while sharing almost all the code between them) : - User-callable (fmgr) version (current one) - C-callable version, usually with different parameters and return type And it would be cool to have a way to grab a bare function pointer on the second one. Maybe an extra column in pg_proc would do (but then, the proargtypes and friends would describe only the sql-callable version) ? Or an extra table ? pg_cproc ? Or an in-memory hash : hashtable[ fmgr-callable function ] => C version - What happens if a C function has no SQL-callable equivalent ? Or (ugly) introduce an extra per-type function type_get_function_ptr( function_kind ) which returns the requested function ptr If one of those happens, I'll dust off my old copy-optimization patch ;) Hmm... just my 2c Regards Pierre
pgsql-hackers by date: