Re: MemSet inline for newNode - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: MemSet inline for newNode |
Date | |
Msg-id | 200211111606.gABG6eS29372@candle.pha.pa.us Whole thread Raw |
In response to | Re: MemSet inline for newNode (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: MemSet inline for newNode
|
List | pgsql-patches |
OK, here is a newer version of the macro. I tested it on catalog/index.c and saw the optimizer remove the MemSetTest conditional for the newNode/palloc0 call in that file: pushl $1 <--------- pushl $108 movl CurrentMemoryContext,%eax pushl %eax call MemoryContextAllocZero I still test the palloc pointer alignment in MemSetLoop because that is not a constant. I just ran a test and now see a 7% difference for the following test: j = 32; for (i = 0; i < 1000000; i++) MemSet(buf, 0, j /* choose 32 or j here */); return 0; It used to be a 14% difference. However, this looks more like an optimizer problem than actual coding and with the new macros, MemSet will be called with a constant third parameter anyway in almost all cases, including newNode and the merge of palloc/MemSet into palloc0 patch. --------------------------------------------------------------------------- Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > >> But your patch as committed does NOT inline newNode, in fact it does the > > >> exact opposite: the MemSet macro is removed from the callsite. > > > > > Yes, there were actually two patches; the first inlined newNode by > > > calling palloc0. Without palloc, there was no way to inline newNode. > > > > But you have *not* inlined newNode: the memset is still done at a location > > that cannot know the size at compile time. > > I am sorry. I inlined newNode, but by pushing MemSet into palloc0, the > MemSet conditional tests can not be optimized away. Is that clearer? > > #define newNode(size, tag) \ > ( \ > AssertMacro((size) >= sizeof(Node)), /* need the tag, at least */ \ > \ > newNodeMacroHolder = (Node *) palloc0(size), \ > newNodeMacroHolder->type = (tag), \ > newNodeMacroHolder \ > ) > > In this case, I needed palloc0 to make newNode inlined, and the old > function version of newNode couldn't optimize away the conditional test > either because it was called with various sizes. > > > > > The second was a more general one to merge palloc/MemSet into a single > > > palloc0 call. That has been backed out while I research it. > > > > That part could be sold just on the basis of making the code easier > > to read and less error-prone; particularly for call sites where the > > length is a runtime computation anyway. I think that newNode() is the > > Yes, that was my thought. > > > principal case where the length is knowable at compile time. So my > > feeling is that the general change to invent a palloc0() call is fine > > ... but if you want performance then newNode() should *not* be using the > > generic palloc0() call. > > OK. I am still struggling about how to do that. > > > > My new idea is to add a third boolean argument to > > > MemoryContextAllocZero() which will control whether the MemSet > > > assignment loop is used, or memset(). > > > > But we are *trying to eliminate any runtime test whatever*. Short > > of that, you aren't going to get the speedup. Certainly passing a third > > argument to the alloc subroutine will eat enough cycles to negate any > > savings from simplifying the runtime test slightly. > > My idea is that the conditional tests will be optimized away before the > palloc0 call is made. > > Attached is a patch that implements the MemSet split and allows the size > tests to be optimized away _before_ at the call location. It isn't done > yet, but you can get the idea. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: src/backend/utils/mmgr/mcxt.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/utils/mmgr/mcxt.c,v retrieving revision 1.35 diff -c -c -r1.35 mcxt.c *** src/backend/utils/mmgr/mcxt.c 10 Nov 2002 02:17:25 -0000 1.35 --- src/backend/utils/mmgr/mcxt.c 11 Nov 2002 15:56:09 -0000 *************** *** 460,466 **** * function is called too many times, so we have a separate version. */ void * ! MemoryContextAllocZero(MemoryContext context, Size size) { void *ret; --- 460,466 ---- * function is called too many times, so we have a separate version. */ void * ! MemoryContextAllocZero(MemoryContext context, Size size, bool use_loop) { void *ret; *************** *** 471,477 **** (unsigned long) size); ret = (*context->methods->alloc) (context, size); ! MemSet(ret, 0, size); return ret; } --- 471,477 ---- (unsigned long) size); ret = (*context->methods->alloc) (context, size); ! MemSetLoop(ret, 0, size, use_loop); return ret; } Index: src/include/c.h =================================================================== RCS file: /cvsroot/pgsql-server/src/include/c.h,v retrieving revision 1.131 diff -c -c -r1.131 c.h *** src/include/c.h 11 Nov 2002 03:02:19 -0000 1.131 --- src/include/c.h 11 Nov 2002 15:56:12 -0000 *************** *** 577,601 **** * memset() functions. More research needs to be done, perhaps with * platform-specific MEMSET_LOOP_LIMIT values or tests in configure. * * bjm 2002-10-08 */ ! #define MemSet(start, val, len) \ do \ { \ ! int32 * _start = (int32 *) (start); \ ! int _val = (val); \ ! Size _len = (len); \ ! \ ! if ((( ((long) _start) | _len) & INT_ALIGN_MASK) == 0 && \ ! _val == 0 && \ ! _len <= MEMSET_LOOP_LIMIT) \ { \ ! int32 * _stop = (int32 *) ((char *) _start + _len); \ while (_start < _stop) \ *_start++ = 0; \ } \ else \ ! memset((char *) _start, _val, _len); \ } while (0) #define MEMSET_LOOP_LIMIT 1024 --- 577,613 ---- * memset() functions. More research needs to be done, perhaps with * platform-specific MEMSET_LOOP_LIMIT values or tests in configure. * + * MemSet has been split into two parts so MemSetTest can be optimized + * away for constant 'val' and 'len'. This is used by palloc0(). + * + * Note, arguments are evaluated more than once. + * * bjm 2002-10-08 */ ! #define MemSetTest(val, len) \ ! ( ((len) & INT_ALIGN_MASK) == 0 && \ ! (len) <= MEMSET_LOOP_LIMIT && \ ! (val) == 0 ) ! ! #define MemSetLoop(start, val, len, use_loop) \ do \ { \ ! if (use_loop && ((long)(start) & INT_ALIGN_MASK) == 0 ) \ { \ ! int32 * _start = (int32 *) (start); \ ! int32 * _stop = (int32 *) ((char *) _start + (len)); \ ! \ while (_start < _stop) \ *_start++ = 0; \ } \ else \ ! memset((char *)(start), (val), (len)); \ ! } while (0) ! ! #define MemSet(start, val, len) \ ! do \ ! { \ ! MemSetLoop(start, val, len, MemSetTest(val, len)); \ } while (0) #define MEMSET_LOOP_LIMIT 1024 Index: src/include/utils/palloc.h =================================================================== RCS file: /cvsroot/pgsql-server/src/include/utils/palloc.h,v retrieving revision 1.22 diff -c -c -r1.22 palloc.h *** src/include/utils/palloc.h 10 Nov 2002 02:17:25 -0000 1.22 --- src/include/utils/palloc.h 11 Nov 2002 15:56:14 -0000 *************** *** 46,56 **** * Fundamental memory-allocation operations (more are in utils/memutils.h) */ extern void *MemoryContextAlloc(MemoryContext context, Size size); ! extern void *MemoryContextAllocZero(MemoryContext context, Size size); #define palloc(sz) MemoryContextAlloc(CurrentMemoryContext, (sz)) ! #define palloc0(sz) MemoryContextAllocZero(CurrentMemoryContext, (sz)) extern void pfree(void *pointer); --- 46,57 ---- * Fundamental memory-allocation operations (more are in utils/memutils.h) */ extern void *MemoryContextAlloc(MemoryContext context, Size size); ! extern void *MemoryContextAllocZero(MemoryContext context, Size size, bool use_loop); #define palloc(sz) MemoryContextAlloc(CurrentMemoryContext, (sz)) ! #define palloc0(sz) MemoryContextAllocZero(CurrentMemoryContext, (sz), \ ! MemSetTest(0, (sz))) extern void pfree(void *pointer);
pgsql-patches by date: