Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> But your patch as committed does NOT inline newNode, in fact it does the
>> exact opposite: the MemSet macro is removed from the callsite.
> Yes, there were actually two patches; the first inlined newNode by
> calling palloc0. Without palloc, there was no way to inline newNode.
But you have *not* inlined newNode: the memset is still done at a location
that cannot know the size at compile time.
> The second was a more general one to merge palloc/MemSet into a single
> palloc0 call. That has been backed out while I research it.
That part could be sold just on the basis of making the code easier
to read and less error-prone; particularly for call sites where the
length is a runtime computation anyway. I think that newNode() is the
principal case where the length is knowable at compile time. So my
feeling is that the general change to invent a palloc0() call is fine
... but if you want performance then newNode() should *not* be using the
generic palloc0() call.
> My new idea is to add a third boolean argument to
> MemoryContextAllocZero() which will control whether the MemSet
> assignment loop is used, or memset().
But we are *trying to eliminate any runtime test whatever*. Short
of that, you aren't going to get the speedup. Certainly passing a third
argument to the alloc subroutine will eat enough cycles to negate any
savings from simplifying the runtime test slightly.
regards, tom lane