Thread: Is it really such a good thing for newNode() to be a macro?

Is it really such a good thing for newNode() to be a macro?

From
Tom Lane
Date:
I happened to be looking at nodes.h and started wondering just how
sane this coding really is:

extern PGDLLIMPORT Node *newNodeMacroHolder;

#define newNode(size, tag) \
( \   AssertMacro((size) >= sizeof(Node)),        /* need the tag, at least */ \   newNodeMacroHolder = (Node *)
palloc0fast(size),\   newNodeMacroHolder->type = (tag), \   newNodeMacroHolder \
 
)

Given that we're calling palloc, it's not clear that saving one level of
function call is really buying much; and what it's costing us is a store
to a global variable that the compiler has no way to optimize away.
On a lot of platforms, accessing global variables isn't especially
cheap.  Also, considering that palloc0fast is a nontrivial macro, and
that there are a LOT of uses of newNode(), we're paying rather a lot of
code space for a pretty dubious savings.

So I'm tempted to get rid of this and just make newNode() an out-of-line
function.

Thoughts?
        regards, tom lane


Re: Is it really such a good thing for newNode() to be a macro?

From
"Stephen R. van den Berg"
Date:
Tom Lane wrote:
>I happened to be looking at nodes.h and started wondering just how
>sane this coding really is:

>extern PGDLLIMPORT Node *newNodeMacroHolder;

>#define newNode(size, tag) \
>( \
>    AssertMacro((size) >= sizeof(Node)),        /* need the tag, at least */ \
>    newNodeMacroHolder = (Node *) palloc0fast(size), \
>    newNodeMacroHolder->type = (tag), \
>    newNodeMacroHolder \
>)

>Given that we're calling palloc, it's not clear that saving one level of
>function call is really buying much; and what it's costing us is a store
>to a global variable that the compiler has no way to optimize away.
>On a lot of platforms, accessing global variables isn't especially
>cheap.  Also, considering that palloc0fast is a nontrivial macro, and
>that there are a LOT of uses of newNode(), we're paying rather a lot of
>code space for a pretty dubious savings.

Correct analysis, I'd say.

>So I'm tempted to get rid of this and just make newNode() an out-of-line
>function.

Getting rid of the global variable is imperative.
However, for the rest you'd have two alternate options (besides making
it a normal function):

a. Use macros like:
 #define makeNode(_type_,_variable_)   \     newNode(sizeof(_type_), T_##_type_, _variable_) #define newNode(size, tag,
variable)             \   do {                          \   Node * newNodeMacroHolder;                  \
AssertMacro((size)>= sizeof(Node));        /* need the tag, at least */ \   newNodeMacroHolder = (Node *)
palloc0fast(size); \   newNodeMacroHolder->type = (tag);              \   _variable_ = newNodeMacroHolder;
\  } while(0)
 

b. Create a function newNode() which is declared as inline, which  basically gives you the same code as under (a).
-- 
Sincerely,          Stephen R. van den Berg.

"Good moaning!"


Re: Is it really such a good thing for newNode() to be a macro?

From
Tom Lane
Date:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
> b. Create a function newNode() which is declared as inline, which
>    basically gives you the same code as under (a).

I considered that one, but since part of my argument is that inlining
this is a waste of code space, it seems like a better inlining
technology isn't really the answer.

The other two alternatives would force notational changes on all the
callers, which doesn't seem appealing (there are close to 1400 calls
of makeNode() in the backend...)
        regards, tom lane


Re: Is it really such a good thing for newNode() to be a macro?

From
"Heikki Linnakangas"
Date:
Tom Lane wrote:
> I happened to be looking at nodes.h and started wondering just how
> sane this coding really is:
> 
> extern PGDLLIMPORT Node *newNodeMacroHolder;
> 
> #define newNode(size, tag) \
> ( \
>     AssertMacro((size) >= sizeof(Node)),        /* need the tag, at least */ \
>     newNodeMacroHolder = (Node *) palloc0fast(size), \
>     newNodeMacroHolder->type = (tag), \
>     newNodeMacroHolder \
> )
> 
> Given that we're calling palloc, it's not clear that saving one level of
> function call is really buying much; and what it's costing us is a store
> to a global variable that the compiler has no way to optimize away.
> On a lot of platforms, accessing global variables isn't especially
> cheap.  Also, considering that palloc0fast is a nontrivial macro, and
> that there are a LOT of uses of newNode(), we're paying rather a lot of
> code space for a pretty dubious savings.

Note that the MemSetLoop macro used in palloc0fast is supposed to be 
evaluated at compile time, so the code space taken by that macro isn't 
that big. Turning newNode into function would force it to be evaluated 
at run-time instead.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Is it really such a good thing for newNode() to be a macro?

From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Tom Lane wrote:
>> I happened to be looking at nodes.h and started wondering just how
>> sane this coding really is:

> Note that the MemSetLoop macro used in palloc0fast is supposed to be 
> evaluated at compile time,

Oooh, good point, I had forgotten about that little detail.  Yeah,
we'll lose that optimization if we move the code out-of-line.

So I guess a fallback position is #if __gcc__ use a "static inline"
function, else the existing code.  That would at least let us get
rid of the global-variable assignment in gcc-based builds.
        regards, tom lane


Re: Is it really such a good thing for newNode() to be a macro?

From
"Heikki Linnakangas"
Date:
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> Note that the MemSetLoop macro used in palloc0fast is supposed to be 
>> evaluated at compile time,
> 
> Oooh, good point, I had forgotten about that little detail.  Yeah,
> we'll lose that optimization if we move the code out-of-line.

Well, we could still have the MemSetTest outside the function, and 
evaluated at compile-time, if we provided an aligned and unaligned 
version of newNode:

#define newNode(size, tag) \
( \AssertMacro((size) >= sizeof(Node)),        /* need the tag, at least */ \( MemSetTest(0, sz) ? \
newNodeAligned(CurrentMemoryContext,sz, tag) : \    newNodeNotAligned(CurrentMemoryContext, sz, tag))
 

And if you're worried about the function call overhead, 
newNode(Not)Aligned could have the contents of MemoryContextAllocZero 
inlined into it. Not sure how much optimization is worthwhile here..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Is it really such a good thing for newNode() to be a macro?

From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Well, we could still have the MemSetTest outside the function, and 
> evaluated at compile-time, if we provided an aligned and unaligned 
> version of newNode:

Yeah, that should work fine, since we expect MemSetTest to reduce to
a compile-time constant.  I'll have a go at this later ... unless
you want to do it?

I suppose it could cause funny behavior due to the size being
evaluated more than once, but the existing macro already does that
when Asserts are on, so in practice I don't see an objection there.
        regards, tom lane


Re: Is it really such a good thing for newNode() to be a macro?

From
"Heikki Linnakangas"
Date:
Tom Lane wrote:
> I'll have a go at this later ... unless you want to do it?

Nah, I don't care enough.

BTW, it would be nice to have a test case that shows it's worth messing 
with that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Is it really such a good thing for newNode() to be a macro?

From
Peter Eisentraut
Date:
Tom Lane wrote:
> I considered that one, but since part of my argument is that inlining
> this is a waste of code space, it seems like a better inlining
> technology isn't really the answer.

The compiler presumably has the intelligence and the command-line options to 
control how much inlining one wants to do.  But without any size vs. 
performance measurements it is an idle discussion.  Getting rid of a global 
variable and macro ugliness is a worthwhile goal of its own.


Re: Is it really such a good thing for newNode() to be a macro?

From
"Greg Stark"
Date:
On Wed, Aug 27, 2008 at 4:05 PM, Heikki Linnakangas
<heikki@enterprisedb.com> wrote:
> Tom Lane wrote:
> BTW, it would be nice to have a test case that shows it's worth messing with
> that.

At a guess the parser would be a good place to look. Perhaps a
benchmark of a parsing a very large query?

One thing I wonder is if you (or someone else) gets around to doing
the simple stack allocator for the parser if it will make
microoptimizations like this more visible.

-- 
greg


Re: Is it really such a good thing for newNode() to be a macro?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> I considered that one, but since part of my argument is that inlining
>> this is a waste of code space, it seems like a better inlining
>> technology isn't really the answer.

> The compiler presumably has the intelligence and the command-line options to 
> control how much inlining one wants to do.  But without any size vs. 
> performance measurements it is an idle discussion.  Getting rid of a global 
> variable and macro ugliness is a worthwhile goal of its own.

I got around to doing some experiments.  The method suggested by Heikki
(out-of-line subroutine for everything except the MemSetTest) reduces
the size of the backend executable by about 0.5% (about 20K) in CVS HEAD
on Fedora 9 x86_64, in a non-assert-enabled build.  However it also
makes it measurably slower.  I couldn't detect any difference in a
regular pgbench run, so instead I timed iterations of this:

explain select * from  tenk1 a join tenk1 b using(unique1) join tenk1 c on a.unique1 = c.unique2 join tenk1 d on
a.unique1= d.thousand join tenk1 e on a.unique1 = e.ten join tenk1 f on a.unique1 = f.tenthous join tenk1 g on
a.unique1= g.unique2
 
where exists(select 1 from int4_tbl where f1 = b.unique2);

in the regression database.  Put the above (as a single line!) into
"explainjoin.sql" and try

pgbench -c 1 -t 100 -n -f explainjoin.sql regression

This is mostly stressing the planner, which is pretty newNode-heavy.
I get consistently about 14.1 tps on straight CVS HEAD and about 13.8
with the partially out-of-line implementation.

I also tried the "static inline" implementation, but that doesn't work
at all: gcc refuses to inline it, which makes the palloc0fast call a
dead loss.

So indeed what we need here is a better inlining technology.  I looked
into using gcc's "a compound statement enclosed in parentheses is an
expression" extension, thus:

#define newNode(size, tag) \
({  Node *newNodeMacroHolder; \   AssertMacro((size) >= sizeof(Node));        /* need the tag, at least */ \
newNodeMacroHolder= (Node *) palloc0fast(size); \   newNodeMacroHolder->type = (tag); \   newNodeMacroHolder; \
 
})

This gets rid of the global, but incredibly, it's even slower: 13.5 tps
on the explain test.  I do not understand that result.  I looked at the
generated machine code to verify that it was what I expected, and indeed
it's about the same as CVS HEAD except that there's no store-and-fetch
into a global.

Getting rid of the global variable accesses reduces the size of the
backend by about 12K on this architecture, and the only theory I can
think of is that that moves things around enough to make the instruction
cache less efficient on some code path that this test happens to
exercise heavily.

In theory the above implementation of newNode should be a clear win,
so I'm thinking this result must be an artifact of some kind.  I'm
going to go try it on PPC and HPPA machines next; does anyone want to
try it on something else?
        regards, tom lane


Re: Is it really such a good thing for newNode() to be a macro?

From
Tom Lane
Date:
I wrote:
> In theory the above implementation of newNode should be a clear win,
> so I'm thinking this result must be an artifact of some kind.  I'm
> going to go try it on PPC and HPPA machines next; does anyone want to
> try it on something else?

Repeating the explain test on several machines, I get:

HPPA/HPUX: gcc-specific macro is about 1.5% faster than CVS HEAD
PPC/Darwin: gcc-specific macro is about 1% faster
x86/Darwin: seems to be a dead heat

So it seems that getting rid of the global doesn't really help on
current Intel hardware, but it does help on other architectures.
I'm pretty well convinced that the slowdown on my x86_64 machine is
an artifact that will disappear as soon as anybody changes the backend
code materially.

Accordingly, I'm going to go ahead with this:

#ifdef __GNUC__

/* With GCC, we can use a compound statement within an expression */
#define newNode(size, tag) \
({    Node   *__result__; \AssertMacro((size) >= sizeof(Node));        /* need the tag, at least */ \__result__ = (Node
*)palloc0fast(size); \__result__->type = (tag); \__result__; \
 
})

#else

old form of macro here

#endif   /* __GNUC__ */

I'm not entirely sure whether ICC will take this, but the buildfarm
will tell us soon enough.

It's tempting to consider whether we should use this construct in place
of "static inline" functions elsewhere, such as list_length().
        regards, tom lane


Re: Is it really such a good thing for newNode() to be a macro?

From
"Alex Hunsaker"
Date:
On Fri, Aug 29, 2008 at 1:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In theory the above implementation of newNode should be a clear win,
> so I'm thinking this result must be an artifact of some kind.  I'm
> going to go try it on PPC and HPPA machines next; does anyone want to
> try it on something else?

Hrm, I tried it on my x86_64 (quad core 2.66ghz, sorry no exotic
machines here :)) and did not see any real noticeable difference
between the two...

Here is what I tried:
(all with ./configure --enable-debug and make clean in between)

CVS HEAD:
tps = 30.375794
tps = 31.138078
tps = 30.928565

#define newNode(size, tag) \
({  Node *newNodeMacroHolder; \  AssertMacro((size) >= sizeof(Node));        /* need the tag, at least */ \
newNodeMacroHolder= (Node *) palloc0fast(size); \  newNodeMacroHolder->type = (tag); \  newNodeMacroHolder; \
 
})

tps = 30.814628
tps = 30.706080
tps = 31.10788

static inline Node *newNode(Size size, NodeTag tag)
{Node *newNode;Assert(size >= sizeof(Node));newNode = (Node *) palloc0(size);newNode->type = tag;return newNode;
}

tps = 30.317978
tps = 30.786187
tps = 30.747112


Re: Is it really such a good thing for newNode() to be a macro?

From
Tom Lane
Date:
"Alex Hunsaker" <badalex@gmail.com> writes:
> Hrm, I tried it on my x86_64 (quad core 2.66ghz, sorry no exotic
> machines here :)) and did not see any real noticeable difference
> between the two...

Thanks.  That confirms my feeling that I was seeing some weird artifact
on my own x86_64 box.
        regards, tom lane


Re: Is it really such a good thing for newNode() to be a macro?

From
"Stephen R. van den Berg"
Date:
Tom Lane wrote:
>Accordingly, I'm going to go ahead with this:

>#ifdef __GNUC__

>/* With GCC, we can use a compound statement within an expression */
>#define newNode(size, tag) \
>({    Node   *__result__; \

Please avoid identifiers starting with __ .
ANSI reserves those for the implementation (the compiler and lib).
Either use the old global name or something like newMode_result.
-- 
Sincerely,          Stephen R. van den Berg.
Several ways to kill a programmer:  kill -15, fair trial; kill -1, death by
hanging; kill -2, suicide; kill -3, euthanasia; kill -9, execution.


Re: Is it really such a good thing for newNode() to be a macro?

From
Tom Lane
Date:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
> Tom Lane wrote:
>> ({    Node   *__result__; \

> Please avoid identifiers starting with __ .

Yeah, I had misremembered which way that rule went.  It's "_result"
as committed.
        regards, tom lane