Thread: inlining

inlining

From
Bruce Momjian
Date:
Here is a list of usenet articles about inlining that just appeared in
comp.compilers.


--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
Path:
readme1.op.net!op.net!cezanne.op.net!op.net!newsfeed.direct.ca!news-peer.sprintlink.net!news-backup-west.sprintlink.net!news.sprintlink.net!208.31.42.33!iecc.com!iecc.com!not-for-mail
From: Mark Sanvitale <sanvitam@std.teradyne.com>
Newsgroups: comp.compilers
Subject: why not inline all functions?
Date: 9 Jun 1998 12:03:34 -0400
Organization: Teradyne, Incorporated
Lines: 57
Sender: johnl@iecc.com
Approved: compilers@ivan.iecc.com
Message-ID: <98-06-032@comp.compilers>
NNTP-Posting-Host: ivan.iecc.com
Keywords: performance
Xref: readme1.op.net comp.compilers:5583

Functions are great for making written code (C, C++, etc.) mode
readable and structured, however, they do not seem to make much sense
when you get down to the raw machine code which actually is executed
by a processor.

As far as my understanding of the matter goes, the most basic way to
slow down a processor is to make it execute an instruction besides the
one immediately following the current instruction, thus, why not make
a compiler which turns every function into an inline function?  This
would save you the overhead inherent in a traditional function call
(push everything defining the current state of the processor on the
stack, make fresh copies of the parameters for the function, and,
afterwards, pop things off the stack to return the processor to the
pre-function state, not to mention losing the chance to take advantage
of any instruction prefetching the processor might do).

The output of such a compiler would be larger binary files (since
every call to a function would expand to the entire function body)
however the execution time for such a program should be improved
(relative to a non-inlining compiler) by a factor proportional to the
number of function calls in the program.

Now, a "inline everything" scheme might run into some roadblocks when
it comes to external functions which are resolved at link time and the
notion of dynamic linking is not compatible with such a method.
Still, I think compilers should try to inline every function it can
without depending on the programmer to specify a function as "inline"
(C++).

Perhaps compilers already take advantage of the idea I have outlined or
perhaps there are some problems with the idea which I don't know about
(an old C++ book I have says, "Compiler limits prevent complicated
functions from being inlined," but no further explanation is given.

What do you all think?

NOTE - During my quest for a CS degree I did not have the chance to take
a compilers course (it was a tech elective which never fit in my
schedule) so if my assertion is totally pointless please just point out
the reason I am wrong and spare me the "Are you stupid/crazy/lost"
comments.  In the area of compilers I admit to being fairly ignorant (as
far as professional programmers go).

Mark S
"computer geek and movie freak"
[Well, if you ignore recursive and mutually recursive functions which
can't be in-lined, the main problem is code bloat.  Remember that since
computers have fairly small caches and finite main memory, big code can
run slower due to cache reloads and page faults, even though you aviod
relatively slow procedure calls and returns.  Aggressive optimizing
compilers (usually for Fortran) do indeed do a lot of in-lining already,
though. -John]

--
Send compilers articles to compilers@iecc.com, meta-mail to
compilers-request@iecc.com.  Archives at http://www.iecc.com/compilers

Path:
readme1.op.net!op.net!cezanne.op.net!op.net!nntp-out.monmouth.com!newspeer.monmouth.com!news-peer-east.sprintlink.net!news-peer.sprintlink.net!news-backup-west.sprintlink.net!news.sprintlink.net!208.31.42.33!iecc.com!iecc.com!not-for-mail
From: Joerg Schoen <f81@ix.urz.uni-heidelberg.de>
Newsgroups: comp.compilers
Subject: Re: why not inline all functions?
Date: 11 Jun 1998 16:11:47 -0400
Organization: University of Heidelberg, Germany
Lines: 64
Sender: johnl@iecc.com
Approved: compilers@ivan.iecc.com
Message-ID: <98-06-050@comp.compilers>
References: <98-06-032@comp.compilers>
NNTP-Posting-Host: ivan.iecc.com
Keywords: performance, practice
Xref: readme1.op.net comp.compilers:5604

Mark Sanvitale <sanvitam@std.teradyne.com> wrote:
: Functions are great for making written code (C, C++, etc.) mode
: readable and structured, however, they do not seem to make much sense
: when you get down to the raw machine code which actually is executed
: by a processor.

: As far as my understanding of the matter goes, the most basic way to
: slow down a processor is to make it execute an instruction besides the
: one immediately following the current instruction, thus, why not make
: a compiler which turns every function into an inline function?  This
: would save you the overhead inherent in a traditional function call
: (push everything defining the current state of the processor on the

In my experience "pushing the current state on the stack" refers only
to variables that reside in processor registers. If the function is
small, it will be inlined (at sufficiently high optimization level)
and no pushs are necessary. If the function is big and a function call
is done, it is more likely that some time is spent in the function and
the processors registers are used therein for performance. Thus it is
reasonable to "free" them for reusage in the function by pushing away
them.

: stack, make fresh copies of the parameters for the function, and,
: afterwards, pop things off the stack to return the processor to the
: pre-function state, not to mention losing the chance to take advantage
: of any instruction prefetching the processor might do).

: The output of such a compiler would be larger binary files (since
: every call to a function would expand to the entire function body)
: however the execution time for such a program should be improved
: (relative to a non-inlining compiler) by a factor proportional to the
: number of function calls in the program.

No, that's not true. Consider a long function or a short one with a
loop that is executed a couple of times. You then can neglect the cost
of calling the function versus the time spent in the function itself.

: Now, a "inline everything" scheme might run into some roadblocks when
: it comes to external functions which are resolved at link time and the
: notion of dynamic linking is not compatible with such a method.

I know that some compilers have an "ucode" format that is different to
the usual object file format (which is used in libraries). As far as
my understanding goes, compilers can do much more with the ucode
format in the linking stage, I think they can also do inlining.

: Still, I think compilers should try to inline every function it can
: without depending on the programmer to specify a function as "inline"
: (C++).

As our moderator pointed out, you have to consider the cost of
loading new instructions into the cache. If the function is a separate
code, it will be in the cache after the first call and probably stay
there. That improves performance compared to the case of inlined
functions that consist of separate code blocks that have all to be
loaded into the cache.

        Joerg Schoen
E-mail: Joerg.Schoen AT tc DOT pci DOT uni-heidelberg DOT de
Web-Page: http://www.pci.uni-heidelberg.de/tc/usr/joerg
--
Send compilers articles to compilers@iecc.com, meta-mail to
compilers-request@iecc.com.  Archives at http://www.iecc.com/compilers

Path:
readme1.op.net!op.net!cezanne.op.net!op.net!nntp-out.monmouth.com!newspeer.monmouth.com!news-peer-east.sprintlink.net!news-peer.sprintlink.net!news-backup-west.sprintlink.net!news.sprintlink.net!208.31.42.33!iecc.com!iecc.com!not-for-mail
From: Ben Elliston <bje@cygnus.com>
Newsgroups: comp.compilers
Subject: Re: why not inline all functions?
Date: 11 Jun 1998 16:13:27 -0400
Organization: Cygnus Solutions
Lines: 52
Sender: johnl@iecc.com
Approved: compilers@ivan.iecc.com
Message-ID: <98-06-052@comp.compilers>
References: <98-06-032@comp.compilers>
NNTP-Posting-Host: ivan.iecc.com
Keywords: optimize, practice
Xref: readme1.op.net comp.compilers:5602

Mark Sanvitale <sanvitam@std.teradyne.com> writes:

> Functions are great for making written code (C, C++, etc.) mode
> readable and structured, however, they do not seem to make much sense
> when you get down to the raw machine code which actually is executed
> by a processor.

There is still one purpose for retaining this structure in the
executable.  I'll get to this below.

> The output of such a compiler would be larger binary files (since
> every call to a function would expand to the entire function body)
> however the execution time for such a program should be improved
> (relative to a non-inlining compiler) by a factor proportional to the
> number of function calls in the program.

I remember reading somewhere that, in fact, by inlining functions like
this, the potential for optimisations is greater.  My intuition can
see why.  But code bloat would still be significant.

> Perhaps compilers already take advantage of the idea I have outlined or
> perhaps there are some problems with the idea which I don't know about
> (an old C++ book I have says, "Compiler limits prevent complicated
> functions from being inlined," but no further explanation is given.

In my opinion, the major drawback to inline functions is that they are
much harder to debug.  Suppose you have code which frequently calls:

    my_sqrt(..)

And you suspect a bug in your square root function.  If my_sqrt() were
inlined at every point that it was used, it would be impossible to set
a breakpoint at the start of my_sqrt() and to examine an invocation of
this function.  It becomes a real headache.

Furthermore, it helps a great deal to know the calling sequence at
runtime in case, say, an assertion fails and you need to know how your
function was called and what the arguments were.

---
Ben Elliston
bje@cygnus.com
[That's not a very persuasive argument.  There's no reason the debugger
can't know all the places the inline function was called and put breakpoints
on all of them.  It has to do something similar in compilers that unwind
loops already. -John]


--
Send compilers articles to compilers@iecc.com, meta-mail to
compilers-request@iecc.com.  Archives at http://www.iecc.com/compilers

Path:
readme1.op.net!op.net!cezanne.op.net!op.net!nntp-out.monmouth.com!newspeer.monmouth.com!news-peer-east.sprintlink.net!news-peer.sprintlink.net!news-backup-west.sprintlink.net!news.sprintlink.net!208.31.42.33!iecc.com!iecc.com!not-for-mail
From: Andy Ayers <ayers@incert.com>
Newsgroups: comp.compilers
Subject: Re: why not inline all functions?
Date: 11 Jun 1998 16:15:13 -0400
Organization: MIT Laboratory for Computer Science
Lines: 59
Sender: johnl@iecc.com
Approved: compilers@ivan.iecc.com
Message-ID: <98-06-055@comp.compilers>
References: <98-06-032@comp.compilers>
NNTP-Posting-Host: ivan.iecc.com
Keywords: optimize, performance
Xref: readme1.op.net comp.compilers:5603

Mark Sanvitale <sanvitam@std.teradyne.com> writes:

> ... why not make
> a compiler which turns every function into an inline function?  This
> would save you the overhead inherent in a traditional function call
> (push everything defining the current state of the processor on the
> stack, make fresh copies of the parameters for the function, and,
> afterwards, pop things off the stack to return the processor to the
> pre-function state, not to mention losing the chance to take advantage
> of any instruction prefetching the processor might do).
> ...
> Perhaps compilers already take advantage of the idea I have outlined or
> perhaps there are some problems with the idea which I don't know about
> (an old C++ book I have says, "Compiler limits prevent complicated
> functions from being inlined," but no further explanation is given.
>
> What do you all think?

While working on high-level optimization at HP, some colleagues and I
essentially pursued a similar idea. Our goal was not so much to remove
call overhead as it was to open up larger expanses of code to
aggressive intraprocedural optimization. Some of the results are
summarized in a paper "Aggressive Inlining" which we presented at PLDI
last year. HP-UX compilers from 9.x onwards are capable of aggressive,
profile-driven, cross-module inlining.

Interprocedural optimization was (and is) often hampered by the
complexity and scale of the supporting interprocedural analyses and
whole-program information is often required to get good results. Once
you've done the analysis you often end up duplicating callee code to
take advantage of some specialized calling context. Inlining does this
duplication and specialization for you without requiring much
heavyweight analysis.

We saw very good results on most codes -- benchmarks, certainly, but
also many user and production codes and some parts of the HPUX
kernel. Fortran was actually trickier than C or C++ because the formal
parameter aliasing properties are difficult to describe accurately
once a subroutine is inlined. The only program I remember us actually
"flattening" completely was the spec benchmark fpppp.

On the flip side, getting good results with inlining requires
profiling, but this is quickly becoming a must for any aggressive
optimizing compiler. The code placement tool (ala Pettis & Hanson)
needs to be inlining-aware. Code growth is not that big of a problem
in many codes. Many very large codes have relatively small dynamic hot
spots. Database codes are a notable exception. Another big downside
is that aggressive cross-module inlining builds in cross-module
dependences, so you lose the quick build-time benefits of separate
compilation. Debugging runtime failures in heavily inlined code is a
huge challenge. And finally, many user codes are not interprocedurally
clean, making it hard to inline at some sites and preserve whatever
twisted semantics were implied by the original call.

    -- Andy Ayers
--
Send compilers articles to compilers@iecc.com, meta-mail to
compilers-request@iecc.com.  Archives at http://www.iecc.com/compilers

Path:
readme1.op.net!op.net!cezanne.op.net!op.net!newsfeed.direct.ca!news-peer.sprintlink.net!news-backup-east.sprintlink.net!news.sprintlink.net!208.31.42.33!iecc.com!iecc.com!not-for-mail
From: mcdirmid@beaver.cs.washington.edu (Sean McDirmid)
Newsgroups: comp.compilers
Subject: Re: why not inline all functions?
Date: 11 Jun 1998 16:59:26 -0400
Organization: Computer Science & Engineering, U of Washington, Seattle
Lines: 46
Sender: johnl@iecc.com
Approved: compilers@ivan.iecc.com
Message-ID: <98-06-063@comp.compilers>
References: <98-06-032@comp.compilers>
NNTP-Posting-Host: ivan.iecc.com
Keywords: optimize
Xref: readme1.op.net comp.compilers:5607

Mark Sanvitale (sanvitam@std.teradyne.com) wrote:

: Now, a "inline everything" scheme might run into some roadblocks when
: it comes to external functions which are resolved at link time and the
: notion of dynamic linking is not compatible with such a method.
: Still, I think compilers should try to inline every function it can
: without depending on the programmer to specify a function as "inline"
: (C++).

Hmm, you forgot to mention recursive functions and virtual methods in
C++. In C++, sometimes you execute a method that can only be determined
at runtime. That's why, if you "inline" a method that can be invoked
virtually, the compiler will probably just compile it to a function for
external accesses.

Of course, there are tools like Vortex that are attempting to solve this
problem through extreme global analysis. See:

http://www.cs.washington.edu/research/projects/cecil/

: Perhaps compilers already take advantage of the idea I have outlined or
: perhaps there are some problems with the idea which I don't know about
: (an old C++ book I have says, "Compiler limits prevent complicated
: functions from being inlined," but no further explanation is given.

Over "inlining" leads to greater code bloat. This could adversely affect
your instruction cache (or maybe not...). Whenever you optimize here, you
might deoptimize somewhere else (isn't computer science fun!).

: What do you all think?

: NOTE - During my quest for a CS degree I did not have the chance to take
: a compilers course (it was a tech elective which never fit in my
: schedule) so if my assertion is totally pointless please just point out
: the reason I am wrong and spare me the "Are you stupid/crazy/lost"
: comments.  In the area of compilers I admit to being fairly ignorant (as
: far as professional programmers go).

A compilers course probably would not have gone into these issues. Maybe
an architecture course...

Sean
--
Send compilers articles to compilers@iecc.com, meta-mail to
compilers-request@iecc.com.  Archives at http://www.iecc.com/compilers

Path:
readme1.op.net!op.net!out2.nntp.cais.net!in1.nntp.cais.net!nntp.abs.net!news-peer-east.sprintlink.net!news-peer.sprintlink.net!news-backup-east.sprintlink.net!news.sprintlink.net!208.31.42.33!iecc.com!iecc.com!not-for-mail
From: Thomas Niemann <portland@uswest.net>
Newsgroups: comp.compilers
Subject: Re: why not inline all functions?
Date: 11 Jun 1998 17:00:10 -0400
Organization: Compilers Central
Lines: 26
Sender: johnl@iecc.com
Approved: compilers@ivan.iecc.com
Message-ID: <98-06-065@comp.compilers>
References: <98-06-032@comp.compilers>
NNTP-Posting-Host: ivan.iecc.com
Keywords: optimize, practice
Xref: readme1.op.net comp.compilers:5613

I worked on compilers for Prime and Apollo in the 80's, and implemented
procedure inlining for both.  For testing, I tried to inline
everything.  I recall one program that seemingly "hung" the compiler.
Being persistent, I let it compile overnight.  It finally did, and
executed properly.  On inspection, the call graph for the code resembled
a binary tree, and quickly resulted in a huge program that took hours to
compile.  The final released product tried to determine good candidates
for inlining (another topic).  As I recall, we didn't expand recursive
routines, though we could have inlined a few times.

Large programs not only take a long time to compile, but may incurr
excessive paging, as there may be insufficient memory to hold the
executable.  The big win obtained from inlining is due to optimization.
Inlining a procedure exposes the procedure's body to the surrounding
code, allowing for classical optimizations to take place.  One of the
benchmarks in our test suite terminated with a divide-by-zero exception
after inlining.  A matrix-multiplication function was inlined.  The
optimizer could then determine that there was no use of the calculation,
so the code was eliminated.  Thus, it took zero time to do the
calculation.  The divide-by-zero error occurred when dividing elapsed
time into a constant.  At any rate, we had qualms about reporting
benchmark figures for such a result.
--
Send compilers articles to compilers@iecc.com, meta-mail to
compilers-request@iecc.com.  Archives at http://www.iecc.com/compilers

Re: [HACKERS] inlining

From
dg@illustra.com (David Gould)
Date:
>
> Here is a list of usenet articles about inlining that just appeared in
> comp.compilers.

Good discussion and I am happy to see you post it. I follow comp.arch
regularly and there are often very interesting hints there too amid
the dross. Actually it is not a high traffic group except for the
occasional "sunspot cycle".

> optimizing compiler. The code placement tool (ala Pettis & Hanson)
> needs to be inlining-aware. Code growth is not that big of a problem
> in many codes. Many very large codes have relatively small dynamic hot
> spots. Database codes are a notable exception. Another big downside
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Database codes are the mothers heartbreak of both the compiler design and
hardware architecture communities. They blow up caches, there are never
5 instructions in a row before a branch, they whack at the whole working
set (which blows up the tlb and bus), they have poor locality so when they
miss cache you can't fix it with bandwith. Everything depends on everything
so you can't parallize at small scales. Hopeless really.

Btw, I sure wish someone would comment on the S_LOCK analysis even if only
to tell me not to make such long posts as it wastes bandwidth. Or was it just
too long to read?

-dg


David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"Don't worry about people stealing your ideas.  If your ideas are any
 good, you'll have to ram them down people's throats." -- Howard Aiken

RE: [HACKERS] inlining

From
"Stupor Genius"
Date:
> Btw, I sure wish someone would comment on the S_LOCK analysis even if only
> to tell me not to make such long posts as it wastes bandwidth. Or
> was it just too long to read?

I read it all! Great analysis of the situation and not a waste, IMHO.

One comment...when you ran the tests in succession, could the cache be
responsible for the timing groupings in the same test?  Should a
little program be run in between to "flush" the cache full of garbage
so each real run will miss?  Seem to recall a little program, in CUJ,
I think, that set up a big array and then iterated over it to trash
the cache.

Darren  aka  stuporg@erols.com

Re: [HACKERS] inlining

From
Bruce Momjian
Date:
>
> > Btw, I sure wish someone would comment on the S_LOCK analysis even if only
> > to tell me not to make such long posts as it wastes bandwidth. Or
> > was it just too long to read?
>
> I read it all! Great analysis of the situation and not a waste, IMHO.
>
> One comment...when you ran the tests in succession, could the cache be
> responsible for the timing groupings in the same test?  Should a
> little program be run in between to "flush" the cache full of garbage
> so each real run will miss?  Seem to recall a little program, in CUJ,
> I think, that set up a big array and then iterated over it to trash
> the cache.

Yes, that is a good point.  When testing in a loop, the function is in
the cache, while in normal use, the function may not be in the cache
because of intervening instructions.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

RE: [HACKERS] inlining

From
"Henry B. Hotz"
Date:
At 4:50 AM -0700 6/12/98, Stupor Genius wrote:
>One comment...when you ran the tests in succession, could the cache be
>responsible for the timing groupings in the same test?  Should a
>little program be run in between to "flush" the cache full of garbage
>so each real run will miss?  Seem to recall a little program, in CUJ,
>I think, that set up a big array and then iterated over it to trash
>the cache.

Obviously I'm commenting at second hand, and perhaps this problem is
handled properly, but:

Many CPU's have independent data and instruction caches.  Setting up a big
array and moving through it will flush the data cache, but most benchmark
anomalies are likely to be due to the instruction cache, aren't they?

Also, if you have a process (program) stop and then restart is the OS smart
enough to reconnect the VM state in such a way that the cache isn't flushed
anyway?  Can it even preserve cache coherence through a fork (when the VM
state is mostly preserved)?  I doubt it.

That said if you are testing multiple SQL statements within a single
connection (so the backend doesn't fork a new process) then I could see
some anomalies.  Otherwise I doubt it.

Anyone know better?

Signature failed Preliminary Design Review.
Feasibility of a new signature is currently being evaluated.
h.b.hotz@jpl.nasa.gov, or hbhotz@oxy.edu