Thread: profile-guided opt. w/ GCC

profile-guided opt. w/ GCC

From

Neil Conway

Date:

30 September 2004, 10:08:20

Profile-guided optimization is a relatively new GCC feature that
improves the quality of generated code by:

- compiling a copy of the source program with some profiling hooks
- running this copy of the program on some representative input data
- recompiling the program using the profiling data produced by the
previous stage; the profiling data lets GCC's optimizer generate more
efficient code

I think it would be cool to add support for PGO to PostgreSQL's build
system (for 8.1). There are a lot of situations where PostgreSQL is
compiled once, and then used for weeks or months (compilations for
inclusion in a distro being the extreme example). In that kind of
situation, trading some additional compile-time for even a small
improvement in run-time performance is worthwhile, IMHO.

I've attached a proof-of-concept patch that implements this. Caveats:

- you'll need to re-run autoconf
- the libgcov.a stuff is a temporary hack, you may need to adjust it for
where libgcov.a is on your system
- I've only bothered adding support for GCC 3.4 (IIRC profile-guided
optimization was introduced in GCC 3.3, but 3.4 adds a simpler interface
to using it). By the time 8.1 is out I think GCC 3.4+ will be pretty
prevalent anyway.
- the patch should remove the .gcda and .gcno files that are produced by
GCC; I haven't done that yet

The patch adds a new make target ("profile-opt") that does the PGO steps
outlined above -- the "representative input data" is the regression
tests running in serial mode. I haven't run any benchmarks yet (if
someone wants to try that, I'd be very curious to see the results).

Comments?

-Neil

Attachment

pgo-support-8.patch

Re: profile-guided opt. w/ GCC

From

Peter Eisentraut

Date:

30 September 2004, 10:49:36

Neil Conway wrote:
> The patch adds a new make target ("profile-opt") that does the PGO
> steps outlined above -- the "representative input data" is the
> regression tests running in serial mode. I haven't run any benchmarks
> yet (if someone wants to try that, I'd be very curious to see the
> results).

I doubt that the regression tests are anywhere near representative input 
data.  They run a proportion of borderline and error cases that is much 
higher than I would expect in normal use.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: profile-guided opt. w/ GCC

From

Neil Conway

Date:

30 September 2004, 11:12:50

On Thu, 2004-09-30 at 19:49, Peter Eisentraut wrote:
> I doubt that the regression tests are anywhere near representative input 
> data.  They run a proportion of borderline and error cases that is much 
> higher than I would expect in normal use.

That's definitely true. At first glance, the regression tests don't seem
to be *too* badly skewed:

[src/test/regress/expected]% grep ERROR *.out | wc -l    
867
[src/test/regress/expected]% grep -i "^SELECT" *.out | wc -l
2924
[src/test/regress/expected]% grep -i "^INSERT" *.out | wc -l
2714
[src/test/regress/expected]% grep -i "^UPDATE" *.out | wc -l
122
[/src/test/regress/expected]% grep -i "^DELETE" *.out | wc -l
110
[src/test/regress/expected]% grep -i "^CREATE" *.out | wc -l
848
[src/test/regress/expected]% grep -i "^COPY" *.out | wc -l
46

I guess it depends on how closely the test data needs to match "normal"
input data for the gcc optimizer to be able to make valid decisions. My
intuition is that the regression tests are sufficiently close to normal
input that it won't be an issue, but I'm not sure.

-Neil

Re: profile-guided opt. w/ GCC

From

"Jeroen T. Vermeulen"

Date:

30 September 2004, 11:58:35

On Thu, Sep 30, 2004 at 07:07:27PM +1000, Neil Conway wrote:
> I think it would be cool to add support for PGO to PostgreSQL's build
> system (for 8.1). There are a lot of situations where PostgreSQL is
> compiled once, and then used for weeks or months (compilations for
> inclusion in a distro being the extreme example). In that kind of
> situation, trading some additional compile-time for even a small
> improvement in run-time performance is worthwhile, IMHO.

It's some time ago now, but a group at the Universitat Politecnica de
Catalunya (including my thesis advisor Alex Ramirez and our databases 
specialist Josep Larriba-Pey) has done a case study on something similar,
using PostgreSQL as their case study.

What they researched was a code reordering algorithm that minimized both
taken branches and I-cache clashes.  The scheme was quite aggressive, even
going so far as to coallocate code in some functions with code in their
most frequent callers.  The study also includes a characterization of the
I-miss and execution intensities of the backend, in a neat matrix with
the major functions on one axis and the stage from which they're invoked 
on the other.

The paper may be enlightening.  Just a moment while I google for it...

...Got it.  Here's the paper:
http://research.ac.upc.es/CAP/hpc/Papers/1999/aramirez1999aC.pdf

And here's the Citeseer entry:
http://citeseer.ist.psu.edu/context/163268/0

Jeroen

Re: profile-guided opt. w/ GCC

From

Tom Lane

Date:

30 September 2004, 15:23:26

Peter Eisentraut <peter_e@gmx.net> writes:
> Neil Conway wrote:
>> The patch adds a new make target ("profile-opt") that does the PGO
>> steps outlined above -- the "representative input data" is the
>> regression tests running in serial mode. I haven't run any benchmarks
>> yet (if someone wants to try that, I'd be very curious to see the
>> results).

> I doubt that the regression tests are anywhere near representative input 
> data.  They run a proportion of borderline and error cases that is much 
> higher than I would expect in normal use.

Also, the serial regression tests provide absolutely 0 exercise for any
of the code paths associated with concurrent behavior.  At minimum I'd
suggest using the parallel tests instead.

It might be interesting to compare the results from PGO using the
regression tests to PGO using pgbench.  pgbench probably goes overboard
in the other direction of not exercising enough stuff, but it would give
us some kind of data point about the consequences of different profiling
loads.
        regards, tom lane