Thread: profile-guided opt. w/ GCC
Profile-guided optimization is a relatively new GCC feature that improves the quality of generated code by: - compiling a copy of the source program with some profiling hooks - running this copy of the program on some representative input data - recompiling the program using the profiling data produced by the previous stage; the profiling data lets GCC's optimizer generate more efficient code I think it would be cool to add support for PGO to PostgreSQL's build system (for 8.1). There are a lot of situations where PostgreSQL is compiled once, and then used for weeks or months (compilations for inclusion in a distro being the extreme example). In that kind of situation, trading some additional compile-time for even a small improvement in run-time performance is worthwhile, IMHO. I've attached a proof-of-concept patch that implements this. Caveats: - you'll need to re-run autoconf - the libgcov.a stuff is a temporary hack, you may need to adjust it for where libgcov.a is on your system - I've only bothered adding support for GCC 3.4 (IIRC profile-guided optimization was introduced in GCC 3.3, but 3.4 adds a simpler interface to using it). By the time 8.1 is out I think GCC 3.4+ will be pretty prevalent anyway. - the patch should remove the .gcda and .gcno files that are produced by GCC; I haven't done that yet The patch adds a new make target ("profile-opt") that does the PGO steps outlined above -- the "representative input data" is the regression tests running in serial mode. I haven't run any benchmarks yet (if someone wants to try that, I'd be very curious to see the results). Comments? -Neil
Attachment
Neil Conway wrote: > The patch adds a new make target ("profile-opt") that does the PGO > steps outlined above -- the "representative input data" is the > regression tests running in serial mode. I haven't run any benchmarks > yet (if someone wants to try that, I'd be very curious to see the > results). I doubt that the regression tests are anywhere near representative input data. They run a proportion of borderline and error cases that is much higher than I would expect in normal use. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, 2004-09-30 at 19:49, Peter Eisentraut wrote: > I doubt that the regression tests are anywhere near representative input > data. They run a proportion of borderline and error cases that is much > higher than I would expect in normal use. That's definitely true. At first glance, the regression tests don't seem to be *too* badly skewed: [src/test/regress/expected]% grep ERROR *.out | wc -l 867 [src/test/regress/expected]% grep -i "^SELECT" *.out | wc -l 2924 [src/test/regress/expected]% grep -i "^INSERT" *.out | wc -l 2714 [src/test/regress/expected]% grep -i "^UPDATE" *.out | wc -l 122 [/src/test/regress/expected]% grep -i "^DELETE" *.out | wc -l 110 [src/test/regress/expected]% grep -i "^CREATE" *.out | wc -l 848 [src/test/regress/expected]% grep -i "^COPY" *.out | wc -l 46 I guess it depends on how closely the test data needs to match "normal" input data for the gcc optimizer to be able to make valid decisions. My intuition is that the regression tests are sufficiently close to normal input that it won't be an issue, but I'm not sure. -Neil
On Thu, Sep 30, 2004 at 07:07:27PM +1000, Neil Conway wrote: > I think it would be cool to add support for PGO to PostgreSQL's build > system (for 8.1). There are a lot of situations where PostgreSQL is > compiled once, and then used for weeks or months (compilations for > inclusion in a distro being the extreme example). In that kind of > situation, trading some additional compile-time for even a small > improvement in run-time performance is worthwhile, IMHO. It's some time ago now, but a group at the Universitat Politecnica de Catalunya (including my thesis advisor Alex Ramirez and our databases specialist Josep Larriba-Pey) has done a case study on something similar, using PostgreSQL as their case study. What they researched was a code reordering algorithm that minimized both taken branches and I-cache clashes. The scheme was quite aggressive, even going so far as to coallocate code in some functions with code in their most frequent callers. The study also includes a characterization of the I-miss and execution intensities of the backend, in a neat matrix with the major functions on one axis and the stage from which they're invoked on the other. The paper may be enlightening. Just a moment while I google for it... ...Got it. Here's the paper: http://research.ac.upc.es/CAP/hpc/Papers/1999/aramirez1999aC.pdf And here's the Citeseer entry: http://citeseer.ist.psu.edu/context/163268/0 Jeroen
Peter Eisentraut <peter_e@gmx.net> writes: > Neil Conway wrote: >> The patch adds a new make target ("profile-opt") that does the PGO >> steps outlined above -- the "representative input data" is the >> regression tests running in serial mode. I haven't run any benchmarks >> yet (if someone wants to try that, I'd be very curious to see the >> results). > I doubt that the regression tests are anywhere near representative input > data. They run a proportion of borderline and error cases that is much > higher than I would expect in normal use. Also, the serial regression tests provide absolutely 0 exercise for any of the code paths associated with concurrent behavior. At minimum I'd suggest using the parallel tests instead. It might be interesting to compare the results from PGO using the regression tests to PGO using pgbench. pgbench probably goes overboard in the other direction of not exercising enough stuff, but it would give us some kind of data point about the consequences of different profiling loads. regards, tom lane