Re: strict aliasing - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: strict aliasing
Date
Msg-id CA+CSw_s=Pe3CiK-tfD9fpmVSBorCiV64WeRcma3dw6ZYnMv1CA@mail.gmail.com
Whole thread Raw
In response to Re: strict aliasing  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: strict aliasing
List pgsql-hackers
On Tue, Nov 15, 2011 at 9:02 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> From my reading, it appears that if we get safe code in terms of
> strict aliasing, we might be able to use the "restrict" keyword to
> get further optimizations which bring it to a net win, but I think
> there is currently lower-hanging fruit than monkeying with these
> compiler options.  I'm letting this go, although I still favor the
> const-ifying which started this discussion, on the grounds of API
> clarity.

Speaking of lower-hanging fruit...

I ran a series of tests to see how different optimization flags
affect performance. I was particularly interested in what effect
link time optimization has. The results are somewhat interesting.

Benchmark machine is my laptop, Intel Core i5 M 540 @ 2.53GHz.
2 cores + hyperthreading for a total of 4 threads. Ubuntu 11.10.
Compiled with GCC 4.6.1-9ubuntu3.

I ran pgbench read only test with scale factor 10, default
options    except for shared_buffers = 256MB. The dataset fits fully
in shared buffers.

I tried following configurations:
default: plain old ./configure; make; make install
-O3: what it says on the label
lto: CFLAGS="-O3 -flto" This should do some global optimizations    at link time.
PGO: compiled with CFLAGS="-O3 -fprofile-generate", then ran    pgbench -T 30 on a scalefactor 100 database (IO bound
rwload    to mix the profile up a bit). Then did     # sed -i s/-fprofile-generate/-fprofile-use/ src/Makefile.global
and recompiled and installed. 
lto + PGO: same as previous, but with added -flto.

Median tps of 3 5 minute runs at different concurrency levels:

-c  default   -O3      lto       PGO   lto + PGO
==================================================1  6753.40  6689.76  6498.37  6614.73  5918.652 11600.87 11659.33
12074.6312957.81 13353.544 18852.86 18918.32 19008.89 20006.49 20652.938 15232.30 15762.70 14568.06 15880.19 16091.24 
16 15693.93 15625.87 16563.91 17088.28 18223.02

Percentage increase from default flags:

-c  default   -O3      lto       PGO   lto + PGO
==================================================1  6753.40  -0.94%   -3.78%   -2.05%  -12.36%2 11600.87   0.50%
4.08%  11.70%   15.11%4 18852.86   0.35%    0.83%    6.12%    9.55%8 15232.30   3.48%   -4.36%    4.25%    5.64% 
16 15693.93  -0.43%    5.54%    8.88%   16.12%

Concurrency 8 results should probably be ignored - variance was huge,
definitely more than the differences. For other results, variance was
~1%.

I don't know what to make of the single client results, why they seem
to be going in the opposite direction of all other results. Other than
that both profile guided optimization and link time optimization give
a pretty respectable boost. If anyone can suggest some more diverse
workloads to test, I could try to see if the PGO results persist when
profiling and benchmark loads differ more. These results suggest that
giving the compiler information about hot and cold paths results in a
significant improvement in generated code quality.

--
Ants Aasma


pgsql-hackers by date:

Previous
From: Josh Kupershmidt
Date:
Subject: psql \ir filename normalization
Next
From: Jeff Janes
Date:
Subject: Re: Group Commit