Re: strict aliasing - Mailing list pgsql-hackers
From | Ants Aasma |
---|---|
Subject | Re: strict aliasing |
Date | |
Msg-id | CA+CSw_s=Pe3CiK-tfD9fpmVSBorCiV64WeRcma3dw6ZYnMv1CA@mail.gmail.com Whole thread Raw |
In response to | Re: strict aliasing ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: strict aliasing
|
List | pgsql-hackers |
On Tue, Nov 15, 2011 at 9:02 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > From my reading, it appears that if we get safe code in terms of > strict aliasing, we might be able to use the "restrict" keyword to > get further optimizations which bring it to a net win, but I think > there is currently lower-hanging fruit than monkeying with these > compiler options. I'm letting this go, although I still favor the > const-ifying which started this discussion, on the grounds of API > clarity. Speaking of lower-hanging fruit... I ran a series of tests to see how different optimization flags affect performance. I was particularly interested in what effect link time optimization has. The results are somewhat interesting. Benchmark machine is my laptop, Intel Core i5 M 540 @ 2.53GHz. 2 cores + hyperthreading for a total of 4 threads. Ubuntu 11.10. Compiled with GCC 4.6.1-9ubuntu3. I ran pgbench read only test with scale factor 10, default options except for shared_buffers = 256MB. The dataset fits fully in shared buffers. I tried following configurations: default: plain old ./configure; make; make install -O3: what it says on the label lto: CFLAGS="-O3 -flto" This should do some global optimizations at link time. PGO: compiled with CFLAGS="-O3 -fprofile-generate", then ran pgbench -T 30 on a scalefactor 100 database (IO bound rwload to mix the profile up a bit). Then did # sed -i s/-fprofile-generate/-fprofile-use/ src/Makefile.global and recompiled and installed. lto + PGO: same as previous, but with added -flto. Median tps of 3 5 minute runs at different concurrency levels: -c default -O3 lto PGO lto + PGO ==================================================1 6753.40 6689.76 6498.37 6614.73 5918.652 11600.87 11659.33 12074.6312957.81 13353.544 18852.86 18918.32 19008.89 20006.49 20652.938 15232.30 15762.70 14568.06 15880.19 16091.24 16 15693.93 15625.87 16563.91 17088.28 18223.02 Percentage increase from default flags: -c default -O3 lto PGO lto + PGO ==================================================1 6753.40 -0.94% -3.78% -2.05% -12.36%2 11600.87 0.50% 4.08% 11.70% 15.11%4 18852.86 0.35% 0.83% 6.12% 9.55%8 15232.30 3.48% -4.36% 4.25% 5.64% 16 15693.93 -0.43% 5.54% 8.88% 16.12% Concurrency 8 results should probably be ignored - variance was huge, definitely more than the differences. For other results, variance was ~1%. I don't know what to make of the single client results, why they seem to be going in the opposite direction of all other results. Other than that both profile guided optimization and link time optimization give a pretty respectable boost. If anyone can suggest some more diverse workloads to test, I could try to see if the PGO results persist when profiling and benchmark loads differ more. These results suggest that giving the compiler information about hot and cold paths results in a significant improvement in generated code quality. -- Ants Aasma
pgsql-hackers by date: