Thread: Promising results with Intel Linux x86 compiler

Promising results with Intel Linux x86 compiler

From
Kyle
Date:
I've been playing around with Intel's x86 C++ compiler (icc) for
linux.  The compiler is very good for optimizing x86 code.  With some
struggle, I managed to get postgresql compiled with it.  I've listed
below what I had to do to get postgres compiled, along with some
results from pgbench.


Compilation

icc can compile C code as well as C++.  WRT C, icc is binary
compatible with gcc.  It aims to be compatible with gcc extensions,
but has a ways to go.

Note: I only compiled the backend with icc.  Targets such as psql/
pgbench were compiled with gcc.

* The first problem I encountered was icc couldn't produce the
SUBSYS.o targets for the backend.  The work around was to ignore the
SUBSYS.o targets and link with each individual object files.

* Linking doesn't appear to work with icc's "-ipo" optimization.  The
goal of ipo to perform inlining of functions between source files.
This is a bummer, since -ipo can produce very good code.

* Since icc can't handle inline assembly, a number of files need to be
compiled with gcc.  These include:   access/transam/xlog.c   storage/ipc/shmem.c   storage/lmgr/proc.c
storage/lmgr/lwlock.c  storage/lmgr/s_lock.c   utils/adt/pg_lzcompress.c
 
Hopefully intel will add inline assembly to icc....

* I used the gcc frontend to ld instead of using icc directly.  ld
seems to hang (3+ hrs cpu time, no output) when invoked from icc on
postgresql for some reason.


Results:

I produced two different backend executables.  One with gcc and one
with icc.  I ran each executable on the same database and benchmarked
with pgbench.

(1) gcc 2.96 (yeah, RedHat 7.1) options:       -Wall -O3 -fomit-frame-pointer -fforce-addr -fforce-mem
-funroll-loops-malign-loops=2 -malign-functions=2 -malign-jumps=2
 
(2) icc 5.0 options:       -O3 -tpp6 -xK -unroll -ip
                     pgbench{i}
         [-t 5000] [-t 500 -c 10] [-t 200 -c 25]  [-t 25 -c 50]
(1)gcc      94.06       94.74          86.77           149.38
(2)icc     102.08      100.31          91.10           155.15

{i}: results are tps excluding connection establishing, average of 3 runs.    A full vacuum/analyze was performed
betweenruns.
 

The results indicate up to a ~10% increase in transactions per second
for pgbench.  I've seen improvement more like 20% on some very cpu
intensive programs (ie- lame mp3 encoder).

If there are some other benchmarks easily run let me know and I'll
give them a go.

Side note: it seems difficult to get consistent results out of
pgbench.  I ended up dropping/recreating/repopulating the database
between runs.  I also modified pgbench to have a constant seed to the
random number generator (attempting to get more consistent results).


Conclusion

The Intel compiler appears to produce code better than gcc 2.96 when
testing with pgbench.  My experience has been that icc excels at
cpu-intensive processes, which might not be reflected in the pgbench
results.  Since postgresql can require lots of disk I/O, performance
versus gcc will not be significant on processes already I/O bound.

The build process currently requires lots of hand tweaking and isn't
entirely possible without gcc.  Future versions of icc should improve
upon this.  As such, it may be currently obtuse to give postgres
support for icc out of the box but it should be doable if their is
interest.

My understanding is that the intel evaluation license is ok for
hobbyists, but without purchasing an actual license ($500) the
compiled code cannot be distributed.

link: http://www.intel.com/software/products/compilers/c50/linux/noncom.htm


Regards,
Kyle
kaf@_nwlink_._com_


Re: Promising results with Intel Linux x86 compiler

From
Tom Lane
Date:
Kyle <kaf@nwlink.com> writes:
> * Linking doesn't appear to work with icc's "-ipo" optimization.  The
> goal of ipo to perform inlining of functions between source files.
> This is a bummer, since -ipo can produce very good code.

You should be quite wary of that one.

The reason is that accesses to shared memory are typically protected by
LWLockAcquire/LWLockRelease call pairs.  It's absolutely critical that
no operations get relocated into or out of the code segments between
such call pairs.  With interprocedural optimizations turned on, I think
it's quite likely for a compiler to blow this --- which would lead to
extremely nasty, low-probability, hard-to-debug failures during
concurrent operation.

Having recently tracked down some similar nastiness *within*
LWLockAcquire (AIX's compiler feels no compunction about rearranging
volatile-object operations w.r.t. non-volatile ones) the prospect of
any compiler deciding to interleave LWLockAcquire/LWLockRelease code
with calling code scares me to death.

AFAIK the only way we could prevent such problems is for *all* pointers
to shared memory to be marked volatile --- which would doubtless blow a
good proportion of the speedup one might otherwise hope to get.  Within
an LWLockAcquire'd segment, shared memory is *not* volatile and we don't
want to completely defeat optimization of routines such as the lock and
buffer managers.

Possibly you could avoid the issue by arranging for lwlock.c to be
compiled at a lower optimization level that doesn't expose its routines
for merging with callers.

> Side note: it seems difficult to get consistent results out of
> pgbench.

Yeah, I've noticed that too.  You really have to do a complete vacuum
between runs to get any semblance of stable results.
        regards, tom lane


Re: Promising results with Intel Linux x86 compiler

From
Justin Clift
Date:
Hi Kyle,

Would you like to try the icc optimised version of PostgreSQL with the
OSDB (Open Source Database Benchmark)?

It's based on the AS3AP database benchmark, which I feel is a lot more
recognised than pgbench.

It's URL is http://osdb.sourceforge.net

The latest released version (0.12) has a problem with hash indexes in
PostgreSQL (a PostgreSQL bug which Neil Conway has put up his hand to
fix), but the latest CVS commit of OSDB has a workaround for that.

*If* you don't mind downloading the latest CVS version (it's not a real
big program) and compiling that, it would be interesting to see the
throughput differences between the gcc compiled and icc compiled
versions of PostgreSQL.

If you need the dataset generation utility for OSDB, I have that too. 
Just ask me for it and I'll email it to you.  It's a DOS executable, but
runs fine with Wine (the windows emulator).

:-)

Regards and best wishes,

Justin Clift


Kyle wrote:
> 
> I've been playing around with Intel's x86 C++ compiler (icc) for
> linux.  The compiler is very good for optimizing x86 code.  With some
> struggle, I managed to get postgresql compiled with it.  I've listed
> below what I had to do to get postgres compiled, along with some
> results from pgbench.
> 
> Compilation
> 
> icc can compile C code as well as C++.  WRT C, icc is binary
> compatible with gcc.  It aims to be compatible with gcc extensions,
> but has a ways to go.
> 
> Note: I only compiled the backend with icc.  Targets such as psql/
> pgbench were compiled with gcc.
> 
> * The first problem I encountered was icc couldn't produce the
> SUBSYS.o targets for the backend.  The work around was to ignore the
> SUBSYS.o targets and link with each individual object files.
> 
> * Linking doesn't appear to work with icc's "-ipo" optimization.  The
> goal of ipo to perform inlining of functions between source files.
> This is a bummer, since -ipo can produce very good code.
> 
> * Since icc can't handle inline assembly, a number of files need to be
> compiled with gcc.  These include:
>     access/transam/xlog.c
>     storage/ipc/shmem.c
>     storage/lmgr/proc.c
>     storage/lmgr/lwlock.c
>     storage/lmgr/s_lock.c
>     utils/adt/pg_lzcompress.c
> Hopefully intel will add inline assembly to icc....
> 
> * I used the gcc frontend to ld instead of using icc directly.  ld
> seems to hang (3+ hrs cpu time, no output) when invoked from icc on
> postgresql for some reason.
> 
> Results:
> 
> I produced two different backend executables.  One with gcc and one
> with icc.  I ran each executable on the same database and benchmarked
> with pgbench.
> 
> (1) gcc 2.96 (yeah, RedHat 7.1) options:
>         -Wall -O3 -fomit-frame-pointer -fforce-addr -fforce-mem
>         -funroll-loops -malign-loops=2 -malign-functions=2 -malign-jumps=2
> (2) icc 5.0 options:
>         -O3 -tpp6 -xK -unroll -ip
> 
>                       pgbench{i}
> 
>           [-t 5000] [-t 500 -c 10] [-t 200 -c 25]  [-t 25 -c 50]
> (1)gcc      94.06       94.74          86.77           149.38
> (2)icc     102.08      100.31          91.10           155.15
> 
> {i}: results are tps excluding connection establishing, average of 3 runs.
>      A full vacuum/analyze was performed between runs.
> 
> The results indicate up to a ~10% increase in transactions per second
> for pgbench.  I've seen improvement more like 20% on some very cpu
> intensive programs (ie- lame mp3 encoder).
> 
> If there are some other benchmarks easily run let me know and I'll
> give them a go.
> 
> Side note: it seems difficult to get consistent results out of
> pgbench.  I ended up dropping/recreating/repopulating the database
> between runs.  I also modified pgbench to have a constant seed to the
> random number generator (attempting to get more consistent results).
> 
> Conclusion
> 
> The Intel compiler appears to produce code better than gcc 2.96 when
> testing with pgbench.  My experience has been that icc excels at
> cpu-intensive processes, which might not be reflected in the pgbench
> results.  Since postgresql can require lots of disk I/O, performance
> versus gcc will not be significant on processes already I/O bound.
> 
> The build process currently requires lots of hand tweaking and isn't
> entirely possible without gcc.  Future versions of icc should improve
> upon this.  As such, it may be currently obtuse to give postgres
> support for icc out of the box but it should be doable if their is
> interest.
> 
> My understanding is that the intel evaluation license is ok for
> hobbyists, but without purchasing an actual license ($500) the
> compiled code cannot be distributed.
> 
> link: http://www.intel.com/software/products/compilers/c50/linux/noncom.htm
> 
> Regards,
> Kyle
> kaf@_nwlink_._com_
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."    - Indira Gandhi