Well folks, I've been trying to track down why this Athlon 2800
(2.1ghz) has been handing my 2.5ghz G5 its cake. I have a query that
(makes no io - the dataset can live in ram easily) takes about 700ms
on the athlon and about 10 seconds on the G5.
Tracking ti down a bit timestamp_cmp_internal (The btree was made of
a timestamp & and int) was taking a large amount of time -
specifically all the calls it makes to isnan(x). 14.1% in __isnand
(which is the libSystem function & guts, which according to the
darwin source copies the double to memory and accesses it as 2 ints
looking for a specific pattern). (For reference, the other top
functions are _bt_checkkeys at 30%, FunctionCall2 at 15.8% , _bt_step
at 9% and _bt_first at 7%) .
Talking to some of the mac super guru's on irc they said the problem
is how the Mach-O ABI works, basically you get kicked in the nuts for
accessing global or static data (like those constants __isnand
uses). (You can read http://www.unsanity.org/archives/000044.php for
a touch of info on it).
I think given the function-call-rich arch of PG may make its
performance on OSX always lower than other counterparts. Especially
things like that __isnand.
I'm going to be doing a couple experiments: 1. making an inline
version of isnan to see how that improves performance 2. Trying it
out on linux ppc to see how it runs. It may be worth noting these
in the docs or faq somewhere.
Also, two things to note, one of which is quite important: On tiger
(10.4) PG compiles with NO OPTIMIZATION. Probably a template file
needs to be updated.
Panther seems to compile with -O2 though.
If you want to profile PG on Tiger do not use gprof - it seems to be
broken. I get func call #s, but no timing data. Instead you can do
something even better - compile PG normally and attach to it with
Shark (Comes with the CHUD tools) and check out its profile. Quite
slick actually :)
I'll keep people updated on my progress, but I just wanted to get
these issues out in the air.
Jeff Trout <>