Well, I feel like a fool, because I failed to notice that the total
runtime shown in that profile wasn't anywhere close to the actual wall
clock time. gprof is indeed simply not counting the time spent in
dynamically-linked code. With tsearch2 statically linked into the
backend, a more believable picture emerges:
% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
98.96 1495.93 1495.93 33035195 0.00 0.00 hemdistsign
0.27 1500.01 4.08 10030581 0.00 0.00 makesign
0.11 1501.74 1.73 588976 0.00 0.00 gistchoose
0.10 1503.32 1.58 683471 0.00 0.00 XLogInsert
0.05 1504.15 0.83 246579 0.00 0.00 sizebitvec
0.05 1504.93 0.78 446399 0.00 0.00 gtsvector_union
0.03 1505.45 0.52 3576475 0.00 0.00 LWLockRelease
0.03 1505.92 0.47 1649 0.00 0.00 gtsvector_picksplit
0.03 1506.38 0.47 3572572 0.00 0.00 LWLockAcquire
0.02 1506.74 0.36 444817 0.00 0.00 gtsvector_same
0.02 1507.09 0.35 4077089 0.00 0.00 AllocSetAlloc
0.02 1507.37 0.28 236984 0.00 0.00 gistdoinsert
0.02 1507.63 0.26 874195 0.00 0.00 hash_search
0.02 1507.89 0.26 9762101 0.00 0.00 gtsvector_penalty
0.01 1508.08 0.19 236984 0.00 0.00 gistmakedeal
0.01 1508.27 0.19 841754 0.00 0.00 UnpinBuffer
0.01 1508.45 0.18 22985469 0.00 0.00 hemdistcache
0.01 1508.63 0.18 3998572 0.00 0.00 LockBuffer
0.01 1508.81 0.18 686681 0.00 0.00 gtsvector_compress
0.01 1508.98 0.17 11514275 0.00 0.00 gistdentryinit
So we gotta fix hemdistsign ...
regards, tom lane