I did some profiling of pgbench -j 36 -c 36 -T 500 banging on my
two-core desktop box - with synchronous_commit turned off to keep the
fsyncs from dominating the profile - and got these results:
29634 4.7124 postgres base_yyparse
27906 4.4377 postgres AllocSetAlloc
17751 2.8228 postgres SearchCatCache
13029 2.0719 postgres hash_search_with_hash_value
11253 1.7895 postgres core_yylex
9957 1.5834 libc-2.11.2.so memcmp
9237 1.4689 libc-2.11.2.so __strcmp_sse2
8628 1.3720 postgres ScanKeywordLookup
7984 1.2696 postgres GetSnapshotData
7144 1.1361 postgres MemoryContextAllocZeroAligned
6898 1.0969 postgres XLogInsert
6440 1.0241 postgres LWLockAcquire
Full report and call graph attached. The opannotate results are 2MB
bzip'd, so I'm not attaching those; send me an email off-list if you
want them.
I'd like to get access to a box with (a lot) more cores, to see
whether the lock stuff moves up in the profile. A big chunk of that
hash_search_with_hash_value overhead is coming from
LockAcquireExtended. The __strcmp_sse2 is almost entirely parsing
overhead. In general, I'm not sure there's much hope for reducing the
parsing overhead, although ScanKeywordLookup() can certainly be done
better. XLogInsert() is spending a lot of time doing CRC's.
LWLockAcquire() is dropping cycles in many different places.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company