I think we need to come up with some benchmarking queries which get more work done per round-trip to the database. And build them into the binary, because otherwise people won't use them as much as they should if they have to pass "-f" files around mailing lists and blog postings. For example, we could enclose 5 statements of the TPC-B-like into a single function which takes aid, bid, tid, and delta as arguments. And presumably we could drop the other two statements (BEGIN and COMMIT) as well, and rely on autocommit to get that job done. So we could go from 7 statements to 1.
Here is an implementation of that. I've included the calling code as a patch to pgbench, because if I make it a separate -f file then it is a pain to get the correct scale and settings of naccounts, etc., into it.
The create script could be integrated into pgbench -i if this is something we might want to commit.
This gives me an almost 3 fold increase in performance on a system with fsync turned off: