It looks like this is the bulk loading of data into unindexed tables. How good is that as a target for optimization? I can see several (quite difficult to code and maintain) ways to make bulk loading into unindexed tables faster, but they would not speed up the more general cases.
I played around a little with this, parallel bulk loads into a unindexed, very skinny table. If I hacked XLogInsert so that it did nothing but take the WALInsertLock, release it, then return a fake RecPtr, it scaled better but still not very well. So giant leaps in throughput would need to involve calling XLogInsert less often (or at least taking the WALInsertLock less often). You could nibble around the edges by tweaking what happens under the WALInsertLock, but I don't think that that will get you big wins by doing that for this case. But again, how important is this case? Are bulk loads into skinny unindexed tables the best test-bed for improving XLogInsert?
(Sorry, I think I forgot to change the subject on previous message. Digests are great if you only read, but for contributing I guess I have to change to receiving each message)