I'm currently playing with Andrews parallel restore patch and it seems
that pg is far from taking advantage of the hardware I have for testing
(Dual Quad Core Xeon attached to Netapp with 68 spindles). with a
concurrency of 4 - I see iowait hovering ~1% CPU load at 20% peak and
around 150000 contextswitches/s. The load seems to progress at around
the same rate as a single backend restore does on the same box. a
profile during the load (a fair amount of tables sized ~10-30M rows
each) looks fairly similiar to:
samples % symbol name
1933314 21.8884 LWLockAcquire
1677808 18.9957 XLogInsert
848227 9.6034 LWLockRelease
414179 4.6892 DoCopy
332633 3.7660 CopyReadLine
266580 3.0181 UnpinBuffer
221693 2.5099 heap_formtuple
176939 2.0033 .plt
171842 1.9455 PinBuffer
160470 1.8168 GetNewObjectId
154095 1.7446 heap_insert
151813 1.7188 s_lock
117849 1.3343 LockBuffer
109530 1.2401 hash_search_with_hash_value
102169 1.1567 PageAddItem
91151 1.0320 pg_verify_mbstr_len
82538 0.9345 CopyGetData
using --truncate-before-load seems to help a bit but it still seems to
only barely utilizing the available resources.
Stefan