Well, the results are in and at least in this particular case
conventional wisdom is overturned. Not a huge benefit, but throughput is
definitely higher with HT enabled and nthreads >> ncores:
HT off :
bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 48
number of threads: 48
duration: 600 s
number of transactions actually processed: 2435711
tps = 4058.667332 (including connections establishing)
tps = 4058.796309 (excluding connections establishing)
avg-cpu: %user %nice %system %iowait %steal %idle
52.50 0.00 14.79 5.07 0.00 27.64
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 5700.30 0.10 13843.50 0.00 74.78
11.06 48.46 3.50 0.05 65.21
HT on:
bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 48
number of threads: 48
duration: 600 s
number of transactions actually processed: 2832463
tps = 4720.668984 (including connections establishing)
tps = 4720.750477 (excluding connections establishing)
avg-cpu: %user %nice %system %iowait %steal %idle
40.61 0.00 12.71 3.09 0.00 43.59
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 6197.10 14.80 16389.50 0.14 86.53
10.82 54.11 3.30 0.05 82.35
System details:
E5-2620 (6 core + HT 15Mb LL) 64G (4 channels with 16G 1333 modules),
Intel 710 300G (which is faster than the smaller drives, note),
Supermicro X9SRi-F Motherboard.
CentOS 6.3 64-bit, PG 9.2.1 from the PGDG RPM repository. pgbench
running locally on the server.