Thread: HT on or off for E5-26xx ?
I'm bringing up a new type of server using Intel E5-2620 (unisocket) which was selected for good SpecIntRate performance vs cost/power (201 for $410 and 95W). Was assuming it was 6-core but I just noticed it has HT which is currently enabled since I see 12 cores in /proc/cpuinfo Question for the performance experts : is it better to have HT enabled or disabled for this generation of Xeon ? Workload will be moderately concurrent, small OLTP type transactions. We'll also run a few low-load VMs (using KVM) and a big Java application. Any thoughts welcome. Thanks.
On 07/11/12 16:31, David Boreham wrote: > > I'm bringing up a new type of server using Intel E5-2620 (unisocket) > which was selected for good SpecIntRate performance vs cost/power (201 > for $410 and 95W). > > Was assuming it was 6-core but I just noticed it has HT which is > currently enabled since I see 12 cores in /proc/cpuinfo > > Question for the performance experts : is it better to have HT enabled > or disabled for this generation of Xeon ? > Workload will be moderately concurrent, small OLTP type transactions. > We'll also run a few low-load VMs (using KVM) and a big Java application. > > > I've been benchmarking a E5-4640 (4 socket) and hyperthreading off gave much better scaling behaviour in pgbench (gentle rise and flatten off), whereas with hyperthreading on there was a dramatic falloff after approx number clients = number of (hyperthreaded) cpus. The box is intended to be a pure db server, so we are running with hyperthreading off. Cheers Mark
On 11/6/2012 9:16 PM, Mark Kirkwood wrote: > > > I've been benchmarking a E5-4640 (4 socket) and hyperthreading off > gave much better scaling behaviour in pgbench (gentle rise and flatten > off), whereas with hyperthreading on there was a dramatic falloff > after approx number clients = number of (hyperthreaded) cpus. The box > is intended to be a pure db server, so we are running with > hyperthreading off. It looks like this syndrome is not observed on my box, likely due to the much lower number of cores system-wide (12). I see pgbench tps increase nicely until #threads/clients == #cores, then plateau. I tested up to 96 threads btw. We're waiting on more memory modules to arrive. I'll post some test results once we have all 4 memory banks populated.
Hi, On Tue, 2012-11-06 at 20:31 -0700, David Boreham wrote: > Was assuming it was 6-core but I just noticed it has HT which is > currently enabled since I see 12 cores in /proc/cpuinfo > > Question for the performance experts : is it better to have H enabled > or disabled for this generation of Xeon ? Workload will be moderately > concurrent, small OLTP type transactions. We'll also run a few > low-load VMs (using KVM) and big Java application. HT should be good for file servers, or say many of the app servers, or small web/mail servers. PostgreSQL relies on the CPU power, and since the HT CPUs don't have the same power as the original CPU, when OS submits a job to that particular HTed CPU, query will run significantly slow. To avoid issues, I would suggest you to turn HT off on all PostgreSQL servers. If you can throw some more money, another 6-core CPU would give more benefit. Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz
Attachment
On 11/7/2012 6:37 AM, Devrim GÜNDÜZ wrote: > HT should be good for file servers, or say many of the app servers, or > small web/mail servers. PostgreSQL relies on the CPU power, and since > the HT CPUs don't have the same power as the original CPU, when OS > submits a job to that particular HTed CPU, query will run significantly > slow. To avoid issues, I would suggest you to turn HT off on all > PostgreSQL servers. If you can throw some more money, another 6-core CPU > would give more benefit. I realize this is the "received knowledge" but it is not supported by the evidence before me (which is that I get nearly 2x the throughput from pgbench using nthreads == nhtcores vs nthreads == nfullcores). Intel's latest HT implementation seems to suffer less from the kinds of resource sharing contention issues seen in older generations. Once I have the machine's full memory installed I'll run pgbench with HT disabled in the BIOS and post the results.
On 08/11/12 02:33, David Boreham wrote: > On 11/6/2012 9:16 PM, Mark Kirkwood wrote: >> >> >> I've been benchmarking a E5-4640 (4 socket) and hyperthreading off >> gave much better scaling behaviour in pgbench (gentle rise and >> flatten off), whereas with hyperthreading on there was a dramatic >> falloff after approx number clients = number of (hyperthreaded) cpus. >> The box is intended to be a pure db server, so we are running with >> hyperthreading off. > > It looks like this syndrome is not observed on my box, likely due to > the much lower number of cores system-wide (12). > I see pgbench tps increase nicely until #threads/clients == #cores, > then plateau. I tested up to 96 threads btw. > > We're waiting on more memory modules to arrive. I'll post some test > results once we have all 4 memory banks populated. > > > > Interesting - I was wondering if a single socket board would behave differently (immediately after posting of course)...I've got an i3 home system that scales nicely even with hyperthreading on (2 cores, 4 typerthreads). Cheers Mark
Well, the results are in and at least in this particular case conventional wisdom is overturned. Not a huge benefit, but throughput is definitely higher with HT enabled and nthreads >> ncores: HT off : bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 100 query mode: simple number of clients: 48 number of threads: 48 duration: 600 s number of transactions actually processed: 2435711 tps = 4058.667332 (including connections establishing) tps = 4058.796309 (excluding connections establishing) avg-cpu: %user %nice %system %iowait %steal %idle 52.50 0.00 14.79 5.07 0.00 27.64 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 5700.30 0.10 13843.50 0.00 74.78 11.06 48.46 3.50 0.05 65.21 HT on: bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 100 query mode: simple number of clients: 48 number of threads: 48 duration: 600 s number of transactions actually processed: 2832463 tps = 4720.668984 (including connections establishing) tps = 4720.750477 (excluding connections establishing) avg-cpu: %user %nice %system %iowait %steal %idle 40.61 0.00 12.71 3.09 0.00 43.59 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6197.10 14.80 16389.50 0.14 86.53 10.82 54.11 3.30 0.05 82.35 System details: E5-2620 (6 core + HT 15Mb LL) 64G (4 channels with 16G 1333 modules), Intel 710 300G (which is faster than the smaller drives, note), Supermicro X9SRi-F Motherboard. CentOS 6.3 64-bit, PG 9.2.1 from the PGDG RPM repository. pgbench running locally on the server.
On 11/07/2012 09:16 PM, David Boreham wrote: > bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48 Unfortunately without -S, you're not really testing the processors. A regular pgbench can fluctuate a more than that due to writing and checkpoints. For what it's worth, our X5675's perform about 40-50% better with HT enabled. Not the 2x you might expect by doubling the amount of "processors", but it definitely didn't make things worse. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 11/8/2012 6:58 AM, Shaun Thomas wrote: > On 11/07/2012 09:16 PM, David Boreham wrote: > >> bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48 > > Unfortunately without -S, you're not really testing the processors. A > regular pgbench can fluctuate a more than that due to writing and > checkpoints. Hmm...my goal was to test with a workload close to our application's (which is heavy OLTP, small transactions and hence sensitive to I/O commit rate). The hypothesis I was testing was that enabling HT positively degrades performance (which in my case it does not). I wasn't to be honest really testing the additional benefit from HT, rather observing that it is non-negative :) If I have time I can run the select-only test for you and post the results. The DB fits into memory so it will be a good CPU test.
Here are the SELECT only pgbench test results from my E5-2620 machine, with HT on and off: HT off: bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48 -S starting vacuum...end. transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 48 number of threads: 48 duration: 600 s number of transactions actually processed: 25969680 tps = 43281.392273 (including connections establishing) tps = 43282.476955 (excluding connections establishing) All 6 cores saturated: avg-cpu: %user %nice %system %iowait %steal %idle 81.42 0.00 18.21 0.00 0.00 0.37 HT on: bash-4.1$ /usr/pgsql-9.2/bin/pgbench -T 600 -j 48 -c 48 -S starting vacuum...end. transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 48 number of threads: 48 duration: 600 s number of transactions actually processed: 29934601 tps = 49888.697225 (including connections establishing) tps = 49889.570754 (excluding connections establishing) 12% of CPU showing as idle (whether that's true or not I'm not sure): avg-cpu: %user %nice %system %iowait %steal %idle 71.09 0.00 16.99 0.00 0.00 11.92 So for this particular test HT gives us the equivalent of about one extra core. It does not reduce performance, rather increases performance slightly.