Thread: Amazon EC2 CPU Utilization
I have deployed PostgresSQL 8.4.1 on a Fedora 9 c1.xlarge (8x1 cores) instance in the Amazon E2 Cloud. When I run pgbench in read-only mode (-S) on a small database, I am unable to peg the CPUs no matter how many clients I throw at it. In fact, the CPU utilization never drops below 60% idle. I also tried this on Fedora 12 (kernel 2.6.31) and got the same basic result. What's going on here? Am I really only utilizing 40% of the CPUs? Is this to be expected on virtual (xen) instances? [root@domU-12-31-39-0C-88-C1 ~]# uname -a Linux domU-12-31-39-0C-88-C1 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 x86_64 x86_64 GNU/Linux -bash-4.0# pgbench -S -c 16 -T 30 -h domU-12-31-39-0C-88-C1 -U postgres Password: starting vacuum...end. transaction type: SELECT only scaling factor: 64 query mode: simple number of clients: 16 duration: 30 s number of transactions actually processed: 590508 tps = 19663.841772 (including connections establishing) tps = 19710.041020 (excluding connections establishing) top - 15:55:05 up 1:33, 2 users, load average: 2.44, 0.98, 0.44 Tasks: 123 total, 11 running, 112 sleeping, 0 stopped, 0 zombie Cpu(s): 18.9%us, 8.8%sy, 0.0%ni, 70.6%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st Mem: 7348132k total, 1886912k used, 5461220k free, 34432k buffers Swap: 0k total, 0k used, 0k free, 1456472k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2834 postgres 15 0 191m 72m 70m S 16 1.0 0:00.66 postmaster 2838 postgres 15 0 191m 66m 64m R 15 0.9 0:00.62 postmaster 2847 postgres 15 0 191m 70m 68m S 15 1.0 0:00.59 postmaster 2837 postgres 15 0 191m 72m 70m S 14 1.0 0:00.47 postmaster 2842 postgres 15 0 191m 66m 64m R 14 0.9 0:00.48 postmaster 2835 postgres 15 0 191m 69m 67m S 14 1.0 0:00.54 postmaster 2839 postgres 15 0 191m 69m 67m R 14 1.0 0:00.60 postmaster 2840 postgres 15 0 191m 68m 67m R 14 1.0 0:00.58 postmaster 2833 postgres 15 0 191m 68m 66m R 14 1.0 0:00.50 postmaster 2845 postgres 15 0 191m 70m 68m R 14 1.0 0:00.50 postmaster 2846 postgres 15 0 191m 67m 65m R 14 0.9 0:00.51 postmaster 2836 postgres 15 0 191m 66m 64m S 12 0.9 0:00.43 postmaster 2844 postgres 15 0 191m 68m 66m R 11 1.0 0:00.40 postmaster 2841 postgres 15 0 191m 65m 64m R 11 0.9 0:00.43 postmaster 2832 postgres 15 0 191m 67m 65m S 10 0.9 0:00.38 postmaster 2843 postgres 15 0 191m 67m 66m S 10 0.9 0:00.43 postmaster [root@domU-12-31-39-0C-88-C1 ~]# iostat -d 2 -x Linux 2.6.21.7-2.ec2.v1.2.fc8xen (domU-12-31-39-0C-88-C1) 01/27/10 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda1 0.57 15.01 1.32 3.56 34.39 148.57 37.52 0.28 57.35 3.05 1.49 sdb1 0.03 112.38 5.50 12.11 87.98 995.91 61.57 1.88 106.61 2.23 3.93 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda1 0.00 0.00 0.00 1.79 0.00 28.57 16.00 0.00 2.00 1.50 0.27 sdb1 0.00 4.46 0.00 14.29 0.00 150.00 10.50 0.37 26.00 2.56 3.66 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda1 0.00 3.57 0.00 0.79 0.00 34.92 44.00 0.00 3.00 3.00 0.24 sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
On Wed, Jan 27, 2010 at 3:59 PM, Mike Bresnahan <mike.bresnahan@bestbuy.com> wrote:
I have deployed PostgresSQL 8.4.1 on a Fedora 9 c1.xlarge (8x1 cores) instance
in the Amazon E2 Cloud. When I run pgbench in read-only mode (-S) on a small
database, I am unable to peg the CPUs no matter how many clients I throw at it.
In fact, the CPU utilization never drops below 60% idle. I also tried this on
Fedora 12 (kernel 2.6.31) and got the same basic result. What's going on here?
Am I really only utilizing 40% of the CPUs? Is this to be expected on virtual
(xen) instances?
I have seen behavior like this in the past on EC2. I believe your bottleneck may be pulling the data out of cache. I benchmarked this a while back and found that memory speeds are not much faster than disk speeds on EC2. I am not sure if that is true of Xen in general or if its just limited to the cloud.
[root@domU-12-31-39-0C-88-C1 ~]# uname -a
Linux domU-12-31-39-0C-88-C1 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20
17:48:28 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
-bash-4.0# pgbench -S -c 16 -T 30 -h domU-12-31-39-0C-88-C1 -U postgres
Password:
starting vacuum...end.
transaction type: SELECT only
scaling factor: 64
query mode: simple
number of clients: 16
duration: 30 s
number of transactions actually processed: 590508
tps = 19663.841772 (including connections establishing)
tps = 19710.041020 (excluding connections establishing)
top - 15:55:05 up 1:33, 2 users, load average: 2.44, 0.98, 0.44
Tasks: 123 total, 11 running, 112 sleeping, 0 stopped, 0 zombie
Cpu(s): 18.9%us, 8.8%sy, 0.0%ni, 70.6%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st
Mem: 7348132k total, 1886912k used, 5461220k free, 34432k buffers
Swap: 0k total, 0k used, 0k free, 1456472k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2834 postgres 15 0 191m 72m 70m S 16 1.0 0:00.66 postmaster
2838 postgres 15 0 191m 66m 64m R 15 0.9 0:00.62 postmaster
2847 postgres 15 0 191m 70m 68m S 15 1.0 0:00.59 postmaster
2837 postgres 15 0 191m 72m 70m S 14 1.0 0:00.47 postmaster
2842 postgres 15 0 191m 66m 64m R 14 0.9 0:00.48 postmaster
2835 postgres 15 0 191m 69m 67m S 14 1.0 0:00.54 postmaster
2839 postgres 15 0 191m 69m 67m R 14 1.0 0:00.60 postmaster
2840 postgres 15 0 191m 68m 67m R 14 1.0 0:00.58 postmaster
2833 postgres 15 0 191m 68m 66m R 14 1.0 0:00.50 postmaster
2845 postgres 15 0 191m 70m 68m R 14 1.0 0:00.50 postmaster
2846 postgres 15 0 191m 67m 65m R 14 0.9 0:00.51 postmaster
2836 postgres 15 0 191m 66m 64m S 12 0.9 0:00.43 postmaster
2844 postgres 15 0 191m 68m 66m R 11 1.0 0:00.40 postmaster
2841 postgres 15 0 191m 65m 64m R 11 0.9 0:00.43 postmaster
2832 postgres 15 0 191m 67m 65m S 10 0.9 0:00.38 postmaster
2843 postgres 15 0 191m 67m 66m S 10 0.9 0:00.43 postmaster
[root@domU-12-31-39-0C-88-C1 ~]# iostat -d 2 -x
Linux 2.6.21.7-2.ec2.v1.2.fc8xen (domU-12-31-39-0C-88-C1) 01/27/10
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.57 15.01 1.32 3.56 34.39 148.57 37.52
0.28 57.35 3.05 1.49
sdb1 0.03 112.38 5.50 12.11 87.98 995.91 61.57
1.88 106.61 2.23 3.93
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 1.79 0.00 28.57 16.00
0.00 2.00 1.50 0.27
sdb1 0.00 4.46 0.00 14.29 0.00 150.00 10.50
0.37 26.00 2.56 3.66
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda1 0.00 3.57 0.00 0.79 0.00 34.92 44.00
0.00 3.00 3.00 0.24
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
--
Jim Mlodgenski
EnterpriseDB (http://www.enterprisedb.com)
Jim Mlodgenski <jimmy76 <at> gmail.com> writes: > I have seen behavior like this in the past on EC2. I believe your bottleneck may be pulling the data out of cache. I benchmarked this a while back and found that memory speeds are not much faster than disk speeds on EC2. I am not sure if that is true of Xen in general or if its just limited to the cloud. When the CPU is waiting for a memory read, are the CPU cycles not charged to the currently running process?
Mike Bresnahan wrote: > top - 15:55:05 up 1:33, 2 users, load average: 2.44, 0.98, 0.44 > Tasks: 123 total, 11 running, 112 sleeping, 0 stopped, 0 zombie > Cpu(s): 18.9%us, 8.8%sy, 0.0%ni, 70.6%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st > Mem: 7348132k total, 1886912k used, 5461220k free, 34432k buffers > Swap: 0k total, 0k used, 0k free, 1456472k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > 2834 postgres 15 0 191m 72m 70m S 16 1.0 0:00.66 postmaster > > 2838 postgres 15 0 191m 66m 64m R 15 0.9 0:00.62 postmaster > Could you try this again with "top -c", which will label these postmaster processes usefully, and include the pgbench client itself in what you post? It's hard to sort out what's going on in these situations without that style of breakdown. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
> I have seen behavior like this in the past on EC2. I believe your > bottleneck may be pulling the data out of cache. I benchmarked this a > while back and found that memory speeds are not much faster than disk > speeds on EC2. I am not sure if that is true of Xen in general or if > its just limited to the cloud. that doesn't make much sense. more likely, he's disk IO bound, but hard to say as that iostat output only showed a couple 2 second slices of work. the first output, which shows average since system startup, seems to show the system has had relatively high average wait times of 100ms on the average, yet the samples below only show 0, 2, 3mS await.
John R Pierce <pierce <at> hogranch.com> writes: > more likely, he's disk IO bound, but hard to say as that iostat output > only showed a couple 2 second slices of work. the first output, which > shows average since system startup, seems to show the system has had > relatively high average wait times of 100ms on the average, yet the > samples below only show 0, 2, 3mS await. I don't think the problem is disk I/O. The database easily fits in the available RAM (in fact there is a ton of RAM free) and iostat does not show a heavy load.
> Could you try this again with "top -c", which will label these > postmaster processes usefully, and include the pgbench client itself in > what you post? It's hard to sort out what's going on in these > situations without that style of breakdown. I had run pgbench on a separate instance last time, but this time I ran it on the same machine. With the -c option, top(1) reports that many of the postgres processes are idle. top - 18:25:23 up 8 min, 2 users, load average: 1.52, 1.32, 0.55 Tasks: 218 total, 15 running, 203 sleeping, 0 stopped, 0 zombie Cpu(s): 32.3%us, 17.5%sy, 0.0%ni, 49.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.6%st Mem: 7358492k total, 1620500k used, 5737992k free, 11144k buffers Swap: 0k total, 0k used, 0k free, 1248388k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1323 postgres 20 0 50364 2192 1544 R 56.7 0.0 0:03.19 pgbench -S -c 16 -T 30 1337 postgres 20 0 197m 114m 112m R 25.4 1.6 0:01.35 postgres: postgres postgres [local] SELECT 1331 postgres 20 0 197m 113m 111m R 24.4 1.6 0:01.16 postgres: postgres postgres [local] idle 1335 postgres 20 0 197m 114m 112m R 24.1 1.6 0:01.30 postgres: postgres postgres [local] SELECT 1340 postgres 20 0 197m 113m 112m R 22.7 1.6 0:01.28 postgres: postgres postgres [local] idle 1327 postgres 20 0 197m 114m 113m R 22.1 1.6 0:01.26 postgres: postgres postgres [local] idle 1328 postgres 20 0 197m 114m 113m R 21.8 1.6 0:01.32 postgres: postgres postgres [local] SELECT 1332 postgres 20 0 197m 114m 112m R 21.8 1.6 0:01.11 postgres: postgres postgres [local] SELECT 1326 postgres 20 0 197m 112m 110m R 21.4 1.6 0:01.10 postgres: postgres postgres [local] idle 1325 postgres 20 0 197m 112m 110m R 20.8 1.6 0:01.28 postgres: postgres postgres [local] SELECT 1330 postgres 20 0 197m 113m 111m R 20.4 1.6 0:01.21 postgres: postgres postgres [local] idle 1339 postgres 20 0 197m 113m 111m R 20.4 1.6 0:01.10 postgres: postgres postgres [local] idle 1333 postgres 20 0 197m 114m 112m S 20.1 1.6 0:01.08 postgres: postgres postgres [local] SELECT 1336 postgres 20 0 197m 113m 111m S 19.8 1.6 0:01.10 postgres: postgres postgres [local] SELECT 1329 postgres 20 0 197m 113m 111m S 19.1 1.6 0:01.21 postgres: postgres postgres [local] idle 1338 postgres 20 0 197m 114m 112m R 19.1 1.6 0:01.28 postgres: postgres postgres [local] SELECT 1334 postgres 20 0 197m 114m 112m R 18.8 1.6 0:01.00 postgres: postgres postgres [local] idle 1214 root 20 0 14900 1348 944 R 0.3 0.0 0:00.41 top -c
Greg Smith <greg <at> 2ndquadrant.com> writes: > Could you try this again with "top -c", which will label these > postmaster processes usefully, and include the pgbench client itself in > what you post? It's hard to sort out what's going on in these > situations without that style of breakdown. As a further experiment, I ran 8 pgbench processes in parallel. The result is about the same. top - 18:34:15 up 17 min, 2 users, load average: 0.39, 0.40, 0.36 Tasks: 217 total, 8 running, 209 sleeping, 0 stopped, 0 zombie Cpu(s): 22.2%us, 8.9%sy, 0.0%ni, 68.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st Mem: 7358492k total, 1611148k used, 5747344k free, 11416k buffers Swap: 0k total, 0k used, 0k free, 1248408k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1506 postgres 20 0 197m 134m 132m S 29.4 1.9 0:09.27 postgres: postgres postgres [local] idle 1524 postgres 20 0 197m 134m 132m R 29.4 1.9 0:05.13 postgres: postgres postgres [local] idle 1509 postgres 20 0 197m 134m 132m R 27.1 1.9 0:08.58 postgres: postgres postgres [local] SELECT 1521 postgres 20 0 197m 134m 132m R 26.4 1.9 0:05.77 postgres: postgres postgres [local] SELECT 1512 postgres 20 0 197m 134m 132m S 26.1 1.9 0:07.62 postgres: postgres postgres [local] idle 1520 postgres 20 0 197m 134m 132m R 25.8 1.9 0:05.31 postgres: postgres postgres [local] idle 1515 postgres 20 0 197m 134m 132m S 23.8 1.9 0:06.94 postgres: postgres postgres [local] SELECT 1527 postgres 20 0 197m 134m 132m S 21.8 1.9 0:04.46 postgres: postgres postgres [local] SELECT 1517 postgres 20 0 49808 2012 1544 R 5.3 0.0 0:01.02 pgbench -S -c 1 -T 30 1507 postgres 20 0 49808 2012 1544 R 4.6 0.0 0:01.70 pgbench -S -c 1 -T 30 1510 postgres 20 0 49808 2008 1544 S 4.3 0.0 0:01.32 pgbench -S -c 1 -T 30 1525 postgres 20 0 49808 2012 1544 S 4.3 0.0 0:00.79 pgbench -S -c 1 -T 30 1516 postgres 20 0 49808 2016 1544 S 4.0 0.0 0:01.00 pgbench -S -c 1 -T 30 1504 postgres 20 0 49808 2012 1544 R 3.3 0.0 0:01.81 pgbench -S -c 1 -T 30 1513 postgres 20 0 49808 2016 1544 S 3.0 0.0 0:01.07 pgbench -S -c 1 -T 30 1522 postgres 20 0 49808 2012 1544 S 3.0 0.0 0:00.86 pgbench -S -c 1 -T 30 1209 postgres 20 0 63148 1476 476 S 0.3 0.0 0:00.11 postgres: stats collector process
On Wed, Jan 27, 2010 at 6:37 PM, Mike Bresnahan <mike.bresnahan@bestbuy.com> wrote:
Greg Smith <greg <at> 2ndquadrant.com> writes:As a further experiment, I ran 8 pgbench processes in parallel. The result is
> Could you try this again with "top -c", which will label these
> postmaster processes usefully, and include the pgbench client itself in
> what you post? It's hard to sort out what's going on in these
> situations without that style of breakdown.
about the same.
Let's start from the beginning. Have you tuned your postgresql.conf file? What do you have shared_buffers set to? That would have the biggest effect on a test like this.
top - 18:34:15 up 17 min, 2 users, load average: 0.39, 0.40, 0.36
Tasks: 217 total, 8 running, 209 sleeping, 0 stopped, 0 zombie
Cpu(s): 22.2%us, 8.9%sy, 0.0%ni, 68.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
Mem: 7358492k total, 1611148k used, 5747344k free, 11416k buffers
Swap: 0k total, 0k used, 0k free, 1248408k cached1506 postgres 20 0 197m 134m 132m S 29.4 1.9 0:09.27 postgres: postgres
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
postgres [local] idle
1524 postgres 20 0 197m 134m 132m R 29.4 1.9 0:05.13 postgres: postgres
postgres [local] idle
1509 postgres 20 0 197m 134m 132m R 27.1 1.9 0:08.58 postgres: postgres
postgres [local] SELECT
1521 postgres 20 0 197m 134m 132m R 26.4 1.9 0:05.77 postgres: postgres
postgres [local] SELECT
1512 postgres 20 0 197m 134m 132m S 26.1 1.9 0:07.62 postgres: postgres
postgres [local] idle
1520 postgres 20 0 197m 134m 132m R 25.8 1.9 0:05.31 postgres: postgres
postgres [local] idle
1515 postgres 20 0 197m 134m 132m S 23.8 1.9 0:06.94 postgres: postgres
postgres [local] SELECT
1527 postgres 20 0 197m 134m 132m S 21.8 1.9 0:04.46 postgres: postgres
postgres [local] SELECT
1517 postgres 20 0 49808 2012 1544 R 5.3 0.0 0:01.02 pgbench -S -c 1 -T
30
1507 postgres 20 0 49808 2012 1544 R 4.6 0.0 0:01.70 pgbench -S -c 1 -T
30
1510 postgres 20 0 49808 2008 1544 S 4.3 0.0 0:01.32 pgbench -S -c 1 -T
30
1525 postgres 20 0 49808 2012 1544 S 4.3 0.0 0:00.79 pgbench -S -c 1 -T
30
1516 postgres 20 0 49808 2016 1544 S 4.0 0.0 0:01.00 pgbench -S -c 1 -T
30
1504 postgres 20 0 49808 2012 1544 R 3.3 0.0 0:01.81 pgbench -S -c 1 -T
30
1513 postgres 20 0 49808 2016 1544 S 3.0 0.0 0:01.07 pgbench -S -c 1 -T
30
1522 postgres 20 0 49808 2012 1544 S 3.0 0.0 0:00.86 pgbench -S -c 1 -T
30
1209 postgres 20 0 63148 1476 476 S 0.3 0.0 0:00.11 postgres: stats
collector process
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
--
Jim Mlodgenski
EnterpriseDB (http://www.enterprisedb.com)
Jim Mlodgenski <jimmy76 <at> gmail.com> writes: > Let's start from the beginning. Have you tuned your postgresql.conf file? What do you have shared_buffers set to? That would have the biggest effect on a test like this. shared_buffers = 128MB maintenance_work_mem = 256MB checkpoint_segments = 20
Mike Bresnahan wrote: > I have deployed PostgresSQL 8.4.1 on a Fedora 9 c1.xlarge (8x1 cores) instance > in the Amazon E2 Cloud. When I run pgbench in read-only mode (-S) on a small > database, I am unable to peg the CPUs no matter how many clients I throw at it. > In fact, the CPU utilization never drops below 60% idle. I also tried this on > Fedora 12 (kernel 2.6.31) and got the same basic result. What's going on here? > Am I really only utilizing 40% of the CPUs? Is this to be expected on virtual > (xen) instances? > tps = 19663.841772 (including connections establishing Looks to me like you're running into a general memory bandwidth issue here, possibly one that's made a bit worse by how pgbench works. It's a somewhat funky workload Linux systems aren't always happy with, although one of your tests had the right configuration to sidestep the worst of the problems there. I don't see any evidence that pgbench itself is a likely suspect for the issue, but it does shuffle a lot of things around in memory relative to transaction time when running this small select-only test, and clients can get stuck waiting for it when that happens. To put your results in perspective, I would expect to get around 25K TPS running the pgbench setup/test you're doing on a recent 4-core/single processor system, and around 50K TPS is normal for an 8-core server doing this type of test. And those numbers are extremely sensitive to the speed of the underlying RAM even with the CPU staying the same. I would characterize your results as "getting about 1/2 of the CPU+memory performance of an install on a dedicated 8-core system". That's not horrible, as long as you have reasonable expectations here, which is really the case for any virtualized install I think. I'd actually like to launch a more thorough investigation into this particular area, exactly how the PostgreSQL bottlenecks shift around on EC2 compared to similar dedicated hardware, if I found a sponsor for it one day. A bit too much work to do it right just for fun. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Greg Smith <greg <at> 2ndquadrant.com> writes: > Looks to me like you're running into a general memory bandwidth issue > here, possibly one that's made a bit worse by how pgbench works. It's a > somewhat funky workload Linux systems aren't always happy with, although > one of your tests had the right configuration to sidestep the worst of > the problems there. I don't see any evidence that pgbench itself is a > likely suspect for the issue, but it does shuffle a lot of things around > in memory relative to transaction time when running this small > select-only test, and clients can get stuck waiting for it when that > happens. > > To put your results in perspective, I would expect to get around 25K TPS > running the pgbench setup/test you're doing on a recent 4-core/single > processor system, and around 50K TPS is normal for an 8-core server > doing this type of test. And those numbers are extremely sensitive to > the speed of the underlying RAM even with the CPU staying the same. > > I would characterize your results as "getting about 1/2 of the > CPU+memory performance of an install on a dedicated 8-core system". > That's not horrible, as long as you have reasonable expectations here, > which is really the case for any virtualized install I think. I'd > actually like to launch a more thorough investigation into this > particular area, exactly how the PostgreSQL bottlenecks shift around on > EC2 compared to similar dedicated hardware, if I found a sponsor for it > one day. A bit too much work to do it right just for fun. I can understand that I will not get as much performance out of a EC2 instance as a dedicated server, but I don't understand why top(1) is showing 50% CPU utilization. If it were a memory speed problem wouldn't top(1) report 100% CPU utilization? Does the kernel really do a context shift when waiting for response from RAM? That would surprise me, because to do a context shift it might need to read from RAM, which would then also block. I still worry it is a lock contention or scheduling problem, but I am not sure how to diagnose it. I've seen some references to using dtrace to analyze PostgreSQL locks, but it looks like it might take a lot of ramp up time for me to learn how to use dtrace. Note that I can peg the CPU by running 8 infinite loops inside or outside the database. I have only seen the utilization problem when running queries (with pgbench and my application) against PostgreSQL. In any case, assuming this is a EC2 memory speed thing, it is going to be difficult to diagnose application bottlenecks when I cannot rely on top(1) reporting meaningful CPU stats. Thank you for your help.
I have a problem with fetching from cursors sometimes taking an extremely long time to run. I am attempting to use the statement_timeout parameter to limit the runtime on these. PostgreSQL 8.2.4 Linux 2.6.22.14-72.fc6 #1 SMP Wed Nov 21 13:44:07 EST 2007 i686 i686 i386 GNU/Linux begin; set search_path = testdb; declare cur_rep cursor for select * from accounts, individual; set statement_timeout = 1000; fetch forward 1000000 from cur_rep; The open join, 1000ms, and 1000000 count are all intentional. Normally those values would be 300000 and 10000. The accounts and individual tables have around 100 fields and 500k records each. Nested Loop (cost=21992.28..8137785497.71 rows=347496704100 width=8) -> Seq Scan on accounts (cost=0.00..30447.44 rows=623844 width=8) -> Materialize (cost=21992.28..29466.53 rows=557025 width=0) -> Seq Scan on individual (cost=0.00..19531.25 rows=557025 width=0) I tried moving the SET statment before the cursor delcaration and outside the transaction with the same results. I thought possibly it was getting bogged down in I/O but the timeout seems to work fine if not using a cursor. What am I missing here? Thanks, Joe _____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please:(i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii)notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archivingand review by persons other than the intended recipient. Thank you. _____________
On Thu, 2010-01-28 at 22:45 +0000, Mike Bresnahan wrote: > I can understand that I will not get as much performance out of a EC2 instance > as a dedicated server, but I don't understand why top(1) is showing 50% CPU > utilization. One possible cause is lock contention, but I don't know if that explains your problem. Perhaps there's something about the handling of shared memory or semaphores on EC2 that makes it slow enough that it's causing lock contention. You could try testing on a xen instance and see if you have the same problem. Regards, Jeff Davis
Mike Bresnahan wrote: > > I can understand that I will not get as much performance out of a EC2 instance > as a dedicated server, but I don't understand why top(1) is showing 50% CPU > utilization. If it were a memory speed problem wouldn't top(1) report 100% CPU > utilization? A couple of points: top is not the be-all and end-all of analysis tools. I'm sure you know that, but it bears repeating. More importantly, in a virtualised environment the tools on the inside of the guest don't have a full picture of what's really going on. I've not done any real work with Xen; most of my experience is with zVM and KVM. It's pretty normal on a heavily loaded server to see tools like top (and vmstat, sar, et al) reporting less than 100% use while the box is running flat-out, leaving nothing left for the guest to get. I had this last night doing a load on a guest - 60-70% CPU at peak, with no more available. You *should* see steal and 0% idle time in this case, but I *have* seen zVM Linux guests reporting ample idle time while the zVM level monitoring tools reported the LPAR as a whole running at 90-95% utilisation (which is when an LPAR will usually run out of steam). A secondary effect is that sometimes the scheduling of guests on and off the hypervisor will cause skewing in the timekeeping of the guest; it's not uncommon in our loaded-up zVM environment to see discrepencies of 5-20% between the guest's view of how much CPU time it thinks it's getting and how much time the hypervisor knows it's getting (this is why companies like Velocity make money selling hypervisor-aware tools that auto-correct those stats). > In any case, assuming this is a EC2 memory speed thing, it is going to be > difficult to diagnose application bottlenecks when I cannot rely on top(1) > reporting meaningful CPU stats. It's going to be even harder from inside the guests, since you're getting an incomplete view of the system as a whole. You could try the c2cbench (http://sourceforge.net/projects/c2cbench/) which is designed to benchmark memory cache performance, but it'll still be subject to the caveats I outlined above: it may give you something indicative if you think it's a cache problem, but it may also simply tell you that the virtual CPUs are fine while the real processors are pegged for cache from running a bunch of workloads with high memory pressure. If you were running a newer kernel you could look at perf_counters or something similar to get more detail from what the guest thinks it's doing, but, again, there are going to be inaccuracies.
In an attempt to determine whether top(1) is lying about the CPU utilization, I did an experiment. I fired up a EC2 c1.xlarge instance and ran pgbench and a tight loop in parallel. -bash-4.0$ uname -a Linux domu-12-31-39-00-8d-71.compute-1.internal 2.6.31-302-ec2 #7-Ubuntu SMP Tue Oct 13 19:55:22 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux -bash-4.0$ pgbench -S -T 30 -c 16 -h localhost Password: starting vacuum...end. transaction type: SELECT only scaling factor: 64 query mode: simple number of clients: 16 duration: 30 s number of transactions actually processed: 804719 tps = 26787.949376 (including connections establishing) tps = 26842.193411 (excluding connections establishing) While pgbench was running I ran a tight loop at the bash prompt. -bash-4.0# time for i in {1..10000000}; do true; done real 0m36.660s user 0m33.100s sys 0m2.040s Then I ran each alone. -bash-4.0$ pgbench -S -T 30 -c 16 -h localhost Password: starting vacuum...end. transaction type: SELECT only scaling factor: 64 query mode: simple number of clients: 16 duration: 30 s number of transactions actually processed: 964639 tps = 32143.595223 (including connections establishing) tps = 32208.347194 (excluding connections establishing) -bash-4.0# time for i in {1..10000000}; do true; done real 0m32.811s user 0m31.330s sys 0m1.470s Running the loop caused pgbench to lose about 12.5% (1/8), which is exactly what I would expect on a 8 core machine. So it seems that top(1) is lying.
> top is not the be-all and end-all of analysis tools. I'm sure you > know that, but it bears repeating. > More importantly, in a virtualised environment the tools on the inside > of the guest don't have a full picture of what's really going on. Indeed, you have hit the nail on the head. does anyone know what the ACTUAL hardware ec2 is using is? and, does anyone know how much over-subscribing they do? eg, if you're paying for 8 cores, do you actually have 8 dedicated cores, or will they put several "8 virtual core" domU's on the same physical cores? OOOOH.... I'm reading http://aws.amazon.com/ec2/instance-types/ As I'm interpreting that, an "XL" instance is FOUR /virtual/ cores, allocated the horsepower equivalent of 2 1.0 Ghz core2duo style cores each, or 1.7Ghz P4 style processors. So we've been WAY off base here, the XL is *FOUR*, not EIGHT cores. This XL is nominally equivalent to a dual socket dual core 2Ghz Xeon 3050 "Conroe". Does this better fit the observations?