Re: Amazon EC2 CPU Utilization - Mailing list pgsql-general

From Rodger Donaldson
Subject Re: Amazon EC2 CPU Utilization
Date
Msg-id 4B629739.7080303@diaspora.gen.nz
Whole thread Raw
In response to Re: Amazon EC2 CPU Utilization  (Mike Bresnahan <mike.bresnahan@bestbuy.com>)
Responses Re: Amazon EC2 CPU Utilization  (John R Pierce <pierce@hogranch.com>)
List pgsql-general
Mike Bresnahan wrote:
 >
> I can understand that I will not get as much performance out of a EC2 instance
> as a dedicated server, but I don't understand why top(1) is showing 50% CPU
> utilization. If it were a memory speed problem wouldn't top(1) report 100% CPU
> utilization?

A couple of points:

top is not the be-all and end-all of analysis tools.  I'm sure you know
that, but it bears repeating.

More importantly, in a virtualised environment the tools on the inside
of the guest don't have a full picture of what's really going on.  I've
not done any real work with Xen; most of my experience is with zVM and
KVM.

It's pretty normal on a heavily loaded server to see tools like top (and
vmstat, sar, et al) reporting less than 100% use while the box is
running flat-out, leaving nothing left for the guest to get.  I had this
last night doing a load on a guest - 60-70% CPU at peak, with no more
available.  You *should* see steal and 0% idle time in this case, but I
*have* seen zVM Linux guests reporting ample idle time while the zVM
level monitoring tools reported the LPAR as a whole running at 90-95%
utilisation (which is when an LPAR will usually run out of steam).

A secondary effect is that sometimes the scheduling of guests on and off
the hypervisor will cause skewing in the timekeeping of the guest; it's
not uncommon in our loaded-up zVM environment to see discrepencies of
5-20% between the guest's view of how much CPU time it thinks it's
getting and how much time the hypervisor knows it's getting (this is why
companies like Velocity make money selling hypervisor-aware tools that
auto-correct those stats).

> In any case, assuming this is a EC2 memory speed thing, it is going to be
> difficult to diagnose application bottlenecks when I cannot rely on top(1)
> reporting meaningful CPU stats.

It's going to be even harder from inside the guests, since you're
getting an incomplete view of the system as a whole.

You could try the c2cbench (http://sourceforge.net/projects/c2cbench/)
which is designed to benchmark memory cache performance, but it'll still
be subject to the caveats I outlined above: it may give you something
indicative if you think it's a cache problem, but it may also simply
tell you that the virtual CPUs are fine while the real processors are
pegged for cache from running a bunch of workloads with high memory
pressure.


If you were running a newer kernel you could look at perf_counters or
something similar to get more detail from what the guest thinks it's
doing, but, again, there are going to be inaccuracies.

pgsql-general by date:

Previous
From: 沈雷
Date:
Subject: Re: Output float number with hex format
Next
From: Dave Page
Date:
Subject: Re: [pgsql-www] Versions RSS page is missing version(s)