On 2015-09-01 14:11:21 -0400, Robert Haas wrote:
> On Tue, Sep 1, 2015 at 2:04 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> > Memory bandwidth, for example. It's quite difficult to spot, because the
> > intuition is that memory is fast, but thanks to improvements in storage (and
> > stagnation in RAM bandwidth), this is becoming a significant issue.
>
> I'd appreciate any tips on how to spot problems of this type. But
> it's my impression that perf, top, vmstat, and other Linux performance
> tools will count time spent waiting for memory as CPU time, not idle
> time. If that's correct, that wouldn't explain workloads where CPU
> utilization doesn't reach 100%. Rather, it would show up as CPU time
> hitting 100% while tps remains low.
Yea.
-e bus-cycles is a good start to measure where bus traffic isrelevant. Depending on the individual cpu other events can
behelpful.
> > Process-management overhead is another thing we tend to ignore, but once you
> > get to many processes all willing to work at the same time, you need to
> > account for that.
>
> Any tips on spotting problems in that area?
Not perfect, but -e context-switches (general context switches) and -e
syscalls:sys_enter_semop (for postgres enforced context switches) is
rather useful when combined with --call-graph dwarf ('fp' sometimes
doesn't see through libc which is most of the time not compiled with
-fno-omit-frame-pointer).
Greetings,
Andres Freund