On Fri, 28 Dec 2001, Tom Lane wrote:
> Hmm. And what was your actual *throughput*?
Throughput is down noticeably, but unfortunately I don't have any
controlled instrumentation to demonstrate that. A process which took
approximately 9 minutes to complete with 7.1 now takes approximately 17
minutes to complete, for the same amount of work. Further, I can't really
compare the two versions because the required dump/reload 1) may be
changing something that isn't obvious, and 2) takes too long.
> The backtraces you've shown us all correspond to places where 7.1
> would have busy-waited rather than blocking on a semaphore.
But the behavior seems to me to be more like busy-waiting than blocking on
locks. For example, both of my CPUs are now pegged whenever the database
is busy, and I never used to see tens of thousands of contexts switches
per second. I also never used to spend half of my CPU time in the kernel,
but now I do.
Also the straces show control bouncing back and forth between processes
stuck in semop(). In a trace of 576000 syscalls, 79% were semop().
I'd believe that the problem is in the kernel's scheduler or SysV, but
they didn't seem to think so on linux-kernel.
> Reduction of nominal CPU load is exactly what I'd expect, and is not
> in itself bad. The real question is how many transactions can you
> process per second.
What I meant to say was that the CPU load on my application machine was
much lower, which is to say that machine is just waiting around all the
time for the database machine to do something. The CPUs on the database
machine are pinned all the time.
-jwb