Re: Excessive context switching on SMP Xeons - Mailing list pgsql-performance

From Bill Montgomery
Subject Re: Excessive context switching on SMP Xeons
Date
Msg-id 41630D50.3020308@lulu.com
Whole thread Raw
In response to Re: Excessive context switching on SMP Xeons  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Excessive context switching on SMP Xeons  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Thanks for the helpful response.

Josh Berkus wrote:

> First off, the good news: Gavin Sherry and OSDL may have made some
> progress
>
>on this.   We'll be testing as soon as OSDL gets the Scalable Test Platform
>running again.   If you have the CS problem (which I don't think you do, see
>below) and a test box, I'd be thrilled to have you test it.
>

I'd be thrilled to test it too, if for no other reason that to determine
whether what I'm experiencing really is the "CS problem".

>1) I don't really consider a CS of 30,000 to 60,000 on Xeon to be excessive.
>People demonstrating the problem on dual or quad Xeon reported CS levels of
>150,000 or more.    So you probably don't have this issue at all -- depending
>on the load, your level could be considered "normal".
>

Fair enough. I never see nearly this much context switching on my dual
Xeon boxes running dozens (sometimes hundreds) of concurrent apache
processes, but I'll concede this could just be due to the more parallel
nature of a bunch of independent apache workers.

>>I am experiencing said symptom on two different dual-Xeon boxes, both
>>Dells with ServerWorks chipsets, running the latest RH9 and RHEL3
>>kernels, respectively. The databases are 90% read, 10% write, and are
>>small enough to fit entirely into main memory, between pg shared buffers
>>and kernel buffers.
>>
>
>Ah.  Well, you do have the worst possible architecture for PostgreSQL-SMP
>performance.   The ServerWorks chipset is badly flawed (the company is now, I
>believe, bankrupt from recalled products) and Xeons have several performance
>issues on databases based on online tests.
>

Hence my desire for recommendations on alternate architectures ;-)

>AthalonMP appears to be less suseptible to the CS bug than Xeon, and the
>effect of the bug is not as severe.   However, a quad-Opteron box can be
>built for less than $6000; what's your standard for "expensive"?   If you
>don't have that much money, then you may be stuck for options.
>

Being a 24x7x365 shop, and these servers being mission critical, I
require vendors that can offer 24x7 4-hour part replacement, like Dell
or IBM. I haven't seen 4-way 64-bit boxes meeting that requirement for
less than $20,000, and that's for a very minimally configured box. A
suitably configured pair will likely end up costing $50,000 or more. I
would like to avoid an unexpected expense of that size, unless there's
no other good alternative. That said, I'm all ears for a cheaper
alternative that meets my support and performance requirements.

>Overall, though, I'm not convinced that you have the CS bug and I think it's
>more likely that you have a few "bad queries" which are dragging down the
>whole system.    Troubleshoot those and your CPU-bound problems may go away.
>

You may be right, but to compare apples to apples, here's some vmstat
output from a pgbench run:

[billm@xxx billm]$ pgbench -i -s 20 pgbench
<snip>
[billm@xxx billm]$ pgbench -s 20 -t 500 -c 100 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 20
number of clients: 100
number of transactions per client: 500
number of transactions actually processed: 50000/50000
tps = 369.717832 (including connections establishing)
tps = 370.852058 (excluding connections establishing)

and some of the vmstat output...

[billm@poe billm]$ vmstat 1
procs                      memory      swap          io
system         cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
wa id
 0  1      0 863108 220620 1571924    0    0     4    64   34    50  1
0  0 98
 0  1      0 863092 220620 1571932    0    0     0  3144  171  2037  3
3 47 47
 0  1      0 863084 220620 1571956    0    0     0  5840  202  3702  6
3 46 45
 1  1      0 862656 220620 1572420    0    0     0 12948  631 42093 69
22  5  5
11  0      0 862188 220620 1572828    0    0     0 12644  531 41330 70
23  2  5
 9  0      0 862020 220620 1573076    0    0     0  8396  457 28445 43
17 17 22
 9  0      0 861620 220620 1573556    0    0     0 13564  726 44330 72
22  2  5
 8  1      0 861248 220620 1573980    0    0     0 12564  660 43667 65
26  2  7
 3  1      0 860704 220624 1574236    0    0     0 14588  646 41176 62
25  5  8
 0  1      0 860440 220624 1574476    0    0     0 42184  865 31704 44
23 15 18
 8  0      0 860320 220624 1574628    0    0     0 10796  403 19971 31
10 29 29
 0  1      0 860040 220624 1574884    0    0     0 23588  654 36442 49
20 13 17
 0  1      0 859984 220624 1574932    0    0     0  4940  229  3884  5
3 45 46
 0  1      0 859940 220624 1575004    0    0     0 12140  355 13454 20
10 35 35
 0  1      0 859904 220624 1575044    0    0     0  5044  218  6922 11
5 41 43
 1  1      0 859868 220624 1575052    0    0     0  4808  199  2029  3
3 47 48
 0  1      0 859720 220624 1575180    0    0     0 21596  485 18075 28
13 29 30
11  1      0 859372 220624 1575532    0    0     0 24520  609 41409 62
33  2  3

While pgbench does not generate quite as high a number of CS as our app,
it is an apples-to-apples comparison, and rules out the possibility of
poorly written queries in our app. Still, 40k CS/sec seems high to me.
While pgbench is just a synthetic benchmark, and not necessarily the
best benchmark, yada yada, 370 tps seems like pretty poor performance.
I've benchmarked the IO subsystem at 70MB/s of random 8k writes, yet
pgbench typically doesn't use more than 10MB/s of that bandwidth (a
little more at checkpoints).

So I guess the question is this: now that I've opened up the IO
bottleneck that exists on most database servers, am I really truly CPU
bound now, and not just suffering from poorly handled spinlocks on my
Xeon/ServerWorks platform? If so, is the expense of a 64-bit system
worth it, or is the price/performance for PostgreSQL still better on an
alternative 32-bit platform, like AthlonMP?

Best Regards,

Bill Montgomery

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Excessive context switching on SMP Xeons
Next
From: Josh Berkus
Date:
Subject: Re: Excessive context switching on SMP Xeons