Thread: Toooo many context switches (maybe SLES8?)

Toooo many context switches (maybe SLES8?)

From
Dirk Lutzebäck
Date:
Hi,

we have a complex modperl database application using postgresql 7.4.1 on
a new Dual Xeon MP Machine with SLES8 which seems to generate too much
context switches (way more than 100.000) on higher load (meaning system
load > 2). System response times significantly slow down then. We have
tuned parameters for weeks now but could not come up with better
results. It seems that we have had better performance on an older Dual
XEON DP  Machine running on RedHat 7.3.

Here is the config:

database machine on SuSE SLES 8:

   F-S Primergy RX600
   2x XEON MP 2.5GHz
   8GB RAM
   Hardware Raid 1+0 140GB
   Kernel 2.4.21-169-smp

   Postgresql 7.4.1 (self compiled) with
   max_connections = 170
   shared_buffers = 40000
   effective_cache_size = 800000
   sort_mem = 30000
   vacuum_mem = 420000
   max_fsm_relations = 2000
   max_fsm_pages = 200000
   random_page_cost = 4
   checkpoint_segments = 24
   wal_buffers = 32

modperl application machine on RH 7.3:

   F-S Primergy RX200
   2x XEON DP 2.4 GHz
   4 GB RAM
   Kernel 2.4.18-10smp, RedHat 7.3
   Apache 1.3.27 setup:
   MinSpareServers 15
   MaxSpareServers 30
   StartServers 15
   MaxClients 80
   MaxRequestsPerChild 100

vmstat 1 excerpt:

procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
id wa
 1  0   4868 242372 179488 6942316    0    0    12     8   18     9  6
2 92  0
 2  1   4868 242204 179488 6942500    0    0    64   500  701 117921 35
18 48  0
 0  1   4868 242032 179392 6941560    0    0    16   316  412 132295 28
25 47  0
 1  0   4872 242396 179164 6933776    0    0   128   276  474 69708 21
24 56  0
 3  0   4872 242536 179164 6933808    0    0     0   240  412 113643 27
27 46  0
 2  0   4872 242872 179092 6931708    0    0    48  1132  521 127916 24
24 53  0
 0  0   4876 242876 179092 6927512    0    0    48   532  504 117868 32
21 47  0
 0  0   4876 242504 179096 6927560    0    0     0   188  412 127147 34
20 47  0
 1  0   4876 242152 179096 6927856    0    0    96   276  529 117684 28
23 49  0
 2  0   4876 242864 179096 6928384    0    0    88   560  507 135717 38
19 43  0
 1  0   4876 242848 179096 6928520    0    0    64   232  433 151380 32
20 48  0
 4  0   4876 242832 179144 6928916    0    0    16 10380 2913 112583 28
20 52  0
 4  0   4876 242720 179144 6929240    0    0   196     0  329 154821 32
18 50  0
 3  2   4876 243576 179144 6929408    0    0     0   460  451 160287 29
18 52  0
 3  0   4876 243292 179180 6929468    0    0    16   436  614 51894 15
5 80  0
 0  0   4876 243884 179180 6929580    0    0     0   236  619 154168 29
21 49  0
 2  1   4876 243864 179180 6929860    0    0   128   380  493 155903 31
19 50  0
 2  0   4876 244720 179180 6930276    0    0    16  1208  561 129336 27
16 56  0
 2  0   4876 247204 179180 6930300    0    0     0     0  361 146268 33
20 47  0
 3  0   4876 248620 179180 6930372    0    0     0   168  346 155915 32
12 56  0
 2  0   4876 250476 179180 6930436    0    0     0   184  328 163842 35
20 46  0
 0  0   4876 250496 179180 6930652    0    0    48   260  450 144930 31
15 53  0
 1  0   4876 252236 179180 6930732    0    0    16   244  577 167259 35
15 50  0
 0  0   4876 252236 179180 6930780    0    0     0   464  622 165488 31
15 54  0
 1  0   4876 252268 179180 6930812    0    0     0   132  460 153381 34
15 52  0
 2  0   4876 252268 179180 6930964    0    0     0   216  312 141009 31
19 50  0
 1  0   4876 252264 179180 6930980    0    0     0    56  275 153143 33
20 47  0
 2  0   4876 252212 179180 6931212    0    0    96   296  400 133982 32
18 50  0
 1  0   4876 252264 179180 6931332    0    0     0   300  416 136034 32
18 50  0
 1  1   4876 252264 179180 6931332    0    0     0   236  377 143300 34
22 44  0
 4  0   4876 254876 179180 6931372    0    0     0   124  446 118117 34
20 45  0
 1  0   4876 254876 179180 6931492    0    0    16   144  462 140499 38
16 46  0
 2  0   4876 255860 179180 6931572    0    0    16   144  674 126250 33
20 47  0
 1  0   4876 255860 179180 6931788    0    0    48   264  964 115679 36
13 51  0
 3  0   4876 255864 179180 6931804    0    0     0   100  597 127619 36
19 46  0
 5  1   4876 255864 179180 6931924    0    0    72   352  559 151620 34
18 48  0
 2  0   4876 255860 179184 6932100    0    0    96   120  339 137821 34
20 47  0
 0  0   4876 255860 179184 6932156    0    0     8   168  469 125281 36
21 43  0
 2  0   4876 256092 179184 6932444    0    0   112   328  446 137939 34
19 48  0
 2  0   4876 256092 179184 6932484    0    0    16   184  382 141800 35
16 49  0
 3  0   4876 256464 179184 6932716    0    0    16   356  448 134238 30
18 51  0
 5  0   4876 256464 179184 6932892    0    0    96   600  476 142838 34
20 46  0
 1  0   4876 256464 179184 6933012    0    0    16   176  589 138546 35
22 43  0
 2  0   4876 256436 179184 6933096    0    0    60    76  396 93110 42
17 41  0
 1  0   4876 256464 179184 6933484    0    0   212   276  442 83060 45
11 44  0
 5  0   4876 257612 179184 6933604    0    0     0   472  548 94158 39
17 45  0
 0  0   4876 257560 179184 6933708    0    0    96    96  518 116764 38
19 43  0
 1  0   4876 257612 179184 6933796    0    0     0  1768  729 139013 29
19 53  0
 4  0   4876 257612 179184 6934188    0    0   296   108  332 134703 31
21 48  0
 0  1   4876 258584 179184 6934380    0    0     0   492  405 141198 34
18 48  0
 1  0   4876 258584 179184 6934492    0    0     0   176  575 134771 37
16 48  0
 4  1   4876 257796 179184 6935724    0    0  1176   176  438 151240 33
20 48  0
 1  0   4876 261448 179184 6935836    0    0     0   252  489 134348 29
19 51  0
 2  0   4876 261448 179184 6935852    0    0     0   512  639 130875 34
16 49  0
 2  1   4876 261724 179184 6935924    0    0     0    80  238 144970 33
20 47  0





Re: Toooo many context switches (maybe SLES8?)

From
Joe Conway
Date:
Dirk Lutzebäck wrote:
> postgresql 7.4.1

> a new Dual Xeon MP

> too much context switches (way more than 100.000) on higher load (meaning system
> load > 2).

I believe this was fixed in 7.4.2, although I can't seem to find it in
the release notes.

Joe

Re: Toooo many context switches (maybe SLES8?)

From
Dirk Lutzebäck
Date:
Joe, do you know where I should look in the 7.4.2 code to find this out?

Dirk


Joe Conway wrote:

> Dirk Lutzebäck wrote:
>
>> postgresql 7.4.1
>
>> a new Dual Xeon MP
>
>> too much context switches (way more than 100.000) on higher load
>> (meaning system load > 2).
>
>
> I believe this was fixed in 7.4.2, although I can't seem to find it in
> the release notes.
>
> Joe
>
>



Re: Toooo many context switches (maybe SLES8?)

From
Joe Conway
Date:
Dirk Lutzebäck wrote:
> Joe, do you know where I should look in the 7.4.2 code to find this out?

I think I was wrong. I just looked in CVS and found the commit I was
thinking about:

http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/storage/lmgr/s_lock.c.diff?r1=1.22&r2=1.23
http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/include/storage/s_lock.h.diff?r1=1.123&r2=1.124

=========================
Revision 1.23 / (download) - [select for diffs] , Sat Dec 27 20:58:58
2003 UTC (3 months, 2 weeks ago) by tgl
Changes since 1.22: +5 -1 lines
Diff to previous 1.22

Improve spinlock code for recent x86 processors: insert a PAUSE
instruction in the s_lock() wait loop, and use test before test-and-set
in TAS() macro to avoid unnecessary bus traffic.  Patch from Manfred
Spraul, reworked a bit by Tom.
=========================

I thought this had been committed to the 7.4 stable branch as well, but
it appears not.

Joe


Re: Toooo many context switches (maybe SLES8?)

From
Josh Berkus
Date:
Joe,

> I believe this was fixed in 7.4.2, although I can't seem to find it in
> the release notes.

Depends on the cause of the issue.  If it's the same issue that I'm currently
struggling with, it's not fixed.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Toooo many context switches (maybe SLES8?)

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
>> Improve spinlock code for recent x86 processors: insert a PAUSE
>> instruction in the s_lock() wait loop, and use test before test-and-set
>> in TAS() macro to avoid unnecessary bus traffic.  Patch from Manfred
>> Spraul, reworked a bit by Tom.

> I thought this had been committed to the 7.4 stable branch as well, but
> it appears not.

I am currently chasing what seems to be the same issue: massive context
swapping on a dual Xeon system.  I tried back-patching the above-mentioned
patch ... it helps a little but by no means solves the problem ...

            regards, tom lane

Re: Toooo many context switches (maybe SLES8?)

From
Josh Berkus
Date:
Folks,

> I am currently chasing what seems to be the same issue: massive context
> swapping on a dual Xeon system.  I tried back-patching the above-mentioned
> patch ... it helps a little but by no means solves the problem ...

BTW, I'm currently pursuing the possibility that this has something to do with
the ServerWorks chipset on those motherboards.   If anyone knows a high-end
hardware+linux kernel geek I can corner, I'd appreciate it.

Maybe I should contact OSDL ...

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Toooo many context switches (maybe SLES8?)

From
Dave Cramer
Date:
Isn't this a linux kernel issue ?

My understanding is that the scheduler doesn't know that 2 of the CPU's
are actually the same underlying hardware and sometimes two contexts end
up fighting for the same underlying chip?

--dc--

On Thu, 2004-04-15 at 16:37, Josh Berkus wrote:
> Folks,
>
> > I am currently chasing what seems to be the same issue: massive context
> > swapping on a dual Xeon system.  I tried back-patching the above-mentioned
> > patch ... it helps a little but by no means solves the problem ...
>
> BTW, I'm currently pursuing the possibility that this has something to do with
> the ServerWorks chipset on those motherboards.   If anyone knows a high-end
> hardware+linux kernel geek I can corner, I'd appreciate it.
>
> Maybe I should contact OSDL ...
--
Dave Cramer
519 939 0336
ICQ # 14675561


Re: Toooo many context switches (maybe SLES8?)

From
Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck)
Date:
Could this be related to the O(1) scheduler backpatches from 2.6 to 2.4
kernel on newer 2.4er distros (RedHat, SuSE)?


Tom Lane wrote:

>Joe Conway <mail@joeconway.com> writes:
>
>
>>>Improve spinlock code for recent x86 processors: insert a PAUSE
>>>instruction in the s_lock() wait loop, and use test before test-and-set
>>>in TAS() macro to avoid unnecessary bus traffic.  Patch from Manfred
>>>Spraul, reworked a bit by Tom.
>>>
>>>
>
>
>
>>I thought this had been committed to the 7.4 stable branch as well, but
>>it appears not.
>>
>>
>
>I am currently chasing what seems to be the same issue: massive context
>swapping on a dual Xeon system.  I tried back-patching the above-mentioned
>patch ... it helps a little but by no means solves the problem ...
>
>            regards, tom lane
>
>
>



Re: Toooo many context switches (maybe SLES8?)

From
Dave Cramer
Date:
Don't think so, mine is a vanilla kernel from kernel.org

Dave
On Thu, 2004-04-15 at 16:03, Dirk Lutzebaeck wrote:
> Could this be related to the O(1) scheduler backpatches from 2.6 to 2.4
> kernel on newer 2.4er distros (RedHat, SuSE)?
>
>
> Tom Lane wrote:
>
> >Joe Conway <mail@joeconway.com> writes:
> >
> >
> >>>Improve spinlock code for recent x86 processors: insert a PAUSE
> >>>instruction in the s_lock() wait loop, and use test before test-and-set
> >>>in TAS() macro to avoid unnecessary bus traffic.  Patch from Manfred
> >>>Spraul, reworked a bit by Tom.
> >>>
> >>>
> >
> >
> >
> >>I thought this had been committed to the 7.4 stable branch as well, but
> >>it appears not.
> >>
> >>
> >
> >I am currently chasing what seems to be the same issue: massive context
> >swapping on a dual Xeon system.  I tried back-patching the above-mentioned
> >patch ... it helps a little but by no means solves the problem ...
> >
> >            regards, tom lane
> >
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>
>
>
> !DSPAM:408535ce93801252113544!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561