Context-switch storm in 8.1.15 - Mailing list pgsql-admin

From Iñigo Martinez Lasala
Subject Context-switch storm in 8.1.15
Date
Msg-id 1FBF68E78578447EAD1570BC0DDF1041@DEIMOS
Whole thread Raw
Responses Re: Context-switch storm in 8.1.15
List pgsql-admin
Hi everybody.

Recently our company has been granted with a contract for an on-line store
mainteinance.
The website has been developed under J2EE and Postgres 8.1 as database
backend. The system has been working without problem for several month,
but with Christmas access to web portal has raised a lot.
The database suffers of a performance problem on high load. Lot of context
switch happens reaching up to 200.000 cs per second.
This system is a 16GB, 4 CPU intel Xeon MP with HT enabled and a RAID10
iSCSI storage, kernel 2.4.21 (RHAS 3).

Half of CPU power is lost on system time, as you can see.

Vmstat on high load
19  0      0 281852 150316 13732396    0    0    32    80 1071 128209 41
43 16  0
75  0      0 282040 150316 13732396    0    0     0     0  719 148023 40
38 22  0
 3  0      0 284208 150324 13732412    0    0    16   484  728 145371 39
40 21  0
12  0      0 278364 150324 13732508    0    0    80    56  660 157533 35
42 23  1
 6  0      0 284972 150324 13732580    0    0    32   200  685 142014 39
41 20  0
 8  0      0 296424 150324 13732624    0    0    40   136  554 139601 41
39 20  0
85  0      0 265004 150324 13732664    0    0    32    48  642 142437 48
32 20  0
32  0      0 267432 150324 13732680    0    0     0   788 1003 144409 37
42 21  0
13  0      0 270468 150324 13732676    0    0     0    24  724 146663 42
40 19

Vmstat after 20 seconds after stopping portal:
 8  0      0 962388 206744 13771548    0    0     0     0  131 199784 11
38 51  0
 3  0      0 970212 206744 13771548    0    0     0  1856  305 203639 12
40 48  0
10  0      0 975036 206744 13771588    0    0     0   128  212 201899 11
36 52  0
 3  0      0 970272 206744 13771652    0    0    16   232  685 202672 14
41 44  0
 6  0      0 1008320 206744 13771656    0    0     0    40  198 196298 14
46 39  0
 3  0      0 1034836 206744 13771656    0    0     0     0  147 202731 12
39 50  0
 3  0      0 1037764 206752 13771656    0    0     0   952  202 202933 11
39 50  0
 5  0      0 1078132 206752 13771656    0    0     0     0  154 203408 18
35 47  0
 6  0      0 1110572 206752 13771656    0    0     0     0  153 196864 18
41 41  0
 4  0      0 1105440 206752 13771824    0    0    16   592  461 207538 12
37 51  1


I've read about this problem with version prior 8.2. However at this
moment is not possible to migrate to 8.2 due to the amount of stored
procedures and  we don't have time enough to test ALL procedures in order
to migrate to 8.2 (or 8.3).
However we have performed light tests with 8.2 on high load an this
problem has been solved or mitigated.

Now the question. Is there any backport patch for 8.1 that solves
context-switch storm?

The patch I'm looking for is this or a similar one(this one is for 8.2):
---
A Itagaki Takahiro/Tom Lane patch which arranges for GetSnapshotData
  to copy live-subtransaction XIDs from the PGPROC array into
  snapshots, and use this information to avoid visits to pg_subtrans
  in HeapTupleSatisfiesSnapshot.  This appears to solve the
  pg_subtrans-related context swap storm problem that's been reported
  by several people for 8.1.  While at it, modify GetSnapshotData to
  not take an exclusive lock on ProcArrayLock, as closer analysis
  shows that shared lock is always sufficient.
---

Thanks in advance.


pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: ssl database connection problems...
Next
From: "Bhujbal, Santosh"
Date:
Subject: postgres block_size problem