Thread: High Context-Switches on Linux 8.1.4 Server

High Context-Switches on Linux 8.1.4 Server

From

"Donald C. Sumbry ]["

Date:

07 August 2006, 04:59:58

Postgres 8.1.4
Slony 1.1.5
Linux manny 2.6.12-10-k7-smp #1 SMP Fri Apr 28 14:17:26 UTC 2006 i686
GNU/Linux

We're seeing an average of 30,000 context-switches a sec.  This problem
was much worse w/8.0 and got bearable with 8.1 but slowly resurfaced.
Any ideas?

procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
id wa
 8  2      0 392184  40248 3040628    0    0 10012  2300 3371 43436 60
25 11  4
10  2      0 334772  40256 3043340    0    0  2672  1892 3252 10073 84
14  1  1
 9  2      0 338492  40280 3051272    0    0  7960  1612 3548 22013 77
16  4  3
11  2      0 317040  40304 3064576    0    0 13172  1616 3870 42729 61
21 11  7
 7  0      0 291496  40320 3078704    0    0 14192   504 3139 52200 58
24 12  7

The machine has 4 gigs of RAM, shared_buffers = 32768, max_connections =
400, and currently does around 300-500 queries a second.  I can provide
more info if needed.

--
Sumbry][

Re: High Context-Switches on Linux 8.1.4 Server

From

Tom Lane

Date:

07 August 2006, 09:52:29

> We're seeing an average of 30,000 context-switches a sec.  This problem
> was much worse w/8.0 and got bearable with 8.1 but slowly resurfaced.

Is this from LWLock or spinlock contention?  strace'ing a few backends
could tell the difference: look to see how many select(0,...) you see
compared to semop()s.  Also, how many of these compared to real work
(such as read/write calls)?

Do you have any long-running transactions, and if so does shutting
them down help?  There's been some discussion about thrashing of the
pg_subtrans buffers being a problem, and that's mainly a function of
the age of the oldest open transaction.

            regards, tom lane

Re: High Context-Switches on Linux 8.1.4 Server

From

"Donald C. Sumbry ]["

Date:

07 August 2006, 12:47:25

Tom Lane wrote:
>> We're seeing an average of 30,000 context-switches a sec.  This problem
>> was much worse w/8.0 and got bearable with 8.1 but slowly resurfaced.
>
> Is this from LWLock or spinlock contention?  strace'ing a few backends
> could tell the difference: look to see how many select(0,...) you see
> compared to semop()s.  Also, how many of these compared to real work
> (such as read/write calls)?

Over a 20 second interval, I've got about 85 select()s and 6,230
semop()s. 2604 read()s vs 16 write()s.

> Do you have any long-running transactions, and if so does shutting
> them down help?  There's been some discussion about thrashing of the
> pg_subtrans buffers being a problem, and that's mainly a function of
> the age of the oldest open transaction.

Not long-running.  We do have a badly behaving legacy app that is
leaving some backends "idle in transaction"  They're gone pretty quickly
so I can't kill them fast enough, but running a pg_stat_activity  will
always show at least a handful.  Could this be contributing?

Based on the number of semop's we're getting it does look like
shared_memory may be getting thrased - any suggestions?  We did try
lowering shared_memory usage in half the previous day, but that did
little to help (it didn't make performance any worse and we still saw
the high context-switches, but it didn't make it any better either).

--
Sumbry][

Re: High Context-Switches on Linux 8.1.4 Server

From

Tom Lane

Date:

07 August 2006, 15:34:47

>> Is this from LWLock or spinlock contention?

> Over a 20 second interval, I've got about 85 select()s and 6,230
> semop()s. 2604 read()s vs 16 write()s.

OK, so mostly LWLocks then.

>> Do you have any long-running transactions,

> Not long-running.  We do have a badly behaving legacy app that is
> leaving some backends "idle in transaction"  They're gone pretty quickly
> so I can't kill them fast enough, but running a pg_stat_activity  will
> always show at least a handful.  Could this be contributing?

Sorry, I was unclear: it's the age of your oldest transaction that
counts (measured by how many xacts started since it), not how many
cycles it's consumed or not.

With the 8.1 code it's possible for performance to degrade pretty badly
once the age of your oldest transaction exceeds 16K transactions.  You
were not specific enough about the behavior of this legacy app to let
me guess where you are on that scale ...

> Based on the number of semop's we're getting it does look like
> shared_memory may be getting thrased - any suggestions?  We did try
> lowering shared_memory usage in half the previous day,

Unlikely to help --- if it is the pg_subtrans problem, the number of
buffers involved is set by a compile-time constant.

            regards, tom lane

Re: High Context-Switches on Linux 8.1.4 Server

From

"Donald C. Sumbry ]["

Date:

07 August 2006, 16:24:01

Tom Lane wrote:
> Sorry, I was unclear: it's the age of your oldest transaction that
> counts (measured by how many xacts started since it), not how many
> cycles it's consumed or not.

> With the 8.1 code it's possible for performance to degrade pretty badly
> once the age of your oldest transaction exceeds 16K transactions.  You
> were not specific enough about the behavior of this legacy app to let
> me guess where you are on that scale ...

Understood.  This legacy apps wraps every single transaction (even read
only ones) inside of BEGIN; END; blocks.  We do about 90+ percent reads
to our database, and at 300+ queries a second that could quickly add up.

Does this sound like we should investigate this area more?

>> Based on the number of semop's we're getting it does look like
>> shared_memory may be getting thrased - any suggestions?  We did try
>> lowering shared_memory usage in half the previous day,
>
> Unlikely to help --- if it is the pg_subtrans problem, the number of
> buffers involved is set by a compile-time constant.

Interesting.  One other thing to note, this application in particular
accounts for only 4 percent of total queries and if we disable the
application the database runs like a champ.  The only other huge
variable I can think of is this app's gratuitous use of cursors.

I haven't read too much about Postgres performance especially when
dealing with cursors, but could this be a variable?  We are considering
modifying the app and removing all use of cursors and wonder if we're
wasting our time or not.

Thanks for the help.

--
Sumbry][