Thread: High Context-Switches on Linux 8.1.4 Server
Postgres 8.1.4 Slony 1.1.5 Linux manny 2.6.12-10-k7-smp #1 SMP Fri Apr 28 14:17:26 UTC 2006 i686 GNU/Linux We're seeing an average of 30,000 context-switches a sec. This problem was much worse w/8.0 and got bearable with 8.1 but slowly resurfaced. Any ideas? procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 8 2 0 392184 40248 3040628 0 0 10012 2300 3371 43436 60 25 11 4 10 2 0 334772 40256 3043340 0 0 2672 1892 3252 10073 84 14 1 1 9 2 0 338492 40280 3051272 0 0 7960 1612 3548 22013 77 16 4 3 11 2 0 317040 40304 3064576 0 0 13172 1616 3870 42729 61 21 11 7 7 0 0 291496 40320 3078704 0 0 14192 504 3139 52200 58 24 12 7 The machine has 4 gigs of RAM, shared_buffers = 32768, max_connections = 400, and currently does around 300-500 queries a second. I can provide more info if needed. -- Sumbry][
> We're seeing an average of 30,000 context-switches a sec. This problem > was much worse w/8.0 and got bearable with 8.1 but slowly resurfaced. Is this from LWLock or spinlock contention? strace'ing a few backends could tell the difference: look to see how many select(0,...) you see compared to semop()s. Also, how many of these compared to real work (such as read/write calls)? Do you have any long-running transactions, and if so does shutting them down help? There's been some discussion about thrashing of the pg_subtrans buffers being a problem, and that's mainly a function of the age of the oldest open transaction. regards, tom lane
Tom Lane wrote: >> We're seeing an average of 30,000 context-switches a sec. This problem >> was much worse w/8.0 and got bearable with 8.1 but slowly resurfaced. > > Is this from LWLock or spinlock contention? strace'ing a few backends > could tell the difference: look to see how many select(0,...) you see > compared to semop()s. Also, how many of these compared to real work > (such as read/write calls)? Over a 20 second interval, I've got about 85 select()s and 6,230 semop()s. 2604 read()s vs 16 write()s. > Do you have any long-running transactions, and if so does shutting > them down help? There's been some discussion about thrashing of the > pg_subtrans buffers being a problem, and that's mainly a function of > the age of the oldest open transaction. Not long-running. We do have a badly behaving legacy app that is leaving some backends "idle in transaction" They're gone pretty quickly so I can't kill them fast enough, but running a pg_stat_activity will always show at least a handful. Could this be contributing? Based on the number of semop's we're getting it does look like shared_memory may be getting thrased - any suggestions? We did try lowering shared_memory usage in half the previous day, but that did little to help (it didn't make performance any worse and we still saw the high context-switches, but it didn't make it any better either). -- Sumbry][
>> Is this from LWLock or spinlock contention? > Over a 20 second interval, I've got about 85 select()s and 6,230 > semop()s. 2604 read()s vs 16 write()s. OK, so mostly LWLocks then. >> Do you have any long-running transactions, > Not long-running. We do have a badly behaving legacy app that is > leaving some backends "idle in transaction" They're gone pretty quickly > so I can't kill them fast enough, but running a pg_stat_activity will > always show at least a handful. Could this be contributing? Sorry, I was unclear: it's the age of your oldest transaction that counts (measured by how many xacts started since it), not how many cycles it's consumed or not. With the 8.1 code it's possible for performance to degrade pretty badly once the age of your oldest transaction exceeds 16K transactions. You were not specific enough about the behavior of this legacy app to let me guess where you are on that scale ... > Based on the number of semop's we're getting it does look like > shared_memory may be getting thrased - any suggestions? We did try > lowering shared_memory usage in half the previous day, Unlikely to help --- if it is the pg_subtrans problem, the number of buffers involved is set by a compile-time constant. regards, tom lane
Tom Lane wrote: > Sorry, I was unclear: it's the age of your oldest transaction that > counts (measured by how many xacts started since it), not how many > cycles it's consumed or not. > With the 8.1 code it's possible for performance to degrade pretty badly > once the age of your oldest transaction exceeds 16K transactions. You > were not specific enough about the behavior of this legacy app to let > me guess where you are on that scale ... Understood. This legacy apps wraps every single transaction (even read only ones) inside of BEGIN; END; blocks. We do about 90+ percent reads to our database, and at 300+ queries a second that could quickly add up. Does this sound like we should investigate this area more? >> Based on the number of semop's we're getting it does look like >> shared_memory may be getting thrased - any suggestions? We did try >> lowering shared_memory usage in half the previous day, > > Unlikely to help --- if it is the pg_subtrans problem, the number of > buffers involved is set by a compile-time constant. Interesting. One other thing to note, this application in particular accounts for only 4 percent of total queries and if we disable the application the database runs like a champ. The only other huge variable I can think of is this app's gratuitous use of cursors. I haven't read too much about Postgres performance especially when dealing with cursors, but could this be a variable? We are considering modifying the app and removing all use of cursors and wonder if we're wasting our time or not. Thanks for the help. -- Sumbry][