Re: Excessive context switching on SMP Xeons - Mailing list pgsql-performance
From | Josh Berkus |
---|---|
Subject | Re: Excessive context switching on SMP Xeons |
Date | |
Msg-id | 200410050947.36174.josh@agliodbs.com Whole thread Raw |
In response to | Excessive context switching on SMP Xeons (Bill Montgomery <billm@lulu.com>) |
Responses |
Re: Excessive context switching on SMP Xeons
IBM P-series machines (was: Excessive context switching on SMP Xeons) |
List | pgsql-performance |
Bill, > I realize the excessive-context-switching-on-xeon issue has been > discussed at length in the past, but I wanted to follow up and verify my > conclusion from those discussions: First off, the good news: Gavin Sherry and OSDL may have made some progress on this. We'll be testing as soon as OSDL gets the Scalable Test Platform running again. If you have the CS problem (which I don't think you do, see below) and a test box, I'd be thrilled to have you test it. > On a 2-way or 4-way Xeon box, there is no way to avoid excessive > (30,000-60,000 per second) context switches when using PostgreSQL 7.4.5 > to query a data set small enough to fit into main memory under a > significant load. Hmmm ... some clarification: 1) I don't really consider a CS of 30,000 to 60,000 on Xeon to be excessive. People demonstrating the problem on dual or quad Xeon reported CS levels of 150,000 or more. So you probably don't have this issue at all -- depending on the load, your level could be considered "normal". 2) The problem is not limited to Xeon, Linux, or x86 architecture. It has been demonstrated, for example, on 8-way Solaris machines. It's just worse (and thus more noticable) on Xeon. > I am experiencing said symptom on two different dual-Xeon boxes, both > Dells with ServerWorks chipsets, running the latest RH9 and RHEL3 > kernels, respectively. The databases are 90% read, 10% write, and are > small enough to fit entirely into main memory, between pg shared buffers > and kernel buffers. Ah. Well, you do have the worst possible architecture for PostgreSQL-SMP performance. The ServerWorks chipset is badly flawed (the company is now, I believe, bankrupt from recalled products) and Xeons have several performance issues on databases based on online tests. > We recently invested in an solid-state storage device > (http://www.superssd.com/products/ramsan-320/) to help write > performance. Our entire pg data directory is stored on it. Regrettably > (and in retrospect, unsurprisingly) we found that opening up the I/O > bottleneck does little for write performance when the server is under > load, due to the bottleneck created by excessive context switching. Well, if you're CPU-bound, improved I/O won't help you, no. > Is > the only solution then to move to a different SMP architecture such as > Itanium 2 or Opteron? If so, should we expect to see an additional > benefit from running PostgreSQL on a 64-bit architecture, versus 32-bit, > context switching aside? Your performance will almost certainly be better for a variety of reasons on Opteron/Itanium. However, I'm still not convinced that you have the CS bug. > Alternatively, are there good 32-bit SMP > architectures to consider other than Xeon, given the high cost of > Itanium 2 and Opteron systems? AthalonMP appears to be less suseptible to the CS bug than Xeon, and the effect of the bug is not as severe. However, a quad-Opteron box can be built for less than $6000; what's your standard for "expensive"? If you don't have that much money, then you may be stuck for options. > More generally, how have others scaled "up" their PostgreSQL > environments? We will eventually have to invent some "outward" > scalability within the logic of our application (e.g. do read-only > transactions against a pool of Slony-I subscribers), but in the short > term we still have an urgent need to scale upward. Thoughts? General > wisdom? As long as you're on x86, scaling outward is the way to go. If you want to continue to scale upwards, ask Andrew Sullivan about his experiences running PostgreSQL on big IBM boxes. But if you consider an quad-Opteron server expensive, I don't think that's an option for you. Overall, though, I'm not convinced that you have the CS bug and I think it's more likely that you have a few "bad queries" which are dragging down the whole system. Troubleshoot those and your CPU-bound problems may go away. -- Josh Berkus Aglio Database Solutions San Francisco
pgsql-performance by date: