Re: performance config help - Mailing list pgsql-performance
From | Kevin Grittner |
---|---|
Subject | Re: performance config help |
Date | |
Msg-id | 4B4B177B020000250002E194@gw.wicourts.gov Whole thread Raw |
In response to | performance config help (Bob Dusek <redusek@gmail.com>) |
Responses |
Re: performance config help
Re: performance config help |
List | pgsql-performance |
Bob Dusek <redusek@gmail.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> Bob Dusek <redusek@gmail.com> wrote: >> Anyway, my benchmarks tend to show that best throughput occurs at >> about (CPU_count * 2) plus effective_spindle_count. Since you >> seem to be fully cached, effective_spindle_count would be zero, >> so I would expect performance to start to degrade when you have >> more than about 32 sessions active. >> > That's a little disheartening for a single or dual CPU system. Not at all. You only have so many resources to keep busy at any one moment. It is generally more efficient to only context switch between as many processes as can keep those resources relatively busy; otherwise valuable resources are spent switching among the various processes rather than doing useful work. [Regular readers of this list might want to skip ahead while I run through my usual "thought experiment on the topic. ;-) ] Imagine this hypothetical environment -- you have one CPU running requests. There are no other resources to worry about and no latency to the clients. Let's say that the requests take one second each. The client suddenly has 100 requests to run. Assuming context switching is free, you could submit all at once, and 100 seconds later, you get 100 responses, with an average response time of 100 seconds. Let's put a (again free) connection pooler in there. You submit those 100 requests, but they are fed to the database one at a time. You get one response back in one second, the next in two seconds, the last in 100 seconds. No request took any longer, and the average response time was 50.5 seconds -- almost a 50% reduction. Now context switching is not free, and you had tens of thousands of them per second. Besides the hit on CPU availability during each switch, you're reducing the value of the L1 and L2 caches. So in reality, you could expect your "request storm" to perform significantly worse in comparison to the connection pooled configuration. In reality, you have more than one resource to keep busy, so the pool should be sized greater than one; but it's still true that there is some point at which getting a request to the database server delays the response to that request more than queuing it for later execution would. Some database products build in a way to manage this; in PostgreSQL it's on you to do so. >> Your vmstat output suggests that context switches are becoming a >> problem, and I wouldn't be surprised if I heard that the network >> is an issue. You might want to have someone take a look at the >> network side to check. >> > This is all happening on a LAN, and network throughput doesn't > seem to be an issue. It may be a busy network, but I'm not sure > about a problem. Can you elaborate on your suspicion, based on > the vmstat? I haven't used vmstat much. It was simply this: all that CPU idle time while it was swamped with requests suggests that there might be a bottleneck outside the database server. That could be, as another post suggests, the client software. It could also be the network. (It could also be contention on locks within PostgreSQL from the large number of requests, but that's covered by the connection pooling suggestion.) > The problem with connection pooling is that we actually have to > achieve more than 40 per second, which happens to be the sweet > spot with our current config. Well, if you're considering a connection pool which can only submit one request per second, you're looking at the wrong technology. We use a custom connection pool built into our software, so I'm not very familiar with the "drop in" packages out there, but we start the next queued request based on the completion of a request -- there's no polling involved. Between the RAID 0, fsync = off, and full_page_writes = off -- you really had better not be staking anything important on this data. This configuration would make The Flying Wallendas break out in a sweat. It suggests to me that you might want to look into a better RAID controller -- a high quality controller with battery-backup (BBU) cache, configured for write-back, might allow you to change all these to safe settings. If you also switch to a RAID configuration with some redundancy, you'll be much safer.... -Kevin
pgsql-performance by date: