Thread: pgbench could not send data to client: Broken pipe
Howdy, I'm running pgbench with a fairly large # of clients and getting this error in my PG log file. Here's the command: ./pgbench -c 1100 testdb -l I get: LOG: could not send data to client: Broken pipe (I had to modify the pgbench.c file to make it go that high, i changed: MAXCLIENTS = 2048 I thought maybe i was running out of resources so i checked my ulimits: ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 2048 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) unlimited real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited This is Pg 8.3.10. Redhat 64bit, the system has 48 cores, 256G of ram.. Any idea what would be causing the error? thanks Dave
David Kerr <dmk@mr-paradox.net> writes: > I'm running pgbench with a fairly large # of clients and getting this error in my PG log file. > LOG: could not send data to client: Broken pipe That error suggests that pgbench dropped the connection. You might be running into some bug or internal limitation in pgbench. Did you check to make sure pgbench isn't crashing? > (I had to modify the pgbench.c file to make it go that high, i changed: > MAXCLIENTS = 2048 Hm, you can't just arbitrarily change that number; it has to be less than whatever number of open files select(2) supports. A look on my Fedora 13 box suggests that 1024 is the limit there; I'm not sure which Red Hat variant you're using but I suspect it might have the same limit. As of the 9.0 release, it's possible to run pgbench in a "multi thread" mode, and if you forced the subprocess rather than thread model it looks like the select() limit would be per subprocess rather than global. So I think you could get above the FD_SETSIZE limit with a bit of hacking if you were using 9.0's pgbench. No chance with 8.3 though. (This suggests BTW that we might want to expose the thread-versus-fork choice in a slightly more user-controllable fashion, rather than assuming that threads are always better.) regards, tom lane
Tom Lane wrote: > As of the 9.0 release, it's possible to run pgbench in a "multi thread" > mode, and if you forced the subprocess rather than thread model it looks > like the select() limit would be per subprocess rather than global. > So I think you could get above the FD_SETSIZE limit with a bit of > hacking if you were using 9.0's pgbench. No chance with 8.3 though. > I believe David can do this easily enough by compiling a 9.0 source code tree with the "--disable-thread-safety" option. That's the simplest way to force the pgbench client to build itself using the multi-process model, rather than the multi-threaded one. It's kind of futile to run pgbench simulating much more than a hundred or two clients before 9.0 anyway. Without multiple workers, you're likely to just run into the process switching limitations within pgbench itself rather than testing server performance usefully. I've watched the older pgbench program fail to come close to saturating an 8 core server without running into its own limitations first. You might run a 9.0 pgbench client against an 8.3 server though, if you did the whole thing starting from pgbench database initialization over again--the built-in tables like "accounts" changed to "pgbench_accounts" in 8.4. That might work, can't recall any changes that would prevent it; but as I haven't tested it yet I can't say for sure. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> writes: > Tom Lane wrote: >> So I think you could get above the FD_SETSIZE limit with a bit of >> hacking if you were using 9.0's pgbench. No chance with 8.3 though. > I believe David can do this easily enough by compiling a 9.0 source code > tree with the "--disable-thread-safety" option. It would take a bit more work than that, because the code still tries to limit the client count based on FD_SETSIZE. He'd need to hack it so that in non-thread mode, the limit is FD_SETSIZE per subprocess. I was suggesting that an official patch to that effect would be a good thing. > It's kind of futile to run pgbench simulating much more than a hundred > or two clients before 9.0 anyway. Yeah ... regards, tom lane
On Wed, Sep 08, 2010 at 03:27:34PM -0400, Greg Smith wrote: - Tom Lane wrote: - >As of the 9.0 release, it's possible to run pgbench in a "multi thread" - >mode, and if you forced the subprocess rather than thread model it looks - >like the select() limit would be per subprocess rather than global. - >So I think you could get above the FD_SETSIZE limit with a bit of - >hacking if you were using 9.0's pgbench. No chance with 8.3 though. - > - - I believe David can do this easily enough by compiling a 9.0 source code - tree with the "--disable-thread-safety" option. That's the simplest way - to force the pgbench client to build itself using the multi-process - model, rather than the multi-threaded one. - - It's kind of futile to run pgbench simulating much more than a hundred - or two clients before 9.0 anyway. Without multiple workers, you're - likely to just run into the process switching limitations within pgbench - itself rather than testing server performance usefully. I've watched - the older pgbench program fail to come close to saturating an 8 core - server without running into its own limitations first. - - You might run a 9.0 pgbench client against an 8.3 server though, if you - did the whole thing starting from pgbench database initialization over - again--the built-in tables like "accounts" changed to "pgbench_accounts" - in 8.4. That might work, can't recall any changes that would prevent - it; but as I haven't tested it yet I can't say for sure. Thanks, I compiled the 9.0 RC1 branch with the --disable-thread-safety option and ran PG bench on my 8.3 DB it seemed to work fine, However, MAXCLIENTS is still 1024, if i hack it to switch it up to 2048 i get this: starting vacuum...end. select failed: Bad file descriptor <--------------- transaction type: TPC-B (sort of) scaling factor: 1 query mode: simple number of clients: 1900 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 3723/19000 tps = 52.007642 (including connections establishing) tps = 82.579077 (excluding connections establishing) I'm not sure what Tom is referring to with the select(2) limitation, maybe I'm running into it (where do i find that? /usr/include/sys/select.h? ) should i be running pgbench differently? I tried increasing the # of threads but that didn't increase the number of backend's and i'm trying to simulate 2000 physical backend processes. thanks Dave
On Wed, Sep 08, 2010 at 03:44:36PM -0400, Tom Lane wrote: - Greg Smith <greg@2ndquadrant.com> writes: - > Tom Lane wrote: - >> So I think you could get above the FD_SETSIZE limit with a bit of - >> hacking if you were using 9.0's pgbench. No chance with 8.3 though. - - > I believe David can do this easily enough by compiling a 9.0 source code - > tree with the "--disable-thread-safety" option. - - It would take a bit more work than that, because the code still tries to - limit the client count based on FD_SETSIZE. He'd need to hack it so - that in non-thread mode, the limit is FD_SETSIZE per subprocess. I was - suggesting that an official patch to that effect would be a good thing. Yeah, that might be beyond me =) Dave
David Kerr <dmk@mr-paradox.net> writes: > should i be running pgbench differently? I tried increasing the # of threads > but that didn't increase the number of backend's and i'm trying to simulate > 2000 physical backend processes. The odds are good that if you did get up that high, what you'd find is pgbench itself being the bottleneck, not the server. What I'd suggest is running several copies of pgbench *on different machines*, all beating on the one database server. Collating the results will be a bit more of a PITA than if there were only one pgbench instance, but it'd be a truer picture of real-world behavior. It's probably also worth pointing out that 2000 backend processes is likely to be a loser anyhow. If you're just doing this for academic purposes, fine, but if you're trying to set up a real system for 2000 clients you almost certainly want to stick some connection pooling in there. regards, tom lane
On Wed, Sep 08, 2010 at 04:35:28PM -0400, Tom Lane wrote: - David Kerr <dmk@mr-paradox.net> writes: - > should i be running pgbench differently? I tried increasing the # of threads - > but that didn't increase the number of backend's and i'm trying to simulate - > 2000 physical backend processes. - - The odds are good that if you did get up that high, what you'd find is - pgbench itself being the bottleneck, not the server. What I'd suggest - is running several copies of pgbench *on different machines*, all - beating on the one database server. Collating the results will be a bit - more of a PITA than if there were only one pgbench instance, but it'd - be a truer picture of real-world behavior. - - It's probably also worth pointing out that 2000 backend processes is - likely to be a loser anyhow. If you're just doing this for academic - purposes, fine, but if you're trying to set up a real system for 2000 - clients you almost certainly want to stick some connection pooling in - there. - - regards, tom lane - ah that's a good idea, i'll have to give that a shot. Actually, this is real.. that's 2000 connections - connection pooled out to 20k or so. (although i'm pushing for closer to 1000 connections). I know that's not the ideal way to go, but it's what i've got to work with. It IS a huge box though... Thanks Dave
David Kerr <dmk@mr-paradox.net> wrote: > Actually, this is real.. that's 2000 connections - connection > pooled out to 20k or so. (although i'm pushing for closer to 1000 > connections). > > I know that's not the ideal way to go, but it's what i've got to > work with. > > It IS a huge box though... FWIW, my benchmarks (and I've had a couple people tell me this is consistent with what they've seen) show best throughput and best response time when the connection pool is sized such that the number of active PostgreSQL connections is limited to about twice the number of CPU cores plus the number of effective spindles. Either you've got one heck of a machine, or your "sweet spot" for the connection pool will be well under 1000 connections. It is important that your connection pool queues requests when things are maxed out, and quickly submit a new request when completion brings the number of busy connections below the maximum. -Kevin
On Wed, Sep 08, 2010 at 03:56:24PM -0500, Kevin Grittner wrote: - David Kerr <dmk@mr-paradox.net> wrote: - - > Actually, this is real.. that's 2000 connections - connection - > pooled out to 20k or so. (although i'm pushing for closer to 1000 - > connections). - > - > I know that's not the ideal way to go, but it's what i've got to - > work with. - > - > It IS a huge box though... - - FWIW, my benchmarks (and I've had a couple people tell me this is - consistent with what they've seen) show best throughput and best - response time when the connection pool is sized such that the number - of active PostgreSQL connections is limited to about twice the - number of CPU cores plus the number of effective spindles. Either - you've got one heck of a machine, or your "sweet spot" for the - connection pool will be well under 1000 connections. - - It is important that your connection pool queues requests when - things are maxed out, and quickly submit a new request when - completion brings the number of busy connections below the maximum. - - -Kevin Hmm, i'm not following you. I've got 48 cores. that means my sweet-spot active connections would be 96. (i.e., less than the default max_connections shipped with PG) and this is a very very expensive machine. Now if i were to connection pool that out to 15 people per connection, that's 1440 users "total" able to use my app at one time. (with only 96 actually doing anything). not really great for a web-based app that will have millions of users accessing it when we're fully ramped up. I've got a few plans to spread the load out across multiple machines but at 1440 users per machine this wouldn't be sustanable.. I know that other people are hosting more than that on larger machines so i hope i'm ok. Thanks Dave
David Kerr <dmk@mr-paradox.net> wrote: > Hmm, i'm not following you. I've got 48 cores. that means my > sweet-spot active connections would be 96. Plus your effective spindle count. That can be hard to calculate, but you could start by just counting spindles on your drive array. > Now if i were to connection pool that out to 15 people per > connection, Where did you get that number? We routinely route hundreds of requests per second (many of them with 10 or 20 joins) from five or ten thousand connected users through a pool of 30 connections. It started out bigger, we kept shrinking it until we hit our sweet spot. The reason we started bigger is we've got 40 spindles to go with the 16 cores, but the active portion of the database is cached, which reduces our effective spindle count to zero. > that's 1440 users "total" able to use my app at one time. (with > only 96 actually doing anything). not really great for a web-based > app that will have millions of users accessing it when we're fully > ramped up. Once you have enough active connections to saturate the resources, adding more connections just adds contention for resources and context switching cost -- it does nothing to help you service more concurrent users. The key is, as I mentioned before, to have the pooler queue requests above the limit and promptly get them running as slots are freed. -Kevin
On Wed, Sep 08, 2010 at 04:51:17PM -0500, Kevin Grittner wrote: - David Kerr <dmk@mr-paradox.net> wrote: - - > Hmm, i'm not following you. I've got 48 cores. that means my - > sweet-spot active connections would be 96. - - Plus your effective spindle count. That can be hard to calculate, - but you could start by just counting spindles on your drive array. We've got this weird LPAR thing at our hosting center. it's tough for me to do. - > Now if i were to connection pool that out to 15 people per - > connection, - - Where did you get that number? We routinely route hundreds of - requests per second (many of them with 10 or 20 joins) from five or - ten thousand connected users through a pool of 30 connections. It - started out bigger, we kept shrinking it until we hit our sweet - spot. The reason we started bigger is we've got 40 spindles to go - with the 16 cores, but the active portion of the database is cached, - which reduces our effective spindle count to zero. That's encouraging. I don't remember where I got the number from, but my pooler will be Geronimo, so i think it came in that context. - > that's 1440 users "total" able to use my app at one time. (with - > only 96 actually doing anything). not really great for a web-based - > app that will have millions of users accessing it when we're fully - > ramped up. - - Once you have enough active connections to saturate the resources, - adding more connections just adds contention for resources and - context switching cost -- it does nothing to help you service more - concurrent users. The key is, as I mentioned before, to have the - pooler queue requests above the limit and promptly get them running - as slots are freed. Right, I understand that. My assertian/hope is that the saturation point on this machine should be higher than most. Dave
David Kerr <dmk@mr-paradox.net> wrote: > My assertian/hope is that the saturation point > on this machine should be higher than most. Here's another way to think about it -- how long do you expect your average database request to run? (Our top 20 transaction functions average about 3ms per execution.) What does that work out to in transactions per second? That's the TPS you can achieve *on each connection* if your pooler is efficient. If you've determined a connection pool size based on hardware resources, divide your anticipated requests per second by that pool size. If the result is less than the TPS each connection can handle, you're in good shape. If it's higher, you may need more hardware to satisfy the load. Of course, the only way to really know some of these numbers is to test your actual application on the real hardware under realistic load; but sometimes you can get a reasonable approximation from early tests or "gut feel" based on experience with similar applications. I strongly recommend trying incremental changes to various configuration parameters once you have real load, and monitor the impact. The optimal settings are often not what you expect. And if the pooling isn't producing the results you expect, you should look at its configuration, or (if you can) try other pooler products. -Kevin
On Wed, Sep 08, 2010 at 05:27:24PM -0500, Kevin Grittner wrote: - David Kerr <dmk@mr-paradox.net> wrote: - - > My assertian/hope is that the saturation point - > on this machine should be higher than most. - - Here's another way to think about it -- how long do you expect your - average database request to run? (Our top 20 transaction functions - average about 3ms per execution.) What does that work out to in - transactions per second? That's the TPS you can achieve *on each - connection* if your pooler is efficient. If you've determined a - connection pool size based on hardware resources, divide your - anticipated requests per second by that pool size. If the result is - less than the TPS each connection can handle, you're in good shape. - If it's higher, you may need more hardware to satisfy the load. - - Of course, the only way to really know some of these numbers is to - test your actual application on the real hardware under realistic - load; but sometimes you can get a reasonable approximation from - early tests or "gut feel" based on experience with similar - applications. I strongly recommend trying incremental changes to - various configuration parameters once you have real load, and - monitor the impact. The optimal settings are often not what you - expect. - - And if the pooling isn't producing the results you expect, you - should look at its configuration, or (if you can) try other pooler - products. - - -Kevin - Thanks for the insight. we're currently in performance testing of the app. Currently, the JVM is the bottleneck, once we get past that i'm sure it will be the database at which point I'll have the kind of data you're talking about. Dave
Excerpts from David Kerr's message of mié sep 08 18:29:59 -0400 2010: > Thanks for the insight. we're currently in performance testing of the > app. Currently, the JVM is the bottleneck, once we get past that > i'm sure it will be the database at which point I'll have the kind > of data you're talking about. Hopefully you're not running the JVM stuff in the same machine. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Sep 09, 2010 at 10:38:16AM -0400, Alvaro Herrera wrote: - Excerpts from David Kerr's message of mié sep 08 18:29:59 -0400 2010: - - > Thanks for the insight. we're currently in performance testing of the - > app. Currently, the JVM is the bottleneck, once we get past that - > i'm sure it will be the database at which point I'll have the kind - > of data you're talking about. - - Hopefully you're not running the JVM stuff in the same machine. Nope, this server is 100% allocated to the database. Dave
Kevin Grittner wrote: > Of course, the only way to really know some of these numbers is to > test your actual application on the real hardware under realistic > load; but sometimes you can get a reasonable approximation from > early tests or "gut feel" based on experience with similar > applications. And that latter part only works if your gut is as accurate as Kevin's. For most people, even a rough direct measurement is much more useful than any estimate. Anyway, Kevin's point--that ultimately you cannot really be executing more things at once than you have CPUs--is an accurate one to remember here. One reason to put connection pooling in front of your database is that it cannot handle thousands of active connections at once without switching between them very frequently. That wastes both CPU and other resources with contention that could be avoided. If you expect, say, 1000 simultaneous users, and you have 48 CPUs, there is only 48ms worth of CPU time available to each user per second on average. If you drop that to 100 users using a pooler, they'll each get 480ms worth of it. But no matter what, when the CPUs are busy enough to always have a queued backlog, they will clear at best 48 * 1 second = 48000 ms of work from that queue each second, best case, no matter how you setup the ratios here. Now, imagine that the average query takes 24ms. The two scenarios work out like this: Without pooler: takes 24 / 48 = 0.5 seconds to execute in parallel with 999 other processes With pooler: Worst-case, the pooler queue is filled and there are 900 users ahead of this one, representing 21600 ms worth of work to clear before this request will become active. The query waits 21600 / 48000 = 0.45 seconds to get runtime on the CPU. Once it starts, though, it's only contending with 99 other processes, so it gets 1/100 of the available resources. 480 ms of CPU time executes per second for this query; it runs in 0.05 seconds at that rate. Total runtime: 0.45 + 0.05 = 0.5 seconds! So the incoming query in this not completely contrived case (I just picked the numbers to make the math even) takes the same amount of time to deliver a result either way. It's just a matter of whether it spends that time waiting for a clear slice of CPU time, or fighting with a lot of other processes the whole way. Once the incoming connections exceeds CPUs by enough of a margin that a pooler can expect to keep all the CPUs busy, it delivers results at the same speed as using a larger number of connections. And since the "without pooler" case assumes perfect slicing of time into units, it's the unrealistic one; contention among the 1000 processes will actually make it slower than the pooled version in the real world. You won't see anywhere close to 48000 ms worth of work delivered per second anymore if the server is constantly losing its CPU cache, swapping among an average of an average of 21 connections/CPU. Whereas if it's only slightly more than 2 connections per CPU, each of them should alternate between the two processes easily enough. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> Of course, the only way to really know some of these numbers is >> to test your actual application on the real hardware under >> realistic load; but sometimes you can get a reasonable >> approximation from early tests or "gut feel" based on experience >> with similar applications. > > And that latter part only works if your gut is as accurate as > Kevin's. For most people, even a rough direct measurement is much > more useful than any estimate. :-) Indeed, when I talk about "'gut feel' based on experience with similar applications" I'm think of something like, "When I had a query with the same number of joins against tables about this size with the same number and types of key columns, metrics showed that it took n ms and was CPU bound, and this new CPU and RAM hardware benchmarks twice as fast, so I'll ballpark this at 2/3 the runtime as a gut feel, and follow up with measurements as soon as practical." That may not have been entirely clear.... > So the incoming query in this not completely contrived case (I > just picked the numbers to make the math even) takes the same > amount of time to deliver a result either way. I'm gonna quibble with you here. Even if it gets done with the last request at the same time either way (which discounts the very real contention and context switch costs), if you release the thundering herd of requests all at once they will all finish at about the same time as that last request, while a queue allows a stream of responses throughout. Since results start coming back almost immediately, and stream through evenly, your *average response time* is nearly cut in half with the queue. And that's without figuring the network congestion issues of having all those requests complete at the same time. In my experience you can expect the response time benefit of reducing the size of your connection pool to match available resources to be more noticeable than the throughput improvements. This directly contradicts many people's intuition, revealing the downside of "gut feel". -Kevin
Kevin Grittner wrote: > In my experience you can expect the response time benefit of > reducing the size of your connection pool to match available > resources to be more noticeable than the throughput improvements. > This directly contradicts many people's intuition, revealing the > downside of "gut feel". > This is why I focused on showing there won't actually be a significant throughput reduction, because that part is the most counterintuitive I think. Accurately modeling the latency improvements of pooling requires much harder math, and it depends quite a bit on whether incoming traffic is even or in bursts. Easier in many cases to just swallow expectations and estimates and just try it instead. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us