Thread: Question on pgbench output
Hello! Sorry for the wall of text here. I'm working on a performance POC and I'm using pgbench and could use some advice. Mostly I want to ensure that my test is valid and that I'm using pgbench properly. The story behind the POC is that my developers want to pull web items from the database (not too strange) however our environment is fairly unique in that the item size is between 50k and 1.5megs and i need to retrive the data in less than a second. Oh, and we're talking about a minimum of 400 concurrent users. My intuition tells me that this is nuts, for a number of reasons, but to convince everyone I need to get some performance numbers. (So right now i'm just focused on how much time it takes to pull this record from the DB, not memory usage, http caching, contention, etc.) What i did was create a table "temp" with "id(pk)" and "content(bytea)" [ going to compare bytea vs large objects in this POC as well even though i know that large objects are better for this ] I loaded the table with aproximately 50k items that were 1.2Megs in size. Here is my transaction file: \setrandom iid 1 50000 BEGIN; SELECT content FROM test WHERE item_id = :iid; END; and then i executed: pgbench -c 400 -t 50 -f trans.sql -l trying to simulate 400 concurrent users performing 50 operations each which is consistant with my needs. The results actually have surprised me, the database isn't really tuned and i'm not working on great hardware. But still I'm getting: caling factor: 1 number of clients: 400 number of transactions per client: 50 number of transactions actually processed: 20000/20000 tps = 51.086001 (including connections establishing) tps = 51.395364 (excluding connections establishing) I'm not really sure how to evaulate the tps, I've read in this forum that some folks are getting 2k tps so this wouldn't appear to be good to me. However: When i look at the logfile generated: head -5 pgbench_log.7205 0 0 15127082 0 1238784175 660088 1 0 15138079 0 1238784175 671205 2 0 15139007 0 1238784175 672180 3 0 15141097 0 1238784175 674357 4 0 15142000 0 1238784175 675345 (I wrote a script to average the total transaction time for every record in the file) avg_times.ksh pgbench_log.7205 Avg tx time seconds: 7 That's not too bad, it seems like with real hardware + actually tuning the DB i might be able to meet my requirement. So the question is - Can anyone see a flaw in my test so far? (considering that i'm just focused on the performance of pulling the 1.2M record from the table) and if so any suggestions to further nail it down? Thanks Dave Kerr
David Kerr <dmk@mr-paradox.net> writes: > The results actually have surprised me, the database isn't really tuned > and i'm not working on great hardware. But still I'm getting: > caling factor: 1 > number of clients: 400 > number of transactions per client: 50 > number of transactions actually processed: 20000/20000 > tps = 51.086001 (including connections establishing) > tps = 51.395364 (excluding connections establishing) > I'm not really sure how to evaulate the tps, I've read in this forum that > some folks are getting 2k tps so this wouldn't appear to be good to me. Well, you're running a custom transaction definition so comparing your number to anyone else's is 100% meaningless. All you know here is that your individual transactions are more expensive than the default pgbench transaction (which we could'a told you without testing...) > (I wrote a script to average the total transaction time for every record > in the file) > avg_times.ksh pgbench_log.7205 > Avg tx time seconds: 7 That squares with your previous results: if you're completing 50 transactions per sec then it takes about 8 seconds to do 400 of 'em. So any one of the clients ought to see about 8 second response time. I think that your test is probably valid. > That's not too bad, it seems like with real hardware + actually tuning > the DB i might be able to meet my requirement. How much more "real" is the target hardware than what you have? You appear to need about a factor of 10 better disk throughput than you have, and that's not going to be too cheap. I suspect that the thing is going to be seek-limited, and seek rate is definitely the most expensive number to increase. If the items you're pulling are static files, you should consider storing them as plain files and just using the DB as an index. Megabyte-sized fields aren't the cheapest things to push around. regards, tom lane
On Fri, Apr 03, 2009 at 04:43:29PM -0400, Tom Lane wrote: - > I'm not really sure how to evaulate the tps, I've read in this forum that - > some folks are getting 2k tps so this wouldn't appear to be good to me. - - Well, you're running a custom transaction definition so comparing your - number to anyone else's is 100% meaningless. All you know here is that - your individual transactions are more expensive than the default pgbench - transaction (which we could'a told you without testing...) That makes sense. I guess I included it incase there was a community defined sense of what a good TPS for a highly responsive web-app. (like if you're getting 1000tps on your web app then your users are happy) But from the sounds of it, yeah, that would probably be difficult to really measure. - > (I wrote a script to average the total transaction time for every record - > in the file) - > avg_times.ksh pgbench_log.7205 - > Avg tx time seconds: 7 - - That squares with your previous results: if you're completing 50 - transactions per sec then it takes about 8 seconds to do 400 of 'em. - So any one of the clients ought to see about 8 second response time. - I think that your test is probably valid. Ok, great. thanks! - > That's not too bad, it seems like with real hardware + actually tuning - > the DB i might be able to meet my requirement. - - How much more "real" is the target hardware than what you have? - You appear to need about a factor of 10 better disk throughput than - you have, and that's not going to be too cheap. I suspect that the - thing is going to be seek-limited, and seek rate is definitely the - most expensive number to increase. The hardware i'm using is a 5 or 6 year old POS IBM Blade. we haven't specced the new hardware yet but I would say that it will be sigificantly better. - If the items you're pulling are static files, you should consider - storing them as plain files and just using the DB as an index. - Megabyte-sized fields aren't the cheapest things to push around. I agree 100% and of course the memory allocation, etc from being able to cache the items in httpd vs in the DB is a major consideration. Thanks again. Dave Kerr
On Fri, Apr 3, 2009 at 1:53 PM, David Kerr <dmk@mr-paradox.net> wrote: > Here is my transaction file: > \setrandom iid 1 50000 > BEGIN; > SELECT content FROM test WHERE item_id = :iid; > END; > > and then i executed: > pgbench -c 400 -t 50 -f trans.sql -l > > The results actually have surprised me, the database isn't really tuned > and i'm not working on great hardware. But still I'm getting: > > caling factor: 1 > number of clients: 400 > number of transactions per client: 50 > number of transactions actually processed: 20000/20000 > tps = 51.086001 (including connections establishing) > tps = 51.395364 (excluding connections establishing) Not bad. With an average record size of 1.2Meg you're reading ~60 Meg per second (plus overhead) off of your drive(s). > So the question is - Can anyone see a flaw in my test so far? > (considering that i'm just focused on the performance of pulling > the 1.2M record from the table) and if so any suggestions to further > nail it down? You can either get more memory (enough to hold your whole dataset in ram), get faster drives and aggregate them with RAID-10, or look into something like memcached servers, which can cache db queries for your app layer.
On Fri, 3 Apr 2009, David Kerr wrote: > Here is my transaction file: > \setrandom iid 1 50000 > BEGIN; > SELECT content FROM test WHERE item_id = :iid; > END; Wrapping a SELECT in a BEGIN/END block is unnecessary, and it will significantly slow down things for two reason: the transactions overhead and the time pgbench is spending parsing/submitting those additional lines. Your script should be two lines long, the \setrandom one and the SELECT. > trying to simulate 400 concurrent users performing 50 operations each > which is consistant with my needs. pgbench is extremely bad at simulating large numbers of clients. The pgbench client operates as a single thread that handles both parsing the input files, sending things to clients, and processing their responses. It's very easy to end up in a situation where that bottlenecks at the pgbench client long before getting to 400 concurrent connections. That said, if you're in the hundreds of transactions per second range that probably isn't biting you yet. I've seen it more once you get around 5000+ things per second going on. > I'm not really sure how to evaulate the tps, I've read in this forum that > some folks are getting 2k tps so this wouldn't appear to be good to me. You can't compare what you're doing to what anybody else because your item size is so big. The standard pgbench transactions all involve very small rows. The thing that's really missing from your comments so far is the cold vs. hot cache issue: at the point when you're running pgbench, is a lot of the data already in the PostgreSQL or OS buffer cache? If you're starting without any data in there, 50 TPS is completely reasonable--each SELECT could potentially be pulling both data and some number of index blocks, and the tests I was just doing yesterday (with a single disk drive) started at about 40TPS. By the time the test was finished running and the caches were all full of useful data, it was 17K TPS instead. > (I wrote a script to average the total transaction time for every record > in the file) Wait until Monday, I'm announcing some pgbench tools at PG East this weekend that will take care of all this as well as things like graphing. It pushes all the info pgbench returns, including the latency information, into a database and generates a big stack of derived reports. I'd rather see you help improve that than reinvent this particular wheel. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
David Kerr <dmk@mr-paradox.net> writes: > On Fri, Apr 03, 2009 at 04:43:29PM -0400, Tom Lane wrote: > - How much more "real" is the target hardware than what you have? > - You appear to need about a factor of 10 better disk throughput than > - you have, and that's not going to be too cheap. > The hardware i'm using is a 5 or 6 year old POS IBM Blade. we haven't > specced the new hardware yet but I would say that it will be sigificantly > better. The point I was trying to make is that it's the disk subsystem, not the CPU, that is going to make or break you. regards, tom lane
Greg Smith <gsmith@gregsmith.com> writes: > pgbench is extremely bad at simulating large numbers of clients. The > pgbench client operates as a single thread that handles both parsing the > input files, sending things to clients, and processing their responses. > It's very easy to end up in a situation where that bottlenecks at the > pgbench client long before getting to 400 concurrent connections. Yeah, good point. > That said, if you're in the hundreds of transactions per second range that > probably isn't biting you yet. I've seen it more once you get around > 5000+ things per second going on. However, I don't think anyone else has been pgbench'ing transactions where client-side libpq has to absorb (and then discard) a megabyte of data per xact. I wouldn't be surprised that that eats enough CPU to make it an issue. David, did you pay any attention to how busy the pgbench process was? Another thing that strikes me as a bit questionable is that your stated requirements involve being able to pump 400MB/sec from the database server to your various client machines (presumably those 400 people aren't running their client apps directly on the DB server). What's the network fabric going to be, again? Gigabit Ethernet won't cut it... regards, tom lane
On Fri, Apr 03, 2009 at 06:52:26PM -0400, Tom Lane wrote: - Greg Smith <gsmith@gregsmith.com> writes: - > pgbench is extremely bad at simulating large numbers of clients. The - > pgbench client operates as a single thread that handles both parsing the - > input files, sending things to clients, and processing their responses. - > It's very easy to end up in a situation where that bottlenecks at the - > pgbench client long before getting to 400 concurrent connections. - - Yeah, good point. hmmm ok, I didn't realize that pgbouncer wasn't threaded. I've got a Plan B that doesn't use pgbouncer that i'll try. - > That said, if you're in the hundreds of transactions per second range that - > probably isn't biting you yet. I've seen it more once you get around - > 5000+ things per second going on. - - However, I don't think anyone else has been pgbench'ing transactions - where client-side libpq has to absorb (and then discard) a megabyte of - data per xact. I wouldn't be surprised that that eats enough CPU to - make it an issue. David, did you pay any attention to how busy the - pgbench process was? I can run it again and have a look, no problem. - Another thing that strikes me as a bit questionable is that your stated - requirements involve being able to pump 400MB/sec from the database - server to your various client machines (presumably those 400 people - aren't running their client apps directly on the DB server). What's the - network fabric going to be, again? Gigabit Ethernet won't cut it... Yes, sorry I'm not trying to be confusing but i didn't want to bog everyone down with a ton of details. 400 concurrent users doesn't mean that they're pulling 1.5 megs / second every second. Just that they could potentially pull 1.5 megs at any one second. most likely there is a 6 (minimum) to 45 second (average) gap between each individual user's pull. My plan B above emulates that, but i was using pgbouncer to try to emulate "worst case" scenario. - The point I was trying to make is that it's the disk subsystem, not - the CPU, that is going to make or break you. Makes sense, I definitely want to avoid I/Os. On Fri, Apr 03, 2009 at 05:51:50PM -0400, Greg Smith wrote: - Wrapping a SELECT in a BEGIN/END block is unnecessary, and it will - significantly slow down things for two reason: the transactions overhead - and the time pgbench is spending parsing/submitting those additional - lines. Your script should be two lines long, the \setrandom one and the - SELECT. - Oh perfect, I can try that too. thanks - The thing that's really missing from your comments so far is the cold - vs. hot cache issue: at the point when you're running pgbench, is a lot I'm testing with a cold cache because most likely the way the items are spead out, of those 400 users only a few at a time might access similar items. - Wait until Monday, I'm announcing some pgbench tools at PG East this - weekend that will take care of all this as well as things like - graphing. It pushes all the info pgbench returns, including the latency - information, into a database and generates a big stack of derived reports. - I'd rather see you help improve that than reinvent this particular wheel. Ah very cool, wish i could go (but i'm on the west coast). Thanks again guys. Dave Kerr
Gah - sorry, setting up pgbouncer for my Plan B. I meant -pgbench- Dave Kerr On Fri, Apr 03, 2009 at 04:34:58PM -0700, David Kerr wrote: - On Fri, Apr 03, 2009 at 06:52:26PM -0400, Tom Lane wrote: - - Greg Smith <gsmith@gregsmith.com> writes: - - > pgbench is extremely bad at simulating large numbers of clients. The - - > pgbench client operates as a single thread that handles both parsing the - - > input files, sending things to clients, and processing their responses. - - > It's very easy to end up in a situation where that bottlenecks at the - - > pgbench client long before getting to 400 concurrent connections. - - - - Yeah, good point. - - hmmm ok, I didn't realize that pgbouncer wasn't threaded. I've got a Plan B - that doesn't use pgbouncer that i'll try. - - - > That said, if you're in the hundreds of transactions per second range that - - > probably isn't biting you yet. I've seen it more once you get around - - > 5000+ things per second going on. - - - - However, I don't think anyone else has been pgbench'ing transactions - - where client-side libpq has to absorb (and then discard) a megabyte of - - data per xact. I wouldn't be surprised that that eats enough CPU to - - make it an issue. David, did you pay any attention to how busy the - - pgbench process was? - I can run it again and have a look, no problem. - - - Another thing that strikes me as a bit questionable is that your stated - - requirements involve being able to pump 400MB/sec from the database - - server to your various client machines (presumably those 400 people - - aren't running their client apps directly on the DB server). What's the - - network fabric going to be, again? Gigabit Ethernet won't cut it... - - Yes, sorry I'm not trying to be confusing but i didn't want to bog - everyone down with a ton of details. - - 400 concurrent users doesn't mean that they're pulling 1.5 megs / second - every second. Just that they could potentially pull 1.5 megs at any one - second. most likely there is a 6 (minimum) to 45 second (average) gap - between each individual user's pull. My plan B above emulates that, but - i was using pgbouncer to try to emulate "worst case" scenario. - - - The point I was trying to make is that it's the disk subsystem, not - - the CPU, that is going to make or break you. - - Makes sense, I definitely want to avoid I/Os. - - - On Fri, Apr 03, 2009 at 05:51:50PM -0400, Greg Smith wrote: - - Wrapping a SELECT in a BEGIN/END block is unnecessary, and it will - - significantly slow down things for two reason: the transactions - overhead - - and the time pgbench is spending parsing/submitting those additional - - lines. Your script should be two lines long, the \setrandom one and - the - - SELECT. - - - - Oh perfect, I can try that too. thanks - - - The thing that's really missing from your comments so far is the cold - - vs. hot cache issue: at the point when you're running pgbench, is a lot - - I'm testing with a cold cache because most likely the way the items are - spead out, of those 400 users only a few at a time might access similar - items. - - - Wait until Monday, I'm announcing some pgbench tools at PG East this - - weekend that will take care of all this as well as things like - - graphing. It pushes all the info pgbench returns, including the latency - - information, into a database and generates a big stack of derived reports. - - I'd rather see you help improve that than reinvent this particular wheel. - - Ah very cool, wish i could go (but i'm on the west coast). - - - Thanks again guys. - - Dave Kerr - - - -- - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) - To make changes to your subscription: - http://www.postgresql.org/mailpref/pgsql-performance
On Fri, 3 Apr 2009, Tom Lane wrote: > However, I don't think anyone else has been pgbench'ing transactions > where client-side libpq has to absorb (and then discard) a megabyte of > data per xact. I wouldn't be surprised that that eats enough CPU to > make it an issue. David, did you pay any attention to how busy the > pgbench process was? I certainly haven't ever tried that. David, the thing you want to do here is run "top -c" when pgbench is going. You should see the pgbench process and a bunch of postmaster ones, with "-c" (or by hitting "c" while top is running) you can even see what they're all doing. If the pgbench process is consuming close to 100% of a CPU's time, that means the results it's giving are not valid--what you're seeing in that case are the limitations of the testing program instead. You can even automate collection of that with something like this: top -b -d 10 -c -n 10000 > top.log & TOPPID=$! (run test) kill $TOPPID That will save a snapshot every 10 seconds of what's happening during your test. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, Apr 03, 2009 at 10:35:58PM -0400, Greg Smith wrote: - On Fri, 3 Apr 2009, Tom Lane wrote: - - and a bunch of postmaster ones, with "-c" (or by hitting "c" while top is - running) you can even see what they're all doing. If the pgbench process - is consuming close to 100% of a CPU's time, that means the results it's - giving are not valid--what you're seeing in that case are the limitations - of the testing program instead. Looks pretty good to me. not too much mem or CPU. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4241 postgres 20 0 9936 2652 1680 S 0 0.1 0:00.02 pgbench 4241 postgres 20 0 22948 10m 1708 R 4 0.3 0:00.46 pgbench 4241 postgres 20 0 26628 14m 1708 R 5 0.3 0:00.96 pgbench 4241 postgres 20 0 29160 15m 1708 R 5 0.4 0:01.44 pgbench 4241 postgres 20 0 30888 16m 1708 R 4 0.4 0:01.86 pgbench 4241 postgres 20 0 31624 17m 1708 R 5 0.4 0:02.34 pgbench 4241 postgres 20 0 32552 18m 1708 R 5 0.5 0:02.82 pgbench 4241 postgres 20 0 33160 18m 1708 R 5 0.5 0:03.28 pgbench 4241 postgres 20 0 33608 18m 1708 R 4 0.5 0:03.70 pgbench 4241 postgres 20 0 34056 19m 1708 R 4 0.5 0:04.08 pgbench 4241 postgres 20 0 34056 19m 1708 R 4 0.5 0:04.52 pgbench 4241 postgres 20 0 34376 19m 1708 R 4 0.5 0:04.98 pgbench 4241 postgres 20 0 34536 19m 1708 R 4 0.5 0:05.42 pgbench 4241 postgres 20 0 34536 19m 1708 R 5 0.5 0:05.88 pgbench 4241 postgres 20 0 34664 19m 1708 R 5 0.5 0:06.34 pgbench 4241 postgres 20 0 34664 19m 1708 R 5 0.5 0:06.82 pgbench 4241 postgres 20 0 34664 19m 1708 R 4 0.5 0:07.26 pgbench 4241 postgres 20 0 34664 20m 1708 R 4 0.5 0:07.72 pgbench 4241 postgres 20 0 34664 20m 1708 R 4 0.5 0:08.12 pgbench Dave
On Fri, 2009-04-03 at 16:34 -0700, David Kerr wrote: > 400 concurrent users doesn't mean that they're pulling 1.5 megs / > second every second. Just that they could potentially pull 1.5 megs at > any one second. most likely there is a 6 (minimum) to 45 second > (average) gap between each individual user's pull. There's a world of difference between 400 connected and 400 concurrent users. You've been testing 400 concurrent users, yet without measuring data transfer. The think time will bring the number of users right down again, but you really need to include the much higher than normal data transfer into your measurements and pgbench won't help there. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > On Fri, 2009-04-03 at 16:34 -0700, David Kerr wrote: >> 400 concurrent users doesn't mean that they're pulling 1.5 megs / >> second every second. > There's a world of difference between 400 connected and 400 concurrent > users. You've been testing 400 concurrent users, yet without measuring > data transfer. The think time will bring the number of users right down > again, but you really need to include the much higher than normal data > transfer into your measurements and pgbench won't help there. Actually pgbench can simulate think time perfectly well: use its \sleep command in your script. I think you can even set it up to randomize the sleep time. I agree that it seems David has been measuring a case far beyond what his real problem is. regards, tom lane
Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On Fri, 2009-04-03 at 16:34 -0700, David Kerr wrote: >>> 400 concurrent users doesn't mean that they're pulling 1.5 megs / >>> second every second. > >> There's a world of difference between 400 connected and 400 concurrent >> users. You've been testing 400 concurrent users, yet without measuring >> data transfer. The think time will bring the number of users right down >> again, but you really need to include the much higher than normal data >> transfer into your measurements and pgbench won't help there. > > Actually pgbench can simulate think time perfectly well: use its \sleep > command in your script. I think you can even set it up to randomize the > sleep time. > > I agree that it seems David has been measuring a case far beyond what > his real problem is. > > regards, tom lane > Fortunately the network throughput issue is not mine to solve. Would it be fair to say that with the pgbench output i've given so far that if all my users clicked "go" at the same time (i.e., worst case scenario), i could expect (from the database) about 8 second response time? Thanks Dave Kerr
David Kerr <dmk@mr-paradox.net> writes: > Fortunately the network throughput issue is not mine to solve. > Would it be fair to say that with the pgbench output i've given so far > that if all my users clicked "go" at the same time (i.e., worst case > scenario), i could expect (from the database) about 8 second response time? For the hardware you've got, and ignoring the network bandwidth issue, that appears to be a fair estimate of the worst case response. regards, tom lane