Thread: Question on pgbench output

From:
David Kerr
Date:

Hello!

Sorry for the wall of text here.

I'm working on a performance POC and I'm using pgbench and could
use some advice. Mostly I want to ensure that my test is valid
and that I'm using pgbench properly.

The story behind the POC is that my developers want to pull web items
from the database (not too strange) however our environment is fairly
unique in that the item size is between 50k and 1.5megs and i need
to retrive the data in less than a second. Oh, and we're talking about
a minimum of 400 concurrent users.

My intuition tells me that this is nuts, for a number of reasons, but
to convince everyone I need to get some performance numbers.
(So right now i'm just focused on how much time it takes to pull this
record from the DB, not memory usage, http caching, contention, etc.)

What i did was create a table "temp" with "id(pk)" and "content(bytea)"
[ going to compare bytea vs large objects in this POC as well even
though i know that large objects are better for this ]

I loaded the table with aproximately 50k items that were 1.2Megs in size.

Here is my transaction file:
\setrandom iid 1 50000
BEGIN;
SELECT content FROM test WHERE item_id = :iid;
END;

and then i executed:
pgbench -c 400 -t 50 -f trans.sql -l

trying to simulate 400 concurrent users performing 50 operations each
which is consistant with my needs.

The results actually have surprised me, the database isn't really tuned
and i'm not working on great hardware. But still I'm getting:

caling factor: 1
number of clients: 400
number of transactions per client: 50
number of transactions actually processed: 20000/20000
tps = 51.086001 (including connections establishing)
tps = 51.395364 (excluding connections establishing)

I'm not really sure how to evaulate the tps, I've read in this forum that
some folks are getting 2k tps so this wouldn't appear to be good to me.

However: When i look at the logfile generated:

head -5 pgbench_log.7205
0 0 15127082 0 1238784175 660088
1 0 15138079 0 1238784175 671205
2 0 15139007 0 1238784175 672180
3 0 15141097 0 1238784175 674357
4 0 15142000 0 1238784175 675345

(I wrote a script to average the total transaction time for every record
in the file)
avg_times.ksh pgbench_log.7205
Avg tx time seconds: 7

That's not too bad, it seems like with real hardware + actually tuning
the DB i might be able to meet my requirement.

So the question is - Can anyone see a flaw in my test so far?
(considering that i'm just focused on the performance of pulling
the 1.2M record from the table) and if so any suggestions to further
nail it down?

Thanks

Dave Kerr

From:
Tom Lane
Date:

David Kerr <> writes:
> The results actually have surprised me, the database isn't really tuned
> and i'm not working on great hardware. But still I'm getting:

> caling factor: 1
> number of clients: 400
> number of transactions per client: 50
> number of transactions actually processed: 20000/20000
> tps = 51.086001 (including connections establishing)
> tps = 51.395364 (excluding connections establishing)

> I'm not really sure how to evaulate the tps, I've read in this forum that
> some folks are getting 2k tps so this wouldn't appear to be good to me.

Well, you're running a custom transaction definition so comparing your
number to anyone else's is 100% meaningless.  All you know here is that
your individual transactions are more expensive than the default pgbench
transaction (which we could'a told you without testing...)

> (I wrote a script to average the total transaction time for every record
> in the file)
> avg_times.ksh pgbench_log.7205
> Avg tx time seconds: 7

That squares with your previous results: if you're completing 50
transactions per sec then it takes about 8 seconds to do 400 of 'em.
So any one of the clients ought to see about 8 second response time.
I think that your test is probably valid.

> That's not too bad, it seems like with real hardware + actually tuning
> the DB i might be able to meet my requirement.

How much more "real" is the target hardware than what you have?
You appear to need about a factor of 10 better disk throughput than
you have, and that's not going to be too cheap.  I suspect that the
thing is going to be seek-limited, and seek rate is definitely the
most expensive number to increase.

If the items you're pulling are static files, you should consider
storing them as plain files and just using the DB as an index.
Megabyte-sized fields aren't the cheapest things to push around.

            regards, tom lane

From:
David Kerr
Date:

On Fri, Apr 03, 2009 at 04:43:29PM -0400, Tom Lane wrote:
- > I'm not really sure how to evaulate the tps, I've read in this forum that
- > some folks are getting 2k tps so this wouldn't appear to be good to me.
-
- Well, you're running a custom transaction definition so comparing your
- number to anyone else's is 100% meaningless.  All you know here is that
- your individual transactions are more expensive than the default pgbench
- transaction (which we could'a told you without testing...)

That makes sense. I guess I included it incase there was a community
defined sense of what a good TPS for a highly responsive web-app.
(like if you're getting 1000tps on your web app then your users are
happy)

But from the sounds of it, yeah, that would probably be difficult to
really measure.

- > (I wrote a script to average the total transaction time for every record
- > in the file)
- > avg_times.ksh pgbench_log.7205
- > Avg tx time seconds: 7
-
- That squares with your previous results: if you're completing 50
- transactions per sec then it takes about 8 seconds to do 400 of 'em.
- So any one of the clients ought to see about 8 second response time.
- I think that your test is probably valid.

Ok, great. thanks!

- > That's not too bad, it seems like with real hardware + actually tuning
- > the DB i might be able to meet my requirement.
-
- How much more "real" is the target hardware than what you have?
- You appear to need about a factor of 10 better disk throughput than
- you have, and that's not going to be too cheap.  I suspect that the
- thing is going to be seek-limited, and seek rate is definitely the
- most expensive number to increase.

The hardware i'm using is a 5 or 6 year old POS IBM Blade. we haven't
specced the new hardware yet but I would say that it will be sigificantly
better.

- If the items you're pulling are static files, you should consider
- storing them as plain files and just using the DB as an index.
- Megabyte-sized fields aren't the cheapest things to push around.

I agree 100% and of course the memory allocation, etc from being able
to cache the items in httpd vs in the DB is a major consideration.

Thanks again.

Dave Kerr

From:
Scott Marlowe
Date:

On Fri, Apr 3, 2009 at 1:53 PM, David Kerr <> wrote:
> Here is my transaction file:
> \setrandom iid 1 50000
> BEGIN;
> SELECT content FROM test WHERE item_id = :iid;
> END;
>
> and then i executed:
> pgbench -c 400 -t 50 -f trans.sql -l
>
> The results actually have surprised me, the database isn't really tuned
> and i'm not working on great hardware. But still I'm getting:
>
> caling factor: 1
> number of clients: 400
> number of transactions per client: 50
> number of transactions actually processed: 20000/20000
> tps = 51.086001 (including connections establishing)
> tps = 51.395364 (excluding connections establishing)

Not bad.  With an average record size of 1.2Meg you're reading ~60 Meg
per second (plus overhead) off of your drive(s).

> So the question is - Can anyone see a flaw in my test so far?
> (considering that i'm just focused on the performance of pulling
> the 1.2M record from the table) and if so any suggestions to further
> nail it down?

You can either get more memory (enough to hold your whole dataset in
ram), get faster drives and aggregate them with RAID-10, or look into
something like memcached servers, which can cache db queries for your
app layer.

From:
Greg Smith
Date:

On Fri, 3 Apr 2009, David Kerr wrote:

> Here is my transaction file:
> \setrandom iid 1 50000
> BEGIN;
> SELECT content FROM test WHERE item_id = :iid;
> END;

Wrapping a SELECT in a BEGIN/END block is unnecessary, and it will
significantly slow down things for two reason:  the transactions overhead
and the time pgbench is spending parsing/submitting those additional
lines.  Your script should be two lines long, the \setrandom one and the
SELECT.

> trying to simulate 400 concurrent users performing 50 operations each
> which is consistant with my needs.

pgbench is extremely bad at simulating large numbers of clients.  The
pgbench client operates as a single thread that handles both parsing the
input files, sending things to clients, and processing their responses.
It's very easy to end up in a situation where that bottlenecks at the
pgbench client long before getting to 400 concurrent connections.

That said, if you're in the hundreds of transactions per second range that
probably isn't biting you yet.  I've seen it more once you get around
5000+ things per second going on.

> I'm not really sure how to evaulate the tps, I've read in this forum that
> some folks are getting 2k tps so this wouldn't appear to be good to me.

You can't compare what you're doing to what anybody else because your
item size is so big.  The standard pgbench transactions all involve very
small rows.

The thing that's really missing from your comments so far is the cold vs.
hot cache issue:  at the point when you're running pgbench, is a lot of
the data already in the PostgreSQL or OS buffer cache?  If you're starting
without any data in there, 50 TPS is completely reasonable--each SELECT
could potentially be pulling both data and some number of index blocks,
and the tests I was just doing yesterday (with a single disk drive)
started at about 40TPS.  By the time the test was finished running and the
caches were all full of useful data, it was 17K TPS instead.

> (I wrote a script to average the total transaction time for every record
> in the file)

Wait until Monday, I'm announcing some pgbench tools at PG East this
weekend that will take care of all this as well as things like graphing.
It pushes all the info pgbench returns, including the latency information,
into a database and generates a big stack of derived reports.  I'd rather
see you help improve that than reinvent this particular wheel.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

From:
Tom Lane
Date:

David Kerr <> writes:
> On Fri, Apr 03, 2009 at 04:43:29PM -0400, Tom Lane wrote:
> - How much more "real" is the target hardware than what you have?
> - You appear to need about a factor of 10 better disk throughput than
> - you have, and that's not going to be too cheap.

> The hardware i'm using is a 5 or 6 year old POS IBM Blade. we haven't
> specced the new hardware yet but I would say that it will be sigificantly
> better.

The point I was trying to make is that it's the disk subsystem, not
the CPU, that is going to make or break you.

            regards, tom lane

From:
Tom Lane
Date:

Greg Smith <> writes:
> pgbench is extremely bad at simulating large numbers of clients.  The
> pgbench client operates as a single thread that handles both parsing the
> input files, sending things to clients, and processing their responses.
> It's very easy to end up in a situation where that bottlenecks at the
> pgbench client long before getting to 400 concurrent connections.

Yeah, good point.

> That said, if you're in the hundreds of transactions per second range that
> probably isn't biting you yet.  I've seen it more once you get around
> 5000+ things per second going on.

However, I don't think anyone else has been pgbench'ing transactions
where client-side libpq has to absorb (and then discard) a megabyte of
data per xact.  I wouldn't be surprised that that eats enough CPU to
make it an issue.  David, did you pay any attention to how busy the
pgbench process was?

Another thing that strikes me as a bit questionable is that your stated
requirements involve being able to pump 400MB/sec from the database
server to your various client machines (presumably those 400 people
aren't running their client apps directly on the DB server).  What's the
network fabric going to be, again?  Gigabit Ethernet won't cut it...

            regards, tom lane

From:
David Kerr
Date:

On Fri, Apr 03, 2009 at 06:52:26PM -0400, Tom Lane wrote:
- Greg Smith <> writes:
- > pgbench is extremely bad at simulating large numbers of clients.  The
- > pgbench client operates as a single thread that handles both parsing the
- > input files, sending things to clients, and processing their responses.
- > It's very easy to end up in a situation where that bottlenecks at the
- > pgbench client long before getting to 400 concurrent connections.
-
- Yeah, good point.

hmmm ok, I didn't realize that pgbouncer wasn't threaded.  I've got a Plan B
that doesn't use pgbouncer that i'll try.

- > That said, if you're in the hundreds of transactions per second range that
- > probably isn't biting you yet.  I've seen it more once you get around
- > 5000+ things per second going on.
-
- However, I don't think anyone else has been pgbench'ing transactions
- where client-side libpq has to absorb (and then discard) a megabyte of
- data per xact.  I wouldn't be surprised that that eats enough CPU to
- make it an issue.  David, did you pay any attention to how busy the
- pgbench process was?
I can run it again and have a look, no problem.

- Another thing that strikes me as a bit questionable is that your stated
- requirements involve being able to pump 400MB/sec from the database
- server to your various client machines (presumably those 400 people
- aren't running their client apps directly on the DB server).  What's the
- network fabric going to be, again?  Gigabit Ethernet won't cut it...

Yes, sorry I'm not trying to be confusing but i didn't want to bog
everyone down with a ton of details.

400 concurrent users doesn't mean that they're pulling 1.5 megs / second
every second. Just that they could potentially pull 1.5 megs at any one
second. most likely there is a 6 (minimum) to 45 second (average) gap
between each individual user's pull. My plan B above emulates that, but
i was using pgbouncer to try to emulate "worst case" scenario.

- The point I was trying to make is that it's the disk subsystem, not
- the CPU, that is going to make or break you.

Makes sense, I definitely want to avoid I/Os.


On Fri, Apr 03, 2009 at 05:51:50PM -0400, Greg Smith wrote:
- Wrapping a SELECT in a BEGIN/END block is unnecessary, and it will
- significantly slow down things for two reason:  the transactions
  overhead
- and the time pgbench is spending parsing/submitting those additional
- lines.  Your script should be two lines long, the \setrandom one and
  the
- SELECT.
-

Oh perfect, I can try that too. thanks

- The thing that's really missing from your comments so far is the cold
- vs. hot cache issue:  at the point when you're running pgbench, is a lot

I'm testing with a cold cache because most likely the way the items are
spead out, of those 400 users only a few at a time might access similar
items.

- Wait until Monday, I'm announcing some pgbench tools at PG East this
- weekend that will take care of all this as well as things like
- graphing. It pushes all the info pgbench returns, including the latency
- information, into a database and generates a big stack of derived reports.
- I'd rather see you help improve that than reinvent this particular wheel.

Ah very cool, wish i could go (but i'm on the west coast).


Thanks again guys.

Dave Kerr


From:
David Kerr
Date:

Gah - sorry, setting up pgbouncer for my Plan B.

I meant -pgbench-

Dave Kerr


On Fri, Apr 03, 2009 at 04:34:58PM -0700, David Kerr wrote:
- On Fri, Apr 03, 2009 at 06:52:26PM -0400, Tom Lane wrote:
- - Greg Smith <> writes:
- - > pgbench is extremely bad at simulating large numbers of clients.  The
- - > pgbench client operates as a single thread that handles both parsing the
- - > input files, sending things to clients, and processing their responses.
- - > It's very easy to end up in a situation where that bottlenecks at the
- - > pgbench client long before getting to 400 concurrent connections.
- -
- - Yeah, good point.
-
- hmmm ok, I didn't realize that pgbouncer wasn't threaded.  I've got a Plan B
- that doesn't use pgbouncer that i'll try.
-
- - > That said, if you're in the hundreds of transactions per second range that
- - > probably isn't biting you yet.  I've seen it more once you get around
- - > 5000+ things per second going on.
- -
- - However, I don't think anyone else has been pgbench'ing transactions
- - where client-side libpq has to absorb (and then discard) a megabyte of
- - data per xact.  I wouldn't be surprised that that eats enough CPU to
- - make it an issue.  David, did you pay any attention to how busy the
- - pgbench process was?
- I can run it again and have a look, no problem.
-
- - Another thing that strikes me as a bit questionable is that your stated
- - requirements involve being able to pump 400MB/sec from the database
- - server to your various client machines (presumably those 400 people
- - aren't running their client apps directly on the DB server).  What's the
- - network fabric going to be, again?  Gigabit Ethernet won't cut it...
-
- Yes, sorry I'm not trying to be confusing but i didn't want to bog
- everyone down with a ton of details.
-
- 400 concurrent users doesn't mean that they're pulling 1.5 megs / second
- every second. Just that they could potentially pull 1.5 megs at any one
- second. most likely there is a 6 (minimum) to 45 second (average) gap
- between each individual user's pull. My plan B above emulates that, but
- i was using pgbouncer to try to emulate "worst case" scenario.
-
- - The point I was trying to make is that it's the disk subsystem, not
- - the CPU, that is going to make or break you.
-
- Makes sense, I definitely want to avoid I/Os.
-
-
- On Fri, Apr 03, 2009 at 05:51:50PM -0400, Greg Smith wrote:
- - Wrapping a SELECT in a BEGIN/END block is unnecessary, and it will
- - significantly slow down things for two reason:  the transactions
-   overhead
- - and the time pgbench is spending parsing/submitting those additional
- - lines.  Your script should be two lines long, the \setrandom one and
-   the
- - SELECT.
- -
-
- Oh perfect, I can try that too. thanks
-
- - The thing that's really missing from your comments so far is the cold
- - vs. hot cache issue:  at the point when you're running pgbench, is a lot
-
- I'm testing with a cold cache because most likely the way the items are
- spead out, of those 400 users only a few at a time might access similar
- items.
-
- - Wait until Monday, I'm announcing some pgbench tools at PG East this
- - weekend that will take care of all this as well as things like
- - graphing. It pushes all the info pgbench returns, including the latency
- - information, into a database and generates a big stack of derived reports.
- - I'd rather see you help improve that than reinvent this particular wheel.
-
- Ah very cool, wish i could go (but i'm on the west coast).
-
-
- Thanks again guys.
-
- Dave Kerr
-
-
- --
- Sent via pgsql-performance mailing list ()
- To make changes to your subscription:
- http://www.postgresql.org/mailpref/pgsql-performance

From:
Greg Smith
Date:

On Fri, 3 Apr 2009, Tom Lane wrote:

> However, I don't think anyone else has been pgbench'ing transactions
> where client-side libpq has to absorb (and then discard) a megabyte of
> data per xact.  I wouldn't be surprised that that eats enough CPU to
> make it an issue.  David, did you pay any attention to how busy the
> pgbench process was?

I certainly haven't ever tried that.  David, the thing you want to do here
is run "top -c" when pgbench is going.  You should see the pgbench process
and a bunch of postmaster ones, with "-c" (or by hitting "c" while top is
running) you can even see what they're all doing.  If the pgbench process
is consuming close to 100% of a CPU's time, that means the results it's
giving are not valid--what you're seeing in that case are the limitations
of the testing program instead.

You can even automate collection of that with something like this:

top -b -d 10 -c -n 10000 > top.log &
TOPPID=$!
(run test)
kill $TOPPID

That will save a snapshot every 10 seconds of what's happening during your
test.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

From:
David Kerr
Date:

On Fri, Apr 03, 2009 at 10:35:58PM -0400, Greg Smith wrote:
- On Fri, 3 Apr 2009, Tom Lane wrote:
-
- and a bunch of postmaster ones, with "-c" (or by hitting "c" while top is
- running) you can even see what they're all doing.  If the pgbench process
- is consuming close to 100% of a CPU's time, that means the results it's
- giving are not valid--what you're seeing in that case are the limitations
- of the testing program instead.

Looks pretty good to me. not too much mem or CPU.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4241 postgres  20   0  9936 2652 1680 S    0  0.1   0:00.02 pgbench
 4241 postgres  20   0 22948  10m 1708 R    4  0.3   0:00.46 pgbench
 4241 postgres  20   0 26628  14m 1708 R    5  0.3   0:00.96 pgbench
 4241 postgres  20   0 29160  15m 1708 R    5  0.4   0:01.44 pgbench
 4241 postgres  20   0 30888  16m 1708 R    4  0.4   0:01.86 pgbench
 4241 postgres  20   0 31624  17m 1708 R    5  0.4   0:02.34 pgbench
 4241 postgres  20   0 32552  18m 1708 R    5  0.5   0:02.82 pgbench
 4241 postgres  20   0 33160  18m 1708 R    5  0.5   0:03.28 pgbench
 4241 postgres  20   0 33608  18m 1708 R    4  0.5   0:03.70 pgbench
 4241 postgres  20   0 34056  19m 1708 R    4  0.5   0:04.08 pgbench
 4241 postgres  20   0 34056  19m 1708 R    4  0.5   0:04.52 pgbench
 4241 postgres  20   0 34376  19m 1708 R    4  0.5   0:04.98 pgbench
 4241 postgres  20   0 34536  19m 1708 R    4  0.5   0:05.42 pgbench
 4241 postgres  20   0 34536  19m 1708 R    5  0.5   0:05.88 pgbench
 4241 postgres  20   0 34664  19m 1708 R    5  0.5   0:06.34 pgbench
 4241 postgres  20   0 34664  19m 1708 R    5  0.5   0:06.82 pgbench
 4241 postgres  20   0 34664  19m 1708 R    4  0.5   0:07.26 pgbench
 4241 postgres  20   0 34664  20m 1708 R    4  0.5   0:07.72 pgbench
 4241 postgres  20   0 34664  20m 1708 R    4  0.5   0:08.12 pgbench

Dave

From:
Simon Riggs
Date:

On Fri, 2009-04-03 at 16:34 -0700, David Kerr wrote:
> 400 concurrent users doesn't mean that they're pulling 1.5 megs /
> second every second. Just that they could potentially pull 1.5 megs at
> any one second. most likely there is a 6 (minimum) to 45 second
> (average) gap  between each individual user's pull.

There's a world of difference between 400 connected and 400 concurrent
users. You've been testing 400 concurrent users, yet without measuring
data transfer. The think time will bring the number of users right down
again, but you really need to include the much higher than normal data
transfer into your measurements and pgbench won't help there.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


From:
Tom Lane
Date:

Simon Riggs <> writes:
> On Fri, 2009-04-03 at 16:34 -0700, David Kerr wrote:
>> 400 concurrent users doesn't mean that they're pulling 1.5 megs /
>> second every second.

> There's a world of difference between 400 connected and 400 concurrent
> users. You've been testing 400 concurrent users, yet without measuring
> data transfer. The think time will bring the number of users right down
> again, but you really need to include the much higher than normal data
> transfer into your measurements and pgbench won't help there.

Actually pgbench can simulate think time perfectly well: use its \sleep
command in your script.  I think you can even set it up to randomize the
sleep time.

I agree that it seems David has been measuring a case far beyond what
his real problem is.

            regards, tom lane

From:
David Kerr
Date:

Tom Lane wrote:
> Simon Riggs <> writes:
>> On Fri, 2009-04-03 at 16:34 -0700, David Kerr wrote:
>>> 400 concurrent users doesn't mean that they're pulling 1.5 megs /
>>> second every second.
>
>> There's a world of difference between 400 connected and 400 concurrent
>> users. You've been testing 400 concurrent users, yet without measuring
>> data transfer. The think time will bring the number of users right down
>> again, but you really need to include the much higher than normal data
>> transfer into your measurements and pgbench won't help there.
>
> Actually pgbench can simulate think time perfectly well: use its \sleep
> command in your script.  I think you can even set it up to randomize the
> sleep time.
>
> I agree that it seems David has been measuring a case far beyond what
> his real problem is.
>
>             regards, tom lane
>

Fortunately the network throughput issue is not mine to solve.

Would it be fair to say that with the pgbench output i've given so far
that if all my users clicked "go" at the same time (i.e., worst case
scenario), i could expect (from the database) about 8 second response time?

Thanks

Dave Kerr

From:
Tom Lane
Date:

David Kerr <> writes:
> Fortunately the network throughput issue is not mine to solve.

> Would it be fair to say that with the pgbench output i've given so far
> that if all my users clicked "go" at the same time (i.e., worst case
> scenario), i could expect (from the database) about 8 second response time?

For the hardware you've got, and ignoring the network bandwidth issue,
that appears to be a fair estimate of the worst case response.

            regards, tom lane