Re: Question on pgbench output - Mailing list pgsql-performance

From Greg Smith
Subject Re: Question on pgbench output
Date
Msg-id alpine.GSO.2.01.0904031721570.5502@westnet.com
Whole thread Raw
In response to Question on pgbench output  (David Kerr <dmk@mr-paradox.net>)
Responses Re: Question on pgbench output  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Fri, 3 Apr 2009, David Kerr wrote:

> Here is my transaction file:
> \setrandom iid 1 50000
> BEGIN;
> SELECT content FROM test WHERE item_id = :iid;
> END;

Wrapping a SELECT in a BEGIN/END block is unnecessary, and it will
significantly slow down things for two reason:  the transactions overhead
and the time pgbench is spending parsing/submitting those additional
lines.  Your script should be two lines long, the \setrandom one and the
SELECT.

> trying to simulate 400 concurrent users performing 50 operations each
> which is consistant with my needs.

pgbench is extremely bad at simulating large numbers of clients.  The
pgbench client operates as a single thread that handles both parsing the
input files, sending things to clients, and processing their responses.
It's very easy to end up in a situation where that bottlenecks at the
pgbench client long before getting to 400 concurrent connections.

That said, if you're in the hundreds of transactions per second range that
probably isn't biting you yet.  I've seen it more once you get around
5000+ things per second going on.

> I'm not really sure how to evaulate the tps, I've read in this forum that
> some folks are getting 2k tps so this wouldn't appear to be good to me.

You can't compare what you're doing to what anybody else because your
item size is so big.  The standard pgbench transactions all involve very
small rows.

The thing that's really missing from your comments so far is the cold vs.
hot cache issue:  at the point when you're running pgbench, is a lot of
the data already in the PostgreSQL or OS buffer cache?  If you're starting
without any data in there, 50 TPS is completely reasonable--each SELECT
could potentially be pulling both data and some number of index blocks,
and the tests I was just doing yesterday (with a single disk drive)
started at about 40TPS.  By the time the test was finished running and the
caches were all full of useful data, it was 17K TPS instead.

> (I wrote a script to average the total transaction time for every record
> in the file)

Wait until Monday, I'm announcing some pgbench tools at PG East this
weekend that will take care of all this as well as things like graphing.
It pushes all the info pgbench returns, including the latency information,
into a database and generates a big stack of derived reports.  I'd rather
see you help improve that than reinvent this particular wheel.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Question on pgbench output
Next
From: Tom Lane
Date:
Subject: Re: Question on pgbench output