Thread: Best way to load test a postgresql server
Hi,
I’m about to set up a large instance on Amazon EC2 to be our DB server.
Before we switch to using it in production I would like to simulate some load on it so that I know what it can handle and so that I can make sure I have the optimal settings in the config file.
What is the best strategy out there for doing this? Does anyone know of some resource that talks about doing this?
Thanks,
Peter
I’m about to set up a large instance on Amazon EC2 to be our DB server.
Before we switch to using it in production I would like to simulate some load on it so that I know what it can handle and so that I can make sure I have the optimal settings in the config file.
What is the best strategy out there for doing this? Does anyone know of some resource that talks about doing this?
Thanks,
Peter
Disclaimer : I'm very much a newbie here! But I am on the path in my new job to figure this stuff out as well, and went to PG Con here in Ottawa 2 weeks ago and attended quite a few lectures on this topic. Have a look at : http://wiki.postgresql.org/wiki/PgCon_2009 And in particular "Database Hardware Benchmarking" by Greg Smith and "Visualizing Postgres" by Michael Glaesmann "Performance Whack-a-Mole" by Josh Berkus -- “Mother Nature doesn’t do bailouts.” - Glenn Prickett
Hi, "Peter Sheats" <psheats@pbpost.com> writes: > I’m about to set up a large instance on Amazon EC2 to be our DB server. > > Before we switch to using it in production I would like to simulate some load on it so that I know what it can handle and so that I can make sure I have the > optimal settings in the config file. > > What is the best strategy out there for doing this? Does anyone know of some resource that talks about doing this? I'd recommand having a look at tsung which will be able to replay a typical application scenario with as many concurrent users as you want to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php http://tsung.erlang-projects.org/ http://pgfouine.projects.postgresql.org/tsung.html If you want to replay your logs at the current production speed and concurrency, see Playr. https://area51.myyearbook.com/trac.cgi/wiki/Playr Regards, -- dim
On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > I'd recommand having a look at tsung which will be able to replay a > typical application scenario with as many concurrent users as you want > to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php > http://tsung.erlang-projects.org/ > http://pgfouine.projects.postgresql.org/tsung.html I am having a look at tsung and not getting very far yet. Have you had luck with it and do you really mean as many concurrent users as you want? I was hoping to use it to simulate my current load while tuning and making improvements. So far tsung doesn't appear well suited to my needs. I use persistent connections; each tsung session uses a new connection. I have multiple applications that have very usage patterns (some web and largely idle, some non web and almost saturated); tsung has virtual users choosing a session based on a probability with think times. I know many programming languages; tsung (and its error messages) is in erlang. > If you want to replay your logs at the current production speed and > concurrency, see Playr. > https://area51.myyearbook.com/trac.cgi/wiki/Playr Thanks for this tip. It seems worth a look. Regards, Ken
Hi Peter,
I was looking for the same recently, and my answer is as follows:
1. If you want to test the H/W and configuration of your DBMS then you can use the pgbench tool (which uses a specific built-in DB+schema, following the TPC benchmark).
2. If you want to load test your own specific DB then I am unaware of any such tools. I ended up using JMeter with the JDBC connector for Postgresql. It took me a while to get it configured and running, but I now think JMeter is excellent. I suggest you use JMeter 2.3.2, as I upgraded to 2.3.3 and it seems to have a bug with JDBC connection to Postgres.
-- Shaul
I was looking for the same recently, and my answer is as follows:
1. If you want to test the H/W and configuration of your DBMS then you can use the pgbench tool (which uses a specific built-in DB+schema, following the TPC benchmark).
2. If you want to load test your own specific DB then I am unaware of any such tools. I ended up using JMeter with the JDBC connector for Postgresql. It took me a while to get it configured and running, but I now think JMeter is excellent. I suggest you use JMeter 2.3.2, as I upgraded to 2.3.3 and it seems to have a bug with JDBC connection to Postgres.
-- Shaul
On Mon, Jun 1, 2009 at 6:55 PM, Peter Sheats <psheats@pbpost.com> wrote:
Hi,
I’m about to set up a large instance on Amazon EC2 to be our DB server.
Before we switch to using it in production I would like to simulate some load on it so that I know what it can handle and so that I can make sure I have the optimal settings in the config file.
What is the best strategy out there for doing this? Does anyone know of some resource that talks about doing this?
Thanks,
Peter
"Kenneth Cox" <kenstir@gmail.com> writes: > On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine > <dfontaine@hi-media.com> wrote: >> I'd recommand having a look at tsung which will be able to replay a >> typical application scenario with as many concurrent users as you want >> to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php >> http://tsung.erlang-projects.org/ >> http://pgfouine.projects.postgresql.org/tsung.html > > I am having a look at tsung and not getting very far yet. Have you had luck > with it and do you really mean as many concurrent users as you want? Last time I used it it was in the context of a web application and to compare PostgreSQL against Informix after a migration. So I used the HTTP protocol support of the injector. Tsung is based on erlang and can be run from more than one node at any time, last time I checked you could run 600 to 800 concurrent clients from each node. Recent versions of erlang allow a much greater number per node, one or two orders of magnitude greater, as I've been told by Tsung's main developer. > I was > hoping to use it to simulate my current load while tuning and making > improvements. So far tsung doesn't appear well suited to my needs. I use > persistent connections; each tsung session uses a new connection. I have > multiple applications that have very usage patterns (some web and largely > idle, some non web and almost saturated); tsung has virtual users choosing > a session based on a probability with think times. I know many programming > languages; tsung (and its error messages) is in erlang. Tsung can be setup as an http or postgresql proxy: in this mode it'll prepare session files for you while you use your application as usual. The thinktime it sees will then get randomized at run time to better reflect real usage. You can define several user arrival phases to see what happens when the load raises then get back to normal traffic. Lots of options, really. Tsung generates statistics and comes with tools to analyze them and provide graphs organized into a web page, one of those tools allow to draw graphs from different simulations onto the same chart, with the same scaling, in order to easily compare results. It seems to me tsung is a good tool for your use case. Regards, -- dim
I considered Tsung myself but haven't tried it. If you intend to, I suggest you read this excellent tutorial on using Tsung for test-loading Postgresql. While impressed I decided the procedure was too daunting and went with JMeter :-) It too can run test from multiple clients and has built in tables and graphs and you can save results as CSV or XML etc. In particular I recommend adding the extenion "listener" (JMeter term for anything that captures and portrays test results) called Statitical Aggregate Report.
May the force be with you,
-- Shaul
May the force be with you,
-- Shaul
On Wed, Jun 3, 2009 at 12:29 PM, Dimitri Fontaine <dfontaine@hi-media.com> wrote:
"Kenneth Cox" <kenstir@gmail.com> writes:Last time I used it it was in the context of a web application and to
> On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine
> <dfontaine@hi-media.com> wrote:
>> I'd recommand having a look at tsung which will be able to replay a
>> typical application scenario with as many concurrent users as you want
>> to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
>> http://tsung.erlang-projects.org/
>> http://pgfouine.projects.postgresql.org/tsung.html
>
> I am having a look at tsung and not getting very far yet. Have you had luck
> with it and do you really mean as many concurrent users as you want?
compare PostgreSQL against Informix after a migration. So I used the
HTTP protocol support of the injector.
Tsung is based on erlang and can be run from more than one node at any
time, last time I checked you could run 600 to 800 concurrent clients
from each node. Recent versions of erlang allow a much greater number
per node, one or two orders of magnitude greater, as I've been told by
Tsung's main developer.Tsung can be setup as an http or postgresql proxy: in this mode it'll
> I was
> hoping to use it to simulate my current load while tuning and making
> improvements. So far tsung doesn't appear well suited to my needs. I use
> persistent connections; each tsung session uses a new connection. I have
> multiple applications that have very usage patterns (some web and largely
> idle, some non web and almost saturated); tsung has virtual users choosing
> a session based on a probability with think times. I know many programming
> languages; tsung (and its error messages) is in erlang.
prepare session files for you while you use your application as
usual. The thinktime it sees will then get randomized at run time to
better reflect real usage.
You can define several user arrival phases to see what happens when the
load raises then get back to normal traffic. Lots of options, really.
Tsung generates statistics and comes with tools to analyze them and
provide graphs organized into a web page, one of those tools allow to
draw graphs from different simulations onto the same chart, with the
same scaling, in order to easily compare results.
It seems to me tsung is a good tool for your use case.
Regards,
--
dim
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Wed, 03 Jun 2009 05:29:02 -0400, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > Last time I used it it was in the context of a web application and to > compare PostgreSQL against Informix after a migration. So I used the > HTTP protocol support of the injector. Tsung seems well suited for that. > Tsung is based on erlang...you could run 600 to 800 concurrent clients > from each node. But each tsung session (virtual user) uses a separate PG connection, and I need 30k virtual users. I can't imagine 30k PG connections. I could imagine using pgbouncer in statement pooling mode, but that doesn't characterize my load well, where different PG connections have different profiles. I have about 500 connections: ~450 from web servers, often idle, various work loads, no prepared statements 50 from another client, mostly idle, small set of prepared statements 10 from another client, extremely active, small set of prepared statements I know a tsung session doesn't have to exactly mimic a user and I tried to coerce a tsung session to represent instead a DB client, with loops and multiple CSV files. I wasn't so successful there, and was nagged by the assignment of sessions by probability, when I wanted a fixed number running each session. I do appreciate the suggestions, and I agree Tsung has lots of nifty features. I used pgfouine to generate tsung sessions I love the graph generation but for me it comes down to simulating my DB load so that I can profile and tune the DB. I am not seeing how to get tsung to fit my case. Next up I will try JMeter (thanks Shaul Dar for the suggestions). Regards, Ken
On Tue, 2 Jun 2009, Shaul Dar wrote: > If you want to test the H/W and configuration of your DBMS then you can > use the pgbench tool (which uses a specific built-in DB+schema, > following the TPC benchmark). There are a lot of TPC benchmarks. pgbench simulates TPC-B (badly), which is a benchmark from 1990. It's not at all representative of the current TPC benchmarks. > If you want to load test your own specific DB then I am unaware of any > such tools. pgbench will run against any schema and queries, the built-in set are just the easiest to use. I just released a bunch of slides and a package I named pgbench-tools that show some of the possibilities here, links to everything are at: http://notemagnet.blogspot.com/2009/05/bottom-up-postgresql-benchmarking-and.html I'd mentioned working on that this before on this list but the code just got stable enough to release recently. Anybody who is running lots of pgbench tests at different database sizes and client loads might benefit from using my toolset to automate running the tests and reporting on the results. The last few slides of my pgbench presentation show how you might write a custom test that measures how fast rows of various sizes can be inserted into your database at various client counts. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Thu, Jun 4, 2009 at 2:01 AM, Greg Smith <gsmith@gregsmith.com> wrote:
ׂGreg,
Have you actually run pgbench against your own schema? Can you point me to an example? I also had the same impression reading the documentation. But when I tried it with the proper flags to use my own DB and query file I got an error that it couldn't find one of the tables mentioned in the built-in test! I concluded that I cannot use any schema, I could only supply my own DB but with the same set of tables pgbench expects. Maybe I missed something or made a mistake?
Thanks,
-- Shaul
pgbench will run against any schema and queries, the built-in set are just the easiest to use. I just released a bunch of slides and a package I named pgbench-tools that show some of the possibilities here, links to everything are at: http://notemagnet.blogspot.com/2009/05/bottom-up-postgresql-benchmarking-and.htmlIf you want to load test your own specific DB then I am unaware of any such tools.
ׂGreg,
Have you actually run pgbench against your own schema? Can you point me to an example? I also had the same impression reading the documentation. But when I tried it with the proper flags to use my own DB and query file I got an error that it couldn't find one of the tables mentioned in the built-in test! I concluded that I cannot use any schema, I could only supply my own DB but with the same set of tables pgbench expects. Maybe I missed something or made a mistake?
Thanks,
-- Shaul
Shaul Dar <shauldar@gmail.com> writes: > Have you actually run pgbench against your own schema? Can you point me to > an example? I also had the same impression reading the documentation. But > when I tried it with the proper flags to use my own DB and query file I got > an error that it couldn't find one of the tables mentioned in the built-in > test! I concluded that I cannot use any schema, No, you just need to read the documentation. There's a switch that prevents the default action of trying to vacuum the "standard" tables. I think -N, but too lazy to look ... regards, tom lane
Technically you can then use pgbench on that set of statements, but I usually just use perl's "Benchmark" module.... (i'm sure ruby or java or whatever has a similar tool) (First, I log statements by loading the application or web server with statement logging turned on.... so I'm not "guessing" what sql will be called. Usually doing this exposes a flotilla of inefficencies in the code ....) On Tue, Jun 9, 2009 at 9:53 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Shaul Dar <shauldar@gmail.com> writes: >> Have you actually run pgbench against your own schema? Can you point me to >> an example? I also had the same impression reading the documentation. But >> when I tried it with the proper flags to use my own DB and query file I got >> an error that it couldn't find one of the tables mentioned in the built-in >> test! I concluded that I cannot use any schema, > > No, you just need to read the documentation. There's a switch that > prevents the default action of trying to vacuum the "standard" tables. > I think -N, but too lazy to look ... > > regards, tom lane > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Suppose I have a large table with a small-cardinality CATEGORY column (say, categories 1..5). I need to sort by an arbitrary(i.e. user-specified) mapping of CATEGORY, something like this: 1 => 'z' 2 => 'a' 3 => 'b' 4 => 'w' 5 => 'h' So when I get done, the sort order should be 2,3,5,4,1. I could create a temporary table with the category-to-key mapping, but is there any way to do this in a single SQL statement? Thanks, Craig
On Thu, Jul 9, 2009 at 5:26 PM, Craig James<craig_james@emolecules.com> wrote: > Suppose I have a large table with a small-cardinality CATEGORY column (say, > categories 1..5). I need to sort by an arbitrary (i.e. user-specified) > mapping of CATEGORY, something like this: > > 1 => 'z' > 2 => 'a' > 3 => 'b' > 4 => 'w' > 5 => 'h' > > So when I get done, the sort order should be 2,3,5,4,1. > > I could create a temporary table with the category-to-key mapping, but is > there any way to do this in a single SQL statement? > you can create translation table, join it, and sort by its key. -- GJ
=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes: > On Thu, Jul 9, 2009 at 5:26 PM, Craig James<craig_james@emolecules.com> wrote: >> Suppose I have a large table with a small-cardinality CATEGORY column (say, >> categories 1..5). I need to sort by an arbitrary (i.e. user-specified) >> mapping of CATEGORY, something like this: > you can create translation table, join it, and sort by its key. Much easier to ORDER BY CASE category WHEN 'z' THEN 1 WHEN 'a' THEN 2 ... END Actually, consider putting the CASE into a function and doing ORDER BY sort_order(category) regards, tom lane
Craig James <craig_james@emolecules.com> wrote: > Suppose I have a large table with a small-cardinality CATEGORY > column (say, categories 1..5). I need to sort by an arbitrary > (i.e. user-specified) mapping of CATEGORY There was a recent thread discussing ways to do that: http://archives.postgresql.org/pgsql-admin/2009-07/msg00016.php -Kevin
On Thu, Jul 9, 2009 at 6:26 PM, Craig James<craig_james@emolecules.com> wrote: > Suppose I have a large table with a small-cardinality CATEGORY column (say, > categories 1..5). I need to sort by an arbitrary (i.e. user-specified) > mapping of CATEGORY, something like this: > > 1 => 'z' > 2 => 'a' > 3 => 'b' > 4 => 'w' > 5 => 'h' > > So when I get done, the sort order should be 2,3,5,4,1. If the object is to avoid a separate table, you can do it with a "case" statement: select ... from ... order by case category when 1 then 'z' when 2 then 'a' when 3 then 'b' when 4 then 'w' when 5 then 'h' end If you this sounds slow, you're right. But it might perform well enough for your use case. A.
On Thu, Jul 09, 2009 at 09:26:42AM -0700, Craig James wrote: > Suppose I have a large table with a small-cardinality CATEGORY column (say, categories 1..5). I need to sort by an arbitrary(i.e. user-specified) mapping of CATEGORY, something like this: > > 1 => 'z' > 2 => 'a' > 3 => 'b' > 4 => 'w' > 5 => 'h' > So when I get done, the sort order should be 2,3,5,4,1. > I could create a temporary table with the category-to-key mapping, but is there any way to do this in a single SQL statement? You can do it like this: select c.* from categories c, ( values (1, 'z'), (2, 'a'), (3, 'b'), (4, 'w'), (5, 'h') ) as o (id, ordering) on c.id = o.id order by o.ordering depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
> On Thu, Jul 09, 2009 at 09:26:42AM -0700, Craig James wrote: > You can do it like this: > select c.* > from categories c, ( values (1, 'z'), (2, 'a'), (3, 'b'), (4, 'w'), (5, > 'h') ) as o (id, ordering) on c.id = o.id > order by o.ordering Another option would be: select c.* from categories c order by case(c.category) when 1 then 'z' when 2 then 'a' then 3 then 'b' when 4 then 'w' when 5 then 'h' end; Matthew Hartman Programmer/Analyst Information Management, ICP Kingston General Hospital
2009/7/9 Tom Lane <tgl@sss.pgh.pa.us>: > =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes: >> On Thu, Jul 9, 2009 at 5:26 PM, Craig James<craig_james@emolecules.com> wrote: >>> Suppose I have a large table with a small-cardinality CATEGORY column (say, >>> categories 1..5). I need to sort by an arbitrary (i.e. user-specified) >>> mapping of CATEGORY, something like this: > >> you can create translation table, join it, and sort by its key. > > Much easier to > ORDER BY CASE category WHEN 'z' THEN 1 WHEN 'a' THEN 2 ... END > > Actually, consider putting the CASE into a function and doing > ORDER BY sort_order(category) I suppose table is handy, when you have a lot of items as keys... -- GJ