Thread: Best way to load test a postgresql server

Best way to load test a postgresql server

From

"Peter Sheats"

Date:

01 June 2009, 13:08:10

Hi,

I’m about to set up a large instance on Amazon EC2 to be our DB server.

Before we switch to using it in production I would like to simulate some load on it so that I know what it can handle and so that I can make sure I have the optimal settings in the config file.

What is the best strategy out there for doing this? Does anyone know of some resource that talks about doing this?

Thanks,

Peter

Re: Best way to load test a postgresql server

From

Alan McKay

Date:

01 June 2009, 13:27:30

Disclaimer : I'm very much a newbie here!

But I am on the path in my new job to figure this stuff out as well,
and went to PG Con here in Ottawa 2 weeks ago and attended quite a few
lectures on this topic.  Have a look at :

http://wiki.postgresql.org/wiki/PgCon_2009

And in particular "Database Hardware Benchmarking" by Greg Smith
and
"Visualizing Postgres" by Michael Glaesmann
"Performance Whack-a-Mole" by Josh Berkus

--
“Mother Nature doesn’t do bailouts.”
         - Glenn Prickett

Re: Best way to load test a postgresql server

From

Dimitri Fontaine

Date:

02 June 2009, 06:28:16

Hi,  "Peter Sheats" <psheats@pbpost.com> writes: > I’m about to
set up a large instance on Amazon EC2 to be our DB server.  > >
Before we switch to using it in production I would like to
simulate some load on it so that I know what it can handle and so
that I can make sure I have the > optimal settings in the config
file.  > > What is the best strategy out there for doing this?
Does anyone know of some resource that talks about doing this?
I'd recommand having a look at tsung which will be able to replay
a typical application scenario with as many concurrent users as
you want to:
http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
  http://tsung.erlang-projects.org/
  http://pgfouine.projects.postgresql.org/tsung.html

If you want to replay your logs at the current production speed
and
concurrency, see Playr.
  https://area51.myyearbook.com/trac.cgi/wiki/Playr

Regards,
--
dim

Re: Best way to load test a postgresql server

From

"Kenneth Cox"

Date:

02 June 2009, 10:02:54

On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine
<dfontaine@hi-media.com> wrote:
> I'd recommand having a look at tsung which will be able to replay a
> typical application scenario with as many concurrent users as you want
> to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
>   http://tsung.erlang-projects.org/
>   http://pgfouine.projects.postgresql.org/tsung.html

I am having a look at tsung and not getting very far yet.  Have you had
luck with it and do you really mean as many concurrent users as you want?
I was hoping to use it to simulate my current load while tuning and making
improvements.  So far tsung doesn't appear well suited to my needs.  I use
persistent connections; each tsung session uses a new connection.  I have
multiple applications that have very usage patterns (some web and largely
idle, some non web and almost saturated); tsung has virtual users choosing
a session based on a probability with think times.  I know many
programming languages; tsung (and its error messages) is in erlang.

> If you want to replay your logs at the current production speed and
> concurrency, see Playr.
>   https://area51.myyearbook.com/trac.cgi/wiki/Playr

Thanks for this tip.  It seems worth a look.

Regards,
Ken

Re: Best way to load test a postgresql server

From

Shaul Dar

Date:

02 June 2009, 10:31:12

Hi Peter,

I was looking for the same recently, and my answer is as follows:

1. If you want to test the H/W and configuration of your DBMS then you can use the pgbench tool (which uses a specific built-in DB+schema, following the TPC benchmark).

2. If you want to load test your own specific DB then I am unaware of any such tools. I ended up using JMeter with the JDBC connector for Postgresql. It took me a while to get it configured and running, but I now think JMeter is excellent. I suggest you use JMeter 2.3.2, as I upgraded to 2.3.3 and it seems to have a bug with JDBC connection to Postgres.

-- Shaul

On Mon, Jun 1, 2009 at 6:55 PM, Peter Sheats <psheats@pbpost.com> wrote:

Hi,

I’m about to set up a large instance on Amazon EC2 to be our DB server.

Before we switch to using it in production I would like to simulate some load on it so that I know what it can handle and so that I can make sure I have the optimal settings in the config file.

What is the best strategy out there for doing this? Does anyone know of some resource that talks about doing this?

Thanks,

Peter

Re: Best way to load test a postgresql server

From

Dimitri Fontaine

Date:

03 June 2009, 06:26:37

"Kenneth Cox" <kenstir@gmail.com> writes:
> On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine
> <dfontaine@hi-media.com> wrote:
>> I'd recommand having a look at tsung which will be able to replay a
>> typical application scenario with as many concurrent users as you want
>> to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
>>   http://tsung.erlang-projects.org/
>>   http://pgfouine.projects.postgresql.org/tsung.html
>
> I am having a look at tsung and not getting very far yet.  Have you had luck
> with it and do you really mean as many concurrent users as you want?

Last time I used it it was in the context of a web application and to
compare PostgreSQL against Informix after a migration. So I used the
HTTP protocol support of the injector.

Tsung is based on erlang and can be run from more than one node at any
time, last time I checked you could run 600 to 800 concurrent clients
from each node. Recent versions of erlang allow a much greater number
per node, one or two orders of magnitude greater, as I've been told by
Tsung's main developer.

>   I was
> hoping to use it to simulate my current load while tuning and making
> improvements.  So far tsung doesn't appear well suited to my needs.  I use
> persistent connections; each tsung session uses a new connection.  I have
> multiple applications that have very usage patterns (some web and largely
> idle, some non web and almost saturated); tsung has virtual users choosing
> a session based on a probability with think times.  I know many  programming
> languages; tsung (and its error messages) is in erlang.

Tsung can be setup as an http or postgresql proxy: in this mode it'll
prepare session files for you while you use your application as
usual. The thinktime it sees will then get randomized at run time to
better reflect real usage.

You can define several user arrival phases to see what happens when the
load raises then get back to normal traffic. Lots of options, really.

Tsung generates statistics and comes with tools to analyze them and
provide graphs organized into a web page, one of those tools allow to
draw graphs from different simulations onto the same chart, with the
same scaling, in order to easily compare results.

It seems to me tsung is a good tool for your use case.

Regards,
--
dim

Re: Best way to load test a postgresql server

From

Shaul Dar

Date:

03 June 2009, 08:11:21

I considered Tsung myself but haven't tried it. If you intend to, I suggest you read this excellent tutorial on using Tsung for test-loading Postgresql. While impressed I decided the procedure was too daunting and went with JMeter :-) It too can run test from multiple clients and has built in tables and graphs and you can save results as CSV or XML etc. In particular I recommend adding the extenion "listener" (JMeter term for anything that captures and portrays test results) called Statitical Aggregate Report.

May the force be with you,

-- Shaul

On Wed, Jun 3, 2009 at 12:29 PM, Dimitri Fontaine <dfontaine@hi-media.com> wrote:

"Kenneth Cox" <kenstir@gmail.com> writes:
> On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine
> <dfontaine@hi-media.com> wrote:
>> I'd recommand having a look at tsung which will be able to replay a
>> typical application scenario with as many concurrent users as you want
>> to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
>> http://tsung.erlang-projects.org/
>> http://pgfouine.projects.postgresql.org/tsung.html
>
> I am having a look at tsung and not getting very far yet. Have you had luck
> with it and do you really mean as many concurrent users as you want?

Last time I used it it was in the context of a web application and to
compare PostgreSQL against Informix after a migration. So I used the
HTTP protocol support of the injector.

Tsung is based on erlang and can be run from more than one node at any
time, last time I checked you could run 600 to 800 concurrent clients
from each node. Recent versions of erlang allow a much greater number
per node, one or two orders of magnitude greater, as I've been told by
Tsung's main developer.

> I was
> hoping to use it to simulate my current load while tuning and making
> improvements. So far tsung doesn't appear well suited to my needs. I use
> persistent connections; each tsung session uses a new connection. I have
> multiple applications that have very usage patterns (some web and largely
> idle, some non web and almost saturated); tsung has virtual users choosing
> a session based on a probability with think times. I know many programming
> languages; tsung (and its error messages) is in erlang.

Tsung can be setup as an http or postgresql proxy: in this mode it'll
prepare session files for you while you use your application as
usual. The thinktime it sees will then get randomized at run time to
better reflect real usage.

You can define several user arrival phases to see what happens when the
load raises then get back to normal traffic. Lots of options, really.

Tsung generates statistics and comes with tools to analyze them and
provide graphs organized into a web page, one of those tools allow to
draw graphs from different simulations onto the same chart, with the
same scaling, in order to easily compare results.

It seems to me tsung is a good tool for your use case.

Regards,
--
dim

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Best way to load test a postgresql server

From

"Kenneth Cox"

Date:

03 June 2009, 10:09:23

On Wed, 03 Jun 2009 05:29:02 -0400, Dimitri Fontaine
<dfontaine@hi-media.com> wrote:
> Last time I used it it was in the context of a web application and to
> compare PostgreSQL against Informix after a migration. So I used the
> HTTP protocol support of the injector.

Tsung seems well suited for that.

> Tsung is based on erlang...you could run 600 to 800 concurrent clients
> from each node.

But each tsung session (virtual user) uses a separate PG connection, and I
need 30k virtual users.  I can't imagine 30k PG connections.  I could
imagine using pgbouncer in statement pooling mode, but that doesn't
characterize my load well, where different PG connections have different
profiles.  I have about 500 connections:

    ~450 from web servers, often idle, various work loads, no prepared
statements
      50 from another client, mostly idle, small set of prepared statements
      10 from another client, extremely active, small set of prepared
statements

I know a tsung session doesn't have to exactly mimic a user and I tried to
coerce a tsung session to represent instead a DB client, with loops and
multiple CSV files.  I wasn't so successful there, and was nagged by the
assignment of sessions by probability, when I wanted a fixed number
running each session.

I do appreciate the suggestions, and I agree Tsung has lots of nifty
features.  I used pgfouine to generate tsung sessions I love the graph
generation but for me it comes down to simulating my DB load so that I can
profile and tune the DB.  I am not seeing how to get tsung to fit my case.

Next up I will try JMeter (thanks Shaul Dar for the suggestions).

Regards,
Ken

Re: Best way to load test a postgresql server

From

Greg Smith

Date:

03 June 2009, 20:01:42

On Tue, 2 Jun 2009, Shaul Dar wrote:

> If you want to test the H/W and configuration of your DBMS then you can
> use the pgbench tool (which uses a specific built-in DB+schema,
> following the TPC benchmark).

There are a lot of TPC benchmarks.  pgbench simulates TPC-B (badly), which
is a benchmark from 1990.  It's not at all representative of the current
TPC benchmarks.

> If you want to load test your own specific DB then I am unaware of any
> such tools.

pgbench will run against any schema and queries, the built-in set are just
the easiest to use.  I just released a bunch of slides and a package I
named pgbench-tools that show some of the possibilities here, links to
everything are at:
http://notemagnet.blogspot.com/2009/05/bottom-up-postgresql-benchmarking-and.html

I'd mentioned working on that this before on this list but the code just
got stable enough to release recently.  Anybody who is running lots of
pgbench tests at different database sizes and client loads might benefit
from using my toolset to automate running the tests and reporting on the
results.

The last few slides of my pgbench presentation show how you might write a
custom test that measures how fast rows of various sizes can be inserted
into your database at various client counts.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Best way to load test a postgresql server

From

Shaul Dar

Date:

09 June 2009, 09:50:19

On Thu, Jun 4, 2009 at 2:01 AM, Greg Smith <gsmith@gregsmith.com> wrote:

If you want to load test your own specific DB then I am unaware of any such tools.

pgbench will run against any schema and queries, the built-in set are just the easiest to use. I just released a bunch of slides and a package I named pgbench-tools that show some of the possibilities here, links to everything are at: http://notemagnet.blogspot.com/2009/05/bottom-up-postgresql-benchmarking-and.html

ׂGreg,

Have you actually run pgbench against your own schema? Can you point me to an example? I also had the same impression reading the documentation. But when I tried it with the proper flags to use my own DB and query file I got an error that it couldn't find one of the tables mentioned in the built-in test! I concluded that I cannot use any schema, I could only supply my own DB but with the same set of tables pgbench expects. Maybe I missed something or made a mistake?

Thanks,

-- Shaul

Re: Best way to load test a postgresql server

From

Tom Lane

Date:

09 June 2009, 10:53:43

Shaul Dar <shauldar@gmail.com> writes:
> Have you actually run pgbench against your own schema? Can you point me to
> an example? I also had the same impression reading the documentation. But
> when I tried it with the proper flags to use my own DB and query file I got
> an error that it couldn't find one of the tables mentioned in the built-in
> test! I concluded that I cannot use any schema,

No, you just need to read the documentation.  There's a switch that
prevents the default action of trying to vacuum the "standard" tables.
I think -N, but too lazy to look ...

            regards, tom lane

Re: Best way to load test a postgresql server

From

Erik Aronesty

Date:

11 June 2009, 18:30:03

Technically you can then use pgbench on that set of statements, but I
usually just use perl's "Benchmark" module.... (i'm sure ruby or java
or whatever has a similar tool)

(First, I log statements by loading the application or web server with
statement logging turned on.... so I'm not "guessing" what sql will be
called.   Usually doing this exposes a flotilla of inefficencies in
the code ....)

On Tue, Jun 9, 2009 at 9:53 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Shaul Dar <shauldar@gmail.com> writes:
>> Have you actually run pgbench against your own schema? Can you point me to
>> an example? I also had the same impression reading the documentation. But
>> when I tried it with the proper flags to use my own DB and query file I got
>> an error that it couldn't find one of the tables mentioned in the built-in
>> test! I concluded that I cannot use any schema,
>
> No, you just need to read the documentation.  There's a switch that
> prevents the default action of trying to vacuum the "standard" tables.
> I think -N, but too lazy to look ...
>
>                        regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Sorting by an arbitrary criterion

From

Craig James

Date:

09 July 2009, 13:26:59

Suppose I have a large table with a small-cardinality CATEGORY column (say, categories 1..5).  I need to sort by an
arbitrary(i.e. user-specified) mapping of CATEGORY, something like this: 

  1 => 'z'
  2 => 'a'
  3 => 'b'
  4 => 'w'
  5 => 'h'

So when I get done, the sort order should be 2,3,5,4,1.

I could create a temporary table with the category-to-key mapping, but is there any way to do this in a single SQL
statement?

Thanks,
Craig

Re: Sorting by an arbitrary criterion

From

Grzegorz Jaśkiewicz

Date:

09 July 2009, 13:35:15

On Thu, Jul 9, 2009 at 5:26 PM, Craig James<craig_james@emolecules.com> wrote:
> Suppose I have a large table with a small-cardinality CATEGORY column (say,
> categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
> mapping of CATEGORY, something like this:
>
>  1 => 'z'
>  2 => 'a'
>  3 => 'b'
>  4 => 'w'
>  5 => 'h'
>
> So when I get done, the sort order should be 2,3,5,4,1.
>
> I could create a temporary table with the category-to-key mapping, but is
> there any way to do this in a single SQL statement?
>

you can create translation table, join it, and sort by its key.


--
GJ

Re: Sorting by an arbitrary criterion

From

Tom Lane

Date:

09 July 2009, 13:38:54

=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
> On Thu, Jul 9, 2009 at 5:26 PM, Craig James<craig_james@emolecules.com> wrote:
>> Suppose I have a large table with a small-cardinality CATEGORY column (say,
>> categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
>> mapping of CATEGORY, something like this:

> you can create translation table, join it, and sort by its key.

Much easier to
    ORDER BY CASE category WHEN 'z' THEN 1 WHEN 'a' THEN 2 ... END

Actually, consider putting the CASE into a function and doing
    ORDER BY sort_order(category)

            regards, tom lane

Re: Sorting by an arbitrary criterion

From

"Kevin Grittner"

Date:

09 July 2009, 13:39:57

Craig James <craig_james@emolecules.com> wrote:
> Suppose I have a large table with a small-cardinality CATEGORY
> column (say, categories 1..5).  I need to sort by an arbitrary
> (i.e. user-specified) mapping of CATEGORY

There was a recent thread discussing ways to do that:

http://archives.postgresql.org/pgsql-admin/2009-07/msg00016.php

-Kevin

Re: Sorting by an arbitrary criterion

From

Alexander Staubo

Date:

09 July 2009, 13:40:01

On Thu, Jul 9, 2009 at 6:26 PM, Craig James<craig_james@emolecules.com> wrote:
> Suppose I have a large table with a small-cardinality CATEGORY column (say,
> categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
> mapping of CATEGORY, something like this:
>
>  1 => 'z'
>  2 => 'a'
>  3 => 'b'
>  4 => 'w'
>  5 => 'h'
>
> So when I get done, the sort order should be 2,3,5,4,1.

If the object is to avoid a separate table, you can do it with a
"case" statement:

  select ... from ...
  order by case category
    when 1 then 'z'
    when 2 then 'a'
    when 3 then 'b'
    when 4 then 'w'
    when 5 then 'h'
  end

If you this sounds slow, you're right. But it might perform well
enough for your use case.

A.

Re: Sorting by an arbitrary criterion

From

hubert depesz lubaczewski

Date:

09 July 2009, 15:11:39

On Thu, Jul 09, 2009 at 09:26:42AM -0700, Craig James wrote:
> Suppose I have a large table with a small-cardinality CATEGORY column (say, categories 1..5).  I need to sort by an
arbitrary(i.e. user-specified) mapping of CATEGORY, something like this: 
>
>  1 => 'z'
>  2 => 'a'
>  3 => 'b'
>  4 => 'w'
>  5 => 'h'
> So when I get done, the sort order should be 2,3,5,4,1.
> I could create a temporary table with the category-to-key mapping, but is there any way to do this in a single SQL
statement?

You can do it like this:

select c.*
from categories c, ( values (1, 'z'), (2, 'a'), (3, 'b'), (4, 'w'), (5, 'h') ) as o (id, ordering) on c.id = o.id
order by o.ordering

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: Sorting by an arbitrary criterion

From

"Hartman, Matthew"

Date:

09 July 2009, 15:20:27

> On Thu, Jul 09, 2009 at 09:26:42AM -0700, Craig James wrote:
> You can do it like this:
> select c.*
> from categories c, ( values (1, 'z'), (2, 'a'), (3, 'b'), (4, 'w'),
(5,
> 'h') ) as o (id, ordering) on c.id = o.id
> order by o.ordering

Another option would be:

select c.*
from categories c
order by case(c.category) when 1 then 'z' when 2 then 'a' then 3 then
'b' when 4 then 'w' when 5 then 'h' end;

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital

Re: Sorting by an arbitrary criterion

From

Grzegorz Jaśkiewicz

Date:

09 July 2009, 17:25:35

2009/7/9 Tom Lane <tgl@sss.pgh.pa.us>:
> =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
>> On Thu, Jul 9, 2009 at 5:26 PM, Craig James<craig_james@emolecules.com> wrote:
>>> Suppose I have a large table with a small-cardinality CATEGORY column (say,
>>> categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
>>> mapping of CATEGORY, something like this:
>
>> you can create translation table, join it, and sort by its key.
>
> Much easier to
>        ORDER BY CASE category WHEN 'z' THEN 1 WHEN 'a' THEN 2 ... END
>
> Actually, consider putting the CASE into a function and doing
>        ORDER BY sort_order(category)

I suppose table is handy, when you have a lot of items as keys...



--
GJ