Thread: Any tool/script available which can be used to measure scalability of an application's database.

Dear @,

Is there any tool or some sort of script available, for PostgreSQL, which can be used to measure scalability of an application's database. Or is there any guideline on how to do this.

I am a bit confused about the concept of measuring scalability of an application's database.

How is the scalability measured?

Is it like loading the DB with a bulk data volume and then do performance testing by using tools like JMeter?

Could any one kindly help me on this..

Thanks,
        Sreejith.

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

On Tue, Jul 10, 2012 at 12:21 PM, Sreejith Balakrishnan
<sreejith.balakrishnan@tcs.com> wrote:
> Dear @,
>
> Is there any tool or some sort of script available, for PostgreSQL, which
> can be used to measure scalability of an application's database. Or is there
> any guideline on how to do this.

"scalability of an application's database" can be understood either
like a relation of transactions per second to database size or like an
ability of database to be sharded/partitioned or may be like something
else.

Could you please explain more specifically the original task?
What is the goal of it?

> I am a bit confused about the concept of measuring scalability of an
> application's database.
>
> How is the scalability measured?
>
> Is it like loading the DB with a bulk data volume and then do performance
> testing by using tools like JMeter?
>
> Could any one kindly help me on this..
>
> Thanks,
>         Sreejith.
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you



--
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +79160686204

On Tue, Jul 10, 2012 at 10:21 AM, Sreejith Balakrishnan
<sreejith.balakrishnan@tcs.com> wrote:
> Is there any tool or some sort of script available, for PostgreSQL, which
> can be used to measure scalability of an application's database. Or is there
> any guideline on how to do this.
>
> I am a bit confused about the concept of measuring scalability of an
> application's database.

You cannot measure scalability of a database as such.  You need to
know the nature of the load (i.e. operations executed against the DB -
how many INSERT, UPDATE, DELETE and SELECT, against which tables and
with what frequency and criteria).  And then, as Sergey said, you need
to define whether you want to scale up the load or the size - or both.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Dear Sergev,

We have around 15 to 18 separate products.What we are told to do is to check the scalability of the underlying DB of each product (application).
That's the requirement.Nothing more was explained to us.That's why I said earlier that I am confused on how to approach this.

Regards,
Sreejith.

On Jul 14, 2012 12:08 AM, "Sergey Konoplev" <gray.ru@gmail.com> wrote:
On Tue, Jul 10, 2012 at 12:21 PM, Sreejith Balakrishnan
<sreejith.balakrishnan@tcs.com> wrote:
> Dear @,
>
> Is there any tool or some sort of script available, for PostgreSQL, which
> can be used to measure scalability of an application's database. Or is there
> any guideline on how to do this.

"scalability of an application's database" can be understood either
like a relation of transactions per second to database size or like an
ability of database to be sharded/partitioned or may be like something
else.

Could you please explain more specifically the original task?
What is the goal of it?

> I am a bit confused about the concept of measuring scalability of an
> application's database.
>
> How is the scalability measured?
>
> Is it like loading the DB with a bulk data volume and then do performance
> testing by using tools like JMeter?
>
> Could any one kindly help me on this..
>
> Thanks,
>         Sreejith.
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you



--
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +79160686204

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Dear Robert,

We need to scale up both size and load.
Could you please provide steps I need to follow.

Warm regards,
Sreejith.

On Jul 14, 2012 1:37 AM, "Robert Klemme" <shortcutter@googlemail.com> wrote:
On Tue, Jul 10, 2012 at 10:21 AM, Sreejith Balakrishnan
<sreejith.balakrishnan@tcs.com> wrote:
> Is there any tool or some sort of script available, for PostgreSQL, which
> can be used to measure scalability of an application's database. Or is there
> any guideline on how to do this.
>
> I am a bit confused about the concept of measuring scalability of an
> application's database.

You cannot measure scalability of a database as such.  You need to
know the nature of the load (i.e. operations executed against the DB -
how many INSERT, UPDATE, DELETE and SELECT, against which tables and
with what frequency and criteria).  And then, as Sergey said, you need
to define whether you want to scale up the load or the size - or both.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On 07/14/2012 09:21 AM, B Sreejith wrote:

Dear Sergev,

We have around 15 to 18 separate products.What we are told to do is to check the scalability of the underlying DB of each product (application).
That's the requirement.Nothing more was explained to us.That's why I said earlier that I am confused on how to approach this.


Sounds like your client / boss has a case of buzz-word-itis. "Scalability" means lots of different things:

- How well it copes with growth of data sizes
- How well it copes with growth of query rates / activity
- How well it copes with larger user counts (may not be the same as prior)
- Whether it's easily sharded onto multiple systems
- Whether it has any locking choke-points that serialize common operations
- ....

Perhaps most importantly, your database is only as scalable as your application's use of it. Two apps can use exactly the same database structure, but one of them can struggle massively under load another one barely notices. For example, if one app does this (pseudocode):

SELECT id FROM customer WHERE ....
FOR attribute IN customer
   SELECT :attribute.name FROM customer WHERE id = :customer.id
   IF attribute.is_changed THEN
       UPDATE customer SET :attribute.name = :attribute.new_value WHERE id = :customer.id
   END IF

and another just does:

UPDATE customer
SET attribute1 = value1, attribute2 = value2, attribute3 = value3
WHERE ....


The first will totally melt down under load that isn't significantly different from idle as far as the second one is concerned.

That's a ridiculously bad example for the first app, but real examples that aren't much better arise from badly tuned or badly written object relational management systems. The classic "N+1 selects" problem and massive inefficient multiple left outer joins are classics.

Thus, you can't really evaluate the scalability of the database under load separately from the application that's using it and the workload.

--
Craig Ringer
On 07/14/2012 09:26 AM, B Sreejith wrote:

Dear Robert,

We need to scale up both size and load.
Could you please provide steps I need to follow.


For load, first you need to build a representative sample of your application's querying patterns by logging queries and analysing the logs. Produce a load generator based on that data, set up a test copy of your database, and start pushing the query rate up to see what happens.

For simpler loads you can write a transaction script for pgbench based on your queries.

For size: Copy your data set, then start duplicating it with munged copies. Repeat, then use the load generator you wrote for the first part to see how scaling the data up affects your queries. See if anything is unacceptably slow (the "auto_explain" module is useful here) and examine it.

The truth is that predicting how complex database driven apps will scale is insanely hard, because access patterns change as data sizes and user counts grow. You're likely to land up tuning for a scenario that's quite different to the one that you actually face when you start hitting scaling limitations. This doesn't mean you should not investigate, it just means your trials don't prove anything and the optimisations you make based on what you learn may not gain you much.

--
Craig Ringer
Hammerora is a good start but does have some issues when trying to get it started. You can also try PGBench. As someone said, there is a plethora of choices. It all depends on what you want to measure or accomplish.

John Jones

On Sat, Jul 14, 2012 at 1:48 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 07/14/2012 09:26 AM, B Sreejith wrote:

Dear Robert,

We need to scale up both size and load.
Could you please provide steps I need to follow.


For load, first you need to build a representative sample of your application's querying patterns by logging queries and analysing the logs. Produce a load generator based on that data, set up a test copy of your database, and start pushing the query rate up to see what happens.

For simpler loads you can write a transaction script for pgbench based on your queries.

For size: Copy your data set, then start duplicating it with munged copies. Repeat, then use the load generator you wrote for the first part to see how scaling the data up affects your queries. See if anything is unacceptably slow (the "auto_explain" module is useful here) and examine it.

The truth is that predicting how complex database driven apps will scale is insanely hard, because access patterns change as data sizes and user counts grow. You're likely to land up tuning for a scenario that's quite different to the one that you actually face when you start hitting scaling limitations. This doesn't mean you should not investigate, it just means your trials don't prove anything and the optimisations you make based on what you learn may not gain you much.

--
Craig Ringer

Dear All,
Thanks alot for all the invaluable comments.
Regards,
  Sreejith.

On Jul 14, 2012 2:19 PM, "Craig Ringer" <ringerc@ringerc.id.au> wrote:
On 07/14/2012 09:26 AM, B Sreejith wrote:

Dear Robert,

We need to scale up both size and load.
Could you please provide steps I need to follow.


For load, first you need to build a representative sample of your application's querying patterns by logging queries and analysing the logs. Produce a load generator based on that data, set up a test copy of your database, and start pushing the query rate up to see what happens.

For simpler loads you can write a transaction script for pgbench based on your queries.

For size: Copy your data set, then start duplicating it with munged copies. Repeat, then use the load generator you wrote for the first part to see how scaling the data up affects your queries. See if anything is unacceptably slow (the "auto_explain" module is useful here) and examine it.

The truth is that predicting how complex database driven apps will scale is insanely hard, because access patterns change as data sizes and user counts grow. You're likely to land up tuning for a scenario that's quite different to the one that you actually face when you start hitting scaling limitations. This doesn't mean you should not investigate, it just means your trials don't prove anything and the optimisations you make based on what you learn may not gain you much.

--
Craig Ringer
On Sat, Jul 14, 2012 at 11:50 AM, B Sreejith <bsreejithin@gmail.com> wrote:
> Dear All,
> Thanks alot for all the invaluable comments.

Additionally to Craig's excellent advice to measurements there's
something else you can do: with the knowledge of the queries your
application fires against the database you can evaluate your schema
and index definitions.  While there is no guarantee that your
application will scale well if all indexes are present you believe
need to be present based on that inspection, you can pretty easily
identify tables with can be improved.  These are tables which a) are
known to grow large and b) do not have indexes nor no indexes which
support the queries your application does against these tables which
will result in full table scans.  Any database which scales in size
will sooner or later hit a point where full table scans of these large
tables will be extremely slow.  If these queries are done during
regular operation (and not nightly maintenance windows for example)
then you pretty surely have identified a show stopper.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

> We have around 15 to 18 separate products.What we are told to do is to check
> the scalability of the underlying DB of each product (application).
>
>> Sounds like your client / boss has a case of buzz-word-itis. "Scalability"
>> means lots of different things:

Yes, it is still not clear what exactly they want from you, but that
is what I would do...

I would take the metrics Craig described. These ones:

> - How well it copes with growth of data sizes
> - How well it copes with growth of query rates / activity
> - How well it copes with larger user counts (may not be the same as prior)
- Also hard drives activity, CPU, etc

And started to collect this statistics using monitoring tools like
http://www.cacti.net/, for example.

After a week/month/quarter, as time passes and the database activity
and size changes, you will see how the measurements are changed
(usually degraded). So you would be able to make conclusions on
whether your environment meets current requirements or not and to
forecast critical points.

As Craig mentioned, you may also try to simulate your database
activity either with pgbench. I would just like to show you this
article http://www.westnet.com/~gsmith/content/postgresql/pgbench-scaling.htm
where you will find some hints for your case.

Also look at the playback tools
http://wiki.postgresql.org/wiki/Statement_Playback.

--
Sergey Konoplev

a database architect, software developer at PostgreSQL-Consulting.com
http://www.postgresql-consulting.com

Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +79160686204

On 07/14/2012 08:17 PM, Robert Klemme wrote:
> On Sat, Jul 14, 2012 at 11:50 AM, B Sreejith <bsreejithin@gmail.com> wrote:
>> Dear All,
>> Thanks alot for all the invaluable comments.
> Additionally to Craig's excellent advice to measurements there's
> something else you can do: with the knowledge of the queries your
> application fires against the database you can evaluate your schema
> and index definitions.  While there is no guarantee that your
> application will scale well if all indexes are present
Don't forget that sometimes it's better to DROP an index that isn't used
much, or that only helps occasional queries that aren't time-sensitive.
Every index has a cost to maintain - it slows down your inserts and
updates and it competes for disk cache with things that might be more
beneficial.
> b) do not have indexes nor no indexes which
> support the queries your application does against these tables which
> will result in full table scans.
A full table scan is not inherently a bad thing, even for a huge table.
Sometimes you just need to examine every row, and the fastest way to do
that is without a doubt a full table scan.

Remember, a full table scan won't tend to push everything out of
shared_buffers, so it can also avoid competition for cache.

(If anyone ever wants concurrent scans badly enough to implement them,
full table scans with effective_io_concurrency > 1 will become a *lot*
faster for some types of query).

--
Craig Ringer