Thread: Logarithmic change (decrease) in performance

Logarithmic change (decrease) in performance

From
Matthew Nuzum
Date:
Something interesting is going on. I wish I could show you the graphs,
but I'm sure this will not be a surprise to the seasoned veterans.

A particular application server I have has been running for over a
year now. I've been logging cpu load since mid-april.

It took 8 months or more to fall from excellent performance to
"acceptable." Then, over the course of about 5 weeks it fell from
"acceptable" to "so-so." Then, in the last four weeks it's gone from
"so-so" to alarming.

I've been working on this performance drop since Friday but it wasn't
until I replied to Arnau's post earlier today that I remembered I'd
been logging the server load. I grabbed the data and charted it in
Excel and to my surprise, the graph of the server's load average looks
kind of like the graph of y=x^2.

I've got to make a recomendation for a solution to the PHB and my
analysis is showing that as the dataset becomes larger, the amount of
time the disk spends seeking is increasing. This causes processes to
take longer to finish, which causes more processes to pile up, which
cuases processes to take longer to finish, which causes more processes
to pile up etc. It is this growing dataset that seems to be the source
of the sharp decrease in performance.

I knew this day would come, but I'm actually quite surprised that when
it came, there was little time between the warning and the grande
finale. I guess this message is being sent to the list to serve as a
warning to other data warehouse admins that when you reach your
capacity, the downward spiral happens rather quickly.

Crud... Outlook just froze while composing the PHB memo. I've been
working on that for an hour. What a bad day.
--
Matthew Nuzum
www.bearfruit.org

Re: Logarithmic change (decrease) in performance

From
Ron Peacetree
Date:
>From: Matthew Nuzum <mattnuzum@gmail.com>
>Sent: Sep 28, 2005 4:02 PM
>Subject: [PERFORM] Logarithmic change (decrease) in performance
>
Small nit-pick:  A "logarithmic decrease" in performance would be
a relatively good thing, being better than either a linear or
exponential decrease in performance.  What you are describing is
the worst kind: an _exponential_ decrease in performance.

>Something interesting is going on. I wish I could show you the graphs,
>but I'm sure this will not be a surprise to the seasoned veterans.
>
>A particular application server I have has been running for over a
>year now. I've been logging cpu load since mid-april.
>
>It took 8 months or more to fall from excellent performance to
>"acceptable." Then, over the course of about 5 weeks it fell from
>"acceptable" to "so-so." Then, in the last four weeks it's gone from
>"so-so" to alarming.
>
>I've been working on this performance drop since Friday but it wasn't
>until I replied to Arnau's post earlier today that I remembered I'd
>been logging the server load. I grabbed the data and charted it in
>Excel and to my surprise, the graph of the server's load average looks
>kind of like the graph of y=x^2.
>
>I've got to make a recomendation for a solution to the PHB and my
>analysis is showing that as the dataset becomes larger, the amount of
>time the disk spends seeking is increasing. This causes processes to
>take longer to finish, which causes more processes to pile up, which
>causes processes to take longer to finish, which causes more processes
>to pile up etc. It is this growing dataset that seems to be the source
>of the sharp decrease in performance.
>
>I knew this day would come, but I'm actually quite surprised that when
>it came, there was little time between the warning and the grande
>finale. I guess this message is being sent to the list to serve as a
>warning to other data warehouse admins that when you reach your
>capacity, the downward spiral happens rather quickly.
>
Yep, definitely been where you are.  Bottom line: you have to reduce
the sequential seeking behavior of the system to within an acceptable
window and then keep it there.

1= keep more of the data set in RAM
2= increase the size of your HD IO buffers
3= make your RAID sets wider (more parallel vs sequential IO)
4= reduce the atomic latency of your RAID sets
(time for Fibre Channel 15Krpm HD's vs 7.2Krpm SATA ones?)
5= make sure your data is as unfragmented as possible
6= change you DB schema to minimize the problem
a= overall good schema design
b= partitioning the data so that the system only has to manipulate a
reasonable chunk of it at a time.

In many cases, there's a number of ways to accomplish the above.
Unfortunately, most of them require CapEx.

Also, ITRW world such systems tend to have this as a chronic
problem.  This is not a "fix it once and it goes away forever".  This
is a part of the regular maintenance and upgrade plan(s).

Good Luck,
Ron

Re: Logarithmic change (decrease) in performance

From
"Jim C. Nasby"
Date:
On Wed, Sep 28, 2005 at 06:03:03PM -0400, Ron Peacetree wrote:
> 1= keep more of the data set in RAM
> 2= increase the size of your HD IO buffers
> 3= make your RAID sets wider (more parallel vs sequential IO)
> 4= reduce the atomic latency of your RAID sets
> (time for Fibre Channel 15Krpm HD's vs 7.2Krpm SATA ones?)
> 5= make sure your data is as unfragmented as possible
> 6= change you DB schema to minimize the problem
> a= overall good schema design
> b= partitioning the data so that the system only has to manipulate a
> reasonable chunk of it at a time.

Note that 6 can easily swamp the rest of these tweaks. A poor schema
design will absolutely kill any system. Also of great importance is how
you're using the database. IE: are you doing any row-by-row operations?

> In many cases, there's a number of ways to accomplish the above.
> Unfortunately, most of them require CapEx.
>
> Also, ITRW world such systems tend to have this as a chronic
> problem.  This is not a "fix it once and it goes away forever".  This
> is a part of the regular maintenance and upgrade plan(s).

And why DBA's typically make more money that other IT folks. :)
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461