Thread: Postgres performance slowly gets worse over a month

Postgres performance slowly gets worse over a month

From

"Robert M. Meyer"

Date:

08 August 2002, 14:10:49

Back on July 23, I posted on our performance problem.  At that time, I
got several suggestions about what to do to try to fix it.  Well, it's
happening again...

To recap, we have a web based application that utilizes a postgres
backend for handling orders and scheduling.  Gradually, the average load
on the system climbs until we dump the database, drop it and reload it
from the dumps.  This occurs slowly (weeks, rather than hours) and will
eventually get so bad that no work can get done.  Another point is that
the progression of performance appears to be geometric, rather than
linear.  I also notice that the size of the data directory starts at
about 4.5 Gig and climbs throughout this process.  It's currently at
6.3Gig.

We do a full vacuum every night.  We have adjusted max_fsm_pages to
1000000 and max_fsm_relations to 10000 (we were seeing deleted tuples in
the 50K range on some of our tables).  We are using ADODB 2.12 in PHP on
Apache 1.3.26 to access the database.  We've tried rebuilding all of the
indexes and that didn't help.  We're going to try it again because we
wound up corrupting our indexes in the system tables while trying to
drop and recreate a table that got partially created before a system
crash.  We suspect that we may have had other problems at the time.  We
have a contractor that did the reindex for us so I'm not sure of the
process at this point.  I suspect that it's something like;
1. shut down postgres
2. run 'postgres -O -P' to start a single user instantiation of the
engine
3. type 'reindex' to get it to do it
4. exit postgres and restart the DB with pg_ctl

Does that sound about right?

Any other suggestions?  We're looking to do something soon before the
load gets out of hand.  We have done full DB dumps and restores to fix
this in the past and it takes 4-5 hours in the middle of the night since
we can't take the system down during the day 'cuz we have about 1100
people sitting on their hands when we do it.  One of the suggestions was
to use 'pgmonitor' to keep watch over what's happening but transaction
go through too quickly and nothing seems to tie the system up for any
period of time.  We also notice that while the load keeps increasing,
the actual CPU time is very small.  We'll see loads above 3 with each
CPU (we have two) sitting with 80%+ idle time.

As a recap, this is happening on a Compaq Proliant 3500 system with a
five disk raid5 in hardware.

--
Robert M. Meyer
Sr. Network Administrator
DigiVision Satellite Services
14 Lafayette Sq, Ste 410
Buffalo, NY 14203-1904
(716)332-1451

Re: Postgres performance slowly gets worse over a month

From

Tim Ellis

Date:

08 August 2002, 15:09:16

On 08 Aug 2002 14:06:57 -0400
"Robert M. Meyer" <rmeyer@installs.com> wrote:

> Back on July 23, I posted on our performance problem.  At that time, I
> got several suggestions about what to do to try to fix it.  Well, it's
> happening again...
> [snip]
> Any other suggestions?  We're looking to do something soon before the
> load gets out of hand.

With all due respect, the depth of your problem indicates to me you need a
database administrator. Someone who understands the database and the
hardware and performance/tuning principles.

(in case you think I'm hinting about something, I'm not available for work
right now -- you really do need to hire/contract a DBA sort)

> One of the suggestions was
> to use 'pgmonitor' to keep watch over what's happening but transaction
> go through too quickly and nothing seems to tie the system up for any
> period of time.  We also notice that while the load keeps increasing,
> the actual CPU time is very small.  We'll see loads above 3 with each
> CPU (we have two) sitting with 80%+ idle time.

First guess: analyze your queries. It's gotten me through a number of
problems that sound just like this. As the data set grows, a poorly formed
query will chew more and more I/O time.

--
Tim Ellis
Senior Database Architect
Gamet, Inc.

Re: Postgres performance slowly gets worse over a month

From

Tom Lane

Date:

08 August 2002, 16:01:10

"Robert M. Meyer" <rmeyer@installs.com> writes:
> Back on July 23, I posted on our performance problem.  At that time, I
> got several suggestions about what to do to try to fix it.  Well, it's
> happening again...

Given that you've increased the FSM size *and* are doing a full vacuum
every night (I hope with analyze option as well!), it seems unlikely
that your tables are physically growing --- I suspect there may be an
index growth problem instead.  But you should try to check that.
Try doing

    select relkind,sum(relpages) from pg_class group by relkind;

every so often (preferably just after a vacuum run) to see how the space
totals change over time.

> 1. shut down postgres
> 2. run 'postgres -O -P' to start a single user instantiation of the
> engine
> 3. type 'reindex' to get it to do it
> 4. exit postgres and restart the DB with pg_ctl

IIRC, that only rebuilds indexes on the system tables.  You mentioned
that you'd rebuilt user indexes too --- how did you go about that
exactly?

> We also notice that while the load keeps increasing,
> the actual CPU time is very small.  We'll see loads above 3 with each
> CPU (we have two) sitting with 80%+ idle time.

The load must be all disk I/O then.  Do you have adequate RAM in this
thing?  What have you set shared_buffers to?

More generally, what's your typical query mix?  Have you checked for
inefficient plans on the most common queries?

            regards, tom lane

Re: Postgres performance slowly gets worse over a month

From

Andrew Sullivan

Date:

08 August 2002, 16:55:53

On Thu, Aug 08, 2002 at 02:06:57PM -0400, Robert M. Meyer wrote:
> have a contractor that did the reindex for us so I'm not sure of the
> process at this point.  I suspect that it's something like;
> 1. shut down postgres
> 2. run 'postgres -O -P' to start a single user instantiation of the
> engine
> 3. type 'reindex' to get it to do it
> 4. exit postgres and restart the DB with pg_ctl
>
> Does that sound about right?

What you are talking about there is reindexing the system tables.
But "recreating the indexes" as a solution to file growth and
performance problems is not this.  Instead, what you do is drop the
index and recreate it.  At least, that's what I have gathered from
the "reindex" scripts that were sent to the -general list a little
while ago.

> Any other suggestions?

Yes.  As someone else suggested, it sounds like you need a full-time
DBA, at least for a little while.

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110

Re: Postgres performance slowly gets worse over a month

From

nikolaus@dilger.cc

Date:

09 August 2002, 23:17:37

Robert,

Sounds to me that you have some kind of fragmentation
causing the slow degredation of performance.

A dump/restore, which is basically a very drastic
defragmentation, is fixing it.  It happens slowly and
there seems not to be a single big cause.  More like
the sum of many little things.  You also report that
there are a lot of deleted tuples (rows).

Don't know a quick and easy fix.  Maybe its caused how
your application is designed.  Inserting deleting many
rows.  Probably frequent updates.  Some rows using lots
of space and some very little.  Anyway something is
causing very inefficient space use.

So I would agree with the other respondens that you
need a DBA to look at your database.  Especially the
design itself.

Regards,
Nikolaus Dilger


"Robert M. Meyer" wrote:

>
> Back on July 23, I posted on our performance problem.
> At that time, I
> got several suggestions about what to do to try to fix
> it.  Well, it's
> happening again...
>
> To recap, we have a web based application that
utilizes
> a postgres
> backend for handling orders and scheduling.
Gradually,
> the average load
> on the system climbs until we dump the database, drop
> it and reload it
> from the dumps.  This occurs slowly (weeks, rather
than
> hours) and will
> eventually get so bad that no work can get done.
> Another point is that
> the progression of performance appears to be
geometric,
> rather than
> linear.  I also notice that the size of the data
> directory starts at
> about 4.5 Gig and climbs throughout this process.
It's
> currently at
> 6.3Gig.
>
> We do a full vacuum every night.  We have adjusted
> max_fsm_pages to
> 1000000 and max_fsm_relations to 10000 (we were seeing
> deleted tuples in
> the 50K range on some of our tables).  We are using
> ADODB 2.12 in PHP on
> Apache 1.3.26 to access the database.  We've tried
> rebuilding all of the
> indexes and that didn't help.  We're going to try it
> again because we
> wound up corrupting our indexes in the system tables
> while trying to
> drop and recreate a table that got partially created
> before a system
> crash.  We suspect that we may have had other problems
> at the time.  We
> have a contractor that did the reindex for us so I'm
> not sure of the
> process at this point.  I suspect that it's something
> like;
> 1. shut down postgres
> 2. run 'postgres -O -P' to start a single user
> instantiation of the
> engine
> 3. type 'reindex' to get it to do it
> 4. exit postgres and restart the DB with pg_ctl
>
> Does that sound about right?
>
> Any other suggestions?  We're looking to do something
> soon before the
> load gets out of hand.  We have done full DB dumps and
> restores to fix
> this in the past and it takes 4-5 hours in the middle
> of the night since
> we can't take the system down during the day 'cuz we
> have about 1100
> people sitting on their hands when we do it.  One of
> the suggestions was
> to use 'pgmonitor' to keep watch over what's happening
> but transaction
> go through too quickly and nothing seems to tie the
> system up for any
> period of time.  We also notice that while the load
> keeps increasing,
> the actual CPU time is very small.  We'll see loads
> above 3 with each
> CPU (we have two) sitting with 80%+ idle time.
>
> As a recap, this is happening on a Compaq Proliant
3500
> system with a
> five disk raid5 in hardware.
>
> --
> Robert M. Meyer
> Sr. Network Administrator
> DigiVision Satellite Services
> 14 Lafayette Sq, Ste 410
> Buffalo, NY 14203-1904
> (716)332-1451
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send
> an appropriate
> subscribe-nomail command to majordomo@postgresql.org
so
> that your
> message can get through to the mailing list cleanly

Re: Postgres performance slowly gets worse over a month

From

Brian McCane

Date:

11 August 2002, 09:04:55

Robert,

    I got your email but I am currently at the Mayo clinic with my
wife, so I can't give you much help.  If you download:

    ftp://china.maxbaud.net/pub/PostgreSQL/fixtable.pl

This is my perl script which will do a live/hot reindex of your tables.
You can run the command and it will give you a really short description of
options.   It should not let you run the script unless yo give it a group
of options that actually make sense.  I usually run it something like:

    fixtable.pl -t foo -I bar

This will recreate ALL indexes (-I) on table foo (-t foo) in the bar
database.  I use this thing more often than I would like because I get
as much as 1million records changed on a daily basis in one of my tables
and the indexes make a big difference in performance (I assume because of
disk fragmentation on the large file size).

- brian


On 8 Aug 2002, Robert M. Meyer wrote:

>
> Back on July 23, I posted on our performance problem.  At that time, I
> got several suggestions about what to do to try to fix it.  Well, it's
> happening again...
>
> To recap, we have a web based application that utilizes a postgres
> backend for handling orders and scheduling.  Gradually, the average load
> on the system climbs until we dump the database, drop it and reload it
> from the dumps.  This occurs slowly (weeks, rather than hours) and will
> eventually get so bad that no work can get done.  Another point is that
> the progression of performance appears to be geometric, rather than
> linear.  I also notice that the size of the data directory starts at
> about 4.5 Gig and climbs throughout this process.  It's currently at
> 6.3Gig.
>
> We do a full vacuum every night.  We have adjusted max_fsm_pages to
> 1000000 and max_fsm_relations to 10000 (we were seeing deleted tuples in
> the 50K range on some of our tables).  We are using ADODB 2.12 in PHP on
> Apache 1.3.26 to access the database.  We've tried rebuilding all of the
> indexes and that didn't help.  We're going to try it again because we
> wound up corrupting our indexes in the system tables while trying to
> drop and recreate a table that got partially created before a system
> crash.  We suspect that we may have had other problems at the time.  We
> have a contractor that did the reindex for us so I'm not sure of the
> process at this point.  I suspect that it's something like;
> 1. shut down postgres
> 2. run 'postgres -O -P' to start a single user instantiation of the
> engine
> 3. type 'reindex' to get it to do it
> 4. exit postgres and restart the DB with pg_ctl
>
> Does that sound about right?
>
> Any other suggestions?  We're looking to do something soon before the
> load gets out of hand.  We have done full DB dumps and restores to fix
> this in the past and it takes 4-5 hours in the middle of the night since
> we can't take the system down during the day 'cuz we have about 1100
> people sitting on their hands when we do it.  One of the suggestions was
> to use 'pgmonitor' to keep watch over what's happening but transaction
> go through too quickly and nothing seems to tie the system up for any
> period of time.  We also notice that while the load keeps increasing,
> the actual CPU time is very small.  We'll see loads above 3 with each
> CPU (we have two) sitting with 80%+ idle time.
>
> As a recap, this is happening on a Compaq Proliant 3500 system with a
> five disk raid5 in hardware.
>
> --
> Robert M. Meyer
> Sr. Network Administrator
> DigiVision Satellite Services
> 14 Lafayette Sq, Ste 410
> Buffalo, NY 14203-1904
> (716)332-1451
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>

Wm. Brian McCane                    | Life is full of doors that won't open
Search http://recall.maxbaud.net/   | when you knock, equally spaced amid those
Usenet http://freenews.maxbaud.net/ | that open when you don't want them to.
Auction http://www.sellit-here.com/ | - Roger Zelazny "Blood of Amber"