Thread: Postgres performance slowly gets worse over a month
Back on July 23, I posted on our performance problem. At that time, I got several suggestions about what to do to try to fix it. Well, it's happening again... To recap, we have a web based application that utilizes a postgres backend for handling orders and scheduling. Gradually, the average load on the system climbs until we dump the database, drop it and reload it from the dumps. This occurs slowly (weeks, rather than hours) and will eventually get so bad that no work can get done. Another point is that the progression of performance appears to be geometric, rather than linear. I also notice that the size of the data directory starts at about 4.5 Gig and climbs throughout this process. It's currently at 6.3Gig. We do a full vacuum every night. We have adjusted max_fsm_pages to 1000000 and max_fsm_relations to 10000 (we were seeing deleted tuples in the 50K range on some of our tables). We are using ADODB 2.12 in PHP on Apache 1.3.26 to access the database. We've tried rebuilding all of the indexes and that didn't help. We're going to try it again because we wound up corrupting our indexes in the system tables while trying to drop and recreate a table that got partially created before a system crash. We suspect that we may have had other problems at the time. We have a contractor that did the reindex for us so I'm not sure of the process at this point. I suspect that it's something like; 1. shut down postgres 2. run 'postgres -O -P' to start a single user instantiation of the engine 3. type 'reindex' to get it to do it 4. exit postgres and restart the DB with pg_ctl Does that sound about right? Any other suggestions? We're looking to do something soon before the load gets out of hand. We have done full DB dumps and restores to fix this in the past and it takes 4-5 hours in the middle of the night since we can't take the system down during the day 'cuz we have about 1100 people sitting on their hands when we do it. One of the suggestions was to use 'pgmonitor' to keep watch over what's happening but transaction go through too quickly and nothing seems to tie the system up for any period of time. We also notice that while the load keeps increasing, the actual CPU time is very small. We'll see loads above 3 with each CPU (we have two) sitting with 80%+ idle time. As a recap, this is happening on a Compaq Proliant 3500 system with a five disk raid5 in hardware. -- Robert M. Meyer Sr. Network Administrator DigiVision Satellite Services 14 Lafayette Sq, Ste 410 Buffalo, NY 14203-1904 (716)332-1451
On 08 Aug 2002 14:06:57 -0400 "Robert M. Meyer" <rmeyer@installs.com> wrote: > Back on July 23, I posted on our performance problem. At that time, I > got several suggestions about what to do to try to fix it. Well, it's > happening again... > [snip] > Any other suggestions? We're looking to do something soon before the > load gets out of hand. With all due respect, the depth of your problem indicates to me you need a database administrator. Someone who understands the database and the hardware and performance/tuning principles. (in case you think I'm hinting about something, I'm not available for work right now -- you really do need to hire/contract a DBA sort) > One of the suggestions was > to use 'pgmonitor' to keep watch over what's happening but transaction > go through too quickly and nothing seems to tie the system up for any > period of time. We also notice that while the load keeps increasing, > the actual CPU time is very small. We'll see loads above 3 with each > CPU (we have two) sitting with 80%+ idle time. First guess: analyze your queries. It's gotten me through a number of problems that sound just like this. As the data set grows, a poorly formed query will chew more and more I/O time. -- Tim Ellis Senior Database Architect Gamet, Inc.
"Robert M. Meyer" <rmeyer@installs.com> writes: > Back on July 23, I posted on our performance problem. At that time, I > got several suggestions about what to do to try to fix it. Well, it's > happening again... Given that you've increased the FSM size *and* are doing a full vacuum every night (I hope with analyze option as well!), it seems unlikely that your tables are physically growing --- I suspect there may be an index growth problem instead. But you should try to check that. Try doing select relkind,sum(relpages) from pg_class group by relkind; every so often (preferably just after a vacuum run) to see how the space totals change over time. > 1. shut down postgres > 2. run 'postgres -O -P' to start a single user instantiation of the > engine > 3. type 'reindex' to get it to do it > 4. exit postgres and restart the DB with pg_ctl IIRC, that only rebuilds indexes on the system tables. You mentioned that you'd rebuilt user indexes too --- how did you go about that exactly? > We also notice that while the load keeps increasing, > the actual CPU time is very small. We'll see loads above 3 with each > CPU (we have two) sitting with 80%+ idle time. The load must be all disk I/O then. Do you have adequate RAM in this thing? What have you set shared_buffers to? More generally, what's your typical query mix? Have you checked for inefficient plans on the most common queries? regards, tom lane
On Thu, Aug 08, 2002 at 02:06:57PM -0400, Robert M. Meyer wrote: > have a contractor that did the reindex for us so I'm not sure of the > process at this point. I suspect that it's something like; > 1. shut down postgres > 2. run 'postgres -O -P' to start a single user instantiation of the > engine > 3. type 'reindex' to get it to do it > 4. exit postgres and restart the DB with pg_ctl > > Does that sound about right? What you are talking about there is reindexing the system tables. But "recreating the indexes" as a solution to file growth and performance problems is not this. Instead, what you do is drop the index and recreate it. At least, that's what I have gathered from the "reindex" scripts that were sent to the -general list a little while ago. > Any other suggestions? Yes. As someone else suggested, it sounds like you need a full-time DBA, at least for a little while. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
Robert, Sounds to me that you have some kind of fragmentation causing the slow degredation of performance. A dump/restore, which is basically a very drastic defragmentation, is fixing it. It happens slowly and there seems not to be a single big cause. More like the sum of many little things. You also report that there are a lot of deleted tuples (rows). Don't know a quick and easy fix. Maybe its caused how your application is designed. Inserting deleting many rows. Probably frequent updates. Some rows using lots of space and some very little. Anyway something is causing very inefficient space use. So I would agree with the other respondens that you need a DBA to look at your database. Especially the design itself. Regards, Nikolaus Dilger "Robert M. Meyer" wrote: > > Back on July 23, I posted on our performance problem. > At that time, I > got several suggestions about what to do to try to fix > it. Well, it's > happening again... > > To recap, we have a web based application that utilizes > a postgres > backend for handling orders and scheduling. Gradually, > the average load > on the system climbs until we dump the database, drop > it and reload it > from the dumps. This occurs slowly (weeks, rather than > hours) and will > eventually get so bad that no work can get done. > Another point is that > the progression of performance appears to be geometric, > rather than > linear. I also notice that the size of the data > directory starts at > about 4.5 Gig and climbs throughout this process. It's > currently at > 6.3Gig. > > We do a full vacuum every night. We have adjusted > max_fsm_pages to > 1000000 and max_fsm_relations to 10000 (we were seeing > deleted tuples in > the 50K range on some of our tables). We are using > ADODB 2.12 in PHP on > Apache 1.3.26 to access the database. We've tried > rebuilding all of the > indexes and that didn't help. We're going to try it > again because we > wound up corrupting our indexes in the system tables > while trying to > drop and recreate a table that got partially created > before a system > crash. We suspect that we may have had other problems > at the time. We > have a contractor that did the reindex for us so I'm > not sure of the > process at this point. I suspect that it's something > like; > 1. shut down postgres > 2. run 'postgres -O -P' to start a single user > instantiation of the > engine > 3. type 'reindex' to get it to do it > 4. exit postgres and restart the DB with pg_ctl > > Does that sound about right? > > Any other suggestions? We're looking to do something > soon before the > load gets out of hand. We have done full DB dumps and > restores to fix > this in the past and it takes 4-5 hours in the middle > of the night since > we can't take the system down during the day 'cuz we > have about 1100 > people sitting on their hands when we do it. One of > the suggestions was > to use 'pgmonitor' to keep watch over what's happening > but transaction > go through too quickly and nothing seems to tie the > system up for any > period of time. We also notice that while the load > keeps increasing, > the actual CPU time is very small. We'll see loads > above 3 with each > CPU (we have two) sitting with 80%+ idle time. > > As a recap, this is happening on a Compaq Proliant 3500 > system with a > five disk raid5 in hardware. > > -- > Robert M. Meyer > Sr. Network Administrator > DigiVision Satellite Services > 14 Lafayette Sq, Ste 410 > Buffalo, NY 14203-1904 > (716)332-1451 > > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send > an appropriate > subscribe-nomail command to majordomo@postgresql.org so > that your > message can get through to the mailing list cleanly
Robert, I got your email but I am currently at the Mayo clinic with my wife, so I can't give you much help. If you download: ftp://china.maxbaud.net/pub/PostgreSQL/fixtable.pl This is my perl script which will do a live/hot reindex of your tables. You can run the command and it will give you a really short description of options. It should not let you run the script unless yo give it a group of options that actually make sense. I usually run it something like: fixtable.pl -t foo -I bar This will recreate ALL indexes (-I) on table foo (-t foo) in the bar database. I use this thing more often than I would like because I get as much as 1million records changed on a daily basis in one of my tables and the indexes make a big difference in performance (I assume because of disk fragmentation on the large file size). - brian On 8 Aug 2002, Robert M. Meyer wrote: > > Back on July 23, I posted on our performance problem. At that time, I > got several suggestions about what to do to try to fix it. Well, it's > happening again... > > To recap, we have a web based application that utilizes a postgres > backend for handling orders and scheduling. Gradually, the average load > on the system climbs until we dump the database, drop it and reload it > from the dumps. This occurs slowly (weeks, rather than hours) and will > eventually get so bad that no work can get done. Another point is that > the progression of performance appears to be geometric, rather than > linear. I also notice that the size of the data directory starts at > about 4.5 Gig and climbs throughout this process. It's currently at > 6.3Gig. > > We do a full vacuum every night. We have adjusted max_fsm_pages to > 1000000 and max_fsm_relations to 10000 (we were seeing deleted tuples in > the 50K range on some of our tables). We are using ADODB 2.12 in PHP on > Apache 1.3.26 to access the database. We've tried rebuilding all of the > indexes and that didn't help. We're going to try it again because we > wound up corrupting our indexes in the system tables while trying to > drop and recreate a table that got partially created before a system > crash. We suspect that we may have had other problems at the time. We > have a contractor that did the reindex for us so I'm not sure of the > process at this point. I suspect that it's something like; > 1. shut down postgres > 2. run 'postgres -O -P' to start a single user instantiation of the > engine > 3. type 'reindex' to get it to do it > 4. exit postgres and restart the DB with pg_ctl > > Does that sound about right? > > Any other suggestions? We're looking to do something soon before the > load gets out of hand. We have done full DB dumps and restores to fix > this in the past and it takes 4-5 hours in the middle of the night since > we can't take the system down during the day 'cuz we have about 1100 > people sitting on their hands when we do it. One of the suggestions was > to use 'pgmonitor' to keep watch over what's happening but transaction > go through too quickly and nothing seems to tie the system up for any > period of time. We also notice that while the load keeps increasing, > the actual CPU time is very small. We'll see loads above 3 with each > CPU (we have two) sitting with 80%+ idle time. > > As a recap, this is happening on a Compaq Proliant 3500 system with a > five disk raid5 in hardware. > > -- > Robert M. Meyer > Sr. Network Administrator > DigiVision Satellite Services > 14 Lafayette Sq, Ste 410 > Buffalo, NY 14203-1904 > (716)332-1451 > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > Wm. Brian McCane | Life is full of doors that won't open Search http://recall.maxbaud.net/ | when you knock, equally spaced amid those Usenet http://freenews.maxbaud.net/ | that open when you don't want them to. Auction http://www.sellit-here.com/ | - Roger Zelazny "Blood of Amber"