Thread: postgres eating CPU on HP9000
Hello fellow PostgreSQL users. We've been working on this interesting issue for some time now, and we're hoping that someone can help. We've recently integrated postgres into an existing mature app. Its a time sensitive 24x7 system. It runs on HP9000, a K370 Dual Processor system. Postgres is version 7.3.2. Its spawned as a child from a parent supervisory process, and they communicate to eachother via shared memory. We preform 9-12K selects per hour 6-8K inserts per hour (a few updates here as well) 1-1.5K Deletes per hour. It maintains 48hours of data, so its not a large database; roughly <600mbs. We do this by running a housekeeping program in a cron job. It deletes all data older then 48hours, then vaccuum analyzes. It will also preform a reindex if the option is set before it vaccuum's. Postgres initially worked wonderfully, fast and solid. It preformed complex joins in 0.01secs, and was able to keep up with our message queue. It stayed this way for almost a year during our development. Recently it started eating up the cpu, and cannot keepup with the system like it used to. The interesting thing here is that it still runs great on an older system with less ram, one slower cpu, and an older disk. We tried the following with no success: running VACCUUM FULL dropping all tables and staring anew reinstalling postgres tweaking kernel parameters (various combos) tweaking postgres parameters (various combos) a number of other ideas A final note, we have our app on two systems ready for hot backup. The hot backup system is that older slower system that I mentioned before. The two communicate with eachother via rpc's. Any help anyone can give to steer us in the right direction would be much appreciated. Thanks again Fabio E. Just in case: vmstat procs memory page faults cpu r b w avm free re at pi po fr de sr in sy cs us sy id 1 0 0 7631 124955 30 31 1 0 0 0 1 566 964 138 25 2 73 top System: prokyon Tue Mar 23 19:12:54 2004 Load averages: 0.36, 0.33, 0.31 170 processes: 169 sleeping, 1 running Cpu states: CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS 0 0.07 8.9% 0.0% 0.0% 91.1% 0.0% 0.0% 0.0% 0.0% 1 0.72 71.3% 0.0% 1.0% 27.7% 0.0% 0.0% 0.0% 0.0% 2 0.29 29.7% 1.0% 5.0% 64.4% 0.0% 0.0% 0.0% 0.0% --- ---- ----- ----- ----- ----- ----- ----- ----- ----- avg 0.36 36.3% 1.0% 2.0% 60.8% 0.0% 0.0% 0.0% 0.0% Memory: 33180K (22268K) real, 38868K (28840K) virtual, 499708K free Page# 1/17 CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND 0 pty/ttyp1 18631 am 154 20 6096K 2412K sleep 3:17 93.84 93.68 postg 0 rroot 18622 am 154 20 1888K 1192K sleep 0:01 0.78 0.78 amcodecon ipcs IPC status from /dev/kmem as of Tue Mar 23 19:19:19 2004 T ID KEY MODE OWNER GROUP Message Queues: q 0 0x3c180239 -Rrw--w--w- root root q 1 0x3e180239 --rw-r--r-- root root Shared Memory: m 0 0x2f100002 --rw------- root sys m 1 0x4118020d --rw-rw-rw- root root m 2 0x4e0c0002 --rw-rw-rw- root root m 3 0x4114006c --rw-rw-rw- root root m 4 0x4118387e --rw-rw-rw- am am m 3805 0x0052e2c1 --rw------- postgres postgres m 8606 0x0c6629c9 --rw-r----- root sys m 407 0x06347849 --rw-rw-rw- root sys Semaphores: s 0 0x2f100002 --ra-ra-ra- root sys s 1 0x4118020d --ra-ra-ra- root root s 2 0x4e0c0002 --ra-ra-ra- root root s 3 0x4114006c --ra-ra-ra- root root s 4 0x00446f6e --ra-r--r-- root root s 5 0x00446f6d --ra-r--r-- root root s 6 0x01090522 --ra-r--r-- root root s 7 0x61142e7c --ra-ra-ra- root root s 8 0x73142e7c --ra-ra-ra- root root s 9 0x70142e7c --ra-ra-ra- root root s 10 0x69142e7c --ra-ra-ra- root root s 11 0x75142e7c --ra-ra-ra- root root s 12 0x63142e7c --ra-ra-ra- root root s 13 0x64142e7c --ra-ra-ra- root root s 14 0x66142e7c --ra-ra-ra- root root s 15 0x6c142e7c --ra-ra-ra- root root s 1168 0x0052e2c1 --ra------- postgres postgres s 401 0x0052e2c2 --ra------- postgres postgres s 402 0x0052e2c3 --ra------- postgres postgres
Fabio, > Postgres initially worked wonderfully, fast and solid. It > preformed complex joins in 0.01secs, and was able to keep up with our > message queue. It stayed this way for almost a year during our > development. > > Recently it started eating up the cpu, and cannot keepup with the system > like it used to. The interesting thing here is that it still runs great > on an older system with less ram, one slower cpu, and an older disk. This really points to a maintenance problem. How often do you run VACUUM ANALYZE? You have a very high rate of data turnover, and should need to VACUUM frequently. Also, what's you max_fsm_pages setting. -- -Josh Berkus Aglio Database Solutions San Francisco
On Fri, 26 Mar 2004, Fabio Esposito wrote: > > On Fri, 26 Mar 2004, scott.marlowe wrote: > > > > It maintains 48hours of data, so its not a large database; roughly > > > <600mbs. We do this by running a housekeeping program in a cron job. > > > It deletes all data older then 48hours, then vaccuum analyzes. It will > > > also preform a reindex if the option is set before it vaccuum's. > > > > > This almost sounds like a problem (fixed in 7.4 I believe) where some > > system catalog indexes would get huge over time, and couldn't be vacuumed > > or reindexed while the database was up in multi-user mode. > > > > I'll defer to Tom or Bruce or somebody to say if my guess is even close... > > > We haven't tried 7.4, I will experiment with it next week, I hope it > will be that simple. In the meantime, a simple dump - reload into a test box running your current version may provide some insight. If it fixes the problem, then you likely do have some kind of issue with index / table growth that isn't being addressed by vacuuming.
Fabio, > I'll have to get back to you on that, but I'm 90% sure its the default out > of the box. Raise it, a lot. Perhaps to 30,000 or 50,000. VACUUM VERBOSE ANALYZE should show you how many data pages are being reclaimed between vacuums. Because of your very high rate of updates and deletes, you need to hold a lot of data pages open. You would also benefit a great deal by upgrading to 7.4. 7.3 will require you to to REINDEXes several times a day with your current setup; 7.4 will not. -- -Josh Berkus Aglio Database Solutions San Francisco
Fabio Esposito <nfesposi@sourceweave.net> writes: >> Did you start from a fresh initdb, or just drop and recreate user >> tables? I'm wondering about index bloat on the system tables ... > I don't think I re initdb it, just dropped. We did try a reindex command > in the interactive editor, with no success. Reindex of what? I'd suggest looking to see the actual sizes of all the indexes on system tables. If my guess is right, some of them may be way out of line (like larger than their associated tables). regards, tom lane
On Tue, 23 Mar 2004, Fabio Esposito wrote: > > Hello fellow PostgreSQL users. > > We've been working on this interesting issue for some time now, and we're > hoping that someone can help. > > We've recently integrated postgres into an existing mature app. Its a > time sensitive 24x7 system. It runs on HP9000, a K370 Dual Processor > system. Postgres is version 7.3.2. Its spawned as a child from a parent > supervisory process, and they communicate to eachother via shared memory. > > We preform 9-12K selects per hour > 6-8K inserts per hour (a few updates here as well) > 1-1.5K Deletes per hour. > > It maintains 48hours of data, so its not a large database; roughly > <600mbs. We do this by running a housekeeping program in a cron job. > It deletes all data older then 48hours, then vaccuum analyzes. It will > also preform a reindex if the option is set before it vaccuum's. > > Postgres initially worked wonderfully, fast and solid. It > preformed complex joins in 0.01secs, and was able to keep up with our > message queue. It stayed this way for almost a year during our > development. > > Recently it started eating up the cpu, and cannot keepup with the system > like it used to. The interesting thing here is that it still runs great > on an older system with less ram, one slower cpu, and an older disk. > > We tried the following with no success: > > running VACCUUM FULL > dropping all tables and staring anew > reinstalling postgres > tweaking kernel parameters (various combos) > tweaking postgres parameters (various combos) > a number of other ideas This almost sounds like a problem (fixed in 7.4 I believe) where some system catalog indexes would get huge over time, and couldn't be vacuumed or reindexed while the database was up in multi-user mode. I'll defer to Tom or Bruce or somebody to say if my guess is even close...
Fabio Esposito <nfesposi@sourceweave.net> writes: > We've recently integrated postgres into an existing mature app. Its a > time sensitive 24x7 system. It runs on HP9000, a K370 Dual Processor > system. Postgres is version 7.3.2. Its spawned as a child from a parent > supervisory process, and they communicate to eachother via shared memory. You would be well advised to update to 7.3.6, though I'm not sure if any of the post-7.3.2 fixes have anything to do with your speed problem. > Recently it started eating up the cpu, and cannot keepup with the system > like it used to. The interesting thing here is that it still runs great > on an older system with less ram, one slower cpu, and an older disk. > We tried the following with no success: > running VACCUUM FULL > dropping all tables and staring anew Did you start from a fresh initdb, or just drop and recreate user tables? I'm wondering about index bloat on the system tables ... regards, tom lane
<snip> We are experiencing exactly the same problem here, and we use 7.4 on Linux/i386 SMP (2 processors). Our databases does even more access: about 30k selects per hour, 10k updates and inserts per hour Vacuum analyze is done daily. We migrated our database to a new server. Initially, everything was fine, and pretty fast. In a week or so, Vacuum performance is pretty slow. What was done in 15 minutes now takes 2 hours. Postgres is consuming a lot of CPU power and, when the system is in peak period, it's even worse. Sure, we have a large database. 3 tables have more than 10M records, but more or less suddenly, we're having a heavy performance prejudice. > > This almost sounds like a problem (fixed in 7.4 I believe) where some > system catalog indexes would get huge over time, and couldn't be > vacuumed or reindexed while the database was up in multi-user mode. > > I'll defer to Tom or Bruce or somebody to say if my guess is even > close... > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Marcus, > We are experiencing exactly the same problem here, and we use 7.4 on > Linux/i386 SMP (2 processors). Our databases does even more access: > about 30k selects per hour, 10k updates and inserts per hour > > Vacuum analyze is done daily. What is your max_fsm_pages setting? If you are getting 10,000 updates per hour, daily VACUUM ANALYZE may not be enough. Also do you run VACUUM ANALYZE as a superuser, or as a regular user? -- Josh Berkus Aglio Database Solutions San Francisco
> Marcus, > >> We are experiencing exactly the same problem here, and we use 7.4 on >> Linux/i386 SMP (2 processors). Our databases does even more access: >> about 30k selects per hour, 10k updates and inserts per hour >> >> Vacuum analyze is done daily. > > What is your max_fsm_pages setting? If you are getting 10,000 > updates per hour, daily VACUUM ANALYZE may not be enough. > max_fsm_pages is set to 500000 > Also do you run VACUUM ANALYZE as a superuser, or as a regular user? > As a regular user (database owner). Is thery any difference when vacuuming as a super user? > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if > your > joining column's datatypes do not match
On Fri, 26 Mar 2004, Josh Berkus wrote: > Fabio, > > > Recently it started eating up the cpu, and cannot keepup with the system > > like it used to. The interesting thing here is that it still runs great > > on an older system with less ram, one slower cpu, and an older disk. > > This really points to a maintenance problem. How often do you run VACUUM > ANALYZE? You have a very high rate of data turnover, and should need to > VACUUM frequently. > > Also, what's you max_fsm_pages setting. > We run VACUUM ANALYZE after we remove about 1000 rows every hour on the halh hour. Our max_fsm_pages is set to 10000 Thanks again Fabio
"Marcus Andree S. Magalhaes" <marcus.magalhaes@vlinfo.com.br> writes: >> Also do you run VACUUM ANALYZE as a superuser, or as a regular user? > As a regular user (database owner). Is thery any difference when vacuuming > as a super user? That's your problem. A regular user won't have permissions to vacuum any tables but his own ... in particular, not the system tables. regards, tom lane
I'm sorry all, when you say regular user as opposed to superuser are you talking about the user that postgres is installed and running as? Should this be done as the os's root? Fabio On Mon, 29 Mar 2004, Tom Lane wrote: > "Marcus Andree S. Magalhaes" <marcus.magalhaes@vlinfo.com.br> writes: > >> Also do you run VACUUM ANALYZE as a superuser, or as a regular user? > > > As a regular user (database owner). Is thery any difference when vacuuming > > as a super user? > > That's your problem. A regular user won't have permissions to vacuum > any tables but his own ... in particular, not the system tables. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
On Mar 29, 2004, at 9:36 AM, Tom Lane wrote: > "Marcus Andree S. Magalhaes" <marcus.magalhaes@vlinfo.com.br> writes: >>> Also do you run VACUUM ANALYZE as a superuser, or as a regular user? > >> As a regular user (database owner). Is thery any difference when >> vacuuming >> as a super user? > > That's your problem. A regular user won't have permissions to vacuum > any tables but his own ... in particular, not the system tables. > > regards, tom lane If I vacuum as the superuser, are the system tables automatically vacuumed? Or, does using -a from the vacuumdb command accomplish this? Or, is there something else I have to specify on the vacuumdb command line? Thanks! Mark
On Mon, Mar 29, 2004 at 12:00:16 -0500, Fabio Esposito <nfesposi@sourceweave.net> wrote: > > I'm sorry all, when you say regular user as opposed to superuser are you > talking about the user that postgres is installed and running as? Should > this be done as the os's root? The os user used for creating the cluster with initdb is a superuser. Any accounts created with the permission to create more users are also superusers.
Fabio, > We run VACUUM ANALYZE after we remove about 1000 rows every hour on the > halh hour. Our max_fsm_pages is set to 10000 Have you checked how long these vacuums take? If they are starting to overlap, that would explain your high CPU usage and poor performance. You might want to consider raising FSM_pages and vacuuming less frequently. -- -Josh Berkus Aglio Database Solutions San Francisco
The Vacuum's don't take too long, 10 minutes at most. I can tell from ps -ef | grep and top that its the selects/inserts/updates from the postgres related to our app that take all that time up. If we rerun initdb and reload the data, it works great for about two days, then goes bad again. We are in the process of trying out 7.4.2 right now, just waiting on the reload of pg_dump. Fabio > Fabio, > > > We run VACUUM ANALYZE after we remove about 1000 rows every hour on the > > halh hour. Our max_fsm_pages is set to 10000 > > Have you checked how long these vacuums take? If they are starting to > overlap, that would explain your high CPU usage and poor performance. You > might want to consider raising FSM_pages and vacuuming less frequently. > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
Fabio, > The Vacuum's don't take too long, 10 minutes at most. I can tell from ps > -ef | grep and top that its the selects/inserts/updates from the postgres > related to our app that take all that time up. If we rerun initdb and > reload the data, it works great for about two days, then goes bad again. > > We are in the process of trying out 7.4.2 right now, just waiting on the > reload of pg_dump. Well, test running VACUUM ANALYZE as the "postgres" superuser and see if that fixes the issue. -- -Josh Berkus Aglio Database Solutions San Francisco