Thread: Help with performance problems
I need some help. I have 5 db servers running our database servers, and they all are having various degrees of performance problems. The problems we are experiencing are: 1. General slowness 2. High loads All of our db's are running on Dell Poweredge 2650 with 2 P4 Xeons (2.8 -> 3.06 GHz) with 8 to 12 GB of memory. The databases are running on attached Dell Powervault 220s running raid5. The databases were created and taken into production before I started working here and are very flat. Most of the major tables have a combined primary key using an int field and a single char field. There are some additional indexes on some tables. Most queries I see in the logs are running at less than .01 seconds with many significantly slower. We are trying to narrow down the performance problem to either the db or the hardware. As the dba, I need to try and get these db's tuned to the best possible way considering the current db state. We are in the beginning of a complete db redesign and application re-write, but the completion and deployment of the new db and app are quite a ways off. Anyway, we are running the following: PE 2650 w/ 2 cpus (2.8-3.06) - HT on 8-12 GB memory OS on raid 0 DB's on Powervaults 220S using raid 5 (over 6 disks) Each Postgresql cluster has 2 db up to almost 170db's (project to level out the num of db's/cluster is being started) DB's are no bigger than a few GB in size (largest is about 11GB according to a du -h) Running RH ES 2.1 Here is the postgresql.conf from the server with the 11GB db: max_connections = 64 shared_buffers = 32768 # 256MB=32768(buffs)*8192(bytes/buff) max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes sort_mem = 4096 # 256MB=4096(bytes/proc)*64(procs or conns) checkpoint_segments = 16 # in logfile segments, min 1, 16MB each checkpoint_timeout = 30 # range 30-3600, in seconds effective_cache_size = 131072 # typically 8KB each log_connections = true log_pid = true log_statement = true log_duration = true log_timestamp = true stats_start_collector = true stats_reset_on_server_start = true stats_command_string = true stats_row_level = true stats_block_level = true LC_MESSAGES = 'en_US' LC_MONETARY = 'en_US' LC_NUMERIC = 'en_US' LC_TIME = 'en_US' Here is top (server running pretty good right now) 9:28am up 25 days, 16:02, 2 users, load average: 0.54, 0.33, 0.22 94 processes: 91 sleeping, 3 running, 0 zombie, 0 stopped CPU0 states: 64.0% user, 0.1% system, 0.0% nice, 34.0% idle CPU1 states: 29.0% user, 9.0% system, 0.0% nice, 60.0% idle CPU2 states: 2.0% user, 0.1% system, 0.0% nice, 96.0% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle Mem: 7720072K av, 7711648K used, 8424K free, 265980K shrd, 749888K buff Swap: 2096440K av, 22288K used, 2074152K free 6379304K cached Here is top from another server (with the most db's): 9:31am up 25 days, 16:05, 5 users, load average: 2.34, 3.39, 4.28 147 processes: 145 sleeping, 2 running, 0 zombie, 0 stopped CPU0 states: 6.0% user, 1.0% system, 0.0% nice, 91.0% idle CPU1 states: 9.0% user, 4.0% system, 0.0% nice, 85.0% idle CPU2 states: 9.0% user, 3.0% system, 0.0% nice, 86.0% idle CPU3 states: 9.0% user, 4.0% system, 0.0% nice, 85.0% idle Mem: 7721096K av, 7708040K used, 13056K free, 266132K shrd, 3151336K buff Swap: 2096440K av, 24208K used, 2072232K free 3746596K cached Thanks for any help/advice, Chris
Your second server has queuing (load averages are highish), only 2 processes running, and almost all cycles are idle. You need to track down your bottleneck. Have you looked at iostat/vmstat? I think it would be useful to post these, ideally both before and after full vacuum analyze. /Aaron ----- Original Message ----- From: "Chris Hoover" <revoohc@sermonaudio.com> To: <pgsql-performance@postgresql.org> Cc: <revoohc@sermonaudio.com> Sent: Friday, April 23, 2004 9:31 AM Subject: [PERFORM] Help with performance problems I need some help. I have 5 db servers running our database servers, and they all are having various degrees of performance problems. The problems we are experiencing are: 1. General slowness 2. High loads All of our db's are running on Dell Poweredge 2650 with 2 P4 Xeons (2.8 -> 3.06 GHz) with 8 to 12 GB of memory. The databases are running on attached Dell Powervault 220s running raid5. The databases were created and taken into production before I started working here and are very flat. Most of the major tables have a combined primary key using an int field and a single char field. There are some additional indexes on some tables. Most queries I see in the logs are running at less than .01 seconds with many significantly slower. We are trying to narrow down the performance problem to either the db or the hardware. As the dba, I need to try and get these db's tuned to the best possible way considering the current db state. We are in the beginning of a complete db redesign and application re-write, but the completion and deployment of the new db and app are quite a ways off. Anyway, we are running the following: PE 2650 w/ 2 cpus (2.8-3.06) - HT on 8-12 GB memory OS on raid 0 DB's on Powervaults 220S using raid 5 (over 6 disks) Each Postgresql cluster has 2 db up to almost 170db's (project to level out the num of db's/cluster is being started) DB's are no bigger than a few GB in size (largest is about 11GB according to a du -h) Running RH ES 2.1 Here is the postgresql.conf from the server with the 11GB db: max_connections = 64 shared_buffers = 32768 # 256MB=32768(buffs)*8192(bytes/buff) max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes sort_mem = 4096 # 256MB=4096(bytes/proc)*64(procs or conns) checkpoint_segments = 16 # in logfile segments, min 1, 16MB each checkpoint_timeout = 30 # range 30-3600, in seconds effective_cache_size = 131072 # typically 8KB each log_connections = true log_pid = true log_statement = true log_duration = true log_timestamp = true stats_start_collector = true stats_reset_on_server_start = true stats_command_string = true stats_row_level = true stats_block_level = true LC_MESSAGES = 'en_US' LC_MONETARY = 'en_US' LC_NUMERIC = 'en_US' LC_TIME = 'en_US' Here is top (server running pretty good right now) 9:28am up 25 days, 16:02, 2 users, load average: 0.54, 0.33, 0.22 94 processes: 91 sleeping, 3 running, 0 zombie, 0 stopped CPU0 states: 64.0% user, 0.1% system, 0.0% nice, 34.0% idle CPU1 states: 29.0% user, 9.0% system, 0.0% nice, 60.0% idle CPU2 states: 2.0% user, 0.1% system, 0.0% nice, 96.0% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle Mem: 7720072K av, 7711648K used, 8424K free, 265980K shrd, 749888K buff Swap: 2096440K av, 22288K used, 2074152K free 6379304K cached Here is top from another server (with the most db's): 9:31am up 25 days, 16:05, 5 users, load average: 2.34, 3.39, 4.28 147 processes: 145 sleeping, 2 running, 0 zombie, 0 stopped CPU0 states: 6.0% user, 1.0% system, 0.0% nice, 91.0% idle CPU1 states: 9.0% user, 4.0% system, 0.0% nice, 85.0% idle CPU2 states: 9.0% user, 3.0% system, 0.0% nice, 86.0% idle CPU3 states: 9.0% user, 4.0% system, 0.0% nice, 85.0% idle Mem: 7721096K av, 7708040K used, 13056K free, 266132K shrd, 3151336K buff Swap: 2096440K av, 24208K used, 2072232K free 3746596K cached Thanks for any help/advice, Chris ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Chris, > I need some help. I have 5 db servers running our database servers, and > they all are having various degrees of performance problems. The problems > we are experiencing are: I'mm confused. You're saying "general slowness" but say that most queries run in under .01 seconds. And you say "high loads" but the TOP snapshots you provide show servers with 2 CPUs idle. Are you sure you actually *have* a performance issue? -- Josh Berkus Aglio Database Solutions San Francisco
I know the numbers look ok, but we are definetly suffering. Also, if I try to run any sort of vacuum or other db activity during normal business hours, load goes through the roof. I have seen loads of over 10 when trying to vacuum the larger cluster and would have to kill the vacuums due to complaints. I think this is probably related to the hardware configuration, but I want to make sure that there are no changes I could make configuration wise to the db that might lighten the problem. I'm especially want to make sure that I have the memory parameters set to good numbers for my db's so that I can minimize thrashing between the postgres memory pools and the hard drive. I am thinking that this may be a big issue here? Thanks for any help, Chris On Friday 23 April 2004 12:42, Josh Berkus wrote: > Chris, > > > I need some help. I have 5 db servers running our database servers, and > > they all are having various degrees of performance problems. The > > problems we are experiencing are: > > I'mm confused. You're saying "general slowness" but say that most queries > run in under .01 seconds. And you say "high loads" but the TOP snapshots > you provide show servers with 2 CPUs idle. > > Are you sure you actually *have* a performance issue?
On Fri, 23 Apr 2004, Chris Hoover wrote: > DB's on Powervaults 220S using raid 5 (over 6 disks) What controller is this, the adaptec? We've found it to be slower than the LSI megaraid based controller, but YMMV. > Running RH ES 2.1 Are you running the latest kernel for ES 2.1? Early 2.4 kernels are pretty pokey and have some odd behaviour under load that later 2.4 kernels seemed to fix. > Here is the postgresql.conf from the server with the 11GB db: > > max_connections = 64 > shared_buffers = 32768 # 256MB=32768(buffs)*8192(bytes/buff) > max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes > max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes IF you're doing lots of updates and such, you might want these higher. Have you vacuumed full the databases since taking over? > sort_mem = 4096 # 256MB=4096(bytes/proc)*64(procs or conns) Sorry, that's wrong. sort_mem is measure in kbytes. i.e. 8192 means 8 megs sort_mem. Try setting it a bit higher (you've got LOTS of ram in these boxes) to something like 16 or 32 meg. > checkpoint_segments = 16 # in logfile segments, min 1, 16MB each > checkpoint_timeout = 30 # range 30-3600, in seconds > effective_cache_size = 131072 # typically 8KB each This still looks low. On one machine you're showing kernel cache of about .7 gig, on the other it's 6 gig. 6 gigs of kernel cache would be a setting of 800000. It's more of a nudge factor than an exact science, so don't worry too much. If you've got fast I/O look at lowering random page cost to something between 1 and 2. We use 1.3 to 1.4 on most of our machines with fast drives under them. I'd use vmstat to see if you're I/O bound. also, look for index bloat. Before 7.4 it was a serious problem. With 7.4 regular vacuuming should reclaim most lost space, but there are corner cases where you still might need to re-index.
Sorry for the confusion here. I can't run any sort of vacuum durin the day due to performance hits. However, I have run vacuums at night. Several nights a week I run a vacuumdb -f -z on all of the clusters. I can take serveral hours to complete, but it does complete. During the day, I have tried to run a vacuumdb -v and a vacuumdb -z -v during the day since I read it is supposed to help performance, but as I said, it causes to much of a stress on the system. I did change the vacuumdb script to do set the vacuum_mem to 512 when vacuuming to try and help the situation (from the script: ${PATHNAME}psql $PSQLOPT $ECHOOPT -c "SET vacuum_mem=524288;SET autocommit TO 'on';VACUUM $full $verbose $analyze $table" -d $db ), and I reset it to 8192 at the end. Anyway, thank you for the ideas so far, and any additional will be greatly appreciated. Chris On Friday 23 April 2004 13:44, Kevin Barnard wrote: > Chris Hoover wrote: > >I know the numbers look ok, but we are definetly suffering. Also, if I > > try to run any sort of vacuum or other db activity during normal business > > hours, load goes through the roof. I have seen loads of over 10 when > > trying to vacuum the larger cluster and would have to kill the vacuums > > due to complaints. > > This is your problem then. You have to regularly vacuum the DB. You > might want to dump and reload or schedule a vacuum full. If you don't > it doesn't matter what you do you will never get decent performance. > Make sure you vacuum as a superuser this way you get system tables as well. > > Killing a vacuum is bad it tends to make the situation worse. If you > need to vaccuum one table at a time. > > >I think this is probably related to the hardware configuration, but I want > > to make sure that there are no changes I could make configuration wise to > > the db that might lighten the problem. > > > >I'm especially want to make sure that I have the memory parameters set to > > good numbers for my db's so that I can minimize thrashing between the > > postgres memory pools and the hard drive. I am thinking that this may be > > a big issue here? > > Get the vacuum done and don't worry about the hardware or the settings > until afterwords.
On Friday 23 April 2004 13:21, scott.marlowe wrote: > On Fri, 23 Apr 2004, Chris Hoover wrote: > > DB's on Powervaults 220S using raid 5 (over 6 disks) > > What controller is this, the adaptec? We've found it to be slower than > the LSI megaraid based controller, but YMMV. > We are using the perc3/di controller. Believe it is using the megaraid driver. > > Running RH ES 2.1 > > Are you running the latest kernel for ES 2.1? Early 2.4 kernels are > pretty pokey and have some odd behaviour under load that later 2.4 > kernels seemed to fix. > I'm not sure we are at the latest and greatest for 2.1, but I am trying to get there. Management won't let me do the upgrade w/o first testing/proving it will not cause any more issues. Due to all of the current issues, and the criticality of these systems to our bottom line, they are being very careful with any change that may impact our users further. We are waiting on our datacenter to plug in our test server and powervault so that we can test the upgrades the the latest RH 2.1 kernel. > > Here is the postgresql.conf from the server with the 11GB db: > > > > max_connections = 64 > > shared_buffers = 32768 # 256MB=32768(buffs)*8192(bytes/buff) > > max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes > > max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes > > IF you're doing lots of updates and such, you might want these higher. > Have you vacuumed full the databases since taking over? > > > sort_mem = 4096 # 256MB=4096(bytes/proc)*64(procs or conns) > > Sorry, that's wrong. sort_mem is measure in kbytes. i.e. 8192 means 8 > megs sort_mem. Try setting it a bit higher (you've got LOTS of ram in > these boxes) to something like 16 or 32 meg. > > > checkpoint_segments = 16 # in logfile segments, min 1, 16MB each > > checkpoint_timeout = 30 # range 30-3600, in seconds > > effective_cache_size = 131072 # typically 8KB each > > This still looks low. On one machine you're showing kernel cache of about > .7 gig, on the other it's 6 gig. 6 gigs of kernel cache would be a > setting of 800000. It's more of a nudge factor than an exact science, so > don't worry too much. I believe changing this requires a restart of the cluster (correct?). If so, I'll try bumping up the effective_cache_size over the weekend. Also, will all of the memory available to these machines, should I be running with larger shared_buffers? It seems like 256M is a bit small. > > If you've got fast I/O look at lowering random page cost to something > between 1 and 2. We use 1.3 to 1.4 on most of our machines with fast > drives under them. > > I'd use vmstat to see if you're I/O bound. > If we end up being I/O bound, should the random page cost be set higher? > also, look for index bloat. Before 7.4 it was a serious problem. With > 7.4 regular vacuuming should reclaim most lost space, but there are corner > cases where you still might need to re-index. > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org Thanks for the help, Chris
Chris, > Sorry for the confusion here. I can't run any sort of vacuum durin the day > due to performance hits. However, I have run vacuums at night. Several > nights a week I run a vacuumdb -f -z on all of the clusters. I can take > serveral hours to complete, but it does complete. Well, here's your first problem: since your FSM pages is low, and you're only vacuuming once a day, you've got to have some serious table and index bloat. SO you're going to need to do VACUUM FULL on all of your databases, and then REINDEX on all of your indexes. After that, raise your max_fsm_pages to something useful, like 1,000,000. Of course, data on your real rate of updates would help more. If you're getting severe disk choke when you vacuum, you probably are I/O bound. You may want to try something which allows you to vacuum one table at a time, either pg_autovacuum or a custom script. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: >Chris, > > > >>Sorry for the confusion here. I can't run any sort of vacuum durin the day >>due to performance hits. However, I have run vacuums at night. Several >>nights a week I run a vacuumdb -f -z on all of the clusters. I can take >>serveral hours to complete, but it does complete. >> >> > >Well, here's your first problem: since your FSM pages is low, and you're only >vacuuming once a day, you've got to have some serious table and index bloat. >SO you're going to need to do VACUUM FULL on all of your databases, and then >REINDEX on all of your indexes. > >After that, raise your max_fsm_pages to something useful, like 1,000,000. Of >course, data on your real rate of updates would help more. > >If you're getting severe disk choke when you vacuum, you probably are I/O >bound. You may want to try something which allows you to vacuum one table >at a time, either pg_autovacuum or a custom script. > > > Tom and Josh recently gave me some help about setting the fsm settings which was quite useful. The full message is at http://archives.postgresql.org/pgsql-performance/2004-04/msg00229.php and the 'most interesting' posrtion was: Actually, since he's running 7.4, there's an even better way. Do a "VACUUM VERBOSE" (full-database vacuum --- doesn't matter whether you ANALYZE or not). At the end of the very voluminous output, you'll see something like INFO: free space map: 240 relations, 490 pages stored; 4080 total pages needed DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared memory. Here, I would need max_fsm_relations = 240 and max_fsm_pages = 4080 to exactly cover the present freespace needs of my system. I concur with the suggestion to bump that up a good deal, of course, but that gives you a real number to start from. The DETAIL part of the message shows my current settings (which are the defaults) and what the FSM is costing me in shared memory space. Good luck Ron
On Friday 23 April 2004 14:57, Ron St-Pierre wrote: Does this apply to 7.3.4 also? > Actually, since he's running 7.4, there's an even better way. Do a > "VACUUM VERBOSE" (full-database vacuum --- doesn't matter whether you > ANALYZE or not). At the end of the very voluminous output, you'll see > something like > > > INFO: free space map: 240 relations, 490 pages stored; 4080 total pages > needed DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB > shared memory. > > > Here, I would need max_fsm_relations = 240 and max_fsm_pages = 4080 to > exactly cover the present freespace needs of my system. I concur with > the suggestion to bump that up a good deal, of course, but that gives > you a real number to start from. > > > The DETAIL part of the message shows my current settings (which are the > defaults) and what the FSM is costing me in shared memory space. > > Good luck > Ron > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings
On Fri, 23 Apr 2004, Chris Hoover wrote: > On Friday 23 April 2004 13:21, scott.marlowe wrote: > > On Fri, 23 Apr 2004, Chris Hoover wrote: > > > DB's on Powervaults 220S using raid 5 (over 6 disks) > > > > What controller is this, the adaptec? We've found it to be slower than > > the LSI megaraid based controller, but YMMV. > > > We are using the perc3/di controller. Believe it is using the megaraid > driver. No, that's the adaptec, the PERC3/DC is the lsi megaraid. See if there are newer drivers for the RAID card. In terms of performance, the adaptec and lsi drivers have improved considerably in later versions. In terms of stability they've largely gotten better with a few in between releases on the megaraid getting poor grades. The latest / greatest from Dell is pretty up to date. > > > Running RH ES 2.1 > > > > Are you running the latest kernel for ES 2.1? Early 2.4 kernels are > > pretty pokey and have some odd behaviour under load that later 2.4 > > kernels seemed to fix. > > > I'm not sure we are at the latest and greatest for 2.1, but I am trying to get > there. Management won't let me do the upgrade w/o first testing/proving it > will not cause any more issues. Due to all of the current issues, and the > criticality of these systems to our bottom line, they are being very careful > with any change that may impact our users further. Understood. It's why my production box is still running a 2.4 kernel on rh 7.2 with pg 7.2. They just work, but for us stability AND performance are both good with our load. You can install a new kernel and set up the machine to still boot off of the old one, and test on the weekend to see how it behaves under simulated load. Mining the logs for slow queries is a good way to build one. while we don't upgrade our production server's applications to the latest and greatest all the time (i.e. php or postgresql or openldap) we always run the latest security patches, and I think the latest kernels had security fixes for ES 2.1, so NOT upgrading it dangerous. Late model linux kernels (the 2.0.x and 2.2.x where x>20) tend to be VERY stable and very conservatively backported and upgraded, so running a new one isn't usually a big risk. > > > Here is the postgresql.conf from the server with the 11GB db: > > > > > > max_connections = 64 > > > shared_buffers = 32768 # 256MB=32768(buffs)*8192(bytes/buff) > > > max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes > > > max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes > > > > IF you're doing lots of updates and such, you might want these higher. > > Have you vacuumed full the databases since taking over? > > > > > sort_mem = 4096 # 256MB=4096(bytes/proc)*64(procs or conns) > > > > Sorry, that's wrong. sort_mem is measure in kbytes. i.e. 8192 means 8 > > megs sort_mem. Try setting it a bit higher (you've got LOTS of ram in > > these boxes) to something like 16 or 32 meg. > > > > > checkpoint_segments = 16 # in logfile segments, min 1, 16MB each > > > checkpoint_timeout = 30 # range 30-3600, in seconds > > > effective_cache_size = 131072 # typically 8KB each > > > > This still looks low. On one machine you're showing kernel cache of about > > .7 gig, on the other it's 6 gig. 6 gigs of kernel cache would be a > > setting of 800000. It's more of a nudge factor than an exact science, so > > don't worry too much. > I believe changing this requires a restart of the cluster (correct?). If so, > I'll try bumping up the effective_cache_size over the weekend. > > Also, will all of the memory available to these machines, should I be running > with larger shared_buffers? It seems like 256M is a bit small. No, you probably shouldn't. PostgreSQL doesn't "cache" in the classical sense. If all backends close, the stuff they had in their buffers disappears in a flash. So, it's generally considered better to let the kernel do the bulk of the caching, and having the buffer area be large enough to hold a large portion, if not all, of your working set of data. But between the cache management which is dirt simple and works but seems to have performance issues with large numbers of buffers, and the fact that all the memory in it disappears when the last backend using it. for instance, in doing the following seq scan select: explain analyze select * from test; where test is a ~10 megabyte table, the first time I ran it it took 5 seconds to run. The second time took it 2.5, the third 1.9, and it levelled out around there. Starting up another backend and running the same query got a 1.9 second response also. Shutting down both connections, and running the query again, with only the kernel for caching, I got 1.9. That's on a 2.4.2[2-4] kernel. > > If you've got fast I/O look at lowering random page cost to something > > between 1 and 2. We use 1.3 to 1.4 on most of our machines with fast > > drives under them. > > > > I'd use vmstat to see if you're I/O bound. > > > If we end up being I/O bound, should the random page cost be set higher? Not necessarily. Often times on a machine with a lot of memory, you are better off using index scans where disk seek time would be expensive, but with indexes in ram, the page cost in comparison to seq pages is almost 1, with a slight overhead cost. So, lowering the random page cost favors indexes, generally. If your I/O subsystem is doing a lot of seq scans, when only part of the data set is ever really being worked on, this tends to flush out the kernel cache, and we wind up going back to disk over and over. On the other hand, if your data is normally going to be sequentially accessed, then you'll have to invest in better RAID hardware / more drives etc... but with 12 gigs on one box, and an already reasonably fast I/O subsystem in place, I'd think a lower random page cost would help, not hurt performance. Have you explain analyzed your slower queries?
Chris Hoover wrote: >On Friday 23 April 2004 14:57, Ron St-Pierre wrote: >Does this apply to 7.3.4 also? > No it doesn't, I didn't look back through the thread far enough to see what you were running. I tried it on 7.3.4 and none of the summary info listed below was returned. FWIW one of our DBs was slowing down considerably on an update (30+ minutes) and after I changed max_fsm_pages from the 7.4 default of 20,000 to 50,000, it completed in about eight minutes. Ron > > >> Actually, since he's running 7.4, there's an even better way. Do a >> "VACUUM VERBOSE" (full-database vacuum --- doesn't matter whether you >> ANALYZE or not). At the end of the very voluminous output, you'll see >> something like >> >> >> INFO: free space map: 240 relations, 490 pages stored; 4080 total pages >>needed DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB >>shared memory. >> >> >> Here, I would need max_fsm_relations = 240 and max_fsm_pages = 4080 to >> exactly cover the present freespace needs of my system. I concur with >> the suggestion to bump that up a good deal, of course, but that gives >> you a real number to start from. >> >> >> The DETAIL part of the message shows my current settings (which are the >> defaults) and what the FSM is costing me in shared memory space. >> >>Good luck >>Ron >> >> >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 7: don't forget to increase your free space map settings >> >> > > > > >
"Chris Hoover" <revoohc@sermonaudio.com> writes: > Here is the postgresql.conf from the server with the 11GB db: > max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes It's unlikely that that's enough for an 11Gb database, especially if you're only vacuuming a few times a week. You should make your next run be a "vacuum verbose" and look at the output to get an idea of what sort of table bloat you're seeing, but I'll bet it's bad ... regards, tom lane
scott.marlowe wrote: > On Fri, 23 Apr 2004, Chris Hoover wrote: > > >>DB's on Powervaults 220S using raid 5 (over 6 disks) > > > What controller is this, the adaptec? We've found it to be slower than > the LSI megaraid based controller, but YMMV. Wow, really? You got any more details of the chipset, mobo and kernel driver ? I've been taken to my wits end wrestling with an LSI MegaRAID 320-1 controller on a supermicro board all weekend. I just couldn't get anything more than 10MB/sec out of it with megaraid driver v1 OR v2 in Linux 2.4.26, nor the version in 2.6.6-rc2. After 2 days of humming the Adaptec mantra I gave in and switched the array straight onto the onboard Adaptec 160 controller (same cable and everything). Software RAID 5 gets me over 40MB sec for a nominal cpu hit - more than 4 times what I could get out of the MegaRAID controller :( Even the 2nd SCSI-2 channel gets 40MB/sec max (pg_xlog :) And HOW LONG does it take to detect drives during POST....ohhhh never mind ... I really just wanna rant :) There should be a free counseling service for enraged sysops. -- Rob Fielding rob@dsvr.net www.dsvr.co.uk Development Designer Servers Ltd
Ok, I was able to run a vacuumdb -f -v on my largest db over the weekend. However, I am having trouble reading the results of the table portion. Here area a couple of tables, what should I be looking at. First table is the key table to the db, and the second is the largest table in the db. Thanks Chris INFO: --Relation public.clmhdr-- INFO: Pages 32191: Changed 0, reaped 5357, Empty 0, New 0; Tup 339351: Vac 48358, Keep/VTL 0/0, UnUsed 129, MinLen 560, MaxLen 696; Re-using: Free/Av ail. Space 42011004/32546120; EndEmpty/Avail. Pages 0/5310. CPU 0.53s/0.09u sec elapsed 0.61 sec. INFO: Index clmhdr_pkey: Pages 1429; Tuples 339351: Deleted 48358. CPU 0.06s/0.28u sec elapsed 4.54 sec. INFO: Index clmhdr_hdr_user_id_idx: Pages 1711; Tuples 339351: Deleted 48358. CPU 0.09s/0.31u sec elapsed 2.40 sec. INFO: Index clmhdr_hdr_clm_status_idx: Pages 1237; Tuples 339351: Deleted 48358. CPU 0.03s/0.26u sec elapsed 1.66 sec. INFO: Index clmhdr_hdr_create_dt_idx: Pages 1475; Tuples 339351: Deleted 48358. CPU 0.05s/0.24u sec elapsed 1.96 sec. INFO: Index clmhdr_inv_idx: Pages 1429; Tuples 339351: Deleted 48358. CPU 0.08s/0.22u sec elapsed 1.20 sec. INFO: Index clmhdr_userid_status_idx: Pages 2161; Tuples 339351: Deleted 48358. CPU 0.05s/0.18u sec elapsed 3.02 sec. INFO: Rel clmhdr: Pages: 32191 --> 28247; Tuple(s) moved: 8257. CPU 0.37s/1.81u sec elapsed 16.24 sec. INFO: Index clmhdr_pkey: Pages 1429; Tuples 339351: Deleted 8257. CPU 0.00s/0.03u sec elapsed 0.03 sec. INFO: Index clmhdr_hdr_user_id_idx: Pages 1743; Tuples 339351: Deleted 8257. CPU 0.00s/0.05u sec elapsed 0.04 sec. INFO: Index clmhdr_hdr_clm_status_idx: Pages 1265; Tuples 339351: Deleted 8257. CPU 0.00s/0.04u sec elapsed 0.03 sec. INFO: Index clmhdr_hdr_create_dt_idx: Pages 1503; Tuples 339351: Deleted 8257. CPU 0.00s/0.04u sec elapsed 0.12 sec. INFO: Index clmhdr_inv_idx: Pages 1429; Tuples 339351: Deleted 8257. CPU 0.00s/0.04u sec elapsed 0.03 sec. INFO: Index clmhdr_userid_status_idx: Pages 2203; Tuples 339351: Deleted 8257. CPU 0.01s/0.03u sec elapsed 0.04 sec. INFO: --Relation public.sent837-- INFO: Pages 463552: Changed 0, reaped 6690, Empty 0, New 0; Tup 27431539: Vac 204348, Keep/VTL 0/0, UnUsed 2801, MinLen 107, MaxLen 347; Re-using: Free/Avail. Space 54541468/34925860; EndEmpty/Avail. Pages 0/70583. CPU 10.68s/2.18u sec elapsed 188.32 sec. INFO: Index sent837_pkey: Pages 124424; Tuples 27431539: Deleted 204348. CPU 4.24s/3.45u sec elapsed 144.79 sec. INFO: Rel sent837: Pages: 463552 --> 459954; Tuple(s) moved: 91775. CPU 1.12s/9.36u sec elapsed 20.13 sec. INFO: Index sent837_pkey: Pages 124424; Tuples 27431539: Deleted 91775. CPU 3.51s/2.03u sec elapsed 6.13 sec.
On Mon, 26 Apr 2004, Rob Fielding wrote: > scott.marlowe wrote: > > On Fri, 23 Apr 2004, Chris Hoover wrote: > > > > > >>DB's on Powervaults 220S using raid 5 (over 6 disks) > > > > > > What controller is this, the adaptec? We've found it to be slower than > > the LSI megaraid based controller, but YMMV. > > Wow, really? You got any more details of the chipset, mobo and kernel > driver ? We're running on a Dell 2650, the controller is the U320 LSI megaraid 2 channel (they only make the one that I know of right now). Don't know my mobo chipset offhand, but might be able to find out what one dell includes on the 2650. The kernel driver is the latest megaraid2 driver as of about Feb this year. > I've been taken to my wits end wrestling with an LSI MegaRAID 320-1 > controller on a supermicro board all weekend. I just couldn't get > anything more than 10MB/sec out of it with megaraid driver v1 OR v2 in > Linux 2.4.26, nor the version in 2.6.6-rc2. After 2 days of humming the > Adaptec mantra I gave in and switched the array straight onto the > onboard Adaptec 160 controller (same cable and everything). Software > RAID 5 gets me over 40MB sec for a nominal cpu hit - more than 4 times > what I could get out of the MegaRAID controller :( Even the 2nd SCSI-2 > channel gets 40MB/sec max (pg_xlog :) > > And HOW LONG does it take to detect drives during POST....ohhhh never > mind ... I really just wanna rant :) There should be a free counseling > service for enraged sysops. I wonder if your controller is broken or something? Or maybe on a PCI slow that has to share IRQs or something. I've had great luck with SuperMicro mobos in the past (we're talking dual PPro 200 mobos, so seriously, IN THE PAST here... ) Hell, my Dual PPro 200 with an old MegaRAID 428 got 18 Megs a second cfer rate no problem. Have you tried that lsi card in another machine / mobo combo? Can you disable the onboard adaptec? We have on our Dell 2650s, the only active controllers are the onboard IDE and the add in LSI-320-2 controller. We're running ours with 128 Meg cache (I think could be 64) set to write back. I think our throughput on a RAID-1 pair was somewhere around 40+ megs a second reads with bonnie++ With RAID-5 it was not really much faster at reads (something like 60 megs a second) but was much more scalable under heavy parellel read/write access for PostgreSQL. Have you updated the BIOS on the mobo to see if that helps? I'm just throwing darts at the wall here.