Thread: Tuning Help - What did I do wrong?
We have a pretty busy linux server running postgres 8.1.4, waiting to upgrade until 8.3 to avoid dump/restoring twice. # cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 3704217600 3592069120 112148480 0 39460864 2316271616 Swap: 2516918272 270336 2516647936 # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 3 cpu MHz : 2992.795 The postgresql.conf was basically the default so I decided to increase the cache size and a couple paramaters to make more use of that memory - here's what I did: shared_buffers = 16384 (was 1000) work_mem = 16384 (was 1024) wal_buffers = 24 (was 8) checkpoint_segments = 5 (was 3) effective_cache_size = 10000 (was 1000) stats_command_string = on (was off) stats_block_level = on (was off) stats_row_level = on (was off) In order to do this I had to change /proc/sys/kernel/shmmax to 536870912 (don't have /etc/sysctl) Also, the entire cluster gets vacuumed analyzed nightly. After making these changes, the performance on the server actually worsened. I slowly backed off on some of the paramaters but didn't seem to help. Wondering if those changes are silly? For a server this size I didn't think this would be problematic. Thank you, Josh
On 10/4/07, Josh Trutwin <josh@trutwins.homeip.net> wrote: > We have a pretty busy linux server running postgres 8.1.4, waiting to > upgrade until 8.3 to avoid dump/restoring twice. You should immediate update your version to 8.1.whateverislatest. That requires no dump / restore and it is a bug fix update. I doubt this problem is because you're out of date on patches, but who knows... > # cat /proc/meminfo > total: used: free: shared: buffers: cached: > Mem: 3704217600 3592069120 112148480 0 39460864 2316271616 > Swap: 2516918272 270336 2516647936 Well, you've got plenty of memory, and a large chunk is being used as cache. > The postgresql.conf was basically the default so I decided to > increase the cache size and a couple paramaters to make more use of > that memory - here's what I did: > > shared_buffers = 16384 (was 1000) > work_mem = 16384 (was 1024) > wal_buffers = 24 (was 8) > checkpoint_segments = 5 (was 3) > effective_cache_size = 10000 (was 1000) > stats_command_string = on (was off) > stats_block_level = on (was off) > stats_row_level = on (was off) Your changes seem reasonable. > Also, the entire cluster gets vacuumed analyzed nightly. You should look into running the autovacuum daemon. for heavily used databases nightly vacuuming may not be enough. > After making these changes, the performance on the server actually > worsened. I slowly backed off on some of the paramaters but didn't > seem to help. Most likely turning on stats collection slowed you down a bit. We need to see examples of what's slow, including explain analyze output for slow queries. Also a brief explanation of the type of load your database server is seeing. I.e. is it a lot of little transactions, mostly read, batch processing, lots of users, one user, etc... Right now we don't have enough info to really help you.
Oh, and in addition to my previous message, you should use tools like vmstat, iostat and top to get an idea of what your server is doing. What kind of drive subsystem do you have? What kind of raid controller? etc...
On Thu, 4 Oct 2007 11:19:22 -0500 "Scott Marlowe" <scott.marlowe@gmail.com> wrote: > We need to see examples of what's slow, including explain analyze > output for slow queries. Also a brief explanation of the type of > load your database server is seeing. I.e. is it a lot of little > transactions, mostly read, batch processing, lots of users, one > user, etc... Right now we don't have enough info to really help > you. Sorry, this server is for a few (100+?) websites so it's running along site apache, php. All connections to postgresql (except for the occaional psql console login) are done from php requests, using the same user (basically there are two users, the one php uses and postgres). The bulk of the activity would be reads, but certainly inesrts/updates/deletes would be interspersed in there. Most of the activity is done via auto-commits, not many long transactions. From your followup email: > ... you should use tools like vmstat, iostat and top to get an idea > of what your server is doing. # vmstat procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 3 1 0 268 68332 39016 2201436 0 0 3 3 4 2 3 4 2 sorry about the wrapping... iostat is not found - will see if I can download it. top typically shows postmaster as the top process with 10-15% of the CPU, followed by apache threads. 12:01pm up 104 days, 12:05, 2 users, load average: 9.75, 9.30, 7.70 215 processes: 214 sleeping, 1 running, 0 zombie, 0 stopped CPU states: 0.1% user, 0.0% system, 0.0% nice, 0.4% idle Mem: 3617400K av, 3552784K used, 64616K free, 0K shrd, 37456K buff Swap: 2457928K av, 264K used, 2457664K free 2273664K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 31797 postgres 17 0 28836 28M 1784 S 0 8.5 0.7 10:15 postmaster > What kind of drive subsystem do you have? What kind of raid > controller? etc... Gathering more information on this - Raid is a software RAID-1. Some information: I believe itI believe it # df -h Filesystem Size Used Avail Use% Mounted on /dev/md0 66G 50G 16G 76% / /dev/sda1 15M 6.6M 8.5M 44% /boot # cat /proc/mdstat Personalities : [raid0] [raid1] read_ahead 1024 sectors md0 : active raid1 sdb3[0] sdc3[1] 70573440 blocks [2/2] [UU] unused devices: <none> Thanks for your help, I'm more of a developer guy so let me know what else is useful. Josh
On 10/4/07, Josh Trutwin <josh@trutwins.homeip.net> wrote: > On Thu, 4 Oct 2007 11:19:22 -0500 > "Scott Marlowe" <scott.marlowe@gmail.com> wrote: > > > We need to see examples of what's slow, including explain analyze > > output for slow queries. Also a brief explanation of the type of > > load your database server is seeing. I.e. is it a lot of little > > transactions, mostly read, batch processing, lots of users, one > > user, etc... Right now we don't have enough info to really help > > you. > > Sorry, this server is for a few (100+?) websites so it's running > along site apache, php. All connections to postgresql (except for > the occaional psql console login) are done from php requests, using > the same user (basically there are two users, the one php uses and > postgres). The bulk of the activity would be reads, but > certainly inesrts/updates/deletes would be interspersed in there. > Most of the activity is done via auto-commits, not many long > transactions. So, are there certain queries that are much slower than the others? Run them from psql with explain analyze in front of them and post the query and the output here. > From your followup email: > > > ... you should use tools like vmstat, iostat and top to get an idea > > of what your server is doing. > > # vmstat > procs memory swap io > system cpu > r b w swpd free buff cache si so bi bo in cs > us sy id > 3 1 0 268 68332 39016 2201436 0 0 3 3 4 > 2 3 4 2 vmstat needs to be run for a while to give you useful numbers. try: vmstat 5 and let it run for a few minutes. The first line won't count so much, but after that you'll get more reasonable numbers. > iostat is not found - will see if I can download it. top typically > shows postmaster as the top process with 10-15% of the CPU, followed > by apache threads. What OS are you on? > 12:01pm up 104 days, 12:05, 2 users, load average: 9.75, 9.30, > 7.70 That's pretty heavy load. I notice there's no wait % listed for CPU, so I assume it's not a late model Linux kernel or anything. > 215 processes: 214 sleeping, 1 running, 0 zombie, 0 stopped > CPU states: 0.1% user, 0.0% system, 0.0% nice, 0.4% idle > Mem: 3617400K av, 3552784K used, 64616K free, 0K shrd, 37456K > buff > Swap: 2457928K av, 264K used, 2457664K free > 2273664K cached > > PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME > COMMAND > 31797 postgres 17 0 28836 28M 1784 S 0 8.5 0.7 10:15 > postmaster Are the postmasters using most of the CPU? OR the other processes? > > What kind of drive subsystem do you have? What kind of raid > > controller? etc... > > Gathering more information on this - Raid is a software > RAID-1. Some information: OK, given that it's read mostly, it's likely not a problem that a faster RAID controller would help. Possibly more drives in a RAID 10 would help a little, but let's look at optimizing your query and postmaster first. Do you have the postmaster configured to log long running queries? That's a good starting point. also google pg_fouine (I think I spelt it right) for analyzing your logs. It's quite likely the issue here is one long running query that chewing all your I/O or CPU and making everything else slow. Once we find that query things should get better and we can worry about performance tuning in a more leisurely manner.
>>> On Thu, Oct 4, 2007 at 10:28 AM, in message <20071004102804.3418912e@joplin.trutwins.homeip.net>, Josh Trutwin <josh@trutwins.homeip.net> wrote: > running postgres 8.1.4 > # cat /proc/meminfo > total: used: free: shared: buffers: cached: > Mem: 3704217600 3592069120 112148480 0 39460864 2316271616 > shared_buffers = 16384 (was 1000) > effective_cache_size = 10000 (was 1000) It's kind of silly to tell PostgreSQL that its total cache space is 10000 pages when you've got more than that in shared buffers plus all that OS cache space. Try something around 285000 pages for effective_cache_size. > stats_command_string = on (was off) > stats_block_level = on (was off) > stats_row_level = on (was off) > After making these changes, the performance on the server actually > worsened. I slowly backed off on some of the paramaters but didn't > seem to help. Did you try turning off the collection of those additional statistics? That isn't free. You didn't get specific about what you saw in performance problems. If you are seeing occasional "freezes" of all queries, you are likely looking at a known issue with "spikiness" of disk output. For some this can be corrected by using very aggressive background writer settings. Some have solved it by disabling OS write delays. Some haven't found a solution and are waiting for 8.3; there have been some serious changes made to attempt to resolve this issue. -Kevin
On Thu, 04 Oct 2007 14:03:07 -0500 "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > It's kind of silly to tell PostgreSQL that its total cache space is > 10000 pages when you've got more than that in shared buffers plus > all that OS cache space. Try something around 285000 pages for > effective_cache_size. Good point. > > stats_command_string = on (was off) > > stats_block_level = on (was off) > > stats_row_level = on (was off) > > > After making these changes, the performance on the server actually > > worsened. I slowly backed off on some of the paramaters but > > didn't seem to help. > > Did you try turning off the collection of those additional > statistics? That isn't free. I turned off all but row level since I decided to try turning autovacuum on. > You didn't get specific about what you saw in performance > problems. If you are seeing occasional "freezes" of all queries, > you are likely looking at a known issue with "spikiness" of disk > output. For some this can be corrected by using very aggressive > background writer settings. Some have solved it by disabling OS > write delays. Some haven't found a solution and are waiting for > 8.3; there have been some serious changes made to attempt to > resolve this issue. Thanks - I put some additional information in replies to Scott, but mainly the performance of the web sites that are talking to postgres is the problem - people calling in, etc. I turned on slow query logging to see if I can find if it's particular queries or something else? Josh