Thread: Several optimization options (config/hardware)
Hi, we want to see if we can gain better performance with our postgresql database. In the last year the amount of data growed from ~25G to now ~140G and we're currently developing a new feature that needs to get data faster from the database. The system is both read and write heavy. At first I want to give you an overview over the hardware, software and configuration and the changes that I see we could check out. I'd be very happy if you could review and tell if the one or the other is nonsense. Hardware: - CPU: 4x4 Cores Intel Xeon L5630 @ 2.13GHz - RAM: 64GB - RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i (I don't know the actual number of discs) - A single partition for data and wal-files Software - RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64 - postgresql90-server-9.0.6-1PGDG.rhel6.x86_64 Configuration (selected from settings) ------------------------------+-----------+--------+------------------- name | setting | unit | source ------------------------------+-----------+--------+------------------- autovacuum | on | [NULL] | configuration file checkpoint_completion_target | 0.5 | [NULL] | default checkpoint_segments | 16 | | configuration file checkpoint_timeout | 300 | s | default commit_delay | 0 | | default default_statistics_target | 100 | | default effective_cache_size | 16384 | 8kB | default fsync | on | [NULL] | default log_min_duration_statement | 250 | ms | configuration file log_temp_files | -1 | kB | default maintenance_work_mem | 16384 | kB | default max_connections | 2000 | | configuration file random_page_cost | 4 | [NULL] | default shared_buffers | 1310720 | 8kB | configuration file synchronous_commit | on | [NULL] | default wal_buffers | 256 | 8kB | configuration file wal_sync_method | fdatasync | [NULL] | default wal_writer_delay | 200 | ms | default work_mem | 1024 | kB | default ------------------------------+-----------+--------+------------------- Some stats: $ free -m total used free shared buffers cached Mem: 64413 63764 649 0 37 60577 -/+ buffers/cache: 3148 61264 Swap: 8191 333 7858 iostat shows nearly all the time ~100% io utilization of the disc serving the pg data / wal files. I'd suggest the following changes: (Improve query planning) 1) Increase effective_cache_size to 48GB 2) Increase work_mem to 10MB (alternatively first activate log_temp_files to see if this is really needed 3) Reduce random_page_cost to 1 (WAL / I/O) 4) Set synchronous_commit=off 5) Increase checkpoint_segments to 32 6) Increase wal_buffers to 16M 7) Add new discs (RAID) for wal files / pg_xlog (Misc) 8) Increase maintainance_work_mem to 1GB In parallel I'd review statistics like long running queries, index usage (which ones can be dropped) etc. At first I'd like to try out 1) to 3) as they affect the query planner, so that some indices that are not used right now might be used then. After this change I'd review index usage and clean up those / improve queries. Then, finally I'd test WAL / I/O related changes. Do you think this makes sense? Do you see other improvements, or do you need some more information? Thanx in advance, cheers, Martin
Attachment
Hi, On 2 Květen 2012, 15:19, Martin Grotzke wrote: > Hi, > > we want to see if we can gain better performance with our postgresql > database. In the last year the amount of data growed from ~25G to now > ~140G and we're currently developing a new feature that needs to get > data faster from the database. The system is both read and write heavy. What does the read/write heavy mean? How much data / transactions you need to handle, how many clients, etc.? > At first I want to give you an overview over the hardware, software and > configuration and the changes that I see we could check out. I'd be very > happy if you could review and tell if the one or the other is nonsense. > > Hardware: > - CPU: 4x4 Cores Intel Xeon L5630 @ 2.13GHz > - RAM: 64GB > - RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i > (I don't know the actual number of discs) > - A single partition for data and wal-files Have you done any benchmarks with that hardware, to verify the performance? Can you do that now (i.e. stopping the database so that you can run them)? > > Software > - RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64 > - postgresql90-server-9.0.6-1PGDG.rhel6.x86_64 > > Configuration (selected from settings) > ------------------------------+-----------+--------+------------------- > name | setting | unit | source > ------------------------------+-----------+--------+------------------- > autovacuum | on | [NULL] | configuration file > checkpoint_completion_target | 0.5 | [NULL] | default > checkpoint_segments | 16 | | configuration file > checkpoint_timeout | 300 | s | default > commit_delay | 0 | | default > default_statistics_target | 100 | | default > effective_cache_size | 16384 | 8kB | default > fsync | on | [NULL] | default > log_min_duration_statement | 250 | ms | configuration file > log_temp_files | -1 | kB | default > maintenance_work_mem | 16384 | kB | default > max_connections | 2000 | | configuration file > random_page_cost | 4 | [NULL] | default > shared_buffers | 1310720 | 8kB | configuration file > synchronous_commit | on | [NULL] | default > wal_buffers | 256 | 8kB | configuration file > wal_sync_method | fdatasync | [NULL] | default > wal_writer_delay | 200 | ms | default > work_mem | 1024 | kB | default > ------------------------------+-----------+--------+------------------- > > Some stats: > $ free -m > total used free shared buffers cached > Mem: 64413 63764 649 0 37 60577 > -/+ buffers/cache: 3148 61264 > Swap: 8191 333 7858 > > iostat shows nearly all the time ~100% io utilization of the disc > serving the pg data / wal files. That's rather useless value, especially if you don't know details about the RAID array. With multiple spindles, the array may be 100% utilized (ratio of time it spent servicing requests) yet it may absorb more. Imagine a RAID with 2 drives, each 50% utilized. The array may report 100% utilization yet it's actually 50% utilized ... > > I'd suggest the following changes: > > (Improve query planning) > 1) Increase effective_cache_size to 48GB > 2) Increase work_mem to 10MB (alternatively first activate > log_temp_files to see if this is really needed > 3) Reduce random_page_cost to 1 > > (WAL / I/O) > 4) Set synchronous_commit=off > 5) Increase checkpoint_segments to 32 > 6) Increase wal_buffers to 16M > 7) Add new discs (RAID) for wal files / pg_xlog > > (Misc) > 8) Increase maintainance_work_mem to 1GB > > In parallel I'd review statistics like long running queries, index usage > (which ones can be dropped) etc. Reviewing long-running stats queries is a good starting point - you need to find out where the bottleneck is (I/O, CPU, ...) and this may be helpful. Dropping unused indexes is quite difficult - most of the time I see the case with multiple similar indexes, all of them are used but it's possible to remove some of them with minimal performance impact. > At first I'd like to try out 1) to 3) as they affect the query planner, > so that some indices that are not used right now might be used then. If you don't know where the issue is, it's difficult to give any advices. But in general, I'd say this 1) setting effective_cache_size to 48G - seems like a good idea, better match for your environment 2) increasing work_mem - might help, but you should check the slow queries first (enabling log_temp_files is a good idea) 3) setting random_page_cost is a really bad idea IMHO, especially with spinners, rather weak controller and unknown details about the array So do (1), maybe (2) and I'd definitely vote against (3). Regarding the other options: 4) synchronous_commit=off - well, this may improve the performance, but it won't fix the underlying issues and it may introduce other application-level issues (expecting the transaction to be committed etc.) 5) Increase checkpoint_segments to 32 - Do you see a lot of checkpoint-related warnings in the log? If not, this probably won't fix anything. If you actually do have issues with checkpoints, I'd recommend increasing the default checkpoint timeout (eg. to 30 minutes), significantly increasing the number of segments (e.g. to 64 or more) and tuning the completion target (e.g. to 0.9). 6) Increase wal_buffers to 16M - may help, but I would not expect a tremendous improvement. 7) Add new discs (RAID) for wal files / pg_xlog - good idea, moving those to a separate spindles may help a lot. > After this change I'd review index usage and clean up those / improve > queries. > > Then, finally I'd test WAL / I/O related changes. Why do you want to do this last? Chances are that writes are causing many of the I/O issues (because it needs to actually fsync the data). Tuning this will improve the general I/O performance etc. > Do you think this makes sense? Do you see other improvements, or do you > need some more information? First of all, find out more about the RAID array. Do some basic I/O tests (with dd etc.). Moreover, I've noticed you do have max_connections=2000. That's insanely high in most cases, unless you're using commit_delay/commit_siblings. A reasonable value is usually something like "num of cpus + num of drives" although that's just a rough estimate. But given that you have 16 cores, I'd expect ~100 or something like that. If you need more, I'd recommend a pooler (e.g. pgpool). Tomas
Martin Grotzke wrote: > we want to see if we can gain better performance with our postgresql > database. In the last year the amount of data growed from ~25G to now > ~140G and we're currently developing a new feature that needs to get > data faster from the database. The system is both read and write heavy. > > At first I want to give you an overview over the hardware, software and > configuration and the changes that I see we could check out. I'd be very > happy if you could review and tell if the one or the other is nonsense. > > Hardware: > - CPU: 4x4 Cores Intel Xeon L5630 @ 2.13GHz > - RAM: 64GB > - RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i > (I don't know the actual number of discs) > - A single partition for data and wal-files > > Software > - RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64 > - postgresql90-server-9.0.6-1PGDG.rhel6.x86_64 You could try different kernel I/O elevators and see if that improves something. I have made good experiences with elevator=deadline and elevator=noop. Yours, Laurenz Albe
Hi Laurenz, On 05/03/2012 09:26 AM, Albe Laurenz wrote: > Martin Grotzke wrote: >> we want to see if we can gain better performance with our postgresql >> database. In the last year the amount of data growed from ~25G to now >> ~140G and we're currently developing a new feature that needs to get >> data faster from the database. The system is both read and write > heavy. >> >> At first I want to give you an overview over the hardware, software > and >> configuration and the changes that I see we could check out. I'd be > very >> happy if you could review and tell if the one or the other is > nonsense. >> >> Hardware: >> - CPU: 4x4 Cores Intel Xeon L5630 @ 2.13GHz >> - RAM: 64GB >> - RAID 1 (1+0) HP Company Smart Array G6 controllers, P410i >> (I don't know the actual number of discs) >> - A single partition for data and wal-files >> >> Software >> - RHEL 6, Kernel 2.6.32-220.4.1.el6.x86_64 >> - postgresql90-server-9.0.6-1PGDG.rhel6.x86_64 > > You could try different kernel I/O elevators and see if that improves > something. > > I have made good experiences with elevator=deadline and elevator=noop. Ok, great info. I'm not sure at which device to look honestly to check the current configuration. mount/fstab shows the device /dev/mapper/VG01-www for the relevant partition. When I check iostat high utilization is reported for the devices dm-4 and sda (showing nearly the same numbers for util always), so I suspect that dm-4 is mapped on sda. This is the current config: $ cat /sys/block/sda/queue/scheduler noop anticipatory deadline [cfq] $ cat /sys/block/dm-4/queue/scheduler none Which of them should be changed? I'll discuss this also with our hosting provider next week, he'll know what has to be done. Cheers, Martin
Attachment
Martin Grotzke wrote: >> You could try different kernel I/O elevators and see if that improves >> something. >> >> I have made good experiences with elevator=deadline and elevator=noop. > Ok, great info. > > I'm not sure at which device to look honestly to check the current > configuration. > > mount/fstab shows the device /dev/mapper/VG01-www for the relevant > partition. When I check iostat high utilization is reported for the > devices dm-4 and sda (showing nearly the same numbers for util always), > so I suspect that dm-4 is mapped on sda. Use the option -N of "iostat" to see long device names. You can use "lvm" to figure out the mapping. > This is the current config: > $ cat /sys/block/sda/queue/scheduler > noop anticipatory deadline [cfq] > $ cat /sys/block/dm-4/queue/scheduler > none Do you mean literal "none" or do you mean that the file is empty? > Which of them should be changed? > I'll discuss this also with our hosting provider next week, he'll know > what has to be done. I'd just add "elevator=deadline" to the kernel line in /etc/grub.conf and reboot. At least if it is a dedicated database machine. But of course you want to change it on the fly first to test - not knowing the answer to your question, I would change it in both devices if I can. Yours, Laurenz Albe
On 05/04/2012 09:57 AM, Albe Laurenz wrote: > Martin Grotzke wrote: >>> You could try different kernel I/O elevators and see if that >>> improves something. >>> >>> I have made good experiences with elevator=deadline and >>> elevator=noop. > >> Ok, great info. >> >> I'm not sure at which device to look honestly to check the current >> configuration. >> >> mount/fstab shows the device /dev/mapper/VG01-www for the relevant >> partition. When I check iostat high utilization is reported for >> the devices dm-4 and sda (showing nearly the same numbers for util >> always), >> so I suspect that dm-4 is mapped on sda. > > Use the option -N of "iostat" to see long device names. You can use > "lvm" to figure out the mapping. iostat with -N shows VG01-www for dm-4. For lvm/lvdisplay/dmsetup I get "Permission denied" as I have no root/sudo permissions. I need to check this with our hosting provider (hopefully we have a call today). >> This is the current config: $ cat /sys/block/sda/queue/scheduler >> noop anticipatory deadline [cfq] $ cat >> /sys/block/dm-4/queue/scheduler none > > Do you mean literal "none" or do you mean that the file is empty? "none" was the output of `cat /sys/block/dm-4/queue/scheduler`. >> Which of them should be changed? I'll discuss this also with our >> hosting provider next week, he'll know what has to be done. > > I'd just add "elevator=deadline" to the kernel line in > /etc/grub.conf and reboot. At least if it is a dedicated database > machine. > > But of course you want to change it on the fly first to test - not > knowing the answer to your question, I would change it in both > devices if I can. Ok, makes sense. Cheers, Martin