Re: Large memory machine and PG 9.2.9 - Mailing list pgsql-admin
From | Lacey Powers |
---|---|
Subject | Re: Large memory machine and PG 9.2.9 |
Date | |
Msg-id | 54ED049F.1060506@gmail.com Whole thread Raw |
In response to | Large memory machine and PG 9.2.9 (jesper@krogh.cc) |
List | pgsql-admin |
On 02/24/2015 12:19 PM, jesper@krogh.cc wrote: > Hi. > > We have just moved our 9.2.9 instance onto new beefy iron. The machine has > 48 Intel cores and 3TB of memory. Running linux 3.13.0-43 (Ubuntu 12.04). > > The problem is a bit hard to describe, but I do suspect it is related to > the large memory and probably kernel og pg-kernel interaction. > > Performance is nice up and until the point where the memory is full at > which point sluggish behaviour comes up. sar -B output is. > > 17:30:01 29513.79 140026.51 634164.09 0.00 160093.42 0.00 > 0.00 0.00 0.00 > 17:35:01 31351.13 154801.18 638880.17 0.12 184323.22 0.00 > 0.00 0.00 0.00 > 17:40:01 38269.69 128701.23 652375.35 0.23 176369.40 0.00 > 0.00 0.00 0.00 > 17:45:01 34834.14 135371.82 627765.26 0.00 169779.58 0.00 > 0.00 0.00 0.00 > 17:50:01 34039.17 134630.64 627500.30 0.00 174259.04 0.00 > 0.00 0.00 0.00 > 17:55:01 30318.70 150791.13 612534.75 0.00 163425.51 0.00 > 0.00 0.00 0.00 > 18:05:01 28446.80 122103.38 549891.01 0.26 141756.52 0.00 > 612.77 556.60 90.83 > 18:10:01 12944.16 39222.64 332848.27 4.27 82317.48 0.00 > 3037.11 2725.14 89.73 > 18:15:01 12955.10 47841.51 453714.33 3.95 106397.71 1018.39 > 4811.64 5421.81 93.00 > 18:20:01 16393.43 64063.10 548341.21 2.48 149489.93 6447.13 > 2238.06 8537.87 98.30 > 18:25:01 15725.89 59096.20 502043.56 0.27 152932.96 5197.78 > 2783.81 7782.02 97.50 > 18:30:01 12735.95 50460.08 394507.90 0.09 143488.71 4645.20 > 2141.35 6621.27 97.56 > 18:35:01 11995.37 52743.57 414669.31 0.02 134363.87 5096.32 > 1708.52 6668.57 98.00 > 18:40:01 11448.30 43185.84 373441.27 0.35 109712.93 3247.41 > 1772.93 4902.79 97.66 > 18:45:01 10959.95 44993.48 402033.19 0.04 115914.63 3157.24 > 2393.26 5378.58 96.90 > 18:50:01 11270.25 50853.00 431117.15 0.30 105697.26 3951.30 > 1951.31 5778.41 97.90 > 18:55:01 10086.69 59206.44 362027.12 0.70 104760.65 6928.91 > 1684.04 8497.73 98.66 > > Sluggish'ness starts at 18:10'ish and continues. Load drops and IO is very > small.. > > All good suggestions welcome. > > sysctl changes > $ grep '^vm' /etc/sysctl.conf > vm.swappiness = 0 > vm.dirty_background_ratio = 3 > vm.dirty_ratio = 15 > vm.dirty_expire_centisecs = 500 > vm.dirty_writeback_centisecs = 100 > > Thanks > > Jesper > > > Hello Jesper, At first glance, looking at your sysctl config and noting that you have 48 cores and 3TB of RAM, you might consider setting vm.dirty_bytes and vm.dirty_background_bytes instead of vm.dirty_ratio, and vm.dirty_background_ratio. Your current settings start background writing at 92GB of RAM, and forcingIO to be synchronousat 420GB of RAM, which is crazy amount of data to push onto a controller or disks. =( Even if you set dirty_background_ratio to 1% and dirty_ratio to 2%, the numbers would still be 31GB and 62GB respectively, so something lower than 1% seems most useful, which is why you have the dirty_bytes and dirty_background_bytes controls available. Capturing the output from /proc/meminfo (in a while loop or with watch) would also be useful for checking your hypothesis regarding the large RAM, during the stalls you note. If the stalls start when writeback reaches about 420GB (with your current settings), that should let you know that the stalls are from writing back all that dirty RAM.Otherwise, you'll probably need to look elsewhere. I would keep the vm.dirty_bytes and vm.dirty_background_bytes lower than the amount of cache on your raid controller, maybe 50% and 25% of the total size, respectively? That should be a reasonable starting point for testing, and you can adjust the values up and down as needed to get the performance you're looking for. Hope this is helpful. =) Regards, Lacey
pgsql-admin by date: