Re: postgresql latency & bgwriter not doing its job - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: postgresql latency & bgwriter not doing its job |
Date | |
Msg-id | alpine.DEB.2.10.1408281947350.28571@sto Whole thread Raw |
In response to | Re: postgresql latency & bgwriter not doing its job (Claudio Freire <klaussfreire@gmail.com>) |
List | pgsql-hackers |
>> I tried that by setting: >> vm.dirty_expire_centisecs = 100 >> vm.dirty_writeback_centisecs = 100 >> >> So it should start writing returned buffers at most 2s after they are >> returned, if I understood the doc correctly, instead of at most 35s. >> >> The result is that with a 5000s 25tps pretty small load (the system can do >> 300tps with the default configuration), I lost 2091 (1.7%) of transactions, >> that is they were beyond the 200ms schedule limit. Another change is that >> overall the lost transactions are more spread than without this setting, >> although there still is stretches of unresponsiveness. >> >> So although the situation is significantly better, it is still far from good >> with the much reduced value I tried. > > What do the checkpoint logs look like now? (especially interested in sync times) Here is an extract: LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 4893 buffers (4.8%); 0 transaction log file(s) added,0 removed, 0 recycled; write=128.430 s, sync=0.378 s, total=128.921 s; sync files=11, longest=0.375 s, average=0.034s LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 6200 buffers (6.1%); 0 transaction logfile(s) added, 3 removed, 0 recycled; write=128.934 s, sync=0.240 s, total=129.280 s; sync files=7, longest=0.132s, average=0.034 s LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 6177 buffers (6.0%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=130.146 s, sync=0.322 s, total=130.592 s; sync files=7,longest=0.185 s, average=0.046 s LOG: checkpoint starting: xlog LOG: checkpoint complete: wrote 6185 buffers(6.0%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=132.673 s, sync=0.143 s, total=132.909s; sync files=5, longest=0.078 s, average=0.028 s LOG: checkpoint starting: xlog LOG: checkpoint complete:wrote 6188 buffers (6.0%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=130.415 s, sync=0.170s, total=130.676 s; sync files=5, longest=0.132 s, average=0.034 s LOG: checkpoint starting: xlog LOG: checkpointcomplete: wrote 6203 buffers (6.1%); 0 transaction log file(s) added, 3 removed, 0 recycled; write=131.726s, sync=0.444 s, total=132.256 s; sync files=5, longest=0.441 s, average=0.088 s ... Basically sync is between 0.1-0.5 seconds. I had one particulary bad stretch of unresponsiveness, which is not clearly related to a slow checkpoint sync: progress: 4065.0 s, 28.7 tps, lat 11.886 ms stddev 48.470, lag 18.851 ms, 0 skipped progress: 4066.2 s, 1.6 tps, lat 952.830ms stddev 280.673, lag 155.310 ms, 25 skipped progress: 4067.0 s, 2.7 tps, lat 1067.730 ms stddev 316.321, lag 171.766ms, 14 skipped progress: 4068.4 s, 1.4 tps, lat 1147.392 ms stddev 240.337, lag 132.297 ms, 35 skipped progress:4069.7 s, 2.3 tps, lat 1034.543 ms stddev 213.786, lag 154.453 ms, 35 skipped progress: 4070.2 s, 1.8 tps, lat562.715 ms stddev 0.000, lag 182.833 ms, 12 skipped progress: 4071.3 s, 3.6 tps, lat 600.929 ms stddev 108.232, lag 162.723ms, 25 skipped progress: 4072.5 s, 1.7 tps, lat 1077.187 ms stddev 77.654, lag 168.849 ms, 31 skipped progress:4073.3 s, 1.3 tps, lat 1298.093 ms stddev 0.000, lag 168.882 ms, 21 skipped progress: 4074.0 s, 2.7 tps, lat 920.704ms stddev 183.587, lag 165.333 ms, 24 skipped progress: 4075.3 s, 2.4 tps, lat 756.051 ms stddev 118.241, lag 176.863ms, 29 skipped progress: 4076.1 s, 1.3 tps, lat 1424.548 ms stddev 0.000, lag 0.000 ms, 17 skipped progress: 4077.3s, 2.4 tps, lat 791.090 ms stddev 338.729, lag 155.403 ms, 26 skipped progress: 4078.1 s, 27.4 tps, lat 46.834 msstddev 198.812, lag 3.915 ms, 0 skipped -- Fabien
pgsql-hackers by date: