Re: checkpoint spikes - Mailing list pgsql-general
From | Janning |
---|---|
Subject | Re: checkpoint spikes |
Date | |
Msg-id | 201006101849.45712.ml@planwerk6.de Whole thread Raw |
In response to | checkpoint spikes (Janning <ml@planwerk6.de>) |
Responses |
Re: checkpoint spikes
Re: checkpoint spikes |
List | pgsql-general |
Hi again, nobody answered my question :-(, so i did some research. I was convinced to set: checkpoint_segments = 16 checkpoint_timeout = 60min and echo 2 > /proc/sys/vm/dirty_background_ratio this helped a lot. Our freeze time was reduced from 10 seconds to 5 seconds. But this is still way too long. I know the best fix would be good disks, but at the moment it is not affordable. Behalf of my research I have lots of questions concerning Checkpoints: 1. With raising checkpoint_timeout, is there any downgrade other than slower after-crash recovery? 2. Is there a way to calculate the after-crash recovery time with a certain checkpoint_timeout? How long would be approx. for a checkpoint_timeout of 60 minutes? 3. Is it sane to set checkpoint_timeout to 120min or even to 600min? 4. We have checkpoint_completion_target set to 0.9. The checkpoint is always ready much earlier. Is it ought to be like this? I thought the work would be spread across 90% of 60 minutes? Instead it takes only 10 Minutes for the checkpoint to be finished. 5. Does anybody know if I can set dirty_background_ratio to 0.5? As we have 12 GB RAM and rather slow disks 0,5% would result in a maximum of 61MB dirty pages. Your help is very appreciated? kind regards Janning PS: Do I need to post this question on pgsql-perfomance? If so, please let me know. On Wednesday 09 June 2010 18:53:23 Janning wrote: > Hi, > > we currently encounter an increasing load on our website. With the > increasing load we see some problems on our database. so we checked what > happens and we saw spikes in our load when checkpoints are about to finish. > > Our configuration: > > max_connections = 125 > ssl = false > shared_buffers = 500MB > work_mem = 15MB > maintenance_work_mem = 250MB > synchronous_commit = off > full_page_writes = off > > checkpoint_segments = 10 > checkpoint_timeout = 10min > checkpoint_completion_target = 0.9 > > random_page_cost = 2 > effective_cache_size = 5000MB > > autovacuum = off > > (we put autovacuum to off because we suspected it). > > all the other parameters are set to default (beside log parameters and so > on). > > Our machine has 12 GB RAM, i7-975 CPU and a SW-Raid-1 for datadir and > another one for the rest of the server (including postgresql logfiles). > Disk are "Barracuda 7200.11 SATA 3Gb/s 1.5-TB" and we are running debian > lenny. > > these are our checkpoint log statements: > > 2010-06-09 17:24:27 CEST [6375]: [1-1] LOG: checkpoint starting: time > 2010-06-09 17:28:09 CEST [6375]: [2-1] LOG: checkpoint complete: wrote > 2861 buffers (4.5%); 0 transaction log file(s) added, 0 removed, 1 > recycled; write=193.057 s, sync=29.259 s, total=222.353 s > 2010-06-09 17:34:27 CEST [6375]: [3-1] LOG: checkpoint starting: time > 2010-06-09 17:39:09 CEST [6375]: [4-1] LOG: checkpoint complete: wrote > 3247 buffers (5.1%); 0 transaction log file(s) added, 0 removed, 2 > recycled; write=255.255 s, sync=26.911 s, total=282.177 s > 2010-06-09 17:44:27 CEST [6375]: [5-1] LOG: checkpoint starting: time > 2010-06-09 17:49:41 CEST [6375]: [6-1] LOG: checkpoint complete: wrote > 2746 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 2 > recycled; write=280.743 s, sync=33.392 s, total=314.147 s > 2010-06-09 17:54:27 CEST [6375]: [7-1] LOG: checkpoint starting: time > 2010-06-09 17:58:59 CEST [6375]: [8-1] LOG: checkpoint complete: wrote > 3118 buffers (4.9%); 0 transaction log file(s) added, 0 removed, 1 > recycled; write=253.293 s, sync=18.585 s, total=271.892 s > 2010-06-09 18:04:27 CEST [6375]: [9-1] LOG: checkpoint starting: time > 2010-06-09 18:08:46 CEST [6375]: [10-1] LOG: checkpoint complete: wrote > 2695 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 2 > recycled; write=225.173 s, sync=33.789 s, total=258.972 s > 2010-06-09 18:14:27 CEST [6375]: [11-1] LOG: checkpoint starting: time > 2010-06-09 18:18:30 CEST [6375]: [12-1] LOG: checkpoint complete: wrote > 2868 buffers (4.5%); 0 transaction log file(s) added, 0 removed, 2 > recycled; write=215.561 s, sync=27.701 s, total=243.271 s > > What we saw is a rather long sync time. And exactly at this time, our > responses become slow and the server load increases. > > this is from pg_stat_bgwriter: > > checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | > maxwritten_clean | buffers_backend | buffers_alloc > -------------------+-----------------+--------------------+---------------+ >------------------+-----------------+--------------- 3495 | 0 > | 9070242 | 10798927 | 14421 | 6412707 | > 208340755 > > > We tried to increase checkpoint_timeout to 20 minutes but it was getting > worse: > > 2010-06-09 18:34:27 CEST [6375]: [13-1] LOG: checkpoint starting: time > 2010-06-09 18:42:49 CEST [6375]: [14-1] LOG: checkpoint complete: wrote > 2956 buffers (4.6%); 0 transaction log file(s) added, 0 removed, 1 > recycled; write=448.265 s, sync=54.087 s, total=502.377 s > > here we see a 54 sec sync time and a much higher load on sync time compared > to the 10 minutes checkpoint_timeout. > > do you have any hints for us how to tune our configuration to avoid spikes? > > kind regards > Janning
pgsql-general by date: