Re: Checkpoint spikes - Mailing list pgsql-performance
From | Greg Smith |
---|---|
Subject | Re: Checkpoint spikes |
Date | |
Msg-id | 4B1762E1.7010002@2ndquadrant.com Whole thread Raw |
In response to | Re: Analyse without locking? (Richard Neill <rn214@cam.ac.uk>) |
Responses |
Re: Checkpoint spikes
|
List | pgsql-performance |
Richard Neill wrote: > On the other hand, the system never seems to be I/O bound. (we have at > least 25 MB/sec of write bandwidth, and use a small fraction of that > normally). I would bet that if you sampled vmstat or iostat every single second, you'd discover there's a large burst in write speed for the same few seconds that queries are stuck. If you're averaging out the data over a 5 second or longer period, you'll never see it--the spike will get lost in the average. You just can't monitor checkpoint spikes unless you're watching I/O with an extremely tight time resolution. Watching the "Writeback" figure in /proc/meminfo is helpful too, that is where I normally see everything jammed up. > Here's the typical checkpoint logs: > 2009-12-03 06:21:21 GMT LOG: checkpoint complete: wrote 12400 buffers > (2.2%); 0 transaction log file(s) added, 0 removed, 12 recycled; > write=149.883 s, sync=5.143 s, total=155.040 s See that "sync" number there? That's your problem; while that sync operation is going on, everybody else is grinding to a halt waiting for it. Not a coincidence that the duration is about the same amount of time that your queries are getting stuck. This example shows 12400 buffers = 97MB of total data written. Since those writes are pretty random I/O, it's easily possible to get stuck for a few seconds waiting for that much data to make it out to disk. You only gave the write phase a couple of minutes to spread things out over; meanwhile, Linux may not even bother starting to write things out until 30 seconds into that, so the effective time between when writes to disk start and when the matching sync happens on your system is extremely small. That's not good--you have to give that several minutes of breathing room if you want to avoid checkpoint spikes. > We're using 8.4.1, on ext4 with SSD. Is it possible that something > exotic is occurring to do with write barriers (on by default in ext4, > and we haven't changed this). > Perhaps a low priority IO process for writing the previous WAL to disk > is blocking a high-priority transaction (which is trying to write to > the new WAL). If the latter is trying to sync, could the large amount > of lower priority IO be getting in the way thanks to write barriers? > If so, can I safely turn off write barriers? Linux is pretty dumb in general here. fsync operations will usually end up writing out way more of the OS buffer cache than they need to. And the write cache can get quite big before pdflush decides it should actually do some work, the whole thing is optimized for throughput rather than latency. I don't really trust barriers at all, so I don't know if there's some specific tuning you can do with those to improve things. Your whole system is bleeding edge craziness IMHO--SSD, ext4, write barriers, all stuff that just doesn't work reliably yet far as I'm concerned. ...but that's not what you want to hear. When I can suggest that should help is increasing checkpoint_segments (>32), checkpoint_timeout (>=10 minutes), checkpoint_completion_target (0.9), and lowering the amount of writes Linux will cache before it gets more aggressive about flushing them. Those things will fight the root cause of the problem, by giving more time between the "write" and "sync" phases of the checkpoint. It's ok if "write" takes a long while, decreasing the "sync" number is your goal you need to keep your eye on. I've written a couple of articles on this specific topic if you want more background on the underlying issues, it's kind of heavy reading: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm http://www.westnet.com/~gsmith/content/linux-pdflush.htm http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
pgsql-performance by date: