Re: Spread checkpoint sync - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: Spread checkpoint sync |
Date | |
Msg-id | 4D357733.3090104@2ndquadrant.com Whole thread Raw |
In response to | Re: Spread checkpoint sync (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
Robert Haas wrote: > Idea #4: For ext3 filesystems that like to dump the entire buffer > cache instead of only the requested file, write a little daemon that > runs alongside of (and completely indepdently of) PostgreSQL. Every > 30 s, it opens a 1-byte file, changes the byte, fsyncs the file, and > closes the file, thus dumping the cache and preventing a ridiculous > growth in the amount of data to be sync'd at checkpoint time. > Today's data suggests this problem has been resolved in the latest kernels. I saw the "giant flush/series of small flushes" pattern quite easily on the CentOS5 system I last did heavy pgbench testing on. The one I'm testing now has kernel 2.6.23 (Ubuntu 10.04), and it doesn't show it at all. Here's what a bad checkpoint looks like on this system: LOG: checkpoint starting: xlog DEBUG: checkpoint sync: number=1 file=base/24746/36596.8 time=7651.601 msec DEBUG: checkpoint sync: number=2 file=base/24746/36506 time=0.001 msec DEBUG: checkpoint sync: number=3 file=base/24746/36596.2 time=1891.695 msec DEBUG: checkpoint sync: number=4 file=base/24746/36596.4 time=7431.441 msec DEBUG: checkpoint sync: number=5 file=base/24746/36515 time=0.216 msec DEBUG: checkpoint sync: number=6 file=base/24746/36596.9 time=4422.892 msec DEBUG: checkpoint sync: number=7 file=base/24746/36596.12 time=954.242 msec DEBUG: checkpoint sync: number=8 file=base/24746/36237_fsm time=0.002 msec DEBUG: checkpoint sync: number=9 file=base/24746/36503 time=0.001 msec DEBUG: checkpoint sync: number=10 file=base/24746/36584 time=41.401 msec DEBUG: checkpoint sync: number=11 file=base/24746/36596.7 time=885.921 msec DEBUG: checkpoint sync: number=12 file=base/24813/30774 time=0.002 msec DEBUG: checkpoint sync: number=13 file=base/24813/24822 time=0.005 msec DEBUG: checkpoint sync: number=14 file=base/24746/36801 time=49.801 msec DEBUG: checkpoint sync: number=15 file=base/24746/36601.2 time=610.996 msec DEBUG: checkpoint sync: number=16 file=base/24746/36596 time=16154.361 msec DEBUG: checkpoint sync: number=17 file=base/24746/36503_vm time=0.001 msec DEBUG: checkpoint sync: number=18 file=base/24746/36508 time=0.000 msec DEBUG: checkpoint sync: number=19 file=base/24746/36596.10 time=9759.898 msec DEBUG: checkpoint sync: number=20 file=base/24746/36596.3 time=3392.727 msec DEBUG: checkpoint sync: number=21 file=base/24746/36237 time=0.150 msec DEBUG: checkpoint sync: number=22 file=base/24746/36596.11 time=9153.437 msec DEBUG: could not forward fsync request because request queue is full CONTEXT: writing block 1057833 of relation base/24746/36596 [>800 more of these] DEBUG: checkpoint sync: number=23 file=base/24746/36601.1 time=48697.179 msec DEBUG: could not forward fsync request because request queue is full DEBUG: checkpoint sync: number=24 file=base/24746/36597 time=0.080 msec DEBUG: checkpoint sync: number=25 file=base/24746/36237_vm time=0.001 msec DEBUG: checkpoint sync: number=26 file=base/24813/24822_fsm time=0.001 msec DEBUG: checkpoint sync: number=27 file=base/24746/36503_fsm time=0.000 msec DEBUG: could not forward fsync request because request queue is full CONTEXT: writing block 20619 of relation base/24746/36601 DEBUG: checkpoint sync: number=28 file=base/24746/36506_fsm time=0.000 msec DEBUG: checkpoint sync: number=29 file=base/24746/36596_vm time=0.040 msec DEBUG: could not forward fsync request because request queue is full CONTEXT: writing block 278967 of relation base/24746/36596 DEBUG: could not forward fsync request because request queue is full CONTEXT: writing block 1582400 of relation base/24746/36596 DEBUG: checkpoint sync: number=30 file=base/24746/36596.6 time=0.002 msec DEBUG: checkpoint sync: number=31 file=base/24813/11647 time=0.004 msec DEBUG: checkpoint sync: number=32 file=base/24746/36601 time=201.632 msec DEBUG: checkpoint sync: number=33 file=base/24746/36801_fsm time=0.001 msec DEBUG: checkpoint sync: number=34 file=base/24746/36596.5 time=0.001 msec DEBUG: checkpoint sync: number=35 file=base/24746/36599 time=0.000 msec DEBUG: checkpoint sync: number=36 file=base/24746/36587 time=0.005 msec DEBUG: checkpoint sync: number=37 file=base/24746/36596_fsm time=0.001 msec DEBUG: checkpoint sync: number=38 file=base/24746/36596.1 time=0.001 msec DEBUG: checkpoint sync: number=39 file=base/24746/36801_vm time=0.001 msec LOG: checkpoint complete: wrote 9515 buffers (29.0%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=32.409 s, sync=111.615 s, total=144.052 s; sync files=39, longest=48.697 s, average=2.853 s Here the file that's been brutally delayed via backend contention is #23, after already seeing quite long delays on the earlier ones. That I've never seen with earlier kernels running ext3. This is good in that it makes it more likely a spread sync approach that works on XFS will also work on these newer kernels with ext4. Then the only group we wouldn't be able to help if that works the ext3 + old kernel crowd. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
pgsql-hackers by date: