Home > mailing lists

Re: Spread checkpoint sync - Mailing list pgsql-hackers

From	Greg Smith
Subject	Re: Spread checkpoint sync
Date	January 18, 2011 10:19:27
Msg-id	4D357733.3090104@2ndquadrant.com Whole thread Raw
In response to	Re: Spread checkpoint sync (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

Robert Haas wrote:
> Idea #4: For ext3 filesystems that like to dump the entire buffer
> cache instead of only the requested file, write a little daemon that
> runs alongside of (and completely indepdently of) PostgreSQL.  Every
> 30 s, it opens a 1-byte file, changes the byte, fsyncs the file, and
> closes the file, thus dumping the cache and preventing a ridiculous
> growth in the amount of data to be sync'd at checkpoint time.
>   

Today's data suggests this problem has been resolved in the latest 
kernels.  I saw the "giant flush/series of small flushes" pattern quite 
easily on the CentOS5 system I last did heavy pgbench testing on.  The 
one I'm testing now has kernel 2.6.23 (Ubuntu 10.04), and it doesn't 
show it at all.

Here's what a bad checkpoint looks like on this system:

LOG:  checkpoint starting: xlog
DEBUG:  checkpoint sync: number=1 file=base/24746/36596.8 time=7651.601 msec
DEBUG:  checkpoint sync: number=2 file=base/24746/36506 time=0.001 msec
DEBUG:  checkpoint sync: number=3 file=base/24746/36596.2 time=1891.695 msec
DEBUG:  checkpoint sync: number=4 file=base/24746/36596.4 time=7431.441 msec
DEBUG:  checkpoint sync: number=5 file=base/24746/36515 time=0.216 msec
DEBUG:  checkpoint sync: number=6 file=base/24746/36596.9 time=4422.892 msec
DEBUG:  checkpoint sync: number=7 file=base/24746/36596.12 time=954.242 msec
DEBUG:  checkpoint sync: number=8 file=base/24746/36237_fsm time=0.002 msec
DEBUG:  checkpoint sync: number=9 file=base/24746/36503 time=0.001 msec
DEBUG:  checkpoint sync: number=10 file=base/24746/36584 time=41.401 msec
DEBUG:  checkpoint sync: number=11 file=base/24746/36596.7 time=885.921 msec
DEBUG:  checkpoint sync: number=12 file=base/24813/30774 time=0.002 msec
DEBUG:  checkpoint sync: number=13 file=base/24813/24822 time=0.005 msec
DEBUG:  checkpoint sync: number=14 file=base/24746/36801 time=49.801 msec
DEBUG:  checkpoint sync: number=15 file=base/24746/36601.2 time=610.996 msec
DEBUG:  checkpoint sync: number=16 file=base/24746/36596 time=16154.361 msec
DEBUG:  checkpoint sync: number=17 file=base/24746/36503_vm time=0.001 msec
DEBUG:  checkpoint sync: number=18 file=base/24746/36508 time=0.000 msec
DEBUG:  checkpoint sync: number=19 file=base/24746/36596.10 
time=9759.898 msec
DEBUG:  checkpoint sync: number=20 file=base/24746/36596.3 time=3392.727 
msec
DEBUG:  checkpoint sync: number=21 file=base/24746/36237 time=0.150 msec
DEBUG:  checkpoint sync: number=22 file=base/24746/36596.11 
time=9153.437 msec
DEBUG:  could not forward fsync request because request queue is full
CONTEXT:  writing block 1057833 of relation base/24746/36596

[>800 more of these]

DEBUG:  checkpoint sync: number=23 file=base/24746/36601.1 
time=48697.179 msec
DEBUG:  could not forward fsync request because request queue is full
DEBUG:  checkpoint sync: number=24 file=base/24746/36597 time=0.080 msec
DEBUG:  checkpoint sync: number=25 file=base/24746/36237_vm time=0.001 msec
DEBUG:  checkpoint sync: number=26 file=base/24813/24822_fsm time=0.001 msec
DEBUG:  checkpoint sync: number=27 file=base/24746/36503_fsm time=0.000 msec
DEBUG:  could not forward fsync request because request queue is full
CONTEXT:  writing block 20619 of relation base/24746/36601
DEBUG:  checkpoint sync: number=28 file=base/24746/36506_fsm time=0.000 msec
DEBUG:  checkpoint sync: number=29 file=base/24746/36596_vm time=0.040 msec
DEBUG:  could not forward fsync request because request queue is full
CONTEXT:  writing block 278967 of relation base/24746/36596
DEBUG:  could not forward fsync request because request queue is full
CONTEXT:  writing block 1582400 of relation base/24746/36596
DEBUG:  checkpoint sync: number=30 file=base/24746/36596.6 time=0.002 msec
DEBUG:  checkpoint sync: number=31 file=base/24813/11647 time=0.004 msec
DEBUG:  checkpoint sync: number=32 file=base/24746/36601 time=201.632 msec
DEBUG:  checkpoint sync: number=33 file=base/24746/36801_fsm time=0.001 msec
DEBUG:  checkpoint sync: number=34 file=base/24746/36596.5 time=0.001 msec
DEBUG:  checkpoint sync: number=35 file=base/24746/36599 time=0.000 msec
DEBUG:  checkpoint sync: number=36 file=base/24746/36587 time=0.005 msec
DEBUG:  checkpoint sync: number=37 file=base/24746/36596_fsm time=0.001 msec
DEBUG:  checkpoint sync: number=38 file=base/24746/36596.1 time=0.001 msec
DEBUG:  checkpoint sync: number=39 file=base/24746/36801_vm time=0.001 msec
LOG:  checkpoint complete: wrote 9515 buffers (29.0%); 0 transaction log 
file(s) added, 0 removed, 64 recycled; write=32.409 s, sync=111.615 s, 
total=144.052 s; sync files=39, longest=48.697 s, average=2.853 s

Here the file that's been brutally delayed via backend contention is 
#23, after already seeing quite long delays on the earlier ones.  That 
I've never seen with earlier kernels running ext3.

This is good in that it makes it more likely a spread sync approach that 
works on XFS will also work on these newer kernels with ext4.  Then the 
only group we wouldn't be able to help if that works the ext3 + old 
kernel crowd.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

pgsql-hackers by date:

From: Dimitri Fontaine
Date: 18 January 2011, 10:03:53
Subject: Re: REVIEW: Extensions support for pg_dump

From: Magnus Hagander
Date: 18 January 2011, 10:21:18
Subject: Re: pg_basebackup for streaming base backups

Re: Spread checkpoint sync - Mailing list pgsql-hackers

Previous

Next