Re: Spread checkpoint sync - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Spread checkpoint sync
Date
Msg-id 4D482B6F.9000302@2ndquadrant.com
Whole thread Raw
In response to Re: Spread checkpoint sync  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: Spread checkpoint sync
Re: Spread checkpoint sync
List pgsql-hackers
Greg Smith wrote:
I think the right way to compute "relations to sync" is to finish the sorted writes patch I sent over a not quite right yet update to already

Attached update now makes much more sense than the misguided patch I submitted two weesk ago.  This takes the original sorted write code, first adjusting it so it only allocates the memory its tag structure is stored in once (in a kind of lazy way I can improve on right now).  It then computes a bunch of derived statistics from a single walk of the sorted data on each pass through.  Here's an example of what comes out:

DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11809.0_0
DEBUG:  BufferSync 2 dirty blocks in relation.segment_fork 11811.0_0
DEBUG:  BufferSync 3 dirty blocks in relation.segment_fork 11812.0_0
DEBUG:  BufferSync 3 dirty blocks in relation.segment_fork 16496.0_0
DEBUG:  BufferSync 28 dirty blocks in relation.segment_fork 16499.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11638.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11640.0_0
DEBUG:  BufferSync 2 dirty blocks in relation.segment_fork 11641.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11642.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11644.0_0
DEBUG:  BufferSync 2048 dirty blocks in relation.segment_fork 16508.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11645.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11661.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11663.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11664.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11672.0_0
DEBUG:  BufferSync 1 dirty blocks in relation.segment_fork 11685.0_0
DEBUG:  BufferSync 2097 buffers to write, 17 total dirty segment file(s) expected to need sync

This is the first checkpoint after starting to populate a new pgbench database.  The next four show it extending into new segments:

DEBUG:  BufferSync 2048 dirty blocks in relation.segment_fork 16508.1_0
DEBUG:  BufferSync 2048 buffers to write, 1 total dirty segment file(s) expected to need sync

DEBUG:  BufferSync 2048 dirty blocks in relation.segment_fork 16508.2_0
DEBUG:  BufferSync 2048 buffers to write, 1 total dirty segment file(s) expected to need sync

DEBUG:  BufferSync 2048 dirty blocks in relation.segment_fork 16508.3_0
DEBUG:  BufferSync 2048 buffers to write, 1 total dirty segment file(s) expected to need sync

DEBUG:  BufferSync 2048 dirty blocks in relation.segment_fork 16508.4_0
DEBUG:  BufferSync 2048 buffers to write, 1 total dirty segment file(s) expected to need sync

The fact that it's always showing 2048 dirty blocks on these makes me think I'm computing something wrong still, but the general idea here is working now.  I had to use some magic from the md layer to let bufmgr.c know how its writes were going to get mapped into file segments and correspondingly fsync calls later.  Not happy about breaking the API encapsulation there, but don't see an easy way to compute that data at the per-segment level--and it's not like that's going to change in the near future anyway.

I like this approach for a providing a map of how to spread syncs out for a couple of reasons:

-It computes data that could be used to drive sync spread timing in a relatively short amount of simple code.

-You get write sorting at the database level helping out the OS.  Everything I've been seeing recently on benchmarks says Linux at least needs all the help it can get in that regard, even if block order doesn't necessarily align perfectly with disk order.

-It's obvious how to take this same data and build a future model where the time allocated for fsyncs was proportional to how much that particular relation was touched.

Benchmarks of just the impact of the sorting step and continued bug swatting to follow.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [pgsql-general 2011-1-21:] Are there any projects interested in object functionality? (+ rule bases)
Next
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Date:
Subject: Re: [NOVICE] systable_getnext_ordered