Re: Spread checkpoint sync - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Spread checkpoint sync
Date
Msg-id 4D472A9E.2090901@2ndquadrant.com
Whole thread Raw
In response to Re: Spread checkpoint sync  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Spread checkpoint sync  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-hackers
Tom Lane wrote: <blockquote cite="mid:18450.1296493308@sss.pgh.pa.us" type="cite"><pre wrap="">Robert Haas <a
class="moz-txt-link-rfc2396E"href="mailto:robertmhaas@gmail.com"><robertmhaas@gmail.com></a> writes:
</pre><blockquotetype="cite"><pre wrap="">3. Pause for 3 seconds after every fsync.   </pre></blockquote><pre wrap="">
</pre><blockquotetype="cite"><pre wrap="">I think something along the lines of #3 is probably a good idea,
</pre></blockquote><prewrap="">
 
Really?  Any particular delay is guaranteed wrong.
 </pre></blockquote><br /> '3 seconds' is just a placeholder for whatever comes out of a "total time scheduled to sync
/relations to sync" computation.  (Still doing all my thinking in terms of time, altough I recognize a showdown with
segment-basedcheckpoints is coming too)<br /><br /> I think the right way to compute "relations to sync" is to finish
thesorted writes patch I sent over a not quite right yet update to already, which is my next thing to work on here.  I
remainpessimistic that any attempt to issue fsync calls without the maximum possible delay after asking kernel to write
thingsout first will work out well.  My recent tests with low values of dirty_bytes on Linux just reinforces how bad
thatcan turn out.  In addition to computing the relation count while sorting them, placing writes in-order by relation
andthen doing all writes followed by all syncs should place the database right in the middle of the throughput/latency
trade-offhere.  It will have had the maximum amount of time we can give it to sort and flush writes for any given
relationbefore it is asked to sync it.  I don't want to try and be any smarter than that without trying to be a *lot*
smarter--timingindividual sync calls, feedback loops on time estimation, etc.<br /><br /> At this point I have to agree
withRobert's observation that splitting checkpoints into checkpoint_write_target and checkpoint_sync_target is the only
reasonablething left that might be possible complete in a short period.  So that's how this can compute the total time
numeratorhere.<br /><br /> The main thing I will warn about in relations to discussion today is the danger of true
dead-lineoriented scheduling in this area.  The checkpoint process may discover the sync phase is falling behind
expectationsbecause the individual sync calls are taking longer than expected.  If that happens, aiming for the "finish
ontarget anyway" goal puts you right back to a guaranteed nasty write spike again.  I think many people would prefer
loggingthe overrun as tuning feedback for the DBA rather than to accelerate, which is likely to make the problem even
worseif the checkpoint is falling behind.  But since ultimately the feedback for this will be "make the checkpoints
longeror increase checkpoint_sync_target", sync acceleration to meet the deadline isn't unacceptable; DBA can try both
ofthose themselves if seeing spikes.<br /><br /><pre class="moz-signature" cols="72">-- 
 
Greg Smith   2ndQuadrant US    <a class="moz-txt-link-abbreviated"
href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>  Baltimore, MD
 
PostgreSQL Training, Services, and 24x7 Support  <a class="moz-txt-link-abbreviated"
href="http://www.2ndQuadrant.us">www.2ndQuadrant.us</a>
"PostgreSQL 9.0 High Performance": <a class="moz-txt-link-freetext"
href="http://www.2ndQuadrant.com/books">http://www.2ndQuadrant.com/books</a>
</pre>

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Error code for "terminating connection due to conflict with recovery"
Next
From: Heikki Linnakangas
Date:
Subject: Re: SSI patch version 15