Re: Spread checkpoint sync - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Spread checkpoint sync
Date
Msg-id 4D46CCAB.8070601@enterprisedb.com
Whole thread Raw
In response to Re: Spread checkpoint sync  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Spread checkpoint sync  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 31.01.2011 16:44, Robert Haas wrote:
> On Mon, Jan 31, 2011 at 3:04 AM, Itagaki Takahiro
> <itagaki.takahiro@gmail.com>  wrote:
>> On Mon, Jan 31, 2011 at 13:41, Robert Haas<robertmhaas@gmail.com>  wrote:
>>> 1. Absorb fsync requests a lot more often during the sync phase.
>>> 2. Still try to run the cleaning scan during the sync phase.
>>> 3. Pause for 3 seconds after every fsync.
>>>
>>> So if we want the checkpoint
>>> to finish in, say, 20 minutes, we can't know whether the write phase
>>> needs to be finished by minute 10 or 15 or 16 or 19 or only by 19:59.
>>
>> We probably need deadline-based scheduling, that is being used in write()
>> phase. If we want to sync 100 files in 20 minutes, each file should be
>> sync'ed in 12 seconds if we think each fsync takes the same time.
>> If we would have better estimation algorithm (file size? dirty ratio?),
>> each fsync chould have some weight factor.  But deadline-based scheduling
>> is still needed then.
>
> Right.  I think the problem is balancing the write and sync phases.
> For example, if your operating system is very aggressively writing out
> dirty pages to disk, then you want the write phase to be as long as
> possible and the sync phase can be very short because there won't be
> much work to do.  But if your operating system is caching lots of
> stuff in memory and writing dirty pages out to disk only when
> absolutely necessary, then the write phase could be relatively quick
> without much hurting anything, but the sync phase will need to be long
> to keep from crushing the I/O system.  The trouble is, we don't really
> have a priori way to know which it's doing.  Maybe we could try to
> tune based on the behavior of previous checkpoints, ...

IMHO we should re-consider the patch to sort the writes. Not so much 
because of the performance gain that gives, but because we can then 
re-arrange the fsyncs so that you write one file, then fsync it, then 
write the next file and so on. That way we the time taken by the fsyncs 
is distributed between the writes, so we don't need to accurately 
estimate how long each will take. If one fsync takes a long time, the 
writes that follow will just be done a bit faster to catch up.

> ... but I'm wondering
> if we oughtn't to take the cheesy path first and split
> checkpoint_completion_target into checkpoint_write_target and
> checkpoint_sync_target.  That's another parameter to set, but I'd
> rather add a parameter that people have to play with to find the right
> value than impose an arbitrary rule that creates unavoidable bad
> performance in certain environments.

That is of course simpler..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: FPI
Next
From: "Kevin Grittner"
Date:
Subject: Re: Error code for "terminating connection due to conflict with recovery"