Re: Spread checkpoint sync - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Spread checkpoint sync
Date
Msg-id 4CFB29A3.1060002@2ndquadrant.com
Whole thread Raw
In response to Re: Spread checkpoint sync  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Greg Stark wrote:
> Using sync_file_range you can specify the set of blocks to sync and
> then block on them only after some time has passed. But there's no
> documentation on how this relates to the I/O scheduler so it's not
> clear it would have any effect on the problem. 

I believe this is the exact spot we're stalled at in regards to getting 
this improved on the Linux side, as I understand it at least.  *The* 
answer for this class of problem on Linux is to use sync_file_range, and 
I don't think we'll ever get any sympathy from those kernel developers 
until we do.  But that's a Linux specific call, so doing that is going 
to add a write path fork with platform-specific code into the database.  
If I thought sync_file_range was a silver bullet guaranteed to make this 
better, maybe I'd go for that.  I think there's some relatively 
low-hanging fruit on the database side that would do better before going 
to that extreme though, thus the patch.

> We might still have to delay the begining of the sync to allow the dirty blocks to be synced
> naturally and then when we issue it still end up catching a lot of
> other i/o as well.
>   

Whether it's "lots" or not is really workload dependent.  I work from 
the assumption that the blocks being written out by the checkpoint are 
the most popular ones in the database, the ones that accumulate a high 
usage count and stay there.  If that's true, my guess is that the writes 
being done while the checkpoint is executing are a bit less likely to be 
touching the same files.  You raise a valid concern, I just haven't seen 
that actually happen in practice yet.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us




pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Next
From: Stefan Kaltenbrunner
Date:
Subject: Re: profiling connection overhead