Re: Experimental patch for inter-page delay in VACUUM - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Experimental patch for inter-page delay in VACUUM
Date
Msg-id 3FA7CAE6.1040402@dunslane.net
Whole thread Raw
In response to Re: Experimental patch for inter-page delay in VACUUM  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Experimental patch for inter-page delay in VACUUM
List pgsql-hackers
Tom Lane wrote:

>Jan Wieck <JanWieck@Yahoo.com> writes:
>  
>
>>What still needs to be addressed is the IO storm cause by checkpoints. I 
>>see it much relaxed when stretching out the BufferSync() over most of 
>>the time until the next one should occur. But the kernel sync at it's 
>>end still pushes the system hard against the wall.
>>    
>>
>
>I have never been happy with the fact that we use sync(2) at all.  Quite
>aside from the "I/O storm" issue, sync() is really an unsafe way to do a
>checkpoint, because there is no way to be certain when it is done.  And
>on top of that, it does too much, because it forces syncing of files
>unrelated to Postgres.
>
>I would like to see us go over to fsync, or some other technique that
>gives more certainty about when the write has occurred.  There might be
>some scope that way to allow stretching out the I/O, too.
>
>The main problem with this is knowing which files need to be fsync'd.
>The only idea I have come up with is to move all buffer write operations
>into a background writer process, which could easily keep track of
>every file it's written into since the last checkpoint.  This could cause
>problems though if a backend wants to acquire a free buffer and there's
>none to be had --- do we want it to wait for the background process to
>do something?  We could possibly say that backends may write dirty
>buffers for themselves, but only if they fsync them immediately.  As
>long as this path is seldom taken, the extra fsyncs shouldn't be a big
>performance problem.
>
>Actually, once you build it this way, you could make all writes
>synchronous (open the files O_SYNC) so that there is never any need for
>explicit fsync at checkpoint time.  The background writer process would
>be the one incurring the wait in most cases, and that's just fine.  In
>this way you could directly control the rate at which writes are issued,
>and there's no I/O storm at all.  (fsync could still cause an I/O storm
>if there's lots of pending writes in a single file.)
>
>  
>
Or maybe fdatasync() would be slightly more efficient - do we care about 
flushing metadata that much?

cheers

andrew




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Experimental patch for inter-page delay in VACUUM
Next
From: Tom Lane
Date:
Subject: Re: Experimental patch for inter-page delay in VACUUM