Re: Experimental patch for inter-page delay in VACUUM - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: Experimental patch for inter-page delay in VACUUM
Date
Msg-id 3FA7D39A.8060502@Yahoo.com
Whole thread Raw
In response to Re: Experimental patch for inter-page delay in VACUUM  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Experimental patch for inter-page delay in VACUUM  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:
>> Tom Lane wrote:
>>> I have never been happy with the fact that we use sync(2) at all.
> 
>> Sure does it do too much. But together with the other layer of 
>> indirection, the virtual file descriptor pool, what is the exact 
>> guaranteed behaviour of
>>      write(); close(); open(); fsync();
>> cross platform?
> 
> That isn't guaranteed, which is why we have to use sync() at the
> moment.  To go over to fsync or O_SYNC we'd need more control over which
> file descriptors are used to issue writes.  Which is why I was thinking
> about moving the writes to a centralized writer process.
> 
>>> Actually, once you build it this way, you could make all writes
>>> synchronous (open the files O_SYNC) so that there is never any need for
>>> explicit fsync at checkpoint time.
> 
>> Yes, but then the configuration leans more towards "take over the RAM" 
> 
> Why?  The idea is to try to issue writes at a fairly steady rate, which
> strikes me as much better than the current behavior.  I don't see why it
> would force you to have large numbers of buffers available.  You'd want
> a few thousand, no doubt, but that's not a large number.

That is part of the idea. The whole idea is to issue "physical" writes 
at a fairly steady rate without increasing the number of them 
substantial or interfering with the drives opinion about their order too 
much. I think O_SYNC for random access can be in conflict with write 
reordering.

How I can see the background writer operating is that he's keeping the 
buffers in the order of the LRU chain(s) clean, because those are the 
buffers that most likely get replaced soon. In my experimental ARC code 
it would traverse the T1 and T2 queues from LRU to MRU, write out n1 and 
n2 dirty buffers (n1+n2 configurable), then fsync all files that have 
been involved in that, nap depending on where he got down the queues (to 
increase the write rate when running low on clean buffers), and do it 
all over again.

That way, everyone else doing a write must issue an fsync too because 
it's not guaranteed that the fsync of one process flushes the writes of 
another. But as you said, if that is a relatively seldom operation for a 
regular backend, it won't hurt.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Experimental patch for inter-page delay in VACUUM
Next
From: Andrew Dunstan
Date:
Subject: Re: [PATCHES] equal() perf tweak