Home > mailing lists

Re: Experimental patch for inter-page delay in VACUUM - Mailing list pgsql-hackers

From	Jan Wieck
Subject	Re: Experimental patch for inter-page delay in VACUUM
Date	November 4, 2003 17:32:29
Msg-id	3FA7D39A.8060502@Yahoo.com Whole thread Raw
In response to	Re: Experimental patch for inter-page delay in VACUUM (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Experimental patch for inter-page delay in VACUUM
List	pgsql-hackers

Tree view

Tom Lane wrote:

> Jan Wieck <JanWieck@Yahoo.com> writes:
>> Tom Lane wrote:
>>> I have never been happy with the fact that we use sync(2) at all.
> 
>> Sure does it do too much. But together with the other layer of 
>> indirection, the virtual file descriptor pool, what is the exact 
>> guaranteed behaviour of
>>      write(); close(); open(); fsync();
>> cross platform?
> 
> That isn't guaranteed, which is why we have to use sync() at the
> moment.  To go over to fsync or O_SYNC we'd need more control over which
> file descriptors are used to issue writes.  Which is why I was thinking
> about moving the writes to a centralized writer process.
> 
>>> Actually, once you build it this way, you could make all writes
>>> synchronous (open the files O_SYNC) so that there is never any need for
>>> explicit fsync at checkpoint time.
> 
>> Yes, but then the configuration leans more towards "take over the RAM" 
> 
> Why?  The idea is to try to issue writes at a fairly steady rate, which
> strikes me as much better than the current behavior.  I don't see why it
> would force you to have large numbers of buffers available.  You'd want
> a few thousand, no doubt, but that's not a large number.

That is part of the idea. The whole idea is to issue "physical" writes 
at a fairly steady rate without increasing the number of them 
substantial or interfering with the drives opinion about their order too 
much. I think O_SYNC for random access can be in conflict with write 
reordering.

How I can see the background writer operating is that he's keeping the 
buffers in the order of the LRU chain(s) clean, because those are the 
buffers that most likely get replaced soon. In my experimental ARC code 
it would traverse the T1 and T2 queues from LRU to MRU, write out n1 and 
n2 dirty buffers (n1+n2 configurable), then fsync all files that have 
been involved in that, nap depending on where he got down the queues (to 
increase the write rate when running low on clean buffers), and do it 
all over again.

That way, everyone else doing a write must issue an fsync too because 
it's not guaranteed that the fsync of one process flushes the writes of 
another. But as you said, if that is a relatively seldom operation for a 
regular backend, it won't hurt.

Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

pgsql-hackers by date:

From: Greg Stark
Date: 04 November 2003, 17:29:35
Subject: Re: Experimental patch for inter-page delay in VACUUM

From: Andrew Dunstan
Date: 04 November 2003, 17:52:49
Subject: Re: [PATCHES] equal() perf tweak

Re: Experimental patch for inter-page delay in VACUUM - Mailing list pgsql-hackers

Previous

Next