Re: Experimental patch for inter-page delay in VACUUM - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: Experimental patch for inter-page delay in VACUUM
Date
Msg-id 3FAF96BC.9050506@Yahoo.com
Whole thread Raw
In response to Re: Experimental patch for inter-page delay in VACUUM  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Experimental patch for inter-page delay in VACUUM  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Bruce Momjian wrote:
> I would be interested to know if you have the background write process
> writing old dirty buffers to kernel buffers continually if the sync()
> load is diminished.  What this does is to push more dirty buffers into
> the kernel cache in hopes the OS will write those buffers on its own
> before the checkpoint does its write/sync work.  This might allow us to
> reduce sync() load while preventing the need for O_SYNC/fsync().

I tried that first. Linux 2.4 does not, as long as you don't tell it by 
reducing the dirty data block aging time with update(8). So you have to 
force it to utilize the write bandwidth in the meantime. For that you 
have to call sync() or fsync() on something.

Maybe O_SYNC is not as bad an option as it seems. In my patch, the 
checkpointer flushes the buffers in LRU order, meaning it flushes the 
least recently used ones first. This has the side effect that buffers 
returned for replacement (on a cache miss, when the backend needs to 
read the block) are most likely to be flushed/clean. So it reduces the 
write load of backends and thus the probability that a backend is ever 
blocked waiting on an O_SYNC'd write().

I will add some counters and gather some statistics how often the 
backend in comparision to the checkpointer calls write().

> 
> Perhaps sync() is bad partly because the checkpoint runs through all the
> dirty shared buffers and writes them all to the kernel and then issues
> sync() almost guaranteeing a flood of writes to the disk.  This method
> would find fewer dirty buffers in the shared buffer cache, and therefore
> fewer kernel writes needed by sync().

I don't understand this? How would what method reduce the number of page 
buffers the backends modify?


Jan

> 
> ---------------------------------------------------------------------------
> 
> Jan Wieck wrote:
>> Tom Lane wrote:
>> 
>> > Jan Wieck <JanWieck@Yahoo.com> writes:
>> > 
>> >> How I can see the background writer operating is that he's keeping the 
>> >> buffers in the order of the LRU chain(s) clean, because those are the 
>> >> buffers that most likely get replaced soon. In my experimental ARC code 
>> >> it would traverse the T1 and T2 queues from LRU to MRU, write out n1 and 
>> >> n2 dirty buffers (n1+n2 configurable), then fsync all files that have 
>> >> been involved in that, nap depending on where he got down the queues (to 
>> >> increase the write rate when running low on clean buffers), and do it 
>> >> all over again.
>> > 
>> > You probably need one more knob here: how often to issue the fsyncs.
>> > I'm not convinced "once per outer loop" is a sufficient answer.
>> > Otherwise this is sounding pretty good.
>> 
>> This is definitely heading into the right direction.
>> 
>> I currently have a crude and ugly hacked system, that does checkpoints 
>> every minute but streches them out over the whole time. It writes out 
>> the dirty buffers in T1+T2 LRU order intermixed, streches out the flush 
>> over the whole checkpoint interval and does sync()+usleep() every 32 
>> blocks (if it has time to do this).
>> 
>> This is clearly the wrong way to implement it, but ...
>> 
>> The same system has ARC and delayed vacuum. With normal, unmodified 
>> checkpoints every 300 seconds, the transaction responsetime for 
>> new_order still peaks at over 30 seconds (5 is already too much) so the 
>> system basically come to a freeze during a checkpoint.
>> 
>> Now with this high-frequent sync()ing and checkpointing by the minute, 
>> the entire system load levels out really nice. Basically it's constantly 
>> checkpointing. So maybe the thing we're looking for is to make the 
>> checkpoint process the background buffer writer process and let it 
>> checkpoint 'round the clock. Of course, with a bit more selectivity on 
>> what to fsync and not doing system wide sync() every 10-500 milliseconds :-)
>> 
>> 
>> Jan
>> 
>> -- 
>> #======================================================================#
>> # It's easier to get forgiveness for being wrong than for being right. #
>> # Let's break this rule - forgive me.                                  #
>> #================================================== JanWieck@Yahoo.com #
>> 
>> 
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>> 
>>                http://www.postgresql.org/docs/faqs/FAQ.html
>> 
> 


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



pgsql-hackers by date:

Previous
From: "Jaime Casanova"
Date:
Subject: Re: [GENERAL] [ADMIN] retrieve statement from catalogs
Next
From: Jan Wieck
Date:
Subject: Re: Experimental patch for inter-page delay in VACUUM