Re: Background LRU Writer/free list - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Background LRU Writer/free list
Date
Msg-id Pine.GSO.4.64.0704182304290.7075@westnet.com
Whole thread Raw
In response to Re: Background LRU Writer/free list  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
On Wed, 18 Apr 2007, Gregory Stark wrote:

> In particular I'm worried about what happens on a very busy cpu-bound 
> system where adjusting the sleep times would result in it deciding to 
> not sleep at all. On such a system sleeping for even 10ms might be too 
> long... Anyways, if we have a working patch that works the other way 
> around we could experiment with that and see if there are actual 
> situations where sleeping for 0ms is necessary.

I've been waiting for 8.3 to settle down before packaging the prototype 
auto-tuning background writer concept I'm working on (you can peek at the 
code at http://www.westnet.com/~gsmith/content/postgresql/bufmgr.c ), 
which already implements some of the ideas you're talking about in your 
messages today.  I estimate how much of the buffer pool is dirty, use that 
to compute an expected I/O rate, and try to adjust parameters to meet a 
quality of service guarantee for how often the entire buffer pool is 
scanned.  This is one of those problems that gets more difficult the more 
you dig into it; with all that done I still feel like I'm only halfway 
finished and several parts worked radically different in reality than I 
expected them to.

If you're allowing the background writer to write 1000 pages at a clip, 
that's 8MB each interval.  Doing that every 200ms makes for an I/O rate of 
40MB/s.  In a system that cares about data integrity, you'll exceed the 
ability of the WAL to sustain page writes (which limits how fast you can 
dirty pages) long before the interval approaches 0ms.  What I do in my 
code is set the interval to 200ms, compute what the maximum pages to write 
must be, and if it's >1000 then I reduce the interval.  I've tested 
dumping into a fairly fast disk array with tons of cache and I've never 
been able to get useful throughput below an 80ms interval; the OS just 
clamps down and makes you wait for I/O instead regardless of how little 
you intended to sleep.  Eventually, it's got to hit disk, and you can only 
buffer for so long before that starts to slow you down.

Anyway, this is a tangent discussion.  The LRU patch that's in the queue 
doesn't really care if it runs with a short interval or a long one, 
because it automatically scales how much work it does according to how 
much time passed.  I think that many only be a bit of tweaking away from a 
solid solution.  Tuning the all scan, which is what you're talking about 
when you speak in terms of the statistics about the overall buffer pool, 
is a much harder job.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: Background LRU Writer/free list
Next
From: ITAGAKI Takahiro
Date:
Subject: Re: Remaining VACUUM patches