Re: Just-in-time Background Writer Patch+Test Results - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: Just-in-time Background Writer Patch+Test Results |
Date | |
Msg-id | 46E052ED.EE98.0025.0@wicourts.gov Whole thread Raw |
In response to | Re: Just-in-time Background Writer Patch+Test Results (Greg Smith <gsmith@gregsmith.com>) |
Responses |
Re: Just-in-time Background Writer Patch+Test Results
Re: Just-in-time Background Writer Patch+Test Results |
List | pgsql-hackers |
>>> On Thu, Sep 6, 2007 at 11:27 AM, in message <Pine.GSO.4.64.0709061121020.14491@westnet.com>, Greg Smith <gsmith@gregsmith.com> wrote: > On Thu, 6 Sep 2007, Kevin Grittner wrote: > > I have been staring carefully at your configuration recently, and I would > wager that you could turn off the LRU writer altogether and still meet > your requirements in 8.2. I totally agree that it is of minor benefit compared to the all-writer, if it even matters at all. I knew that when I chose the settings. > Here's what you've got right now: > >> shared_buffers = 160MB (=20000 buffers) >> bgwriter_lru_percent = 20.0 >> bgwriter_lru_maxpages = 200 >> bgwriter_all_percent = 10.0 >> bgwriter_all_maxpages = 600 > > With the default delay of 200ms, this has the LRU-writer scanning the > whole pool every 1 second, Whoa! Apparently I've totally misread the documentation. I thought that the bgwriter_lru_percent was scanned from the lru end each time; I would not expect that it would ever get beyond the oldest 10%. I put that in just as a guard to keep the backends from having to wait for the OS write. I've always doubted whether it was helping, but "it wasn't broke".... > while the all-writer scans every two > seconds--assuming they don't hit the write limits. If some event were to > dirty the whole pool in 200ms, it might take as much as 6.7 seconds to > write everything out (20000 / 600 * 200 ms) via the all-scan. Right. Since the file system didn't seem to be able to accept writes faster than 800 PostgreSQL pages per second, and I wanted to leave a LITTLE slack, I set that limit. We don't seem to hit it, as far as I can tell. In fact, the output rate would be naturally fairly smooth, if not for the "hold all dirty pages until the last possible moment, then write them all to the OS and fsync" approach. > There's a second low-level issue involved here. When a page becomes > dirty, that implies it was also recently used, which means the LRU writer > won't touch it. That page can't be written out by the LRU writer until an > entire pass has been made over the shared_buffer pool while looking for > buffers to allocate for new activity. When the allocation clock-sweep > passes over the newly dirtied buffer again, its usage count will drop by > one and it will no longer be considered recently used. At that point the > LRU writer can write it out. How low does the count have to go, or does it track the count when it becomes dirty and look for a decrease? > So unless there is other allocation activity > going on, the scan_whole_pool_seconds mechanism will never provide the > bound on time to scan and write everything you hope it will. That may not be an issue for the environment where this has been a problem for us -- the web hits are coming in at a pretty good rate 24/7. (We have a couple dozen large companies scanning data through HTTP SOAP requests all the time.) This should keep us reading new pages, which covers this, yes? > where the buffer cache was > filled with mostly dirty buffers that couldn't be re-used That would be the condition that would be the killer with a synchronous checkpoint if the OS cache has already had some dirty pages trickled out. If we can hit this condition in our web database, either the load distributed checkpoint will save us, or we can't use 8.3. Period. > The completely understandable line of thinking that led to your request > here is one of my concerns with exposing scan_whole_pool_seconds as a > tunable. It may suggest to people that if they set the number very low, > it will assure all dirty buffers will be scanned and written within that > time bound. That's certainly not the case; both the maxpages and the > usage count information will actually drive the speed that mechanism plods > through the buffer cache. It really isn't useful for scanning fast. I'm not clear on the benefit of not writing the recently accessed dirty pages when there are no less recently used dirty pages. I do trust the OS to not write them before they age out in that cache, and the OS cache doesn't start writing dirty pages from its cache until they reach a certain percentage of the cache space, so I'd just as soon let the OS know that the MRU dirty pages are there, so it knows that it's time to start working on the LRU pages in its cache. -Kevin
pgsql-hackers by date: