Thread: Background LRU Writer/free list
I'm mostly done with my review of the "Automatic adjustment of bgwriter_lru_maxpages" patch. In addition to issues already brought up with that code, there are some small things that need to be done to merge it with the recent pg_stat_bgwriter patch, and I have some concerns about its unbounded scanning of the buffer pool; I'll write that up in more detail or just submit an improved patch as I get time this week. But there's a fundamental question that has been bugging me, and I think it impacts the direction that code should take. Unless I'm missing something in my reading, buffers written out by the LRU writer aren't ever put onto the free list. I assume this is to stop from prematurely removing buffers that contain useful data. In cases where a substantial percentage of the buffer cache is dirty, the LRU writer has to scan a significant portion of the pool looking for one of the rare clean buffers, then write it out. When a client goes to grab a free buffer afterward, it has to scan the same section of the pool to find the now clean buffer, which seems redundant. With the new patch, the LRU writer is fairly well bounded in that it doesn't write out more than it thinks it will need; you shouldn't get into a situation where many more pages are written than will be used in the near future. Given that mindset, shouldn't pages the LRU scan writes just get moved onto the free list? -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wed, Apr 18, 2007 at 09:09:11AM -0400, Greg Smith wrote: > I'm mostly done with my review of the "Automatic adjustment of > bgwriter_lru_maxpages" patch. In addition to issues already brought up > with that code, there are some small things that need to be done to merge > it with the recent pg_stat_bgwriter patch, and I have some concerns about > its unbounded scanning of the buffer pool; I'll write that up in more > detail or just submit an improved patch as I get time this week. > > But there's a fundamental question that has been bugging me, and I think > it impacts the direction that code should take. Unless I'm missing > something in my reading, buffers written out by the LRU writer aren't ever > put onto the free list. I assume this is to stop from prematurely > removing buffers that contain useful data. In cases where a substantial > percentage of the buffer cache is dirty, the LRU writer has to scan a > significant portion of the pool looking for one of the rare clean buffers, > then write it out. When a client goes to grab a free buffer afterward, it > has to scan the same section of the pool to find the now clean buffer, > which seems redundant. > > With the new patch, the LRU writer is fairly well bounded in that it > doesn't write out more than it thinks it will need; you shouldn't get into > a situation where many more pages are written than will be used in the > near future. Given that mindset, shouldn't pages the LRU scan writes just > get moved onto the free list? I've wondered the same thing myself. If we're worried about freeing pages that we might want back, we could change the code so that ReadBuffer would also look at the free list if it couldn't find a page before going to the OS for it. So if you make this change will BgBufferSync start incrementing StrategyControl->nextVictimBuffer and decrementing buf->usage_count like StrategyGetBuffer does now? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Greg Smith" <gsmith@gregsmith.com> writes: > I'm mostly done with my review of the "Automatic adjustment of > bgwriter_lru_maxpages" patch. In addition to issues already brought up with > that code, there are some small things that need to be done to merge it with > the recent pg_stat_bgwriter patch, and I have some concerns about its unbounded > scanning of the buffer pool; I'll write that up in more detail or just submit > an improved patch as I get time this week. I had a thought on this. Instead of sleeping for a constant amount of time and then estimating the number of pages needed for that constant amount of time perhaps what bgwriter should be doing is sleeping for a variable amount of time and estimating the length of time it needs to sleep to arrive at a constant number of pages being needed. The reason I think this may be better is that "what percentage of the shared buffers the bgwriter allows to get old between wakeups" seems more likely to be a universal constant that people won't have to adjust than "fixed time interval between bgwriter cleanup operations". Just a thought. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Greg Smith <gsmith@gregsmith.com> writes: > With the new patch, the LRU writer is fairly well bounded in that it > doesn't write out more than it thinks it will need; you shouldn't get into > a situation where many more pages are written than will be used in the > near future. Given that mindset, shouldn't pages the LRU scan writes just > get moved onto the free list? This just seems like a really bad idea: throwing away data we might want. Furthermore, if the page was dirty, then it's probably been accessed more recently than adjacent pages that are clean, so preferentially zapping just-written pages seems backwards. regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes: > I had a thought on this. Instead of sleeping for a constant amount of time and > then estimating the number of pages needed for that constant amount of time > perhaps what bgwriter should be doing is sleeping for a variable amount of > time and estimating the length of time it needs to sleep to arrive at a > constant number of pages being needed. That's an interesting idea, but a possible problem with it is that we can't vary the granularity of a sleep time as finely as we can vary the number of buffers processed per iteration. Assuming that the system's tick rate is the typical 100Hz, we have only 10ms resolution on sleep times. > The reason I think this may be better is that "what percentage of the shared > buffers the bgwriter allows to get old between wakeups" seems more likely to > be a universal constant that people won't have to adjust than "fixed time > interval between bgwriter cleanup operations". Why? What you're really trying to determine, I think, is the I/O load imposed by the bgwriter, and pages-per-second seems a pretty natural way to think about that; percentage of shared buffers not so much. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Why? What you're really trying to determine, I think, is the I/O load > imposed by the bgwriter, and pages-per-second seems a pretty natural > way to think about that; percentage of shared buffers not so much. What I'm saying is that pages/s will vary from system to system. Busier systems will have higher i/o rates. So a system with a DBA on a system with a higher rate will want to adjust the bgwriter sleep time lower than the DBA on a system where bgwriter isn't doing much work. In particular I'm worried about what happens on a very busy cpu-bound system where adjusting the sleep times would result in it deciding to not sleep at all. On such a system sleeping for even 10ms might be too long. But we probably don't want to make the default even as low as 10ms. Anyways, if we have a working patch that works the other way around we could experiment with that and see if there are actual situations where sleeping for 0ms is necessary. Perhaps a mixture of the two approaches will be necessary anyways because of the granularity issue. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Wed, 18 Apr 2007, Tom Lane wrote: > Furthermore, if the page was dirty, then it's probably been accessed > more recently than adjacent pages that are clean, so preferentially > zapping just-written pages seems backwards. The LRU background writer only writes out pages that have a usage_count of 0, so they can't haven't been accessed too recently. Assuming the buffer allocation rate continues its historical trend, these are the pages that are going to be written out and then allocated for something new one way or another in the next interval; the content is expected to be lost shortly no matter what. As for preferring dirty pages over clean ones, on a re-read my question wasn't as clear as I wanted to be. I think that clean pages near the strategy point should also be moved to the free list by the background writer. You know clients are expected to require x buffers in the next y ms based on the history of the server (the new piece of information provided by the patch in the queue), and the LRU background writer is working in advance to make them available. If you're doing all that, doesn't it make sense to finish the job by putting the pages on the free list, where the clients can grab them without running their own scan over the buffer cache? -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wed, 18 Apr 2007, Jim C. Nasby wrote: > So if you make this change will BgBufferSync start incrementing > StrategyControl->nextVictimBuffer and decrementing buf->usage_count like > StrategyGetBuffer does now? Something will need to keep advancing the nextVictimBuffer, I hadn't really finished implementation yet; I just wanted to get an idea if this was even feasible, or if there was some larger issue that made the whole idea moot. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wed, 18 Apr 2007, Gregory Stark wrote: > In particular I'm worried about what happens on a very busy cpu-bound > system where adjusting the sleep times would result in it deciding to > not sleep at all. On such a system sleeping for even 10ms might be too > long... Anyways, if we have a working patch that works the other way > around we could experiment with that and see if there are actual > situations where sleeping for 0ms is necessary. I've been waiting for 8.3 to settle down before packaging the prototype auto-tuning background writer concept I'm working on (you can peek at the code at http://www.westnet.com/~gsmith/content/postgresql/bufmgr.c ), which already implements some of the ideas you're talking about in your messages today. I estimate how much of the buffer pool is dirty, use that to compute an expected I/O rate, and try to adjust parameters to meet a quality of service guarantee for how often the entire buffer pool is scanned. This is one of those problems that gets more difficult the more you dig into it; with all that done I still feel like I'm only halfway finished and several parts worked radically different in reality than I expected them to. If you're allowing the background writer to write 1000 pages at a clip, that's 8MB each interval. Doing that every 200ms makes for an I/O rate of 40MB/s. In a system that cares about data integrity, you'll exceed the ability of the WAL to sustain page writes (which limits how fast you can dirty pages) long before the interval approaches 0ms. What I do in my code is set the interval to 200ms, compute what the maximum pages to write must be, and if it's >1000 then I reduce the interval. I've tested dumping into a fairly fast disk array with tons of cache and I've never been able to get useful throughput below an 80ms interval; the OS just clamps down and makes you wait for I/O instead regardless of how little you intended to sleep. Eventually, it's got to hit disk, and you can only buffer for so long before that starts to slow you down. Anyway, this is a tangent discussion. The LRU patch that's in the queue doesn't really care if it runs with a short interval or a long one, because it automatically scales how much work it does according to how much time passed. I think that many only be a bit of tweaking away from a solid solution. Tuning the all scan, which is what you're talking about when you speak in terms of the statistics about the overall buffer pool, is a much harder job. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Greg Smith wrote: > I'm mostly done with my review of the "Automatic adjustment of > bgwriter_lru_maxpages" patch. In addition to issues already brought up > with that code, there are some small things that need to be done to merge > it with the recent pg_stat_bgwriter patch, and I have some concerns about > its unbounded scanning of the buffer pool; I'll write that up in more > detail or just submit an improved patch as I get time this week. > > But there's a fundamental question that has been bugging me, and I think > it impacts the direction that code should take. Unless I'm missing > something in my reading, buffers written out by the LRU writer aren't ever > put onto the free list. I assume this is to stop from prematurely > removing buffers that contain useful data. In cases where a substantial > percentage of the buffer cache is dirty, the LRU writer has to scan a > significant portion of the pool looking for one of the rare clean buffers, > then write it out. When a client goes to grab a free buffer afterward, it > has to scan the same section of the pool to find the now clean buffer, > which seems redundant. > > With the new patch, the LRU writer is fairly well bounded in that it > doesn't write out more than it thinks it will need; you shouldn't get into > a situation where many more pages are written than will be used in the > near future. Given that mindset, shouldn't pages the LRU scan writes just > get moved onto the free list? > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Greg Smith wrote: > I'm mostly done with my review of the "Automatic adjustment of > bgwriter_lru_maxpages" patch. In addition to issues already brought up > with that code, there are some small things that need to be done to merge > it with the recent pg_stat_bgwriter patch, and I have some concerns about > its unbounded scanning of the buffer pool; I'll write that up in more > detail or just submit an improved patch as I get time this week. > > But there's a fundamental question that has been bugging me, and I think > it impacts the direction that code should take. Unless I'm missing > something in my reading, buffers written out by the LRU writer aren't ever > put onto the free list. I assume this is to stop from prematurely > removing buffers that contain useful data. In cases where a substantial > percentage of the buffer cache is dirty, the LRU writer has to scan a > significant portion of the pool looking for one of the rare clean buffers, > then write it out. When a client goes to grab a free buffer afterward, it > has to scan the same section of the pool to find the now clean buffer, > which seems redundant. > > With the new patch, the LRU writer is fairly well bounded in that it > doesn't write out more than it thinks it will need; you shouldn't get into > a situation where many more pages are written than will be used in the > near future. Given that mindset, shouldn't pages the LRU scan writes just > get moved onto the free list? > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Added to TODO: * Consider adding buffers the BGW finds reusable to the free list http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php * Automatically tune bgwriter_delay based on activity rather then using a fixed interval http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php --------------------------------------------------------------------------- Greg Smith wrote: > I'm mostly done with my review of the "Automatic adjustment of > bgwriter_lru_maxpages" patch. In addition to issues already brought up > with that code, there are some small things that need to be done to merge > it with the recent pg_stat_bgwriter patch, and I have some concerns about > its unbounded scanning of the buffer pool; I'll write that up in more > detail or just submit an improved patch as I get time this week. > > But there's a fundamental question that has been bugging me, and I think > it impacts the direction that code should take. Unless I'm missing > something in my reading, buffers written out by the LRU writer aren't ever > put onto the free list. I assume this is to stop from prematurely > removing buffers that contain useful data. In cases where a substantial > percentage of the buffer cache is dirty, the LRU writer has to scan a > significant portion of the pool looking for one of the rare clean buffers, > then write it out. When a client goes to grab a free buffer afterward, it > has to scan the same section of the pool to find the now clean buffer, > which seems redundant. > > With the new patch, the LRU writer is fairly well bounded in that it > doesn't write out more than it thinks it will need; you shouldn't get into > a situation where many more pages are written than will be used in the > near future. Given that mindset, shouldn't pages the LRU scan writes just > get moved onto the free list? > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +