Thread: LRU
This patch replaces ARC with LRU in current sources. It is intended as a short-term measure to replace ARC; in the longer run, we should probably look at implementing a more effective replacement policy that is not patent encumbered. Notes: - WIP; it has survived a couple hundred thousand pgbench transactions and repeated regression tests, but I still want to do another code review or two. - It doesn't implement sequential scan activity hints, so it probably suffers from sequential flooding. This may be something we'll want to fix in the 8.0 timeframe, I'm not sure. - The LRU list is maintained via links in the BufferDesc structure, as was the case in 7.4 and earlier. In general the patch ought to be pretty similar to the 7.4 code, although I've taken the opportunity to (IMHO) simplify some things (the freelist is no longer circular, for instance). - One major behavioral change effects the bgwriter. In order to return pages in LRU order, we can _only_ scan the free list (since we don't bother to maintain a list of pinned buffers sorted by the recency of last access). I think this is actually a good idea: I'm not sure why the bgwriter should be writing out pinned buffers anyway, since it increases the likelihood that it will interfere with normal system activity, and a pinned buffer is pretty likely to be re-dirtied later on. So the bgwriter has been changed to only examine unpinned buffers. As a result of that, there is no easy way to say "flush at most n% of the dirty buffers", because the bgwriter no longer considers all the dirty buffers. So I've removed bgwriter_percent for now. Advice on the best way to resolve this situation would be welcome -- should we keep bgwriter_percent in one form or another, redefine bgwriter_maxpages according to the December discussion, or ...? - This patch has a lot of assertions -- I'll disable the more expensive and unnecessary ones once the patch is closer to being finished. - Added BufferGetBufferDescriptor(); I'm not sure if it is notationally cleaner to use this rather than manually looking up entries in the BufferDescriptor array. Also, at present BufferGetBufferDescriptor() doesn't handle local buffers; to make it do that, we'd need to doubly-evaluate the macro argument. Comments? -Neil
Attachment
Neil Conway <neilc@samurai.com> writes: > In other words, you might be able to somewhat reduce the size of the > patch by, say, not renaming some exported functions in freelist.c, but I > don't see that that will have a significant effect upon the complexity > of the patch or the risk that it might cause regressions. Well, the thing that's bothering me is that you are undoing a number of changes that we'll probably just have to redo later; with consequently *two* chances to introduce bugs. Case in point is moving the responsibility for this: + /* + * Clear the buffer's tag. This doesn't matter for the hash table, + * since the buffer is already removed from it, but it ensures that + * sequential searches through the buffer table won't think the buffer + * is still valid for its old page. + */ + CLEAR_BUFFERTAG(buf->tag); out of freelist.c and putting it back in bufmgr.c. Jan actually introduced a bug in his original ARC patch by deleting this code from BufTableDelete and forgetting to put equivalent defenses into freelist.c. I've got little confidence that this patch doesn't create some similar problem, and none that we won't make the same mistake over again when we re-revert the division of labor. In short I'd rather try to minimize the amount of API churn that bufmgr.c sees in this patch, because we'll probably just be changing it back later. If the problem is that the current freelist.c API exposes too much about the ARC implementation, the answer to that is not to go back to exposing the LRU-list implementation; it's to generalize the API. regards, tom lane
Neil Conway <neilc@samurai.com> writes: > This patch replaces ARC with LRU in current sources. I'm more than slightly uncomfortable with the size of this patch. I realize that what you were trying to do was revert the code to its 7.4 state, but is that the long-run direction we want to pursue? I think it would be better and safer to try to localize the changes into freelist.c. regards, tom lane
On Mon, 2005-01-24 at 17:39 -0500, Tom Lane wrote: > I'm more than slightly uncomfortable with the size of this patch. [...] > I think it would be better and safer to try to localize the changes > into freelist.c. IMHO that is basically what the patch does; the changes to other files are either cosmetic or are required to change the replacement strategy. More specifically, diffstat shows: doc/src/sgml/runtime.sgml | 48 - src/backend/catalog/index.c | 2 src/backend/commands/dbcommands.c | 4 src/backend/commands/vacuum.c | 27 src/backend/postmaster/bgwriter.c | 3 src/backend/storage/buffer/buf_init.c | 55 ! src/backend/storage/buffer/buf_table.c | 67 ! src/backend/storage/buffer/bufmgr.c | 177 -!! src/backend/storage/buffer/freelist.c | 1031 !!!!!!!!!!!!!!!!!!!!!!!!! src/backend/utils/misc/guc.c | 19 src/backend/utils/misc/postgresql.conf.sample | 1 src/include/postmaster/bgwriter.h | 1 src/include/storage/buf_internals.h | 85 !! src/include/storage/bufmgr.h | 2 14 files changed, 56 insertions(+), 139 deletions(-), 1327 modifications(!) A summary of the changes made to each file: runtime.sgml: doc updates index.c: trivial API change (BufferSync) dbcommands.c: trivial API change (BufferSync) vacuum.c: trivial API change (StrategyVacuumHint) bgwriter.c: trivial API change (BufferSync), remove bgwriter_percent buf_init.c: changes required for LRU buf_table.c: changes required for LRU (or more properly, reverting some modifications to the buf_table stuff that was part of the ARC patch: the patch should be very close to the buf_table in 7.4) bufmgr.c: move PinBuffer() and UnpinBuffer() to freelist.c, replace "StrategyXXX" calls with similar calls to BufTableXXX and so on. Some BufferSync API changes. Most of these changes are pretty straightforward. freelist.c: obvious :) guc.c: remove bgwriter_percent and debug_shared_buffers postgresql.conf.sample: remove bgwriter_percent bgwriter.h: remove bgwriter_percent buf_internals.h: changes necessary for LRU bufmgr.h: BufferSync API change In other words, you might be able to somewhat reduce the size of the patch by, say, not renaming some exported functions in freelist.c, but I don't see that that will have a significant effect upon the complexity of the patch or the risk that it might cause regressions. If you have specific suggestions about how to reduce the scope of the patch I'm all ears, but I don't think the size of the patch is necessarily the most reliable metric for the probability it will result in a regression. -Neil
On Mon, 2005-01-24 at 18:43 -0500, Tom Lane wrote: > Well, the thing that's bothering me is that you are undoing a number of > changes that we'll probably just have to redo later; with consequently > *two* chances to introduce bugs. I'm not sure if we should apply this patch to both HEAD and REL8_0_STABLE, or just the latter. I would guess there is no IP reason to remove ARC from HEAD as well; while it might introduce complications in backporting changes from HEAD, those same complications were present through most of the 8.0 cycle (i.e. HEAD used a different replacement policy than REL7_4_STABLE did). > If the problem is that the current freelist.c API exposes too much about > the ARC implementation, the answer to that is not to go back to exposing > the LRU-list implementation; it's to generalize the API. That would be one approach, yeah. My goal was to keep the code simple and reasonably similar to the 7.4 codebase (which has gotten a lot more testing than the ARC code, in addition to the fact that the ARC APIs aren't really appropriate for an LRU implementation). Worrying about the right freelist.c APIs seemed a little beside the point -- I didn't want to make _any_ assumptions about how the code is going to look in 8.1, as it may be fundamentally different. -Neil